Historical record of incidents for Alloy
Report: "Increased API and Dashboard Errors"
Last updateWe are currently observing widespread internet disruptions affecting Alloy and many other services across the web, including providers such as Equifax, Glean, and others. This appears to be a broader issue impacting infrastructure providers like Google Cloud and Cloudflare. At this time, the issue is not isolated to Alloy, and we are actively monitoring the situation as it evolves. We will continue to provide updates as more information becomes available. Thank you for your patience.
Report: "Degraded Dashboard Performance"
Last updateWe are continuing to investigate this issue.
We have received reports of performance degradation with the Alloy dashboard. We are investigating this as a matter of priority.
Report: "Degraded SDK performance"
Last updateWe have received reports of degradation with the Alloy SDK Veriff plugin. We are currently working on a fix.
Report: "Degraded Dashboard Performance"
Last updateThe issue has been fully resolved, and all functionality has been restored to normal operations. A postmortem will be provided as soon as possible.
We've received reports of performance degradation in the Alloy dashboard, impacting the loading of some evaluations. The incident began at 5:22 PM.
Report: "Degraded Dashboard Performance"
Last updateThe issue has been fully resolved, and all functionality has been restored to normal operations. A postmortem will be provided as soon as possible.
We've received reports of performance degradation in the Alloy dashboard, impacting the loading of some evaluations. The incident began at 5:22 PM.
Report: "Increased API Errors"
Last updateA faulty API deployment caused timeout-related issues. The incident was identified at 11:01 AM and resolved by reverting the deployment at 11:14 AM.
We’re currently investigating reports of elevated errors to the API. More updates to come shortly.
Report: "Increased API Errors"
Last updateA faulty API deployment caused timeout-related issues. The incident was identified at 11:01 AM and resolved by reverting the deployment at 11:14 AM.
We’re currently investigating reports of elevated errors to the API. More updates to come shortly.
Report: "Degraded Dashboard Performance"
Last updateWe received reports of performance degradation affecting the Alloy dashboard between 15:34 EDT and 15:50 EDT. The issue has since been resolved. Our team is actively investigating the root cause and will share updates as more information becomes available.
Report: "Increased API Errors"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We’re currently investigating reports of elevated errors to the API. More updates to come shortly.
Report: "QualiFile integration outage"
Last updateQualiFile failures ceased at 8:30 AM EST. Any applications that failed during this time can be re-run. For additional information, please reach out to your FIS point of contact or open a ticket in the FIS support portal.
An outage with the QualiFile integration has been identified. The FIS technical team is working towards a resolution as quickly as possible. For the latest status of the outage, please reach out to your FIS point of contact or open a ticket in the FIS support portal.
An outage with QualiFile has been identified. Awaiting resolution from the QualiFile team
Report: "Socure3 Integration Error Rate Increase"
Last updateThe incident has been reported resolved by Socure and Alloy has confirmed.
The issue has been identified and a fix is being implemented.
An outage with Socure has been identified. Awaiting resolution from the Socure team
Report: "Intermittent latencies with Alloy API are being investigated"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Increased latencies were observed from 12/18/2024 20:48 to 12/19/2024 2:18 EST. We are continuing to investigate the root causes.
We are continuing to investigate issues with observed increased latencies. We are still getting to the root cause. We will follow up again with an update in 30 minutes.
Report: "SDK Unavailable for a few minutes"
Last updateThe SDK was unavailable for most customers between 10:16am and 10:35am EST on December 12.
Report: "Evaluation Page is Not Loading"
Last updateA code release at 2:13 PM EST caused our evaluations page to not load. We rolled back the release to resolve the issue at 2:24 PM EST.
Report: "Increased Socure Service Errors"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
Socure is currently experiencing high levels of errors, which are affecting Alloy clients who utilize Socure in their policies. We have reached out to Socure and working towards resolution.
Report: "Degraded Dashboard Performance"
Last updateWe experienced a service degradation at 10:34 am EST on the evaluation page of the Alloy dashboard. To restore full capacity, we reverted the deployment restoring the service at 10:43 am EST.
Report: "Issues loading dashboard"
Last updateThis incident has been resolved.
A fix has been implemented and all evaluation pages should load normally
Some customers are having issues loading our dashboard, we are investigating
Report: "Elevated errors on Transaction Evaluations"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are seeing elevated errors on our Transaction service and are investigating the root cause
Report: "Production API Error Elevation"
Last updateThis incident has been resolved.
All services have been restored and are currently being monitored.
We are continuing to investigate this issue.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Latency and unavailability for API and Dashboard"
Last updateWe have identified and fully addressed the root cause of the unavailability. The system is fully stable.
We have identified and mitigated the source of poor performance. API and Dashboard were unavailable for a total of 8 minutes between 10:56-10:58ET and 11:05-11:10ET.
We're seeing intermittent latency and unavailability for API and Dashboard. Currently investigating.
Report: "Transactions evaluation latency"
Last updateLatency on all APIs has returned to normal and the root cause has been identified and its impact mitigated.
Transaction Evaluations had elevated latency for about 50 minutes this morning. We have cancelled some background tasks and service is restored.
We are investigating an issue with transaction evaluations
Report: "TransUnion Credit is experiencing a complete downtime"
Last updateThis incident has been resolved.
TransUnion Credit is experiencing a complete downtime. We are reaching out to their support to reach a quick resolution.
Report: "Evaluations Dashboard Availability"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We have received reports of performance degradation with the Alloy dashboard. We are investigating this as a matter of priority.
Report: "Iovation Service Experiencing Errors"
Last updateThe third-party service, TransUnion device risk (Iovation), has been completely resolved as of 6:00 am UTC (2:00 am EDT). The most recent failure detected at Alloy occurred at 5:20 UTC (1:20 am EDT).
Our third party service Iovation is returning errors so Evaluations that use Iovation will be affected. We are reaching out to Iovation to identify and address the issue.
Report: "Degraded SDK performance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have received reports of degradation with the Alloy SDK. We are investigating this as a matter of priority.
Report: "Degraded SDK performance in US/EU"
Last updateOn April 1, 2024, between 5:01 PM EST and 5:20 PM EST for the US region and 5:01 PM EST through April 2, 2024, 8:01 AM EST for EU regions, customers in these regions may have experienced issues with the Alloy SDK. Attempts to initialize the Alloy SDK had an increase in failures with 403 errors, and the front end would’ve displayed a blank modal. This issue has been fully resolved.
Report: "Production API Error Elevation"
Last updateThis incident has been resolved.
We are monitoring for further degraded performance
The issue has been identified and a fix was deployed.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Short networking issue"
Last updateDuring this time there was a brief loss of connectivity to our API and dashboard (about 4 minutes). Initially we believed this to be related to a routine security networking change which we rolled back immediately. However, working with AWS on a RCA, we determined that the service interruption was actually the result of a rarely exposed load balancer bug within the AWS ALB service. A workaround is pending to prevent this bug from surfacing again.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateFrom 6:01am to 6:12am we experienced a spike in latency following a spike in throughput from a third party vendor. Service is fully restored and we will continue monitoring and follow up to prevent this moving forward.
We are continuing to monitor for any further issues.
Service has been restored and we will continue monitoring
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Evaluation page is down"
Last updateWe released a change to our dashboard that was causing evaluation page to not display, which also affected journey applications page. Outage happened from 13:30 to 13:52 EST.
Report: "Database lock resulting in API outage"
Last updateBetween the hours of 22:02 and 22:12 ET (10 minutes), a scheduled database migration that had been tested successfully in lower environments acquired an unintended long lock on a database table, resulting in a backup of queries and increasingly high latencies. The migration was automatically detected and killed, freeing up database performance and returning operations to normal. The postmortem process is underway and multiple new checks for this type of failure are already in process.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Webhook Processing Delays"
Last updateThis incident has been resolved.
There is a delay in the webhook service, we have identified the problem, applied a fix and waiting for the queue to normalize.
Report: "Production API Error Elevation"
Last updateReplication delays between our databases caused a period of increased latency for our API and dashboard from 8:35-8:41 AM ET. This resulted in 6 minutes of increased latency on some API endpoints. All data integrity was maintained during this issue.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Intermittent errors across multiple services"
Last updateAWS has fixed the issue on their end and the intermittent errors on Alloy services have been resolved.
AWS has acknowledged they are experiencing a service outage which is impacting Alloy. Our engineers are working with them on identifying the services affected and the full impact on Alloy.
We are seeing intermittent errors across multiple services in the Alloy API and dashboard. We are investigating the impact and will continue to update.
Report: "Production API Error Elevation"
Last updateWe had a spike in transaction latency this morning between 7:06 AM ET and 8:37 AM ET. The database queries that were backing up and causing this latency have been identified and corrected.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateThis incident has been resolved
Our API and dashboard are currently functional - we are continuing to monitor as our services come back up.
We are continuing to investigate this issue. Dashboard access has partially recovered
We are continuing to investigate the issue.
We are continuing to investigate this issue.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Dashboard intermittent errors"
Last updateWe have fully resolved the errors in the Alloy dashboard and restored functionality to normal operations.
We are currently investigating issues loading evaluations and entities pages on our dashboard.
Report: "Production API Increased Latency"
Last updateWe started to see latency on our production API rising from around 5:30 AM ET until it started to trip specific monitors around 8:06 AM ET this morning. Once we discovered and remediated the issue, latency returned to normal around 8:43 AM. This may have triggered timeouts or occasional 504 errors for some clients, but all API queries can be rerun now successfully.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Dashboard Intermittent Errors"
Last updateBetween 10:28am and 10:54am (EST) we deployed a change to improve our database performance which caused intermittent errors on our dashboard. The dashboard was the only impacted service and no data or API processing was affected. Errors were resolved when we rolled back our change.
Report: "API 206 response increase"
Last updateOur API experienced an increase in errors for certain workflows across a small number of clients between 14:38 and 15:44 ET today. This resulted in some API requests not being resolved completely. Any API requests that resulted in a 206 HTTP status code during this time can be rerun now and will process correctly.
Report: "Production API Error Elevation"
Last updateWe had a period of database instability from 12:35 ET until 12:42 ET (7 minutes) that caused a minor service disruption for our API and database. We have identified and remediated the issue and are continuing to monitor the situation.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateThis incident has been resolved.
Between 8:54 ET and 9:01 ET (7 minutes) there was a short period of degraded service to our API. We are monitoring the situation, but it was quickly resolved.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateThe API and dashboard have been confirmed to be functioning normally. We have identified the root of the database latency issue, remediated it, and put checks in place to prevent this issue from occurring again. We continue to monitor the situation and will follow up early next week with a postmortem upon request.
As of 19:21 ET, the API and dashboard have been available but performance is still partially degraded. We are continuing to investigate and monitor.
We are currently having intermittent API and dashboard unavailability that is due to database performance degradation. We are actively investigating and will post updates once we know more.
We are triaging an increase in latency to our APIs because of database latency. Our on-call teams are investigating and remediating now.
Report: "Production API Error Elevation"
Last updateThis incident has been resolved.
Latency is back to normal on all endpoints. We're continuing to monitor overall latency.
We're seeing an increase in latency across certain applications, but no errors. We'll keep this event up to track that increase in latency, but there is nothing to indicate at this time that this is a larger incident.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateDue to a database issue, our API and dashboard experienced errors and were partially unavailable from 11:40 AM ET until 12:02 PM ET (22 minutes). The database issue was promptly resolved and all services began operating as normal after a backlog of queries cleared. We are currently working on increased monitoring of this type of issue and checks to prevent it from happening again.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Database migration latency"
Last updateAt 10:41AM today, a database migration resulted in unexpected latency under load. Alloy’s incident response team was notified immediately by automated alerting, and was able to quickly diagnose the problem. The root cause was resolved within a couple of minutes, but it caused a backlog of requests that resulted in latency, timeouts, and, in some cases, failed requests until 10:54AM.
Report: "Brief dashboard service instability"
Last updateA deployment failure with our core dashboard application caused a short period of "service unavailable" error messages to return. Full service has now been restored.
Report: "Production API Error Elevation"
Last updateFrom 12:52 EDT until 1:57 EDT, urgent maintenance running on our production database caused a significant increase in processing time for our dashboard and api actions. From 1:58 EDT until 2:15 EDT, queries to our api were unresponsive. During this time, we worked to reduce the load on our database and restore service back to normal. We are making adjustments to our internal processes to ensure that we have no reoccurrences of similar issues.
We've applied the fix and are monitoring.
Some clients would likely be experiencing hight latency when sending requests. We've identified the problem and are working on a fix.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateBetween 10:34 and 10:37 ET, we rolled out a scaling event to our API to mitigate some of the issues we are seeing with the AWS outage in one of the us-east-1 datacenters we use. This scaling event did not execute correctly due to the ongoing issues, making the situation temporarily worse. We immediately rolled back that change and are continuing to monitor the situation with AWS.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateThis incident has been resolved and the indicators that we have been observing on our side have recovered to the extent that we're confident in the current health of the system. The incident's effects lasted from approximately 20:39 - 22:28 ET, though the impact was intermittent during that time and mostly impacted our APIs. The root cause appears to be a cleanup process initiated by our Amazon Web Services Aurora database causing some read processes to slow down resulting in most API queries failing. We were able to restore service by moving these workloads to a different location while the process finished. We are working with AWS and our internal teams to avoid both this specific issue and any related issues in the future.
We've moved some of our load off of read replicas, which seems to have mitigated the issues we were seeing so far. We are still monitoring and discussing the root cause. We will continue to monitor this to make sure it is stable before closing this incident.
We're still investigating - something is causing massive issues with all read replicas of our production database cluster. We've tried a series of experiments and methods to recover the service. We are now working on larger-scale fixes which we'll be able to roll out in the new few minutes. We will post immediately upon recovery or the issue is identified.
We are currently aware of major issues with our APIs and certain degradations with our dashboard. We are trying to diagnose the root cause currently.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateWe experienced intermittent API instability between 12:55 and 13:18 ET. The issue is resolved and we will be updating with more information soon.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Dashboard Unavailability"
Last updateOur dashboard application was unavailable intermittently between 10:35 and 10:56 ET today. The API was not impacted, so all data and evaluations were processed normally during this period. We are investigating the root cause and adding more monitors to catch this sort of issue faster.
Report: "Production API and Dashboard Latency Increase"
Last updateFrom 12:05 to 12:19 ET today, additional load on our production database caused a significant increase in processing time for dashboard and API actions. Once the load was targeted and addressed, latency went down immediately and is now back to normal. Our infrastructure team is currently working on addressing the root cause of the problem and making sure we have no reoccurrences of similar issues.
We are continuing to monitor for any further issues.
We have identified the latency issue and are now monitoring the fix to make sure there are no further degradations.
We're seeing increased latency on certain queries from our core database. We are currently working on identifying the issue and mitigating the latency
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateA sharp increase in database connections caused an increase in latency between 12:30 and 12:33 ET today. The issue was quickly mitigated and latency went back to normal immediately.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.
Report: "Production API Error Elevation"
Last updateWe have resolved the issue and are monitoring all affected services to make sure they are operating at full capacity. We will be following up with a full RCA upon request in the next few days as we work through the detailed contributing factors. We are also continuing to work with AWS to understand what happened to our underlying infrastructure.
We've identified the issue and are working with AWS to resolve issues with our caching layer. We believe a fix should be in place soon, but are waiting for infrastructure to come up to mitigate the problem.
We are seeing a major outage with applications that connect to our caching services. We are actively investigating this issue and trying to restore service as soon as possible.
Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.