Alloy

Is Alloy Down Right Now? Check if there is a current outage ongoing.

Alloy is currently Operational

Last checked from Alloy's official status page

Historical record of incidents for Alloy

Report: "Increased API and Dashboard Errors"

Last update
identified

We are currently observing widespread internet disruptions affecting Alloy and many other services across the web, including providers such as Equifax, Glean, and others. This appears to be a broader issue impacting infrastructure providers like Google Cloud and Cloudflare. At this time, the issue is not isolated to Alloy, and we are actively monitoring the situation as it evolves. We will continue to provide updates as more information becomes available. Thank you for your patience.

Report: "Degraded Dashboard Performance"

Last update
investigating

We are continuing to investigate this issue.

investigating

We have received reports of performance degradation with the Alloy dashboard. We are investigating this as a matter of priority.

Report: "Degraded SDK performance"

Last update
identified

We have received reports of degradation with the Alloy SDK Veriff plugin. We are currently working on a fix.

Report: "Degraded Dashboard Performance"

Last update
resolved

The issue has been fully resolved, and all functionality has been restored to normal operations. A postmortem will be provided as soon as possible.

identified

We've received reports of performance degradation in the Alloy dashboard, impacting the loading of some evaluations. The incident began at 5:22 PM.

Report: "Degraded Dashboard Performance"

Last update
Resolved

The issue has been fully resolved, and all functionality has been restored to normal operations. A postmortem will be provided as soon as possible.

Identified

We've received reports of performance degradation in the Alloy dashboard, impacting the loading of some evaluations. The incident began at 5:22 PM.

Report: "Increased API Errors"

Last update
resolved

A faulty API deployment caused timeout-related issues. The incident was identified at 11:01 AM and resolved by reverting the deployment at 11:14 AM.

investigating

We’re currently investigating reports of elevated errors to the API. More updates to come shortly.

Report: "Increased API Errors"

Last update
Resolved

A faulty API deployment caused timeout-related issues. The incident was identified at 11:01 AM and resolved by reverting the deployment at 11:14 AM.

Investigating

We’re currently investigating reports of elevated errors to the API. More updates to come shortly.

Report: "Degraded Dashboard Performance"

Last update
resolved

We received reports of performance degradation affecting the Alloy dashboard between 15:34 EDT and 15:50 EDT. The issue has since been resolved. Our team is actively investigating the root cause and will share updates as more information becomes available.

Report: "Increased API Errors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We’re currently investigating reports of elevated errors to the API. More updates to come shortly.

Report: "QualiFile integration outage"

Last update
resolved

QualiFile failures ceased at 8:30 AM EST. Any applications that failed during this time can be re-run. For additional information, please reach out to your FIS point of contact or open a ticket in the FIS support portal.

identified

An outage with the QualiFile integration has been identified. The FIS technical team is working towards a resolution as quickly as possible. For the latest status of the outage, please reach out to your FIS point of contact or open a ticket in the FIS support portal.

identified

An outage with QualiFile has been identified. Awaiting resolution from the QualiFile team

Report: "Socure3 Integration Error Rate Increase"

Last update
resolved

The incident has been reported resolved by Socure and Alloy has confirmed.

identified

The issue has been identified and a fix is being implemented.

investigating

An outage with Socure has been identified. Awaiting resolution from the Socure team

Report: "Intermittent latencies with Alloy API are being investigated"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Increased latencies were observed from 12/18/2024 20:48 to 12/19/2024 2:18 EST. We are continuing to investigate the root causes.

investigating

We are continuing to investigate issues with observed increased latencies. We are still getting to the root cause. We will follow up again with an update in 30 minutes.

Report: "SDK Unavailable for a few minutes"

Last update
resolved

The SDK was unavailable for most customers between 10:16am and 10:35am EST on December 12.

Report: "Evaluation Page is Not Loading"

Last update
resolved

A code release at 2:13 PM EST caused our evaluations page to not load. We rolled back the release to resolve the issue at 2:24 PM EST.

Report: "Increased Socure Service Errors"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

Socure is currently experiencing high levels of errors, which are affecting Alloy clients who utilize Socure in their policies. We have reached out to Socure and working towards resolution.

Report: "Degraded Dashboard Performance"

Last update
resolved

We experienced a service degradation at 10:34 am EST on the evaluation page of the Alloy dashboard. To restore full capacity, we reverted the deployment restoring the service at 10:43 am EST.

Report: "Issues loading dashboard"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and all evaluation pages should load normally

investigating

Some customers are having issues loading our dashboard, we are investigating

Report: "Elevated errors on Transaction Evaluations"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are seeing elevated errors on our Transaction service and are investigating the root cause

Report: "Production API Error Elevation"

Last update
resolved

This incident has been resolved.

monitoring

All services have been restored and are currently being monitored.

investigating

We are continuing to investigate this issue.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Latency and unavailability for API and Dashboard"

Last update
resolved

We have identified and fully addressed the root cause of the unavailability. The system is fully stable.

monitoring

We have identified and mitigated the source of poor performance. API and Dashboard were unavailable for a total of 8 minutes between 10:56-10:58ET and 11:05-11:10ET.

investigating

We're seeing intermittent latency and unavailability for API and Dashboard. Currently investigating.

Report: "Transactions evaluation latency"

Last update
resolved

Latency on all APIs has returned to normal and the root cause has been identified and its impact mitigated.

monitoring

Transaction Evaluations had elevated latency for about 50 minutes this morning. We have cancelled some background tasks and service is restored.

investigating

We are investigating an issue with transaction evaluations

Report: "TransUnion Credit is experiencing a complete downtime"

Last update
resolved

This incident has been resolved.

monitoring

TransUnion Credit is experiencing a complete downtime. We are reaching out to their support to reach a quick resolution.

Report: "Evaluations Dashboard Availability"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We have received reports of performance degradation with the Alloy dashboard. We are investigating this as a matter of priority.

Report: "Iovation Service Experiencing Errors"

Last update
resolved

The third-party service, TransUnion device risk (Iovation), has been completely resolved as of 6:00 am UTC (2:00 am EDT). The most recent failure detected at Alloy occurred at 5:20 UTC (1:20 am EDT).

investigating

Our third party service Iovation is returning errors so Evaluations that use Iovation will be affected. We are reaching out to Iovation to identify and address the issue.

Report: "Degraded SDK performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We have received reports of degradation with the Alloy SDK. We are investigating this as a matter of priority.

Report: "Degraded SDK performance in US/EU"

Last update
resolved

On April 1, 2024, between 5:01 PM EST and 5:20 PM EST for the US region and 5:01 PM EST through April 2, 2024, 8:01 AM EST for EU regions, customers in these regions may have experienced issues with the Alloy SDK. Attempts to initialize the Alloy SDK had an increase in failures with 403 errors, and the front end would’ve displayed a blank modal. This issue has been fully resolved.

Report: "Production API Error Elevation"

Last update
resolved

This incident has been resolved.

monitoring

We are monitoring for further degraded performance

identified

The issue has been identified and a fix was deployed.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Short networking issue"

Last update
resolved

During this time there was a brief loss of connectivity to our API and dashboard (about 4 minutes). Initially we believed this to be related to a routine security networking change which we rolled back immediately. However, working with AWS on a RCA, we determined that the service interruption was actually the result of a rarely exposed load balancer bug within the AWS ALB service. A workaround is pending to prevent this bug from surfacing again.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

From 6:01am to 6:12am we experienced a spike in latency following a spike in throughput from a third party vendor. Service is fully restored and we will continue monitoring and follow up to prevent this moving forward.

monitoring

We are continuing to monitor for any further issues.

monitoring

Service has been restored and we will continue monitoring

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Evaluation page is down"

Last update
resolved

We released a change to our dashboard that was causing evaluation page to not display, which also affected journey applications page. Outage happened from 13:30 to 13:52 EST.

Report: "Database lock resulting in API outage"

Last update
resolved

Between the hours of 22:02 and 22:12 ET (10 minutes), a scheduled database migration that had been tested successfully in lower environments acquired an unintended long lock on a database table, resulting in a backup of queries and increasingly high latencies. The migration was automatically detected and killed, freeing up database performance and returning operations to normal. The postmortem process is underway and multiple new checks for this type of failure are already in process.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Webhook Processing Delays"

Last update
resolved

This incident has been resolved.

monitoring

There is a delay in the webhook service, we have identified the problem, applied a fix and waiting for the queue to normalize.

Report: "Production API Error Elevation"

Last update
resolved

Replication delays between our databases caused a period of increased latency for our API and dashboard from 8:35-8:41 AM ET. This resulted in 6 minutes of increased latency on some API endpoints. All data integrity was maintained during this issue.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Intermittent errors across multiple services"

Last update
resolved

AWS has fixed the issue on their end and the intermittent errors on Alloy services have been resolved.

identified

AWS has acknowledged they are experiencing a service outage which is impacting Alloy. Our engineers are working with them on identifying the services affected and the full impact on Alloy.

investigating

We are seeing intermittent errors across multiple services in the Alloy API and dashboard. We are investigating the impact and will continue to update.

Report: "Production API Error Elevation"

Last update
resolved

We had a spike in transaction latency this morning between 7:06 AM ET and 8:37 AM ET. The database queries that were backing up and causing this latency have been identified and corrected.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

This incident has been resolved

monitoring

Our API and dashboard are currently functional - we are continuing to monitor as our services come back up.

investigating

We are continuing to investigate this issue. Dashboard access has partially recovered

investigating

We are continuing to investigate the issue.

investigating

We are continuing to investigate this issue.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Dashboard intermittent errors"

Last update
resolved

We have fully resolved the errors in the Alloy dashboard and restored functionality to normal operations.

investigating

We are currently investigating issues loading evaluations and entities pages on our dashboard.

Report: "Production API Increased Latency"

Last update
resolved

We started to see latency on our production API rising from around 5:30 AM ET until it started to trip specific monitors around 8:06 AM ET this morning. Once we discovered and remediated the issue, latency returned to normal around 8:43 AM. This may have triggered timeouts or occasional 504 errors for some clients, but all API queries can be rerun now successfully.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Dashboard Intermittent Errors"

Last update
resolved

Between 10:28am and 10:54am (EST) we deployed a change to improve our database performance which caused intermittent errors on our dashboard. The dashboard was the only impacted service and no data or API processing was affected. Errors were resolved when we rolled back our change.

Report: "API 206 response increase"

Last update
resolved

Our API experienced an increase in errors for certain workflows across a small number of clients between 14:38 and 15:44 ET today. This resulted in some API requests not being resolved completely. Any API requests that resulted in a 206 HTTP status code during this time can be rerun now and will process correctly.

Report: "Production API Error Elevation"

Last update
resolved

We had a period of database instability from 12:35 ET until 12:42 ET (7 minutes) that caused a minor service disruption for our API and database. We have identified and remediated the issue and are continuing to monitor the situation.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

This incident has been resolved.

monitoring

Between 8:54 ET and 9:01 ET (7 minutes) there was a short period of degraded service to our API. We are monitoring the situation, but it was quickly resolved.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

The API and dashboard have been confirmed to be functioning normally. We have identified the root of the database latency issue, remediated it, and put checks in place to prevent this issue from occurring again. We continue to monitor the situation and will follow up early next week with a postmortem upon request.

investigating

As of 19:21 ET, the API and dashboard have been available but performance is still partially degraded. We are continuing to investigate and monitor.

investigating

We are currently having intermittent API and dashboard unavailability that is due to database performance degradation. We are actively investigating and will post updates once we know more.

investigating

We are triaging an increase in latency to our APIs because of database latency. Our on-call teams are investigating and remediating now.

Report: "Production API Error Elevation"

Last update
resolved

This incident has been resolved.

monitoring

Latency is back to normal on all endpoints. We're continuing to monitor overall latency.

investigating

We're seeing an increase in latency across certain applications, but no errors. We'll keep this event up to track that increase in latency, but there is nothing to indicate at this time that this is a larger incident.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

Due to a database issue, our API and dashboard experienced errors and were partially unavailable from 11:40 AM ET until 12:02 PM ET (22 minutes). The database issue was promptly resolved and all services began operating as normal after a backlog of queries cleared. We are currently working on increased monitoring of this type of issue and checks to prevent it from happening again.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Database migration latency"

Last update
resolved

At 10:41AM today, a database migration resulted in unexpected latency under load. Alloy’s incident response team was notified immediately by automated alerting, and was able to quickly diagnose the problem. The root cause was resolved within a couple of minutes, but it caused a backlog of requests that resulted in latency, timeouts, and, in some cases, failed requests until 10:54AM.

Report: "Brief dashboard service instability"

Last update
resolved

A deployment failure with our core dashboard application caused a short period of "service unavailable" error messages to return. Full service has now been restored.

Report: "Production API Error Elevation"

Last update
resolved

From 12:52 EDT until 1:57 EDT, urgent maintenance running on our production database caused a significant increase in processing time for our dashboard and api actions. From 1:58 EDT until 2:15 EDT, queries to our api were unresponsive. During this time, we worked to reduce the load on our database and restore service back to normal. We are making adjustments to our internal processes to ensure that we have no reoccurrences of similar issues.

monitoring

We've applied the fix and are monitoring.

investigating

Some clients would likely be experiencing hight latency when sending requests. We've identified the problem and are working on a fix.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

Between 10:34 and 10:37 ET, we rolled out a scaling event to our API to mitigate some of the issues we are seeing with the AWS outage in one of the us-east-1 datacenters we use. This scaling event did not execute correctly due to the ongoing issues, making the situation temporarily worse. We immediately rolled back that change and are continuing to monitor the situation with AWS.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

This incident has been resolved and the indicators that we have been observing on our side have recovered to the extent that we're confident in the current health of the system. The incident's effects lasted from approximately 20:39 - 22:28 ET, though the impact was intermittent during that time and mostly impacted our APIs. The root cause appears to be a cleanup process initiated by our Amazon Web Services Aurora database causing some read processes to slow down resulting in most API queries failing. We were able to restore service by moving these workloads to a different location while the process finished. We are working with AWS and our internal teams to avoid both this specific issue and any related issues in the future.

monitoring

We've moved some of our load off of read replicas, which seems to have mitigated the issues we were seeing so far. We are still monitoring and discussing the root cause. We will continue to monitor this to make sure it is stable before closing this incident.

investigating

We're still investigating - something is causing massive issues with all read replicas of our production database cluster. We've tried a series of experiments and methods to recover the service. We are now working on larger-scale fixes which we'll be able to roll out in the new few minutes. We will post immediately upon recovery or the issue is identified.

investigating

We are currently aware of major issues with our APIs and certain degradations with our dashboard. We are trying to diagnose the root cause currently.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

We experienced intermittent API instability between 12:55 and 13:18 ET. The issue is resolved and we will be updating with more information soon.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Dashboard Unavailability"

Last update
resolved

Our dashboard application was unavailable intermittently between 10:35 and 10:56 ET today. The API was not impacted, so all data and evaluations were processed normally during this period. We are investigating the root cause and adding more monitors to catch this sort of issue faster.

Report: "Production API and Dashboard Latency Increase"

Last update
resolved

From 12:05 to 12:19 ET today, additional load on our production database caused a significant increase in processing time for dashboard and API actions. Once the load was targeted and addressed, latency went down immediately and is now back to normal. Our infrastructure team is currently working on addressing the root cause of the problem and making sure we have no reoccurrences of similar issues.

monitoring

We are continuing to monitor for any further issues.

monitoring

We have identified the latency issue and are now monitoring the fix to make sure there are no further degradations.

investigating

We're seeing increased latency on certain queries from our core database. We are currently working on identifying the issue and mitigating the latency

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

A sharp increase in database connections caused an increase in latency between 12:30 and 12:33 ET today. The issue was quickly mitigated and latency went back to normal immediately.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update
resolved

We have resolved the issue and are monitoring all affected services to make sure they are operating at full capacity. We will be following up with a full RCA upon request in the next few days as we work through the detailed contributing factors. We are also continuing to work with AWS to understand what happened to our underlying infrastructure.

identified

We've identified the issue and are working with AWS to resolve issues with our caching layer. We believe a fix should be in place soon, but are waiting for infrastructure to come up to mitigate the problem.

investigating

We are seeing a major outage with applications that connect to our caching services. We are actively investigating this issue and trying to restore service as soon as possible.

investigating

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.