Is Alloy Down Right Now? Discover if there is an ongoing service outage.

Alloy is currently Operational

Last checked Jul 29, 2025 14:43 UTC from Alloy's official status page

Historical record of incidents for Alloy

Jul 24, 2025

Report: "Applications Failing to Load with "Something Went Wrong" Error"

Last update 2025-07-24T14:25:07.119Z

monitoring2025-07-24T14:25:07.110Z

We’ve identified an issue causing some applications to fail to load with a ‘Something Went Wrong’ error. A fix has been deployed, and our engineering team is actively monitoring to ensure stability.

Jul 18, 2025

Report: "Fraud Attack Radar Delay"

Last update 2025-07-18T20:24:06.738Z

identified2025-07-18T20:24:06.735Z

We’re currently experiencing a degraded performance for Fraud Attack Radar due to an upstream issue with our cloud provider. While fraud detection remains operational, some clients may experience a delay in alerts. Normal operations will be restored by end of day on July 25, 2025. Thank you for your understanding as we work to resolve this issue.

Jul 3, 2025

Report: "West Coast Agents Reporting They Can't Access the Dashboard – AWS CORS Issue"

Last update 2025-07-03T17:14:53.157Z

investigating2025-07-03T17:12:58.300Z

We’re currently investigating reports from some customers in the west coast regions experiencing blank page loads due to CORS-related errors. Our team is actively working to identify the root cause and will provide updates as more information becomes available.

Jun 16, 2025

Report: "[Investigating] Delays in Application Queue Updates and Missing Evaluations"

Last update 2025-06-16T22:50:17.484Z

investigating2025-06-16T22:50:00.051Z

We are currently investigating an issue where some applications remain in the review queue after being actioned and rerun evaluations are not appearing on the Evaluations page. Additionally, we are aware of a potential disruption where some customers are not seeing new evaluation records since approximately 4:40 PM CST. Our engineering team is actively investigating and we will provide updates as soon as more information becomes available.

Jun 12, 2025

Report: "Increased API and Dashboard Errors"

Last update 2025-06-12T19:22:16.857Z

identified2025-06-12T19:22:16.854Z

We are currently observing widespread internet disruptions affecting Alloy and many other services across the web, including providers such as Equifax, Glean, and others. This appears to be a broader issue impacting infrastructure providers like Google Cloud and Cloudflare. At this time, the issue is not isolated to Alloy, and we are actively monitoring the situation as it evolves. We will continue to provide updates as more information becomes available. Thank you for your patience.

Jun 11, 2025

Report: "Degraded Dashboard Performance"

Last update 2025-06-11T19:02:27.566Z

investigating2025-06-11T19:02:27.563Z

We are continuing to investigate this issue.

investigating2025-06-11T19:02:13.687Z

We have received reports of performance degradation with the Alloy dashboard. We are investigating this as a matter of priority.

Jun 9, 2025

Report: "Degraded SDK performance"

Last update 2025-06-09T13:15:31.018Z

identified2025-06-09T13:15:31.015Z

We have received reports of degradation with the Alloy SDK Veriff plugin. We are currently working on a fix.

May 22, 2025

Report: "Degraded Dashboard Performance"

Last update 2025-05-22T23:33:29.962Z

resolved2025-05-22T23:33:29.948Z

The issue has been fully resolved, and all functionality has been restored to normal operations. A postmortem will be provided as soon as possible.

identified2025-05-22T23:32:29.318Z

We've received reports of performance degradation in the Alloy dashboard, impacting the loading of some evaluations. The incident began at 5:22 PM.

Report: "Degraded Dashboard Performance"

Last update 2025-05-22T18:33:00.000Z

Resolved2025-05-22T18:33:00.000Z

The issue has been fully resolved, and all functionality has been restored to normal operations. A postmortem will be provided as soon as possible.

Identified2025-05-22T18:32:00.000Z

We've received reports of performance degradation in the Alloy dashboard, impacting the loading of some evaluations. The incident began at 5:22 PM.

Report: "Increased API Errors"

Last update 2025-05-22T15:25:26.441Z

resolved2025-05-22T15:25:26.427Z

A faulty API deployment caused timeout-related issues. The incident was identified at 11:01 AM and resolved by reverting the deployment at 11:14 AM.

investigating2025-05-22T15:18:11.148Z

We’re currently investigating reports of elevated errors to the API. More updates to come shortly.

Report: "Increased API Errors"

Last update 2025-05-22T10:25:00.000Z

Resolved2025-05-22T10:25:00.000Z

A faulty API deployment caused timeout-related issues. The incident was identified at 11:01 AM and resolved by reverting the deployment at 11:14 AM.

Investigating2025-05-22T10:18:00.000Z

We’re currently investigating reports of elevated errors to the API. More updates to come shortly.

May 21, 2025

Report: "Degraded Dashboard Performance"

Last update 2025-05-21T20:45:32.724Z

resolved2025-05-21T19:50:40.000Z

We received reports of performance degradation affecting the Alloy dashboard between 15:34 EDT and 15:50 EDT. The issue has since been resolved. Our team is actively investigating the root cause and will share updates as more information becomes available.

Mar 11, 2025

Report: "Increased API Errors"

Last update 2025-03-11T15:17:23.836Z

resolved2025-03-11T15:17:23.817Z

This incident has been resolved.

monitoring2025-03-11T14:55:49.140Z

A fix has been implemented and we are monitoring the results.

identified2025-03-11T14:55:38.698Z

The issue has been identified and a fix is being implemented.

investigating2025-03-11T14:55:12.660Z

We are continuing to investigate this issue.

investigating2025-03-11T14:32:49.536Z

We’re currently investigating reports of elevated errors to the API. More updates to come shortly.

Jan 16, 2025

Report: "QualiFile integration outage"

Last update 2025-01-16T15:20:24.900Z

resolved2025-01-16T15:20:24.885Z

QualiFile failures ceased at 8:30 AM EST. Any applications that failed during this time can be re-run. For additional information, please reach out to your FIS point of contact or open a ticket in the FIS support portal.

identified2025-01-16T01:55:24.293Z

An outage with the QualiFile integration has been identified. The FIS technical team is working towards a resolution as quickly as possible. For the latest status of the outage, please reach out to your FIS point of contact or open a ticket in the FIS support portal.

identified2025-01-15T21:30:43.364Z

An outage with QualiFile has been identified. Awaiting resolution from the QualiFile team

Dec 25, 2024

Report: "Socure3 Integration Error Rate Increase"

Last update 2024-12-25T01:57:01.848Z

resolved2024-12-25T01:57:01.835Z

The incident has been reported resolved by Socure and Alloy has confirmed.

identified2024-12-25T00:12:52.655Z

The issue has been identified and a fix is being implemented.

investigating2024-12-25T00:11:14.000Z

An outage with Socure has been identified. Awaiting resolution from the Socure team

Dec 20, 2024

Report: "Intermittent latencies with Alloy API are being investigated"

Last update 2024-12-20T18:21:54.135Z

resolved2024-12-19T22:32:09.796Z

This incident has been resolved.

monitoring2024-12-19T19:17:19.809Z

A fix has been implemented and we are monitoring the results.

investigating2024-12-19T17:50:16.000Z

Increased latencies were observed from 12/18/2024 20:48 to 12/19/2024 2:18 EST. We are continuing to investigate the root causes.

investigating2024-12-19T17:28:24.379Z

We are continuing to investigate issues with observed increased latencies. We are still getting to the root cause. We will follow up again with an update in 30 minutes.

Dec 12, 2024

Report: "SDK Unavailable for a few minutes"

Last update 2024-12-12T16:24:03.415Z

resolved2024-12-12T16:10:09.000Z

The SDK was unavailable for most customers between 10:16am and 10:35am EST on December 12.

Dec 2, 2024

Report: "Evaluation Page is Not Loading"

Last update 2024-12-02T19:47:41.170Z

resolved2024-12-02T19:13:35.000Z

A code release at 2:13 PM EST caused our evaluations page to not load. We rolled back the release to resolve the issue at 2:24 PM EST.

Nov 21, 2024

Report: "Increased Socure Service Errors"

Last update 2024-11-21T19:15:21.905Z

resolved2024-11-21T19:15:21.886Z

This incident has been resolved.

investigating2024-11-21T18:17:47.758Z

We are continuing to investigate this issue.

investigating2024-11-21T18:16:25.471Z

Socure is currently experiencing high levels of errors, which are affecting Alloy clients who utilize Socure in their policies. We have reached out to Socure and working towards resolution.

Oct 21, 2024

Report: "Degraded Dashboard Performance"

Last update 2024-10-21T15:09:00.687Z

resolved2024-10-21T14:43:00.000Z

We experienced a service degradation at 10:34 am EST on the evaluation page of the Alloy dashboard. To restore full capacity, we reverted the deployment restoring the service at 10:43 am EST.

Sep 20, 2024

Report: "Issues loading dashboard"

Last update 2024-09-20T18:42:21.910Z

resolved2024-09-20T18:42:21.896Z

This incident has been resolved.

monitoring2024-09-20T15:22:26.490Z

A fix has been implemented and all evaluation pages should load normally

investigating2024-09-20T15:16:38.611Z

Some customers are having issues loading our dashboard, we are investigating

Jul 23, 2024

Report: "Elevated errors on Transaction Evaluations"

Last update 2024-07-23T20:52:52.048Z

resolved2024-07-23T20:52:52.031Z

This incident has been resolved.

monitoring2024-07-23T19:19:24.321Z

A fix has been implemented and we are monitoring the results.

investigating2024-07-23T19:14:52.177Z

We are continuing to investigate this issue.

investigating2024-07-23T19:06:15.628Z

We are seeing elevated errors on our Transaction service and are investigating the root cause

Jul 2, 2024

Report: "Production API Error Elevation"

Last update 2024-07-02T14:17:42.105Z

resolved2024-06-26T22:31:32.213Z

This incident has been resolved.

monitoring2024-06-26T17:14:14.149Z

All services have been restored and are currently being monitored.

investigating2024-06-26T16:59:42.533Z

We are continuing to investigate this issue.

investigating2024-06-26T16:50:27.000Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Jun 12, 2024

Report: "Latency and unavailability for API and Dashboard"

Last update 2024-06-12T18:38:05.280Z

resolved2024-06-12T18:38:05.259Z

We have identified and fully addressed the root cause of the unavailability. The system is fully stable.

monitoring2024-06-12T16:33:53.000Z

We have identified and mitigated the source of poor performance. API and Dashboard were unavailable for a total of 8 minutes between 10:56-10:58ET and 11:05-11:10ET.

investigating2024-06-12T15:26:39.765Z

We're seeing intermittent latency and unavailability for API and Dashboard. Currently investigating.

Report: "Transactions evaluation latency"

Last update 2024-06-12T14:45:26.398Z

resolved2024-06-12T14:45:26.386Z

Latency on all APIs has returned to normal and the root cause has been identified and its impact mitigated.

monitoring2024-06-12T12:17:54.000Z

Transaction Evaluations had elevated latency for about 50 minutes this morning. We have cancelled some background tasks and service is restored.

investigating2024-06-12T12:12:09.000Z

We are investigating an issue with transaction evaluations

Jun 7, 2024

Report: "TransUnion Credit is experiencing a complete downtime"

Last update 2024-06-07T22:06:23.193Z

resolved2024-06-07T22:06:23.179Z

This incident has been resolved.

monitoring2024-06-07T21:53:25.000Z

TransUnion Credit is experiencing a complete downtime. We are reaching out to their support to reach a quick resolution.

May 30, 2024

Report: "Evaluations Dashboard Availability"

Last update 2024-05-30T14:03:06.470Z

resolved2024-05-30T14:03:06.455Z

This incident has been resolved.

monitoring2024-05-30T13:58:13.953Z

A fix has been implemented and we are monitoring the results.

identified2024-05-30T13:51:37.951Z

The issue has been identified and a fix is being implemented.

investigating2024-05-30T13:49:16.644Z

We have received reports of performance degradation with the Alloy dashboard. We are investigating this as a matter of priority.

May 11, 2024

Report: "Iovation Service Experiencing Errors"

Last update 2024-05-11T10:50:57.231Z

resolved2024-05-11T05:20:56.000Z

The third-party service, TransUnion device risk (Iovation), has been completely resolved as of 6:00 am UTC (2:00 am EDT). The most recent failure detected at Alloy occurred at 5:20 UTC (1:20 am EDT).

investigating2024-05-10T23:22:27.661Z

Our third party service Iovation is returning errors so Evaluations that use Iovation will be affected. We are reaching out to Iovation to identify and address the issue.

Apr 29, 2024

Report: "Degraded SDK performance"

Last update 2024-04-29T18:49:10.134Z

resolved2024-04-29T18:49:10.115Z

This incident has been resolved.

monitoring2024-04-29T18:43:08.853Z

A fix has been implemented and we are monitoring the results.

investigating2024-04-29T18:04:05.817Z

We have received reports of degradation with the Alloy SDK. We are investigating this as a matter of priority.

Apr 4, 2024

Report: "Degraded SDK performance in US/EU"

Last update 2024-04-04T15:52:48.818Z

resolved2024-04-01T21:00:00.000Z

On April 1, 2024, between 5:01 PM EST and 5:20 PM EST for the US region and 5:01 PM EST through April 2, 2024, 8:01 AM EST for EU regions, customers in these regions may have experienced issues with the Alloy SDK. Attempts to initialize the Alloy SDK had an increase in failures with 403 errors, and the front end would’ve displayed a blank modal. This issue has been fully resolved.

Mar 6, 2024

Report: "Production API Error Elevation"

Last update 2024-03-06T15:23:01.070Z

resolved2024-03-06T14:06:29.476Z

This incident has been resolved.

monitoring2024-03-06T13:59:14.000Z

We are monitoring for further degraded performance

identified2024-03-06T13:58:26.559Z

The issue has been identified and a fix was deployed.

investigating2024-03-06T13:42:31.000Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Feb 29, 2024

Report: "Short networking issue"

Last update 2024-02-29T15:17:56.008Z

resolved2024-01-26T21:11:17.000Z

During this time there was a brief loss of connectivity to our API and dashboard (about 4 minutes). Initially we believed this to be related to a routine security networking change which we rolled back immediately. However, working with AWS on a RCA, we determined that the service interruption was actually the result of a rarely exposed load balancer bug within the AWS ALB service. A workaround is pending to prevent this bug from surfacing again.

investigating2024-01-26T20:59:04.105Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update 2024-02-29T11:57:33.669Z

resolved2024-02-29T11:57:33.657Z

From 6:01am to 6:12am we experienced a spike in latency following a spike in throughput from a third party vendor. Service is fully restored and we will continue monitoring and follow up to prevent this moving forward.

monitoring2024-02-29T11:40:28.522Z

We are continuing to monitor for any further issues.

monitoring2024-02-29T11:28:49.341Z

Service has been restored and we will continue monitoring

investigating2024-02-29T11:04:25.424Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Feb 15, 2024

Report: "Evaluation page is down"

Last update 2024-02-15T20:31:57.918Z

resolved2024-02-15T18:30:00.000Z

We released a change to our dashboard that was causing evaluation page to not display, which also affected journey applications page. Outage happened from 13:30 to 13:52 EST.

Jan 17, 2024

Report: "Database lock resulting in API outage"

Last update 2024-01-17T03:51:23.629Z

resolved2024-01-17T03:12:12.000Z

Between the hours of 22:02 and 22:12 ET (10 minutes), a scheduled database migration that had been tested successfully in lower environments acquired an unintended long lock on a database table, resulting in a backup of queries and increasingly high latencies. The migration was automatically detected and killed, freeing up database performance and returning operations to normal. The postmortem process is underway and multiple new checks for this type of failure are already in process.

investigating2024-01-17T03:00:57.000Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Oct 5, 2023

Report: "Webhook Processing Delays"

Last update 2023-10-05T02:55:19.577Z

resolved2023-10-05T02:55:19.561Z

This incident has been resolved.

monitoring2023-10-05T01:37:54.670Z

There is a delay in the webhook service, we have identified the problem, applied a fix and waiting for the queue to normalize.

Sep 28, 2023

Report: "Production API Error Elevation"

Last update 2023-09-28T13:45:34.433Z

resolved2023-06-14T13:35:30.000Z

Replication delays between our databases caused a period of increased latency for our API and dashboard from 8:35-8:41 AM ET. This resulted in 6 minutes of increased latency on some API endpoints. All data integrity was maintained during this issue.

investigating2023-06-14T12:41:11.721Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Jun 13, 2023

Report: "Intermittent errors across multiple services"

Last update 2023-06-13T21:24:36.311Z

resolved2023-06-13T21:24:36.295Z

AWS has fixed the issue on their end and the intermittent errors on Alloy services have been resolved.

identified2023-06-13T20:19:22.604Z

AWS has acknowledged they are experiencing a service outage which is impacting Alloy. Our engineers are working with them on identifying the services affected and the full impact on Alloy.

investigating2023-06-13T19:32:28.916Z

We are seeing intermittent errors across multiple services in the Alloy API and dashboard. We are investigating the impact and will continue to update.

Jun 2, 2023

Report: "Production API Error Elevation"

Last update 2023-06-02T12:52:49.975Z

resolved2023-06-02T12:52:49.962Z

We had a spike in transaction latency this morning between 7:06 AM ET and 8:37 AM ET. The database queries that were backing up and causing this latency have been identified and corrected.

investigating2023-06-02T11:33:16.469Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Apr 17, 2023

Report: "Production API Error Elevation"

Last update 2023-04-17T16:47:58.851Z

resolved2023-04-17T16:47:58.839Z

This incident has been resolved

monitoring2023-04-17T16:27:41.689Z

Our API and dashboard are currently functional - we are continuing to monitor as our services come back up.

investigating2023-04-17T16:16:59.000Z

We are continuing to investigate this issue. Dashboard access has partially recovered

investigating2023-04-17T15:47:05.918Z

We are continuing to investigate the issue.

investigating2023-04-17T15:19:20.600Z

We are continuing to investigate this issue.

investigating2023-04-17T15:04:28.026Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Apr 10, 2023

Report: "Dashboard intermittent errors"

Last update 2023-04-10T20:41:46.569Z

resolved2023-04-10T20:41:46.553Z

We have fully resolved the errors in the Alloy dashboard and restored functionality to normal operations.

investigating2023-04-10T19:55:36.589Z

We are currently investigating issues loading evaluations and entities pages on our dashboard.

Mar 30, 2023

Report: "Production API Increased Latency"

Last update 2023-03-30T13:01:58.109Z

resolved2023-03-30T13:01:43.248Z

We started to see latency on our production API rising from around 5:30 AM ET until it started to trip specific monitors around 8:06 AM ET this morning. Once we discovered and remediated the issue, latency returned to normal around 8:43 AM. This may have triggered timeouts or occasional 504 errors for some clients, but all API queries can be rerun now successfully.

investigating2023-03-30T12:06:53.147Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Mar 14, 2023

Report: "Dashboard Intermittent Errors"

Last update 2023-03-14T20:15:20.914Z

resolved2023-03-14T14:28:00.000Z

Between 10:28am and 10:54am (EST) we deployed a change to improve our database performance which caused intermittent errors on our dashboard. The dashboard was the only impacted service and no data or API processing was affected. Errors were resolved when we rolled back our change.

Mar 13, 2023

Report: "API 206 response increase"

Last update 2023-03-13T19:55:01.780Z

resolved2023-03-13T19:49:53.000Z

Our API experienced an increase in errors for certain workflows across a small number of clients between 14:38 and 15:44 ET today. This resulted in some API requests not being resolved completely. Any API requests that resulted in a 206 HTTP status code during this time can be rerun now and will process correctly.

Feb 24, 2023

Report: "Production API Error Elevation"

Last update 2023-02-24T17:55:41.576Z

resolved2023-02-24T17:55:41.561Z

We had a period of database instability from 12:35 ET until 12:42 ET (7 minutes) that caused a minor service disruption for our API and database. We have identified and remediated the issue and are continuing to monitor the situation.

investigating2023-02-24T17:36:15.751Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Feb 18, 2023

Report: "Production API Error Elevation"

Last update 2023-02-18T15:24:36.489Z

resolved2023-02-18T14:44:10.764Z

This incident has been resolved.

monitoring2023-02-18T14:21:34.000Z

Between 8:54 ET and 9:01 ET (7 minutes) there was a short period of degraded service to our API. We are monitoring the situation, but it was quickly resolved.

investigating2023-02-18T13:56:44.403Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update 2023-02-18T02:17:40.345Z

resolved2023-02-18T02:17:40.330Z

The API and dashboard have been confirmed to be functioning normally. We have identified the root of the database latency issue, remediated it, and put checks in place to prevent this issue from occurring again. We continue to monitor the situation and will follow up early next week with a postmortem upon request.

investigating2023-02-18T00:54:15.247Z

As of 19:21 ET, the API and dashboard have been available but performance is still partially degraded. We are continuing to investigate and monitor.

investigating2023-02-18T00:28:12.735Z

We are currently having intermittent API and dashboard unavailability that is due to database performance degradation. We are actively investigating and will post updates once we know more.

investigating2023-02-17T23:25:36.000Z

We are triaging an increase in latency to our APIs because of database latency. Our on-call teams are investigating and remediating now.

Feb 1, 2023

Report: "Production API Error Elevation"

Last update 2023-02-01T01:35:01.329Z

resolved2023-02-01T01:35:01.313Z

This incident has been resolved.

monitoring2023-02-01T01:26:17.015Z

Latency is back to normal on all endpoints. We're continuing to monitor overall latency.

investigating2023-02-01T01:14:07.636Z

We're seeing an increase in latency across certain applications, but no errors. We'll keep this event up to track that increase in latency, but there is nothing to indicate at this time that this is a larger incident.

investigating2023-02-01T00:55:57.468Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Jan 24, 2023

Report: "Production API Error Elevation"

Last update 2023-01-24T17:08:19.168Z

resolved2023-01-24T17:08:19.150Z

Due to a database issue, our API and dashboard experienced errors and were partially unavailable from 11:40 AM ET until 12:02 PM ET (22 minutes). The database issue was promptly resolved and all services began operating as normal after a backlog of queries cleared. We are currently working on increased monitoring of this type of issue and checks to prevent it from happening again.

investigating2023-01-24T16:42:31.406Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Dec 19, 2022

Report: "Database migration latency"

Last update 2022-12-19T22:01:17.049Z

resolved2022-12-19T15:41:00.000Z

At 10:41AM today, a database migration resulted in unexpected latency under load. Alloy’s incident response team was notified immediately by automated alerting, and was able to quickly diagnose the problem. The root cause was resolved within a couple of minutes, but it caused a backlog of requests that resulted in latency, timeouts, and, in some cases, failed requests until 10:54AM.

Oct 18, 2022

Report: "Brief dashboard service instability"

Last update 2022-10-18T16:02:33.911Z

resolved2022-10-18T15:57:51.000Z

A deployment failure with our core dashboard application caused a short period of "service unavailable" error messages to return. Full service has now been restored.

Oct 4, 2022

Report: "Production API Error Elevation"

Last update 2022-10-04T22:04:05.425Z

resolved2022-09-30T18:44:54.000Z

From 12:52 EDT until 1:57 EDT, urgent maintenance running on our production database caused a significant increase in processing time for our dashboard and api actions. From 1:58 EDT until 2:15 EDT, queries to our api were unresponsive. During this time, we worked to reduce the load on our database and restore service back to normal. We are making adjustments to our internal processes to ensure that we have no reoccurrences of similar issues.

monitoring2022-09-30T18:27:14.690Z

We've applied the fix and are monitoring.

investigating2022-09-30T18:14:48.000Z

Some clients would likely be experiencing hight latency when sending requests. We've identified the problem and are working on a fix.

investigating2022-09-30T16:53:43.160Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Jun 1, 2022

Report: "Production API Error Elevation"

Last update 2022-06-01T21:20:49.255Z

resolved2021-12-22T16:30:14.026Z

Between 10:34 and 10:37 ET, we rolled out a scaling event to our API to mitigate some of the issues we are seeing with the AWS outage in one of the us-east-1 datacenters we use. This scaling event did not execute correctly due to the ongoing issues, making the situation temporarily worse. We immediately rolled back that change and are continuing to monitor the situation with AWS.

investigating2021-12-22T15:36:28.875Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update 2022-06-01T21:20:49.211Z

resolved2022-01-04T06:00:35.000Z

This incident has been resolved and the indicators that we have been observing on our side have recovered to the extent that we're confident in the current health of the system. The incident's effects lasted from approximately 20:39 - 22:28 ET, though the impact was intermittent during that time and mostly impacted our APIs. The root cause appears to be a cleanup process initiated by our Amazon Web Services Aurora database causing some read processes to slow down resulting in most API queries failing. We were able to restore service by moving these workloads to a different location while the process finished. We are working with AWS and our internal teams to avoid both this specific issue and any related issues in the future.

monitoring2022-01-04T03:56:40.815Z

We've moved some of our load off of read replicas, which seems to have mitigated the issues we were seeing so far. We are still monitoring and discussing the root cause. We will continue to monitor this to make sure it is stable before closing this incident.

investigating2022-01-04T03:27:10.506Z

We're still investigating - something is causing massive issues with all read replicas of our production database cluster. We've tried a series of experiments and methods to recover the service. We are now working on larger-scale fixes which we'll be able to roll out in the new few minutes. We will post immediately upon recovery or the issue is identified.

investigating2022-01-04T02:25:45.862Z

We are currently aware of major issues with our APIs and certain degradations with our dashboard. We are trying to diagnose the root cause currently.

investigating2022-01-04T01:40:34.697Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update 2022-06-01T21:20:49.171Z

resolved2022-01-10T18:24:53.716Z

We experienced intermittent API instability between 12:55 and 13:18 ET. The issue is resolved and we will be updating with more information soon.

investigating2022-01-10T17:56:09.989Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Dashboard Unavailability"

Last update 2022-06-01T21:20:49.126Z

resolved2022-01-12T16:07:24.000Z

Our dashboard application was unavailable intermittently between 10:35 and 10:56 ET today. The API was not impacted, so all data and evaluations were processed normally during this period. We are investigating the root cause and adding more monitors to catch this sort of issue faster.

Report: "Production API and Dashboard Latency Increase"

Last update 2022-06-01T21:20:49.083Z

resolved2022-03-31T17:04:14.365Z

From 12:05 to 12:19 ET today, additional load on our production database caused a significant increase in processing time for dashboard and API actions. Once the load was targeted and addressed, latency went down immediately and is now back to normal. Our infrastructure team is currently working on addressing the root cause of the problem and making sure we have no reoccurrences of similar issues.

monitoring2022-03-31T16:23:28.488Z

We are continuing to monitor for any further issues.

monitoring2022-03-31T16:23:05.818Z

We have identified the latency issue and are now monitoring the fix to make sure there are no further degradations.

investigating2022-03-31T16:17:32.973Z

We're seeing increased latency on certain queries from our core database. We are currently working on identifying the issue and mitigating the latency

investigating2022-03-31T16:09:15.748Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update 2022-06-01T21:20:49.040Z

resolved2022-04-21T16:49:43.759Z

A sharp increase in database connections caused an increase in latency between 12:30 and 12:33 ET today. The issue was quickly mitigated and latency went back to normal immediately.

investigating2022-04-21T16:32:55.072Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Report: "Production API Error Elevation"

Last update 2022-06-01T21:20:48.998Z

resolved2022-05-12T01:55:20.232Z

We have resolved the issue and are monitoring all affected services to make sure they are operating at full capacity. We will be following up with a full RCA upon request in the next few days as we work through the detailed contributing factors. We are also continuing to work with AWS to understand what happened to our underlying infrastructure.

identified2022-05-12T01:34:29.612Z

We've identified the issue and are working with AWS to resolve issues with our caching layer. We believe a fix should be in place soon, but are waiting for infrastructure to come up to mitigate the problem.

investigating2022-05-12T01:06:18.684Z

We are seeing a major outage with applications that connect to our caching services. We are actively investigating this issue and trying to restore service as soon as possible.

investigating2022-05-12T00:22:59.892Z

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.