Is Castle Down Right Now? Discover if there is an ongoing service outage.

Castle is currently Operational

Last checked Jul 29, 2025 17:53 UTC from Castle's official status page

Historical record of incidents for Castle

Jun 27, 2025

Report: "We observe a slight increase in 5xx errors due to transient backend issues."

Last update 2025-06-27T07:23:16.642Z

resolved2025-06-27T06:52:00.000Z

We’re currently experiencing an elevated number of errors returned by our API. Our team has identified the root cause and is actively working on a fix. We apologize for any inconvenience and will provide updates as soon as more information becomes available.

Jan 29, 2025

Report: "Partial service degradation"

Last update 2025-01-29T12:04:02.273Z

resolved2025-01-29T03:29:25.053Z

This incident has been resolved.

monitoring2025-01-29T01:56:24.920Z

A fix has been implemented and we are monitoring the results.

identified2025-01-29T00:20:09.000Z

Jan 17, 2025

Report: "Partial API Outage"

Last update 2025-01-17T10:02:05.204Z

resolved2025-01-16T22:42:47.000Z

This incident has been resolved.

Oct 31, 2024

Report: "Slightly elevated number of 5xx responses"

Last update 2024-10-31T21:30:34.852Z

resolved2024-10-31T21:30:34.834Z

A fix has been implemented and all issues are resolved.

identified2024-10-31T15:29:20.666Z

We are currently experiencing a slightly elevated number of 5xx response errors. Our team has identified the core issue and is actively working on a fix. Please rest assured that no data has been lost.

Jul 9, 2024

Report: "Partial service degradation"

Last update 2024-07-09T19:53:24.353Z

resolved2024-07-09T19:53:24.341Z

This incident has been resolved.

monitoring2024-07-09T19:04:36.191Z

A fix has been implemented and we are monitoring the results.

investigating2024-07-09T18:36:46.466Z

We are currently investigating this issue.

Jun 27, 2024

Report: "Dashboard downtime"

Last update 2024-06-27T17:15:53.211Z

resolved2024-06-27T17:15:53.195Z

This incident has been resolved.

monitoring2024-06-27T17:08:19.154Z

A fix has been implemented and we are monitoring the results.

investigating2024-06-27T17:04:31.759Z

We experience a Dashboard downtime due to a failure in the underlying infrastructure changes. Our team is working to resolve the issue and restore full functionality. We apologize for any inconvenience caused and appreciate your patience.

investigating2024-06-27T17:03:37.298Z

We are currently investigating this issue.

Jun 13, 2024

Report: "Service degradation"

Last update 2024-06-13T10:50:42.363Z

resolved2024-06-13T10:50:42.350Z

This incident has been resolved.

monitoring2024-06-13T10:16:02.835Z

A fix has been implemented and we are monitoring the results.

identified2024-06-13T10:15:57.561Z

We have identified and resolved the root cause of the recent system issues. All affected systems have been restored to normal operation.

identified2024-06-13T10:12:50.470Z

We experienced a disruption in our API services due to a failure in the underlying infrastructure changes. Our team is currently working to resolve the issue and restore full functionality as soon as possible. We apologize for any inconvenience caused and appreciate your patience.

Aug 3, 2023

Report: "Partial unavailability dashboard"

Last update 2023-08-03T13:56:30.094Z

resolved2023-08-03T13:56:30.074Z

This incident has been resolved.

monitoring2023-08-03T13:27:20.112Z

A fix has been implemented and we are monitoring the results.

identified2023-08-03T12:52:19.886Z

The issue has been identified and a fix is being implemented.

investigating2023-08-03T12:47:52.170Z

We are continuing to investigate this issue.

investigating2023-08-03T11:51:54.076Z

We're experiencing temporary issues with our dashboard's functionality. Our team is currently investigating to determine the root cause. Your data remains secure and intact - this issue affects only accessibility, not data integrity. We're working diligently towards a quick resolution and will provide updates accordingly. Apologies for any inconvenience. For immediate concerns, please reach out to our support team.

Feb 22, 2023

Report: "Intermittent 5xx API errors"

Last update 2023-02-22T07:39:59.335Z

resolved2023-02-22T07:39:59.322Z

Our team has thoroughly investigated the issue and determined that the root cause is likely due to underlying networking issues within AWS.

investigating2023-02-22T04:45:14.740Z

We are currently investigating intermittent 5xx API errors and longer than usual response times.

Jan 22, 2023

Report: "Service degradation"

Last update 2023-01-22T00:58:28.474Z

resolved2023-01-22T00:58:28.457Z

The APIs are back to full operation and we're assessing any impact on the data during the period of the incident. We will follow up with an analysis.

monitoring2023-01-22T00:40:52.852Z

We've identified what seems to be the issue and have deployed a fix. The service seems to be back to normal but we'll be monitoring it and follow up with a confirmation.

investigating2023-01-22T00:20:25.834Z

We are investigating an issue where 401 and 503 responses are returned for a subset of requests. We're still assessing the scope of the issue and will keep you posted.

Aug 22, 2022

Report: "Partial database outage"

Last update 2022-08-22T13:08:23.534Z

resolved2022-08-22T13:08:23.518Z

This incident has been resolved.

identified2022-08-22T06:57:07.726Z

We are investigating an elevated error rate on our end. We've identified issue and working towards fixing it. All of the APIs are operational.

Aug 8, 2022

Report: "API downtime"

Last update 2022-08-08T19:03:38.662Z

resolved2022-08-08T19:03:38.648Z

This incident has been resolved.

monitoring2022-08-08T17:37:29.669Z

A fix has been implemented and we are monitoring the results.

identified2022-08-08T17:18:03.534Z

The issue has been identified and a fix is being implemented.

investigating2022-08-08T17:09:11.095Z

API responses are coming through for some requests.

investigating2022-08-08T16:49:50.199Z

We’re experiencing timeouts in API endpoints. Investigating.

Jul 11, 2022

Report: "Data ingestion issue in the Explore tab"

Last update 2022-07-11T15:19:16.120Z

resolved2022-07-11T15:19:16.109Z

This incident has been resolved.

monitoring2022-07-11T14:22:23.823Z

We have resolved the core issue. All data should be available in dashboard. We are going to actively monitor data ingestion.

identified2022-07-11T13:55:49.604Z

We are currently observing data ingestion problems that are caused by issue in one of our databases. We are actively working on fixing the core issue. Also there is no data loss in any of the systems.

Jun 16, 2022

Report: "API performance degradation"

Last update 2022-06-16T06:41:25.868Z

resolved2022-06-16T06:41:25.853Z

This incident has been resolved.

monitoring2022-06-16T06:01:11.859Z

We see improved response times on our API endpoints. We will monitor the situation.

identified2022-06-16T03:36:23.671Z

While we are working on a permanent fix we are observing response times to improve.

identified2022-06-16T02:59:03.370Z

We have identified the issue. We are working to remediate the root cause.

investigating2022-06-16T02:45:23.980Z

We are investigating general API slowdown.

Jun 15, 2022

Report: "Internal services issue"

Last update 2022-06-15T16:37:02.084Z

resolved2022-06-15T16:37:02.070Z

Internal systems are stable.

monitoring2022-06-15T15:46:01.784Z

We have implemented a fix and are monitoring systems.

identified2022-06-15T15:36:10.992Z

We have identified the issue and are working on the root cause of the issue.

Dec 22, 2021

Report: "Amazon Web Services disruption"

Last update 2021-12-22T21:31:53.195Z

resolved2021-12-22T21:31:53.183Z

Amazon Web Services mitigated majority of issues around EC2. We will continue to monitor the issues of AWS and update status if needed.

monitoring2021-12-22T13:17:25.345Z

AWS updated information that USE1-AZ4 availability zone is the only one affected. We do not see any indicators that castle services are affected. We will continue monitoring of the AWS issue and act to remedy issues if needed.

investigating2021-12-22T13:01:17.154Z

We are currently investigating if Amazon Web Services us-east-1 service disruption affects castle services.

Dec 7, 2021

Report: "Potential service disruption"

Last update 2021-12-07T22:35:21.527Z

resolved2021-12-07T22:35:21.514Z

Most of the AWS systems are working already and the few that are not are recovering fast.

monitoring2021-12-07T18:02:23.092Z

We are keeping this incident open while we wait for AWS systems to recover.

investigating2021-12-07T16:37:32.131Z

Our hosting provider AWS is currently experiencing issues, however, it is not affecting the Castle services at the moment. All Castle services are fully functional, and we're actively monitoring the situation. Please see AWS' status page for more information: https://status.aws.amazon.com

Nov 5, 2021

Report: "API slowdown"

Last update 2021-11-05T20:42:58.686Z

resolved2021-11-05T17:30:00.000Z

Our APIs have slowed down due to a database issue.

Sep 9, 2021

Report: "API Downtime"

Last update 2021-09-09T14:43:26.793Z

resolved2021-09-06T20:04:00.000Z

At 2021-09-06 20:04 UTC we experienced an AWS hardware failure with one of our main databases which led to 7 minutes of downtime impacting our APIs. During this time, the APIs were returning a 500 response code and no data was processed. The database in question is configured to be multi-node with automatic failover, but for unknown reasons the failover didn't happen as expected when the hardware fault occurred. Instead, a full backup had to be recreated, which led to the extended period of downtime. We're currently debugging this with AWS support to make sure we can trust the resiliency of our platform. While the current setup should provide good redundancy, we're simultaneously looking into alternative options to prevent this from happening again.

Aug 31, 2021

Report: "Lost events in Castle Dashboard"

Last update 2021-08-31T16:08:17.687Z

resolved2021-08-30T14:43:00.000Z

Between 14:43 and 15:31 UTC Castle experienced an infrastructure issue with our message queuing system that caused some customer event data to get lost. While risk scoring and inline responses were functioning normally, the requests sent during the period of the incident will not be visible or searchable in the Castle Dashboard We're prioritizing efforts to add extra redundancy to our system to prevent this from happening again.

Apr 8, 2021

Report: "Service disruption"

Last update 2021-04-08T14:29:14.437Z

postmortem2021-04-08T14:28:53.254Z

On Sunday, April 4th, 2021, beginning at 13:56 UTC, Castle's `/authenticate` endpoint was unavailable. Our teams promptly responded and service was restored at 14:09 UTC. We've conducted a full retrospective and root-cause analysis and determined that the original cause of the incident was the hardware failure \(as confirmed by AWS Support\) of an AWS host instance that contained Castle's managed cache service. This hardware failure caused an accumulation of timeouts, resulting in some app instances being marked unhealthy and automatically restarted in a loop. Although rare, we do expect occasional hardware-level failures, and our system is designed to be resilient to these failures whenever possible. In this case, the accumulated timeouts caused the system to behave in a way we have not seen before. We have re-prioritized our engineering team to implement '[circuit breaker](https://martinfowler.com/bliki/CircuitBreaker.html)'-style handling around cache look-ups which will prevent subsequent cache layer failures from impacting synchronous endpoints like `/authenticate`.

resolved2021-04-04T14:26:18.222Z

System is back to normal. We will follow up with more details about this incident

investigating2021-04-04T14:13:06.000Z

API endpoints responding normally again. Queued requests are catching up. Monitoring

investigating2021-04-04T13:56:36.000Z

We’re experiencing timeouts in API endpoints. Investigating

Mar 31, 2021

Report: "Service disruption"

Last update 2021-03-31T16:57:30.629Z

postmortem2021-03-30T17:06:23.324Z

On March 30, 2021, Castle’s API became degraded during three distinct windows of time: * 12:02 UTC - 12:45 UTC * 12:59 UTC - 13:41 UTC * 14:48 UTC - 15:25 UTC During this time, some Castle API calls failed, including calls to our synchronous `authenticate` endpoint. The Castle dashboard was up, but due to the API being unavailable was not rendering data. Service was fully restored as of 15:25 UTC, and some data generated from requests to our asynchronous `track` and `batch` endpoints during the incident was recovered from queues and subsequently processed. As we communicated to all active customers yesterday, we take these sort of incidents very seriously, and want to share some of the factors that led to this incident. The root cause of the incident was a failure of one of our primary data clusters. This is a multi-node, fault-tolerant commercial solution and a complete cluster failure is extremely rare. Castle’s infrastructure team responded immediately to the incident and found an unbounded memory leak that caused each node to simultaneously shut down. Over the course of the incident, we learned this memory leak was exacerbated by a specific class of background job that actually began running a day prior but did not begin leaking memory for some time. When the incident began, we detected the issue and immediately restarted the cluster. A full 'cold start' of the entire cluster takes around 40 minutes, and this accounts for the first downtime window. After the cluster restarted, our fault-tolerant job scheduling system attempted to run the jobs again, which caused the cluster to require full cold restarts twice more as we worked to clear out the job queue and replicas. At this time, we believe the reason for the memory leak is a bug in our data cluster provider’s software - we have been able to successfully reproduce the issue in a test environment and have a high-priority case open with their support team. In the meantime, we have audited all active background job systems to ensure performance-affecting jobs are temporarily disabled or worked around. Once again, we apologize for the impact of this interruption. Please feel free to contact us at [support@castle.io](mailto:support@castle.io) if you have any further questions.

resolved2021-03-30T16:28:26.831Z

Systems are operating normally and we have put mitigation measures in place to ensure the issue does not reoccur. We'll have a full retrospective and root cause teardown of the incident published within the next few days.

monitoring2021-03-30T15:41:01.000Z

API endpoints are responsive again and the system is stabilizing. We're monitoring the situation

identified2021-03-30T14:53:46.867Z

We are seeing degraded performance on API endpoints once more, and are working on restoring functionality as quickly as possible.

monitoring2021-03-30T13:42:13.000Z

Database cluster operating normally and API endpoints are responding. We're continuing to monitor the situation

investigating2021-03-30T12:09:06.000Z

We are experiencing issues with our main database cluster which affects all API endpoints. We're currently investigating this issue.