Is Aptible Down Right Now? Discover if there is an ongoing service outage.

Aptible is currently Operational

Last checked Jul 30, 2025 6:30 UTC from Aptible's official status page

Historical record of incidents for Aptible

Jun 5, 2025

Report: "Increased error rate"

Last update 2025-06-05T19:34:25.138Z

investigating2025-06-05T19:34:25.135Z

We are investigating an increased error rate in our API which may be causing failed operations

May 31, 2025

Report: "Aptible Documentation Site Unavailable"

Last update 2025-05-31T10:43:51.342Z

resolved2025-05-31T10:43:51.001Z

This incident has been resolved.

investigating2025-05-31T10:11:58.513Z

Our online documentation at aptible.com/docs is temporarily unavailable. We are working with our upstream provider to resolve the issue and will update this incident when it is resolved.

Report: "Aptible Documentation Site Unavailable"

Last update 2025-05-31T05:43:00.000Z

Resolved2025-05-31T05:43:00.000Z

This incident has been resolved.

Investigating2025-05-31T05:11:00.000Z

Our online documentation at aptible.com/docs is temporarily unavailable. We are working with our upstream provider to resolve the issue and will update this incident when it is resolved.

May 29, 2025

Report: "Route53 increased propagation delays"

Last update 2025-05-29T22:24:49.668Z

resolved2025-05-29T22:24:49.302Z

Route 53 record propagation appears to have returned to normal.

monitoring2025-05-29T22:04:27.381Z

We've noticed that some Operations are failing due to Route53 record changes not propagating within the 10 minute time limit allowed by our platform. Running App and Databases are not impacted, but creation or deletion of Databases or Endpoints, as well as scaling services to/from zero containers may be impacted. We'll continue to monitor the situation and provide updates as we have any additional information to shre.

Report: "Route53 increased propagation delays"

Last update 2025-05-29T17:24:00.000Z

Resolved2025-05-29T17:24:00.000Z

Route 53 record propagation appears to have returned to normal.

Monitoring2025-05-29T17:04:00.000Z

We've noticed that some Operations are failing due to Route53 record changes not propagating within the 10 minute time limit allowed by our platform. Running App and Databases are not impacted, but creation or deletion of Databases or Endpoints, as well as scaling services to/from zero containers may be impacted.We'll continue to monitor the situation and provide updates as we have any additional information to shre.

May 7, 2025

Report: "Aptible Documentation Site Unavailable"

Last update 2025-05-07T13:34:20.670Z

resolved2025-05-07T13:34:20.655Z

This incident has been resolved.

investigating2025-05-07T13:18:00.643Z

Our online documentation at aptible.com/docs is temporarily unavailable. We are working with our upstream provider to resolve the issue and will update this incident when it is resolved.

Apr 29, 2025

Report: "Aptible Documentation Site Unavailable"

Last update 2025-04-29T15:29:27.516Z

resolved2025-04-29T15:29:27.501Z

This incident has been resolved.

investigating2025-04-29T15:07:29.697Z

Our online documentation at aptible.com/docs is temporarily unavailable. We are working with our upstream provider to resolve the issue and will update this incident when it is resolved.

Apr 4, 2025

Report: "Delayed Operations in eu-central-1"

Last update 2025-04-04T23:26:12.512Z

resolved2025-04-04T23:26:12.204Z

This incident has been resolved.

identified2025-04-04T22:54:43.787Z

We are currently experiencing issues with operations being delayed for stacks hosted in eu-central-1. Our Engineering team is currently working to restore normal functionality.

Mar 26, 2025

Report: "Delayed Operations"

Last update 2025-03-26T17:10:55.470Z

resolved2025-03-26T17:10:55.099Z

This incident has been resolved.

monitoring2025-03-26T17:02:20.876Z

A fix has been implemented and we are monitoring the results.

investigating2025-03-26T16:53:07.011Z

We are currently experiencing issues with operations being delayed. Our Engineering team is currently investigating.

Mar 5, 2025

Report: "Delayed Operations"

Last update 2025-03-05T20:32:41.116Z

resolved2025-03-05T20:32:40.794Z

This incident has been resolved.

monitoring2025-03-05T19:28:41.138Z

A fix has been implemented and operations are running smoothly again. We are monitoring.

investigating2025-03-05T19:03:15.369Z

We are currently experiencing issues with operations being delayed. Our Engineering team is currently investigating.

Feb 20, 2025

Report: "App and Database operation failures"

Last update 2025-02-20T02:37:06.411Z

resolved2025-02-20T02:37:06.393Z

This incident has been resolved.

monitoring2025-02-20T00:01:50.791Z

We are experiencing intermittent failures in App and Database operations due to issues with an upstream provider. This issue only affects Apps and Databases with endpoints. Retrying the operation may resolve the issue. We are actively monitoring the situation and will provide updates once the problem is fully resolved.

Feb 14, 2025

Report: "Operations blocked - Route 53 propagation delays"

Last update 2025-02-14T00:48:38.339Z

resolved2025-02-14T00:48:37.912Z

This incident has been resolved.

monitoring2025-02-14T00:26:18.458Z

We are noticing Route 53 record requests succeeding in a normal time frame, and are lifting the operation block at this time. We'll continue to observe running operations to ensure stability.

identified2025-02-13T23:51:14.030Z

We've noticed that some Operations are failing due to Route53 record changes not propagating within the 10 minute time limit allowed by our platform. In order to prevent Apps and Databases DNS records from reaching an inconsistent state, we are temporarily blocking Operations. Performance and reachability of existing Apps and Database is not impacted.

Feb 5, 2025

Report: "Database provision errors"

Last update 2025-02-05T20:13:51.572Z

resolved2025-02-05T20:13:51.253Z

This incident has been resolved.

identified2025-02-05T20:03:23.349Z

We are continuing to work on a fix for this issue.

identified2025-02-05T20:02:59.180Z

We've identified an error blocking the creation of new Databases on the platform, and our team is applying a fix. Reachability of your existing databases, and the ability to scale or restart them is not impacted.

Jan 31, 2025

Report: "Delayed Operations"

Last update 2025-01-31T02:21:15.683Z

resolved2025-01-31T02:21:15.380Z

This incident has been resolved.

investigating2025-01-30T18:16:17.184Z

We are currently experiencing issues with operations being delayed. Our Engineering team is currently investigating—more updates to follow.

Oct 22, 2024

Report: "Long load balancer registration times"

Last update 2024-10-22T18:51:43.426Z

resolved2024-10-22T18:51:43.016Z

AWS has indicated that the underlying issue has been resolved, and our monitoring indicates it is safe to run operations again. All inconsistencies impacting customer apps or databases (there were only 4 impacted resources) have been resolved.

identified2024-10-22T18:27:00.662Z

We are experiencing longer than usual Route53 change times, and some operations are unable to Rollback gracefully. In order to prevent resources from reaching a failed state where the DNS is not properly configured, we are blocking creation of new operations on the platform. We will update soon with additional information.

Oct 16, 2024

Report: "Limited Availability Incident in shared-us-west-1"

Last update 2024-10-16T03:23:58.439Z

resolved2024-10-16T03:23:58.430Z

On 2024-10-16, between 00:20 and 02:38 UTC, some customer apps and databases in a single shared stack, shared-us-west-1, experienced an availability incident as a result of a problem encountered with planned maintenance. Service has been restored to those affected apps and databases, and this incident is considered resolved at this time.

Oct 7, 2024

Report: "Impacted platform operation in us-east-2"

Last update 2024-10-07T20:23:31.989Z

resolved2024-10-07T20:23:31.686Z

AWS has resolved the underlying issue.

monitoring2024-10-07T20:01:44.568Z

We are no longer observing error responses for S3, and have re-allowed operations in us-east-2. We will continue to monitor the situation.

identified2024-10-07T19:49:05.677Z

AWS confirmed multiple services are impacted in us-east-2. We are blocking operations in that region until availability stabilizes.

investigating2024-10-07T19:40:25.088Z

We are investigating an S3 outage in the us-east-2 region, which is impacting new operations on resources in that region. All apps and databases are running normally, though if your code relies on S3 directly, or 3rd party services that rely on S3, you may see application-level impact.

Sep 26, 2024

Report: "Long load balancer registration times"

Last update 2024-09-26T19:26:35.741Z

resolved2024-09-26T19:26:35.290Z

AWS has marked this issue RESOLVED as of 19:19 UTC, and we have not observed any issues in the last hour. The issue has been resolved and all services are operating normally.

identified2024-09-26T18:06:39.290Z

The latest update from AWS indicates that operations created around 17:10 through 17:20 UTC were impacted, which matches our internal metrics. AWS has promised another update by 18:00 UTC, and we will continue to monitor the situation until we are satisfied that it is resolved.

identified2024-09-26T17:52:50.167Z

We're again seeing degradation and failure to register new load balancer targets in about about 10% of running operations.

monitoring2024-09-26T16:57:48.023Z

Loadbalancer registration appears to be working as expected at this time. We will continue to monitor operations until AWS resolves their service degradation notice.

identified2024-09-26T16:37:39.240Z

AWS has acknowledged the impact we are seeing and opened an incident: > We are investigating increased load balancer back-end instance registration times in the us-east-1 Region. September 26, 2024 at 16:21:43 UTC Since 16:05 UTC, Aptible is observing some recovery, about half of endpoint target registrations are succeeding at this time.

identified2024-09-26T16:24:59.892Z

This service impact only applies to resources hosted in the `us-east-1` region. Customers may notice operations reaching timeout, but at this point all operations are rolling back successfully to the previous state.

investigating2024-09-26T16:15:57.597Z

We are investigating abnormally long registration times for new targets with AWS Load Balancers. This may be causing extended operation times for releases (Deploy, Scale, Restart) for services that have Endpoints.

Sep 3, 2024

Report: "Dockerfile based `git-push` deployments issue"

Last update 2024-09-03T15:29:34.783Z

resolved2024-09-03T15:29:34.768Z

This incident has been resolved.

identified2024-09-03T13:04:14.300Z

After the recent git server maintenance, a fallout issue was identified that affected `git push` based deployments to existing apps. A fix has been put in place, so we expect further deployments will not be affected. Please contact support if you encounter further issues.

Jun 14, 2024

Report: "Git-based Deploy Log Streaming Disruption on Aptible CLI for Dedicated Stacks"

Last update 2024-06-14T14:09:45.706Z

resolved2024-06-14T14:09:45.692Z

For dedicated stacks only, <a href="https://www.aptible.com/docs/core-concepts/apps/deploying-apps/image/deploying-with-git/overview">git-based deployments</a> were not streaming logs about the deployment operation activity as they normally do. The deploy operations were running normally in the background but not streaming live logs to the CLI. This incident impacted git-based deploys from the CLI between June 14th, 5:44 AM UTC, and June 14th, 1:45 PM UTC. Our team has applied a fix, which has resolved the issue. Please contact our <a href="https://contact.aptible.com/">Support Team</a> if you have additional questions.

May 7, 2024

Report: "Temporary Metrics Unavailability in Aptible Dashboard"

Last update 2024-05-07T00:54:07.632Z

resolved2024-05-07T00:54:07.623Z

We are notifying our users of an issue where some metrics are not available on the Aptible Dashboard (app.aptible.com) for the period between May 5, 2024, 18:54 UTC and May 6, 2024, 22:50 UTC. We want to assure you that this does not affect the functionality of <a href="https://www.aptible.com/docs/metric-drains">Aptible Metric Drains</a>. If you have any concerns or require further assistance, please do not hesitate to reach out to our <a href="https://contact.aptible.com/">support team</a>.

Apr 1, 2024

Report: "Update on CVE-2024-3094: XZ Utils Vulnerability"

Last update 2024-04-01T20:18:08.646Z

resolved2024-04-01T20:18:08.636Z

Aptible is aware of <a href="https://nvd.nist.gov/vuln/detail/CVE-2024-3094">CVE-2024-3094</a>, a critical vulnerability in XZ Utils, specifically affecting versions 5.6.0 and 5.6.1, with a CVSS score of 10, indicating a severe level of risk. This vulnerability results from a supply chain compromise and is present in data compression software widely used across major Linux distributions. The malicious code discovered in the affected versions allows for unauthorized system access, posing a significant security threat. The Aptible platform and services do not utilize the affected software versions and are not impacted. Aptible customers are urged to evaluate dependencies in their Docker Images and other systems and patch as needed urgently to mitigate the risk associated with this vulnerability. Given the scope and severity of the CVE, our security team continues to monitor the situation actively. If you have any concerns or questions, please contact the <a href="https://www.aptible.com/docs/support">Aptible Support team</a>.

Feb 1, 2024

Report: "Response to Leaky Vessels: Docker and runc container breakout vulnerabilities"

Last update 2024-02-01T00:03:17.374Z

resolved2024-02-01T00:03:17.363Z

We have proactively addressed a recent security vulnerability identified as "Leaky Vessels," a container breakout issue affecting runc versions up to 1.1.11. This vulnerability had the potential to allow unauthorized access to the host OS from containers. Our team has promptly updated our systems, including all instances of runc to the secure version, to ensure the highest level of security for our platform and your services. This update mitigates the risks associated with this vulnerability. The following CVEs have been addressed on our platform: - CVE-2024-21626: <a href="https://snyk.io/blog/cve-2024-21626-runc-process-cwd-container-breakout/">runc process.cwd & leaked fds container breakout</a> - CVE-2024-23651: <a href="https://snyk.io/blog/cve-2024-23651-docker-buildkit-mount-cache-race/">Buildkit Mount Cache Race</a> - CVE-2024-23653: <a href="https://snyk.io/blog/cve-2024-23653-buildkit-grpc-securitymode-privilege-check/">Buildkit GRPC SecurityMode Privilege Check</a> - CVE-2024-23652: <a href="https://snyk.io/blog/cve-2024-23652-buildkit-build-time-container-teardown-arbitrary-delete/">Buildkit Build-time Container Teardown Arbitrary Delete</a> We assure you that our swift actions have kept our systems, and consequently your services, secure and unaffected by this vulnerability. We remain committed to maintaining the highest security standards and will continue to monitor and update our systems to safeguard your data and services. For more detailed information about this topic, you can refer to the Snyk blog post: https://snyk.io/blog/leaky-vessels-docker-runc-container-breakout-vulnerabilities/

Jan 23, 2024

Report: "Missing Dashboard Metrics for Small Number of Apps and Databases"

Last update 2024-01-23T20:11:42.775Z

resolved2024-01-23T20:11:42.763Z

This incident has been resolved.

monitoring2024-01-23T17:40:13.469Z

For a small number of apps and databases deployed, restarted, or scaled since Friday, Jan 19th 16:00 UTC, metrics were missing from the Aptible Dashboard metrics view. There is no other impact; a fix is rolling out for metrics for those apps and databases, and this incident will be resolved once the fix has been completed.

Jan 19, 2024

Report: "Operations Blocked for Shared Stack shared-eu-central-1"

Last update 2024-01-19T18:25:08.301Z

resolved2024-01-19T18:25:08.289Z

This incident has been resolved.

identified2024-01-19T17:48:29.959Z

Aptible operations have been temporarily blocked in shared stack shared-eu-central-1 in order to address a stack-specific error. Our team will provide an updated status once operations are unblocked.

Dec 14, 2023

Report: "EC2 Host Failure"

Last update 2023-12-14T11:43:07.468Z

resolved2023-12-14T11:43:06.965Z

This incident has been resolved.

investigating2023-12-14T11:29:12.050Z

We are investigating an EC2 dedicated host failure affecting a small number of dedicated stacks.

Dec 13, 2023

Report: "Aptible API Degraded Performance"

Last update 2023-12-13T20:38:45.543Z

resolved2023-12-13T20:38:45.529Z

This incident has been resolved.

monitoring2023-12-13T19:53:04.031Z

The Aptible team is aware of intermittent degraded performance in the Aptible API, which led to some users seeing API-related Operation timeouts. Performance has returned to normal levels, and the team continues to monitor to ensure stability.

Nov 15, 2023

Report: "Quay.io Registry Issues"

Last update 2023-11-15T00:27:16.853Z

resolved2023-11-15T00:27:16.833Z

Quay is reporting that this incident has been resolved.

monitoring2023-11-14T21:45:24.779Z

We have failed over to our secondary registry provider and are monitoring ongoing status.

identified2023-11-14T21:29:39.141Z

We have identified an issue with our primary upstream registry provider which is impacting some Aptible Deploy operations. Our team is in the process of failing over to our backup provider and will update this incident when this has been completed.

Oct 11, 2023

Report: "CVE-2023-44487 "HTTP/2 Rapid Reset" Response"

Last update 2023-10-11T18:00:16.158Z

resolved2023-10-11T18:00:16.149Z

We are aware of the recently disclosed vulnerability CVE-2023-44487, also known as the "HTTP/2 Rapid Reset Attack," which poses a potential risk of Denial of Service (DoS) attacks on HTTP/2-capable web servers. We are actively monitoring the situation and have conducted in-house tests on our HTTPS Endpoints that utilize AWS Application Load Balancers (ALBs). Currently, there is no evidence suggesting Aptible is vulnerable to this particular security concern. AWS has put in place extra measures to mitigate this vulnerability, ensuring that our services stay secure and fully functional. More information here: - AWS: CVE-2023-44487 - HTTP/2 Rapid Reset Attack: https://aws.amazon.com/security/security-bulletins/AWS-2023-011/ On Endpoint Types at Aptible: - HTTP(S) Endpoints: these use Application Load Balancers (ALBs) and have mitigations in place to address the vulnerability. Some legacy endpoints created before 2018 use legacy Elastic Load Balancers (ELBs), which do not support HTTP/2 and are not vulnerable. - TLS / TCP Endpoints: if customers are exposing custom HTTP/2-capable web servers behind these Endpoints, we recommend verifying with your web server vendor to determine if you are affected and, if so, promptly install the latest patches to mitigate this issue.

Sep 29, 2023

Report: "Host Provisioning Delays in us-east-1"

Last update 2023-09-29T04:09:16.071Z

resolved2023-09-29T04:09:16.052Z

This incident has been resolved.

monitoring2023-09-29T03:16:51.000Z

We are again seeing successful deployment of new hosts in the affected single availability zone in the us-east-1 region. We will continue to monitor for an additional period before resolving the incident.

identified2023-09-29T01:12:52.541Z

AWS continues to work on recovering this issue in a single availability zone in the us-east-1 region. Running apps and databases continue not to be impacted by this failure.

investigating2023-09-28T23:31:15.148Z

AWS is experiencing an issue preventing the timely deployment of new hosts in a single availability zone in the us-east-1 region. As a result, some app and database restart, scale, and deployment operations that result in a new host being provisioned may fail and roll back. Running apps and databases are not impacted by this failure.

Jul 29, 2023

Report: "EC2 Host Failure - us-east-1"

Last update 2023-07-29T15:14:52.167Z

resolved2023-07-29T15:14:52.153Z

This incident has been resolved.

investigating2023-07-29T15:02:48.914Z

We are investigating several EC2 dedicated host failures affecting some customers with apps and databases in us-east-1, related to an AWS incident. Affected apps are currently being restarted on healthy instances (apps scaled to 2 or more are automatically distributed across availability zones and will automatically failover).

Jul 20, 2023

Report: "Delayed Operations"

Last update 2023-07-20T20:11:24.154Z

resolved2023-07-20T20:11:24.142Z

This incident has been resolved.

monitoring2023-07-20T19:34:26.248Z

Our team has mitigated this issue, and newly created operations should now succeed. Customers may see long-delayed operations begin to fail. These failed operations will need to be restarted.

identified2023-07-20T19:04:39.507Z

Our team has determined the root cause as an internal dependency causing operations to hang. We're currently beginning steps to remediate this issue—more updates to follow.

investigating2023-07-20T18:50:13.839Z

We are currently experiencing issues with operations being delayed. Our Engineering team is currently investigating—more updates to follow.

Jul 13, 2023

Report: "Metric Drains Interrupted for Some Dedicated Stacks"

Last update 2023-07-13T00:04:09.070Z

postmortem2023-07-11T16:08:20.519Z

# Incident Postmortem: Metric Drains Interrupted for Some Dedicated Stacks ## Executive Summary On June 20, 2023, our platform experienced a service degradation incident for Metric Drains while rolling out a new feature for Metric Drains. This was due to unexpected side effects of a new internal utility used to deploy the feature. Some of our customers experienced interruptions in their metric drains during this incident. All issues were subsequently addressed, and service has been fully restored. ## Detailed Incident Description Configuration Change Initiation: The rollout of the change relied on a two-step configuration process to update the software for the metric drain emitter and aggregator components within each dedicated stack. This process was initiated using a new utility that had been successfully deployed in the past but not at the scale required for this rollout. Utility Timeouts and Delays: During the rollout, the configuration utility started experiencing cascading timeouts as operations queued with increasing delays in executing the configuration changes. During this period of delay in having configuration uniformly updated for the rollout, this caused some customer stacks to be only partially configured for the updated metric drain software. Customer Impact: A small number of customers who were deploying or scaling services during this period had their metric drains interrupted due to the aforementioned configuration issues. Resolution: Our team immediately worked on fixing the configuration issues. By 16:24 EDT, we successfully restored the configuration state for the affected customers, and the service was resumed to its regular state. Follow-up Audit: On the following morning of June 21, a follow-up audit revealed that two additional customers still needed configuration updates for their metric drains. We immediately addressed these issues. ## Root Cause Analysis The root cause of this issue was a combination of the increased scale of the rollout and the relative novelty of the utility used for the configuration changes. Although this utility had performed successfully under previous workloads, it did not sufficiently scale to handle the increased demand of this particular rollout. ## Lessons Learned and Preventative Measures Testing Deployment Tools at Scale: testing new deployment tools and utilities under maximum practical loads is crucial to ensure they can handle expected full-scope workloads without disruption. Audit Processes: Though our follow-up audit process effectively identified additional affected customers, we will make such audits more timely to catch any lingering issues sooner. We sincerely apologize for any inconvenience caused to our customers during this incident. We take this issue seriously and are committed to ensuring that such incidents do not occur in the future.

resolved2023-06-21T19:34:27.614Z

This incident has been resolved.

monitoring2023-06-21T15:57:19.543Z

A fix has been implemented and we are monitoring the results.

identified2023-06-21T15:28:11.667Z

Related to the 6/20 incident, some customers on Dedicated Stacks are seeing a continued disruption to their Metric Drains. We are implementing a fix for those customers/stacks and will follow up via support tickets to affected customers.

Jun 20, 2023

Report: "Metric Drains Interrupted for Some Dedicated Stacks"

Last update 2023-06-20T20:24:36.272Z

resolved2023-06-20T20:24:35.940Z

This incident has been resolved.

identified2023-06-20T20:00:47.036Z

During a rollout of an updated version of Metric Drains in the past hour, some customers on dedicated stacks experienced an interruption in service for Metric Drains. The issue has been identified and is being addressed, with the issue expected to be resolved for all affected customers within the following 10 minutes.

Jun 13, 2023

Report: "AWS Availability (3rd party)"

Last update 2023-06-13T21:46:25.748Z

resolved2023-06-13T21:46:24.932Z

This incident has been resolved.

monitoring2023-06-13T20:59:40.927Z

Backup copying has been re-enabled, and we are continuing to monitor for any other impact.

monitoring2023-06-13T20:34:20.405Z

We are aware of ongoing issues with AWS related to a failure of the Lambda service. Our assessment is that there is no direct impact to the availability or performance of any Aptible hosted services. One impact we are monitoring is that copying Database backups to a second region is not working at this time. When the incident resolves, the copies will be made automatically. If you are utilizing any AWS services directly in your own AWS account, you may monitor the status of those services at https://health.aws.amazon.com/health/status

May 23, 2023

Report: "Aptible transactional emails are delayed"

Last update 2023-05-23T21:18:13.762Z

resolved2023-05-23T21:18:12.383Z

This incident has been resolved.

identified2023-05-23T20:55:52.455Z

Our email provider has reported delays in processing, resulting in delays in sending queued transactional emails sent by our platform. In particular, this means the following workflows might be delayed: - Email verifications - Password resets - Role invitations This does not affect any customer apps (unless you happen to be using the same email provider, of course): only transactional emails sent by Aptible itself are affected.

Jan 28, 2023

Report: "Long queueing times for Aptible operations"

Last update 2023-01-28T14:56:49.682Z

resolved2023-01-28T14:56:49.055Z

This incident has been resolved.

monitoring2023-01-27T19:21:43.350Z

At this time, all issues related to long queueing times should be resolved. We are leaving this incident in "monitoring" status until the underlying AWS issue is either resolved or acknowledged by AWS with an explicit resolution ETA.

monitoring2023-01-27T18:32:05.727Z

Operations for resources in us-east-1 are queueing for up to 5 minutes before beginning to execute, as a result of an AWS issue causing certain operations (especially `CopySnapshot` operations) to be extremely delayed. (These operations ordinarily begin to execute within 1 or 2 seconds.) Currently, only resources in us-east-1 are affected. We are making changes to disable some of the affected AWS operations until the underlying AWS issue is resolved. In the meantime, operations may take longer than usual to start, but will eventually begin to execute if no action is taken (and if they are not cancelled).

Dec 13, 2022

Report: "Operations paused in ap-southeast-1 region"

Last update 2022-12-13T22:23:30.592Z

resolved2022-12-13T22:23:30.020Z

Calls to the EC2 API are responding normally, Operations in ap-southeast-1 have been resumed.

identified2022-12-13T21:54:57.746Z

The issue has been identified and a fix is being implemented.

investigating2022-12-13T21:54:52.067Z

Due to the unavailability of the AWS EC2 API in the ap-southeast-1 region, Aptible Operations have been blocked in that region.

Nov 3, 2022

Report: "AWS Outage"

Last update 2022-11-03T20:37:30.536Z

postmortem2022-11-03T20:32:55.273Z

AWS has identified the root cause of the Endpoint unavailability: > Between 2:26 PM and 3:04 PM PDT$9:26PM ~ 10:04 PM UTC$ we experienced increased packet loss for traffic destined to public endpoints in the US-EAST-1 Region, which affected Internet and public Direct Connect connectivity for endpoints in the US-EAST-1 Region. This is, unfortunately, essentially the same impact we've seen in two previous incidents, although AWS's description of the cause is slightly different: October 15th 2022: [https://status.aptible.com/incidents/grf6gdrrszf9](https://status.aptible.com/incidents/grf6gdrrszf9) > Between 12:20 AM and 11:28 AM PDT, we experienced intermittent failures in Route53 Health Checks impacting Target Health evaluation in US-EAST-1. The issue has been resolved and the service is operating normally. September 27th, 2021: $Only a couple of Endpoints were impacted, so no incident was created$ > On September 27, 2021, between 8:45 AM and 2:09 PM PDT, Route53 experienced increased change propagation times for Health Check edits where unexpected failover to their secondary application load balancer $ALB$ occurred despite their primary ALB targets being healthy. The issue has been resolved and the service is operating normally. While AWS describes these incidents as "increased change propagation times", "intermittent failures", and "increased packet loss", and apparently do not qualify as an incident to be posted to [https://status.aws.amazon.com,](https://status.aws.amazon.com,) the observed impact to our customers is very clear: the impacted Endpoints are totally unreachable for a period. As such, we will permanently implement the "temporary" change we made on October 15th: we will be disabling the Route53 health checks $and the associated custom error page$ for all Endpoints, as this has been the root cause of these availability incidents. As we indicated to customers during the Oct 15th and Nov 3rd incidents, you may restart any App in order to immediately disable the Route53 health check. Any App which has been deployed, restarted, or scaled since October 15th will already have it disabled, and we will make another announcement when we intend to disable it globally on all Apps for which it remains enabled.

resolved2022-11-02T22:28:47.332Z

We no longer see any impact, and will continue investigating for an RCA.

monitoring2022-11-02T22:18:07.779Z

Based on a random sampling, and reported affected Endpoints, we are no longer seeing any impact. We will continue to monitor the situation.

identified2022-11-02T22:08:23.971Z

We're seeing many Endpoints recover without action being taken, so we're looking into ways to identify Endpoints that remain impacted so that we can efficiently fix them. Restarting known impacted Apps remains the quickest solution that we know of.

identified2022-11-02T21:54:53.479Z

We've observed that running `aptible restart --app $handle` can resolve the underlying issue with the ELB, and recommend restarting any of your impacted Apps at this time.

investigating2022-11-02T21:45:18.812Z

We are currently investigating a large number of unreachable ELBs in AWS's us-east-1 region, and are wait for acknowledgement from AWS and trying to narrow the scope of the failures in order to provide failover/workarounds if possible.

Report: "High vulnerabilities in OpenSSL (CVE-2022-3602 & CVE-2022-3786)"

Last update 2022-11-03T13:53:27.903Z

resolved2022-11-03T13:53:27.435Z

This incident has been resolved.

monitoring2022-11-01T17:21:58.000Z

OpenSSL's pre-announcements of CVE-2022-3602 described this issue as CRITICAL but has since been downgraded to HIGH [0]. Aptible remains unaffected by this vulnerability. We still recommend every Aptible customer check the OpenSSL versions used in their apps to confirm they're unaffected. Please follow the aforementioned steps to check the version and update OpenSSL accordingly. Additional Context & Guidance from OpenSSL: https://www.openssl.org/blog/blog/2022/11/01/email-address-overflows/ [0] https://www.openssl.org/news/secadv/20221101.txt

monitoring2022-10-31T20:21:24.000Z

OpenSSL has announced a critical vulnerability [0] for which a patch will be released tomorrow, November 1, 2022 between 13:00 and 17:00 UTC. The nature of the vulnerability has not been disclosed, but based on how it's being handled, Aptible expects it could be a serious vulnerability affecting data confidentiality for those running affected OpenSSL versions (>= 3.0.0, < 3.0.7). Aptible has reviewed all infrastructure components that we manage and have confirmed that all are unaffected by this vulnerability. These components include: - Our Managed TLS endpoints - The TLS endpoints for our REST API services (Auth and Deploy APIs) - All versions of our managed databases - Our log forwarding infrastructure - Our metrics collection infrastructure - Our SSH and Git server infrastructure Still, every Aptible customer should check the OpenSSL versions used in their apps to confirm they're unaffected. To do so, run: $ aptible ssh --app $APP_HANDLE openssl version If the version is >= 3.0.0, you should plan to upgrade your apps' Docker image(s) tomorrow as soon as OpenSSL 3.0.7 is released. We will continue to update this incident page as more information is revealed about the vulnerability. If the vulnerability is only exploitable for *server-side* OpenSSL functionality, the impact to Aptible customers would be significantly reduced. Only those customers who use plain TCP endpoints [1] with their own OpenSSL for TLS termination would be affected in this scenario. [0] https://mta.openssl.org/pipermail/openssl-announce/2022-October/000238.html [1] https://deploy-docs.aptible.com/docs/tcp-endpoints

Oct 17, 2022

Report: "Continuation of AWS ELB incident https://status.aptible.com/incidents/b3xvn9tmzfjz"

Last update 2022-10-17T21:37:34.704Z

resolved2022-10-17T21:37:34.688Z

With an acknowledgement from AWS (copied below) we are marking this incident as resolved. > Between 12:20 AM and 11:28 AM PDT, [on October 15th] we experienced intermittent failures in Route53 Health Checks impacting Target Health evaluation in US-EAST-1. The issue has been resolved and the service is operating normally.

monitoring2022-10-15T18:32:08.670Z

The issue should be resolved for all endpoints that have been affected. We've also updated the behavior of the platform so that if you see this issue, running `aptible restart` on the affected app will update the configuration of all the app's endpoints to ignore health checks, and should thereby resolve the issue.

monitoring2022-10-15T17:31:23.767Z

All impacted endpoints that we have identified have been fixed by disabling their health checks. If you have an Endpoint that seems to be impacted, please email support@aptible.com and include the domain name of the Endpoint in question.

identified2022-10-15T15:42:27.467Z

This is a continuation of https://status.aptible.com/incidents/b3xvn9tmzfjz It has been noticed that we can only detect the issue when HTTP requests are made to the Endpoint, and as the incident started in the middle of a weekend night, we were not able to identify Endpoints which were not in use. We're reviewing and remediating additional Endpoints now, and will make an update to the platform to remove health checks entirely if AWS cannot fix the Route53 issue.

Oct 15, 2022

Report: "AWS ALB issue"

Last update 2022-10-15T09:18:46.196Z

resolved2022-10-15T09:18:45.437Z

This incident has been resolved.

monitoring2022-10-15T08:26:02.803Z

We are disabling health checks on the impacted Endpoints in order to bring those applications back online.

identified2022-10-15T08:18:31.574Z

We have identified an AWS issue with Route 53 health checks where some records are failing over to Brickwall, our error server, despite the containers passing health checks. As far as we can tell, this is affecting a small number of Endpoints, unfortunately including dashboard.aptible.com, as well.

Sep 17, 2022

Report: "EC2 Host Failure"

Last update 2022-09-17T20:36:16.718Z

resolved2022-09-17T20:36:15.296Z

This incident has been resolved.

monitoring2022-09-17T20:31:35.195Z

A fix has been implemented and we are monitoring the results.

identified2022-09-17T20:20:17.716Z

We are investigating an EC2 dedicated host failure affecting a small number of apps and databases. Affected apps are currently being restarted on healthy instances (apps scaled to 2 or more are automatically distributed across availability zones and will automatically failover).

Jul 28, 2022

Report: "AWS us-east-2 power failure"

Last update 2022-07-28T18:46:30.271Z

resolved2022-07-28T18:46:29.746Z

Everything appears to be operational at this time. Please open a ticket with support if you continue to experience issues in us-east-2.

monitoring2022-07-28T18:24:27.142Z

It looks like AWS has recovered most services, and we are continuing to monitor operations in us-east-2 to ensure everything is working properly.

identified2022-07-28T17:42:10.000Z

AWS experienced a power failure in a single availability zone the us-east-2 Region. This affected networking in that AZ, and has also impacted load balancer registration times. We are working to ensure any resources we can identify as impact are moved to another AZ, but the impact to load balancer registration, and an issue creating/updating/deleting Route 53 records are both impacting our ability to mitigate availability issues.

Jul 14, 2022

Report: "EC2 Host Failure"

Last update 2022-07-14T03:47:30.280Z

resolved2022-07-14T03:47:30.265Z

This incident has been resolved.

investigating2022-07-14T00:30:46.921Z

May 27, 2022

Report: "EC2 Host Failure"

Last update 2022-05-27T18:01:16.055Z

resolved2022-05-27T18:01:16.041Z

This incident has been resolved.

investigating2022-05-27T18:00:49.478Z

We are continuing to investigate this issue.

investigating2022-05-27T18:00:03.410Z

We are continuing to investigate this issue.

investigating2022-05-27T17:51:49.483Z

Apr 20, 2022

Report: "Host failure in ca-central-1"

Last update 2022-04-20T22:29:40.082Z

resolved2022-04-20T22:29:39.388Z

This incident has been resolved.

identified2022-04-20T22:21:42.532Z

We are investigating an EC2 dedicated host failure affecting a small number of apps and databases in the ca-central-1 region. Affected apps are currently being restarted on healthy instances (apps scaled to 2 or more are automatically distributed across availability zones and will automatically failover).

Apr 13, 2022

Report: "EC2 Host Failure"

Last update 2022-04-13T17:45:28.301Z

resolved2022-04-13T17:45:28.286Z

This incident has been resolved.

investigating2022-04-13T17:28:30.025Z

Apr 12, 2022

Report: "Host provisioning failures"

Last update 2022-04-12T22:00:51.825Z

resolved2022-04-12T22:00:51.809Z

This incident has been resolved.

investigating2022-04-12T22:00:20.448Z

We are continuing to investigate this issue.

investigating2022-04-12T20:42:52.927Z

We are currently investigating an issue that is blocking new hosts from being provisioned. As a result, some app and database restart, scale, and deployment operations that result in a new host being provisioned may fail. Running apps and databases are not impacted by this failure.

Apr 1, 2022

Report: "CVE-2022-22965 "Spring4Shell" Response"

Last update 2022-04-01T18:21:29.778Z

resolved2022-04-01T18:21:29.763Z

This incident has been resolved.

monitoring2022-03-31T20:57:04.610Z

Recently a series of vulnerabilities in the popular Java framework Spring were found, notably CVE-2022-22965 [0] (dubbed "Spring4Shell") and CVE-2022-22963 [1]. Aptible does not use the Spring framework in any of our internal applications, and has verified that none of our offered services that use Java are vulnerable either. We will continue monitoring the situation. [0] https://tanzu.vmware.com/security/cve-2022-22965 [1] https://tanzu.vmware.com/security/cve-2022-22963

Mar 25, 2022

Report: "Operations blocked - Route 53 propagation delays"

Last update 2022-03-25T22:13:18.715Z

resolved2022-03-25T22:13:17.066Z

This incident has been resolved.

monitoring2022-03-25T22:05:06.895Z

Operations have been restored. Route 53 response times are still slow, but within acceptable limits.

investigating2022-03-25T21:38:39.829Z

Mar 13, 2022

Report: "EC2 Host Failure"

Last update 2022-03-13T22:32:33.565Z

resolved2022-03-13T22:32:32.772Z

This incident has been resolved.

identified2022-03-13T22:25:02.039Z

Mar 3, 2022

Report: "Route53 increased propagation delays"

Last update 2022-03-03T23:15:05.987Z

resolved2022-03-03T23:15:05.559Z

This incident has been resolved.

monitoring2022-03-03T20:27:22.989Z

At this time we have only observed an impact when creating or destroying DNS records. No DNS record changes have been impacted, so we are resuming normal operations. Our team will continue to monitor closely, and work to resolve the initially failed operations.

identified2022-03-03T20:06:26.893Z