Is Pagely Down Right Now? Discover if there is an ongoing service outage.

Pagely is currently Operational

Last checked Jul 29, 2025 17:52 UTC from Pagely's official status page

Historical record of incidents for Pagely

Jul 9, 2025

Report: "API and authentication errors"

Last update 2025-07-09T19:17:32.937Z

investigating2025-07-09T19:17:32.935Z

We are continuing to investigate this issue.

investigating2025-07-09T19:14:39.264Z

We are currently having issues with authentication. This impacts users logging into Atomic, performing certain config changes, and executing CI/CD workflows. We are actively investigating.

Jun 27, 2025

Report: "Atomic API Issue"

Last update 2025-06-27T08:12:39.536Z

resolved2025-06-27T08:12:39.517Z

This incident has been resolved.

investigating2025-06-27T07:09:14.184Z

We are currently investigating this issue.

monitoring2025-06-27T06:44:25.654Z

A fix has been implemented and we are monitoring the results.

investigating2025-06-27T06:35:10.777Z

An issue has been identified preventing the Atomic interface from communicating with other systems, including apps dashboard. We are currently investigating.

Jun 25, 2025

Report: "Atomic API Issue"

Last update 2025-06-25T04:18:14.712Z

investigating2025-06-25T04:17:58.098Z

An issue has been identified preventing the Atomic interface from communicating with other systems, including support. We are currently investigating.

May 22, 2025

Report: "Node.js v24 Upgrades"

Last update 2025-05-22T23:00:00.000Z

Completed2025-05-22T23:00:00.000Z

The scheduled maintenance has been completed.

In progress2025-05-22T19:00:00.000Z

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Scheduled2025-05-20T18:45:00.000Z

The Pagely team will be performing upgrades on all servers to use Node.js v24 LTS as the default system package.This change does not directly impact any web services on the server, however it may impact any post-deployment steps, custom theme or plugin build steps, or similar operations which utilize the /usr/bin/node binary. If any code requires a specific version of Node.js, Pagely recommends managing local versions in client directories by following the steps in our documentation: https://support.pagely.com/hc/en-us/articles/360019779191-Update-and-Manage-Node-js-Versions-with-n

May 7, 2025

Report: "Atomic - Issue Promoting Domains"

Last update 2025-05-07T16:24:15.646Z

resolved2025-05-07T16:24:12.962Z

This incident has been resolved.

monitoring2025-05-07T16:03:00.783Z

A fix has been deployed and promoting/adding/and deleting domains should now be functioning as intended. We are continuing monitoring for any further issues.

identified2025-05-07T15:19:19.900Z

The root cause has been identified and a fix is being deployed.

investigating2025-05-07T13:56:09.212Z

We are continuing to investigate this issue. This issue may also impact adding/removing domains from an App in Atomic.

investigating2025-05-07T13:27:12.536Z

We are currently investigating an issue in the Atomic control panel where domains are unable to be promoted to an App's Primary Domain. We will provide additional details shortly.

Mar 31, 2025

Report: "Atomic API Issues"

Last update 2025-03-31T20:20:23.963Z

resolved2025-03-31T20:20:23.607Z

As we mentioned in our last update we reverted changes to our gateway which affected the Atomic API. We have not seen any further issues and will mark this incident as resolved.

monitoring2025-03-31T20:04:03.368Z

We have reverted the changes and our API at this time is working as intended. We will continue to monitor this issue before marking this incident as resolved.

identified2025-03-31T19:23:29.992Z

Additional changes are being deployed to the gateway and we expect a resolution to this issue shortly.

identified2025-03-31T18:17:12.866Z

We have identified an issue with an endpoint related to a gateway configuration change made earlier today. We expect the fix to be rolled out shortly.

investigating2025-03-31T18:04:10.232Z

We are continuing to investigate this issue.

investigating2025-03-31T17:58:39.990Z

We are currently investigating issues with the Atomic API and will have additional updates shortly.

Mar 24, 2025

Report: "PressCDN Purge Issue"

Last update 2025-03-24T16:31:22.377Z

resolved2025-03-24T16:31:21.869Z

This incident is now resolved. PressCDN purges should now complete successfully. Our team identified issues resulting from the earlier Atomic Control Panel Maintenance and we have taken the appropriate corrective action to restore functionality to API calls.

monitoring2025-03-24T16:05:57.332Z

Our team has identified the root of the issue. A fix has been implemented and PressCDN purges should now complete successfully. We are continuing monitoring for any further issues.

investigating2025-03-24T15:28:54.823Z

We are currently investigating an issue with PressCDN purges returning API failures. We will provide additional details shortly.

Mar 21, 2025

Report: "App Provisioning Issue"

Last update 2025-03-21T22:44:47.948Z

resolved2025-03-21T22:44:47.621Z

This incident has been resolved. Our team identified a newly provisioned internal server that was collecting new app provision information which was not intended. We have corrected the issue and app provisioning is now directed to the proper internal server.

monitoring2025-03-21T22:18:52.640Z

Our team has identified the root of the issue. We have tested new provisioning and each app provision has been successful.

investigating2025-03-21T21:35:48.850Z

Unfortunately the app provisioning issue persists after an initial fix. This is a new incident to track further progress on the issue.

Report: "App Provisioning Issue"

Last update 2025-03-21T21:27:38.744Z

resolved2025-03-21T21:17:24.267Z

This incident is resolved.

monitoring2025-03-21T20:59:32.650Z

We have identified the source of the issue and new app provisioning is now successful. We will monitor the incident and mark it as resolved shortly.

investigating2025-03-21T20:33:55.931Z

We are continuing to investigate the issue and have identified some potential causes of the app provisioning failures. We will post additional updates as we find out more regarding this issue.

investigating2025-03-21T19:30:29.613Z

We are currently investigating an issue with app provisioning not completing successfully. We will provide additional details shortly.

Jan 17, 2025

Report: "Payment Processor Issues"

Last update 2025-01-17T14:42:46.121Z

resolved2025-01-17T14:42:45.827Z

The incident with the payment processor has been resolved! We can confirm that payment processing is operating as expected now.

monitoring2025-01-17T13:51:29.299Z

Our payment processor has reported that the service has been restored. We are monitoring how the service operates on Pagely end. We will provide an update as we confirm that it operates as expected.

identified2025-01-17T12:01:59.310Z

Our payment processor is testing the service before restoring its functionality. We will continue to follow the status of the service and report any updates accordingly.

investigating2025-01-17T07:39:02.784Z

Our payment processor is continuing to work on the issue. We will continue to monitor and update when we have more information.

investigating2025-01-17T03:37:22.810Z

Our payment processor has identified the issue and is currently working on restoring services. We will update once we have further information.

investigating2025-01-17T02:30:27.753Z

Our payment processor is still investigating the issue. We will update once we have further information.

investigating2025-01-17T01:31:28.302Z

Our payment processor is still currently investigating the issues.

investigating2025-01-17T00:32:21.142Z

Our payment processor is currently still investigating the issue.

investigating2025-01-16T23:28:22.044Z

Our payment processor is still investigating further. We will update once we have further information.

investigating2025-01-16T22:30:15.409Z

Our payment processor is investigating this issue further at this time.

investigating2025-01-16T21:32:38.186Z

We are currently investigating an issue with our payment processor preventing payments and stopping sign-ups.

Dec 31, 2024

Report: "Shared database outage for vps-virginia-aurora-20"

Last update 2024-12-31T05:01:27.075Z

resolved2024-12-31T05:01:26.706Z

The new database environment is working as expected and no further action is needed at this time. This incident has been resolved.

monitoring2024-12-31T04:46:53.315Z

Our engineering team has restored vps-virginia-aurora-20 to a new database environment and sites are now able to connect to this new environment properly. We are still monitoring this new environment and site availability has been restored.

identified2024-12-31T03:57:48.327Z

Our engineering team is actively working with AWS regarding issues with the database vps-virginia-aurora-20.

identified2024-12-31T02:52:42.045Z

Our team is continuing to work with AWS to implement a fix at this time.

identified2024-12-31T01:53:20.752Z

The issue has been identified with the database and a fix is being implemented.

investigating2024-12-31T01:07:22.664Z

We were alerted to an issue with the shared database vps-virginia-aurora-20 and are currently investigating the issues at this time.

Dec 18, 2024

Report: "Availability issues with Web128/Web129"

Last update 2024-12-18T19:54:32.461Z

resolved2024-12-18T19:54:32.110Z

This incident has been resolved.

monitoring2024-12-18T19:42:04.426Z

Our team has resolved the issue with high load on web128 and web129. We will continue to monitor these instances and provide another update shortly.

investigating2024-12-18T19:19:10.317Z

We are currently investigating availability issues with our shared instances web128 and web129. We will provide further updates as we continue investigation into this incident.

Jul 30, 2024

Report: "Intermittent issues with Atomic"

Last update 2024-07-30T19:15:31.982Z

resolved2024-07-30T19:15:31.960Z

This incident has been resolved.

monitoring2024-07-30T17:23:16.655Z

Pagely have fixed the Atomic control panel issue, and are continuing to monitor performance.

identified2024-07-30T16:45:55.452Z

We have identified the root cause for this issue, and are actively working to implement a fix.

investigating2024-07-30T16:30:23.982Z

We are aware of intermittent issues with Pagely's Atomic control panel. We are investigating.

Jul 2, 2024

Report: "Service degradation with Zendesk support"

Last update 2024-07-02T17:17:09.235Z

resolved2024-07-02T17:17:09.218Z

This incident has been resolved.

investigating2024-07-02T16:26:39.304Z

We are investigating an issue with Zendesk. Customers can still create support tickets within Atomic, but live chat and ticket response times may be slower than normal.

Jun 26, 2024

Report: "Atomic Control Panel is down"

Last update 2024-06-26T06:37:57.835Z

resolved2024-06-26T06:37:57.821Z

This incident has been resolved.

monitoring2024-06-26T06:23:20.166Z

The issue was related to third-party payment provider having issues with their service.

identified2024-06-26T06:11:29.155Z

We are currently investigating issues with Atomic Control Panel being down.

Dec 17, 2023

Report: "P20 service degradation"

Last update 2023-12-17T04:24:07.196Z

resolved2023-12-17T04:24:07.179Z

The shared-hosting service degradation should now be resolved. If your site still is having issues, please contact Pagely support.

monitoring2023-12-17T02:41:49.048Z

We are continuing to monitor our shared-hosting infrastructure for ongoing issues.

monitoring2023-12-16T23:57:49.541Z

At this time, we believe the issue should be resolved for the majority of shared-hosting customers.

identified2023-12-16T22:49:45.129Z

The cause of the shared-hosting issue has been identified, and we are exploring potential fixes.

investigating2023-12-16T21:49:03.344Z

We are investigating an issue impacting a subset of our shared-hosting customers. Details will be added as they become available.

Dec 14, 2023

Report: "Atomic Control Panel - API Errors"

Last update 2023-12-14T06:59:09.803Z

resolved2023-12-14T06:59:09.785Z

This incident has been resolved.

investigating2023-12-14T06:39:21.564Z

We're experiencing an elevated level of API errors within Atomic and are currently looking into the issue.

Oct 19, 2023

Report: "Support Ticket Creation Issues for Collaborators"

Last update 2023-10-19T19:14:07.458Z

resolved2023-10-19T19:14:07.441Z

This issue has been identified and resolved.

investigating2023-10-19T01:08:40.823Z

We are currently investigating in an issue regarding a small subset of collaborators that can not submit a ticket through our ticketing system. We will provide further updates as we continue to resolve this issue.

Oct 13, 2023

Report: "CI/CD Deployment Issues"

Last update 2023-10-13T12:12:50.918Z

resolved2023-10-13T12:12:50.903Z

The incident has been resolved.

identified2023-10-13T08:11:55.979Z

We are continuing to deploy the fix, although the finish time has been delayed. We will post any further updates we have.

identified2023-10-13T06:05:47.933Z

We expect the issue to be resolved shortly as we are still finishing up the rollout of the needed updates.

identified2023-10-13T04:00:53.376Z

We are continuing to deploy the fix to correct the issue with CI/CD and will post any further updates we have.

identified2023-10-13T01:18:13.755Z

The issue with CI/CD deployments has been identified and the team is currently working on deploying the update to our environment.

investigating2023-10-13T00:44:15.951Z

Pagely engineers are currently investigating issues with CI/CD deployments and we will have further information via this status page once we have identified the issue.

Sep 11, 2023

Report: "Service degradation with Zendesk support"

Last update 2023-09-11T20:25:40.064Z

resolved2023-09-11T20:25:40.053Z

Zendesk reports that this issue has been resolved.

investigating2023-09-11T19:07:31.973Z

We are investigating an issue with Zendesk. Customers can still create support tickets within Atomic, but live chat and ticket response times may be slower than normal.

Sep 6, 2023

Report: "RDS us-east-2 (Ohio) Availability Issues"

Last update 2023-09-06T03:21:39.153Z

resolved2023-09-06T03:21:39.139Z

The issue has now been resolved.

monitoring2023-09-06T02:55:07.302Z

The RDS service affected by the outage is responsive again and sites should no longer have issues connecting to the database. We will continue to monitor and provide a further update once we are comfortable closing this incident.

investigating2023-09-06T02:30:29.223Z

We're currently investigating an issue with the AWS us-east-2 (Ohio) region causing an outage for the RDS service within that region.

Aug 21, 2023

Report: "Shared hosting infrastructure outage"

Last update 2023-08-21T20:53:11.302Z

resolved2023-08-21T20:53:11.288Z

No further issues have come up after our fix was implemented. The issue has been resolved.

monitoring2023-08-21T18:33:03.505Z

A fix has been applied and the shared hosting infrastructure is stable at this point. We'll continue to monitor the infrastructure.

investigating2023-08-21T18:23:51.187Z

We were alerted to issues with our shared hosting infrastructure and are currently investigating issues related to the shared hosting platform at this time.

Aug 2, 2023

Report: "Images missing from support.pagely.com"

Last update 2023-08-02T19:00:04.091Z

resolved2023-08-02T19:00:04.075Z

All images appear to be loading properly at this time with no further issues.

monitoring2023-08-02T15:49:50.544Z

A fix has been made and images are appearing online at this time. We'll continue monitoring to ensure all images are working as intended.

identified2023-08-02T15:37:44.104Z

Around 14:50 UTC Pagely engineers identified an issue with our support documentation in which all images fail to load. A fix is being worked on in order to bring back all images that are missing back online. We will update as progress is made.

Jun 21, 2023

Report: "Availability issues within the us-east-1 (Virginia) region"

Last update 2023-06-21T15:46:47.458Z

resolved2023-06-21T15:46:47.444Z

We have not observed any new issues over the past couple of hours so we'll go ahead and mark this incident as resolved.

monitoring2023-06-21T14:24:56.328Z

We have not received official word from AWS about the issue so far but we're currently no longer seeing any issues that would be affecting our customer's servers and the number of reported issues on external monitoring sites has since dropped substantially. We're continuing to monitor the status of our hosting infrastructure for now.

investigating2023-06-21T12:52:53.116Z

We're currently investigating a potential AWS issue within the us-east-1 (Virginia) region. We're seeing server connectivity issues that auto-resolve within a few minutes. A small number of sites may have experienced a brief outage while the network connection to their server is degraded but we haven't seen these outages last more than a couple of minutes.

Apr 19, 2023

Report: "Shared Hosting Maintenance - web124/web125"

Last update 2023-04-19T05:07:52.937Z

resolved2023-04-19T05:07:52.923Z

This incident has been resolved.

identified2023-04-19T02:34:43.720Z

The issue has been identified and a fix is being implemented.

investigating2023-04-19T02:33:47.735Z

Pagely engineers will be performing maintenance on the following shared web nodes: - web124 - web125 During maintenance, dynamic/uncached requests to these web nodes will not respond for 1-2 minutes. Within Atomic, you can determine if your apps are affected by this maintenance by referring to the "Primary Server Info", "Server Name" within your app overview. If the server name listed is not in the list above, then your app is not affected by this maintenance.

Jan 26, 2023

Report: "VPS User Provisioning"

Last update 2023-01-26T21:09:59.008Z

resolved2023-01-26T21:09:58.598Z

A fix has been implemented and tested successfully. No further issues are noted at this time.

investigating2023-01-26T19:40:25.003Z

We are currently aware and reviewing issues with creating, editing or adding new users or SSH keys. This does not affect any existing users or keys, only new keys and new users at this time. We will post additional updates as we have more information towards a solution.

Jan 21, 2023

Report: "Security Alert for customers using CircleCI with Pagely."

Last update 2023-01-21T00:53:32.840Z

resolved2023-01-21T00:45:52.000Z

This security alert is now resolved, however we still urge customers to take precautions needed in the steps outlined within this alert. We've also provided the full incident report from CircleCI, which was posted on January 13th and that can be read on their blog: https://circleci.com/blog/jan-4-2023-incident-report/

monitoring2023-01-09T22:53:34.147Z

CircleCI recently disclosed a security event on their blog: https://circleci.com/blog/january-4-2023-security-alert/ The nature of the disclosure relates to potential compromise of all secrets stored within a repository on the CircleCI platform. While CircleCI has taken steps since the initial disclosure to automatically rotate what they can for you, there are certain things that rely on you to fully resolve the matter. We urge customers using CircleCI to take the following steps as soon as possible: - Immediately rotate any and all secrets stored in CircleCI. There's a tool available to fetch all of your secrets from CircleCI. (https://github.com/CircleCI-Public/CircleCI-Env-Inspector) - Delete and re-create any CI/CD Integrations or Webhook configurations in Atomic if they were used with CircleCI. Full documentation can be found here: https://support.pagely.com/hc/en-us/articles/360050828232-Automatically-Deploying-Your-WordPress-Site-with-CircleCI - after recreating your integrations, you will need to update the integration ID and secret within your pipeline configuration. - If you are using SSH keys to perform any deployments, please regenerate those as well. If you have any questions or concerns regarding this event, please do not hesitate to Contact Pagely Support: https://support.pagely.com/hc/en-us/articles/114094215332-Contacting-Support

Jan 18, 2023

Report: "Redis 6.2.10 Security Release"

Last update 2023-01-18T23:48:17.287Z

resolved2023-01-18T23:48:16.865Z

The scheduled maintenance has been completed.

investigating2023-01-18T01:15:56.336Z

Pagely will be performing a network-wide upgrade of Redis to v6.2.10 beginning Jan 18 02:00 UTC to address security flaws in previous versions. Anticipated impact involves a flush of Redis cache which may cause a temporary increase in server load and response times until object caches are gradually and automatically repopulated. If you have any questions or concerns regarding this event, please do not hesitate to Contact Pagely Support: https://support.pagely.com/hc/en-us/articles/114094215332-Contacting-Support

Nov 4, 2022

Report: "Increased Load/Response Times for New Relic users"

Last update 2022-11-04T05:36:22.761Z

resolved2022-11-04T05:36:22.744Z

Impacted hosts have had fixes applied to resolve this issue.

monitoring2022-11-04T02:37:53.448Z

A fix has been implemented and is being deployed to all affected hosts. We will continue to monitor the situation.

identified2022-11-04T00:55:38.452Z

We have identified an issue involving higher than normal server CPU usage and response times for customers using New Relic, and we are currently implementing a fix. We will post additional updates here as we have more information.

Jul 28, 2022

Report: "us-east-2 (Ohio) Availability Issues"

Last update 2022-07-28T20:24:01.171Z

resolved2022-07-28T20:24:01.155Z

After further monitoring, no further issues appear to be happening within the us-east-2 Ohio region. Everything has been resolved at this time.

monitoring2022-07-28T19:18:20.264Z

We've gone ahead and confirmed that all servers appear to be operational from within the us-east-2 Ohio region at this time. We will continue to monitor the region to ensure no other issues remain.

identified2022-07-28T18:47:53.250Z

At this time, most of the AWS issues that appeared to be the root cause of the issues has been cleared. We are still continuing to work on servers within the region that may still be unavailable to ensure they recover fully.

identified2022-07-28T18:31:48.824Z

Latest update from AWS: Instance Impairments 11:25 AM PDT We continue to make progress in recovering the remaining EC2 instances and EBS volumes affected by the loss of power in a single Availability Zone in the US-EAST-2 Region. The vast majority of EC2 instances are now healthy, but we continue to work on recovering the remaining EBS volumes affected by the issue. EC2 API error rates and latencies have returned to normal levels. Elastic Load Balancing remains weighted away from the affected Availability Zone. Error rates and latencies for Lambda function invocations have now returned to normal levels. Power has been restored to all affected resources and remains stable. We expect the recovery of EC2 instances and EBS volumes to continue to improve over the next 30 minutes. For customers that need immediate recovery, we recommend failing away from the affected Availability Zone as other Availability Zones are not affected by this issue.

identified2022-07-28T18:31:22.976Z

Latest update from AWS: 11:25 AM PDT We continue to make progress in recovering the remaining EC2 instances and EBS volumes affected by the loss of power in a single Availability Zone in the US-EAST-2 Region. The vast majority of EC2 instances are now healthy, but we continue to work on recovering the remaining EBS volumes affected by the issue. EC2 API error rates and latencies have returned to normal levels. Elastic Load Balancing remains weighted away from the affected Availability Zone. Error rates and latencies for Lambda function invocations have now returned to normal levels. Power has been restored to all affected resources and remains stable. We expect the recovery of EC2 instances and EBS volumes to continue to improve over the next 30 minutes. For customers that need immediate recovery, we recommend failing away from the affected Availability Zone as other Availability Zones are not affected by this issue.

identified2022-07-28T18:08:10.030Z

Further update from AWS: 10:49 AM PDT We continue to see recovery of EC2 instances that were affected by the loss of power in a single Availability Zone in the US-EAST-2 Region. At this stage, the vast majority of affected EC2 instances and EBS volumes have returned to a healthy state and we continue to work on the remaining EC2 instances and EBS volumes. Elastic Load Balancing has shifted traffic away from the affected Availability Zone. Single-AZ RDS databases were also affected and will recover as the underlying EC2 instance recovers. Multi-AZ RDS databases would have mitigated impact by failing away from the affected Availability Zone. While the vast majority of Lambda functions continue operating normally, some functions are experiencing invocation failures and latencies, but we expect this to improve over the next 30 minutes. Power has been restored to all affected resources and remains stable. We expect the recovery of EC2 instances and EBS volumes to continue to improve over the next 45 minutes. For customers that need immediate recovery, we recommend failing away from the affected Availability Zone as other Availability Zones are not affected by this issue.

identified2022-07-28T17:31:18.048Z

Our team is working on failover options while AWS works to resolve this issue. Latest update from AWS: Instance Impairments 10:25 AM PDT We can confirm that some instances within a single Availability Zone (USE2-AZ1) in the US-EAST-2 Region have experienced a loss of power. The loss of power is affecting part of a single data center within the affected Availability Zone. Power has been restored to the affected facility and at this stage the majority of the affected EC2 instances have recovered. We expect to recover the vast majority of EC2 instances within the next hour. For customers that need immediate recovery, we recommend failing away from the affected Availability Zone as other Availability Zones are not affected by this issue.

investigating2022-07-28T17:12:34.291Z

We're currently investigating an issue with the AWS us-east-2 (Ohio) region causing an outage for servers within that region.

Report: "Tickets and Chats Outage"

Last update 2022-07-28T19:32:07.193Z

resolved2022-07-28T19:32:06.764Z

As of 12:15 PM MST, the issues and errors around tickets and chats from our provider have been resolved.

identified2022-07-28T18:52:49.662Z

Latest update: July 28, 2022 11:36 AM: We are beginning to see some improvement in the error rates affecting the US-East region. Our team is monitoring and we will post another update in the next 30 minutes.

identified2022-07-28T18:29:57.208Z

Update from provider: July 28, 2022 11:00 AM: Our team continues to investigate elevated error rates and access issues in the US-East region. We will post another update within the next 30 minutes.

identified2022-07-28T17:44:49.635Z

Latest updates from the provider: We have confirmed access issues and high error rates in the US-East region. Further updates to come shortly.

investigating2022-07-28T17:31:27.196Z

We're currently investigating an issue with our Live Chat and Ticketing system in which the systems are intermittently available with our platform provider and working on a resolution in order to bring them fully back online.

Jun 21, 2022

Report: "Cloudflare Service Issues"

Last update 2022-06-21T08:11:19.736Z

resolved2022-06-21T08:11:19.719Z

This incident has been resolved.

monitoring2022-06-21T07:45:38.682Z

A fix has been implemented by Cloudflare and we are monitoring the results.

monitoring2022-06-21T06:58:35.781Z

Currently Cloudflare is investigating their issues across entire network. Pagely's Customers using Cloudflare may be affected and unable to access the site.

May 27, 2022

Report: "Ares Configuration Management Issue in Atomic Dashboard"

Last update 2022-05-27T17:25:16.226Z

resolved2022-05-27T17:25:15.741Z

We have tested and verified the issue is now resolved.

monitoring2022-05-27T17:00:51.228Z

A fix has been implemented and we are monitoring the results.

identified2022-05-27T16:52:23.563Z

We've identified the issue and are implementing a fix.

investigating2022-05-27T15:46:19.816Z

The issue also affects the provisioning of new apps - we ask that you please hold off on creating new apps in Atomic until the issue is resolved. If you urgently need to create a new app then please contact our Support.

investigating2022-05-27T15:33:29.601Z

We are continuing to investigate this issue.

investigating2022-05-27T15:33:09.000Z

Pagely Engineers are investigating an issue impacting the ability to add HTTP redirects, custom access rules and other ARES configuration operations within the Atomic Dashboard. This does not affect the availability or performance of any existing websites. If you are planning to perform any ARES rule management operations in Atomic, we sincerely apologize for this inconvenience and ask that you hold off on making changes within Atomic until we have the problem resolved.

May 12, 2022

Report: "Site Management Problems in Atomic Dashboard"

Last update 2022-05-12T21:44:32.356Z

resolved2022-05-12T21:44:32.342Z

We have tested and verified the issue is now resolved.

monitoring2022-05-12T21:27:03.472Z

The fix has been deployed and we are currently verifying that it has resolved the issue. This incident will be resolved if all looks good.

identified2022-05-12T21:16:33.744Z

We have identified the cause of the reported issues and a fix is on the way out.

investigating2022-05-12T20:56:00.514Z

Pagely Engineers are investigating an issue impacting the ability to add new sites, manage PHP versions, and perform other site management operations within the Atomic Dashboard. This does not affect the availability or performance of any existing websites. If you are planning to perform any site management operations in Atomic, we sincerely apologize for this inconvenience and ask that you hold off on making changes within Atomic until we have the problem resolved.

Feb 22, 2022

Report: "Slack Outage"

Last update 2022-02-22T19:45:44.434Z

resolved2022-02-22T19:45:32.781Z

This incident has been resolved.

investigating2022-02-22T16:42:21.396Z

We are continuing to investigate this issue.

investigating2022-02-22T16:42:14.848Z

Due to the widespread Slack outage, customers will be unable to access their private Slack support channels to interact with Pagely support. Live chat and Atomic support tickets are still functioning as normal.

Feb 16, 2022

Report: "Cloudflare Possible Network Performance Issues in West Coast (USA)"

Last update 2022-02-16T01:03:01.078Z

resolved2022-02-16T01:03:00.678Z

This incident has been resolved.

identified2022-02-15T21:56:27.030Z

Similar to issues experienced earlier today, Cloudflare is now reporting: "Cloudflare is Observing Possible Network Performance Issues in West Coast (USA)" If you are routing your Pagely traffic through Cloudflare, you may be experiencing Cloudflare error response codes like 520 or 524. Cloudflare has been updating the status of this incident on their end at https://www.cloudflarestatus.com/ Pagely hosting itself is not directly affected.

Feb 15, 2022

Report: "Cloudflare network congestion"

Last update 2022-02-15T19:56:42.429Z

resolved2022-02-15T19:56:42.028Z

Cloudflare is now reporting "All Systems Operational."

identified2022-02-15T19:39:29.178Z

Since approximately 17:29 UTC or before, Cloudflare has been experiencing network congestion. If you are routing your Pagely traffic through Cloudflare, you may be experiencing Cloudflare error response codes like 520 or 524. Cloudflare has been updating the status of this incident on their end at https://www.cloudflarestatus.com/ Pagely hosting itself is not directly affected.

Jan 7, 2022

Report: "WordPress 5.8.3 Security Release"

Last update 2022-01-07T18:44:47.055Z

resolved2022-01-07T18:44:47.034Z

Upgrades are now complete.

monitoring2022-01-06T23:50:56.778Z

The Pagely team has already begun rolling out this patch for all customers. If you have a version hold request on file, we will patch your site while keeping it on the same major branch version.

Dec 23, 2021

Report: "AWS incident causing availability issues for multiple VPS's"

Last update 2021-12-23T00:36:14.460Z

resolved2021-12-23T00:36:11.806Z

Update from Pagely: While we will continue monitoring for any issues, this issue is now resolved. --- Update from AWS: [4:22 PM PST] Starting at 4:11 AM PST some EC2 instances and EBS volumes experienced a loss of power in a single data center within a single Availability Zone (USE1-AZ4) in the US-EAST-1 Region. Instances in other data centers within the affected Availability Zone, and other Availability Zones within the US-EAST-1 Region were not affected by this event. At 4:55 AM PDT, power was restored to EC2 instances and EBS volumes in the affected data center, which allowed the majority of EC2 instances and EBS volumes to recover. However, due to the nature of the power event, some of the underlying hardware experienced failures, which needed to be resolved by engineers within the facility. Engineers worked to recover the remaining EC2 instances and EBS volumes affected by the issue. By 2:30 PM PST, we recovered the vast majority of EC2 instances and EBS volumes. However, some of the affected EC2 instances and EBS volumes were running on hardware that has been affected by the loss of power and is not recoverable. For customers still waiting for recovery of a specific EC2 instance or EBS volume, we recommend that you relaunch the instance or recreate the volume from a snapshot for full recovery.

monitoring2021-12-22T23:55:00.159Z

Update from Pagely: At this time our engineering team is not observing any issues with our customer's servers, databases, or sites. Our team will continue monitoring for any issues and providing relevant updates as they become available. --- Latest Updates from AWS: [12:03 PM PST] Over the last hour, after addressing many of the underlying hardware failures, we have seen an accelerated rate of recovery for the affected EC2 instances and EBS volumes. We continue to work on addressing the underlying hardware failures that are preventing the remaining EC2 instances and EBS volumes. For customers that continue to have EC2 instance or EBS volume impairments, relaunching affected EC2 instances or recreating affecting EBS volumes within the affected Availability Zone, continues to be a faster path to full recovery. [1:39 PM PST] We continue to make progress in addressing the hardware failures that are delaying recovery of the remaining EC2 instances and EBS volumes. At this stage, if you are still waiting for an EC2 instance or EBS volume to fully recover, we would strongly recommend that you consider relaunching the EC2 instance or recreating the EBS volume from a snapshot. As is often the case with a loss of power, there may be some hardware that is not recoverable, which will prevent us from fully recovering the affected EC2 instances and EBS volumes. We are not quite at that point yet in terms of recovery, but it is unlikely that we will recover all of the small number of remaining EC2 instances and EBS volumes. If you need help in launching new EC2 instances or recreating EBS volumes, please reach out to AWS Support. [3:13 PM PST] Since the last update, we have more than halved the number of affected EC2 instances and EBS volumes and continue to work on the remaining EC2 instances and EBS volumes. The remaining EC2 instances and EBS volumes have all experienced underlying hardware failures due to the nature of the initial power event, which we are working to resolve. We expect to make further progress on this list within the next hour, but some of the remaining EC2 instances and EBS volumes may not be recoverable due to hardware failures. If you have the ability to relaunch an affected EC2 instance or recreate an affected EBS volume from snapshot, we continue to strongly recommend that you take that path.

monitoring2021-12-22T19:42:10.292Z

Latest update from AWS: [11:08 AM PST] We continue to make progress in restoring power and connectivity to the remaining EC2 instances and EBS volumes, although recovery of the remaining instances and volumes is taking longer than expected. We believe this is related to the way in which the data center lost power, which has led to failures in the underlying hardware that we are working to recover. While EC2 instances and EBS volumes that have recovered continue to operate normally within the affected data center, we are working to replace hardware components for the recovery of the remaining EC2 instances and EBS volumes. We have multiple engineers working on the underlying hardware failures and expect to see recovery over the next few hours. As is often the case with a loss of power, there may be some hardware that is not recoverable, and so we continue to recommend that you relaunch your EC2 instance, or recreate you EBS volume from a snapshot, if you are able to do so.

monitoring2021-12-22T18:58:30.750Z

Latest update from AWS: [9:28 AM PST] We continue to make progress in restoring connectivity to the remaining EC2 instances and EBS volumes. In the last hour, we have restored underlying connectivity to the majority of the remaining EC2 instance and EBS volumes, but are now working through full recovery at the host level. The majority of affected AWS services remain in recovery and we have seen recovery for the majority of single-AZ RDS databases that were affected by the event. If you are able to relaunch affected EC2 instances within the affected Availability Zone, that may help to speed up recovery. Note that restarting an instance at this stage will not help as a restart does not change the underlying hardware. We continue to work towards full recovery.

monitoring2021-12-22T16:30:27.821Z

Almost all customer servers should be fully recovered at this time. Some high availability clusters are still running in impaired mode but that doesn't affect the availability of the sites at the moment.

identified2021-12-22T15:01:37.961Z

Latest update from AWS: [6:51 AM PST] We have now restored power to all instances and network devices within the affected data center and are seeing recovery for the majority of EC2 instances and EBS volumes within the affected Availability Zone. For the remaining EC2 instances, we are experiencing some network connectivity issues, which is slowing down full recovery. We believe we understand why this is the case and are working on a resolution. Once resolved, we expect to see faster recovery for the remaining EC2 instances and EBS volumes. If you are able to relaunch affected EC2 instances within the affected Availability Zone, that may help to speed up recovery. Note that restarting an instance at this stage will not help as a restart does not change the underlying hardware. We have a small number of affected EBS volumes that are still experiencing degraded IO performance that we are working to recover. The majority of AWS services have also recovered, but services which host endpoints within the customer’s VPCs - such as single-AZ RDS databases, ElasticCache, Redshift, etc. - continue to see some impact as we work towards full recovery.

identified2021-12-22T15:01:14.896Z

We are continuing to work on a fix for this issue.

identified2021-12-22T14:01:41.515Z

Update from Pagely: We are starting to see signs of recovery and have restored a portion of the affected servers. Some servers and RDS instances are still unavailable so we're working towards restoring those. Latest update from AWS: [5:39 AM PST] We have now restored power to all instances and network devices within the affected data center and are seeing recovery for the majority of EC2 instances and EBS volumes within the affected Availability Zone. Network connectivity within the affected Availability Zone has also returned to normal levels. While all services are starting to see meaningful recovery, services which were hosting endpoints within the affected data center - such as single-AZ RDS databases, ElastiCache, etc. - would have seen impact during the event, but are starting to see recovery now. Given the level of recovery, if you have not yet failed away from the affected Availability Zone, you should be starting to see recovery at this stage.

investigating2021-12-22T13:31:08.322Z

The issue is also affecting our Atomic dashboard, which is currently unavailable or returning errors intermittently.

investigating2021-12-22T13:26:37.000Z

Latest update from AWS: [5:18 AM PST] We continue to make progress in restoring power to the affected data center within the affected Availability Zone (USE1-AZ4) in the US-EAST-1 Region. We have now restored power to the majority of instances and networking devices within the affected data center and are starting to see some early signs of recovery. Customers experiencing connectivity or instance availability issues within the affected Availability Zone, should start to see some recovery as power is restored to the affected data center. RunInstances API error rates are returning to normal levels and we are working to recover affected EC2 instances and EBS volumes. While we would expect continued improvement over the coming hour, we would still recommend failing away from the Availability Zone if you are able to do so to mitigate this issue.

investigating2021-12-22T13:05:33.810Z

Latest update from AWS: [5:01 AM PST] We can confirm a loss of power within a single data center within a single Availability Zone (USE1-AZ4) in the US-EAST-1 Region. This is affecting availability and connectivity to EC2 instances that are part of the affected data center within the affected Availability Zone. We are also experiencing elevated RunInstance API error rates for launches within the affected Availability Zone. Connectivity and power to other data centers within the affected Availability Zone, or other Availability Zones within the US-EAST-1 Region are not affected by this issue, but we would recommend failing away from the affected Availability Zone (USE1-AZ4) if you are able to do so. We continue to work to address the issue and restore power within the affected data center.

investigating2021-12-22T12:53:08.455Z

Update from AWS: [4:35 AM PST] We are investigating increased EC2 launch failures and networking connectivity issues for some instances in a single Availability Zone (USE1-AZ4) in the US-EAST-1 Region. Other Availability Zones within the US-EAST-1 Region are not affected by this issue.

investigating2021-12-22T12:32:22.132Z

An issue with AWS in US-EAST-1 (Virginia) region is currently causing an outage for a large number of servers in the region.

Dec 15, 2021

Report: "Atomic Control Panel - API Errors"

Last update 2021-12-15T20:56:13.002Z

resolved2021-12-15T20:56:12.428Z

This incident has been resolved.

investigating2021-12-15T19:22:34.520Z

We're experiencing an elevated level of API errors within Atomic and are currently looking into the issue.

Dec 8, 2021

Report: "AWS incident causing availability issues with Atomic"

Last update 2021-12-08T10:58:55.111Z

resolved2021-12-08T10:58:54.207Z

This incident has been resolved.

identified2021-12-07T23:48:40.032Z

Latest updates from AWS: [2:04 PM PST] We have executed a mitigation which is showing significant recovery in the US-EAST-1 Region. [...] We still do not have an ETA for full recovery at this time. [2:43 PM PST] We have mitigated the underlying issue that caused some network devices in the US-EAST-1 Region to be impaired. [...] We continue to work toward full recovery for all impacted AWS Services and API operations. [...] [3:03 PM PST] Many services have already recovered, however we are working towards full recovery across services. [...] https://status.aws.amazon.com/ --- Update from Pagely: At this time Atomic is mostly stable, however some actions from within the dashboard may not function fully at this time. Pagely engineers are continuing to reinstate full functionality of Atomic still. We will continue our efforts and appreciate all of your patience with this continuing issue.

identified2021-12-07T20:39:12.733Z

Latest update from AWS: [11:26 AM PST] We are seeing impact to multiple AWS APIs in the US-EAST-1 Region. [...] The root cause of this issue is an impairment of several network devices in the US-EAST-1 Region. We are pursuing multiple mitigation paths in parallel, and have seen some signs of recovery, but we do not have an ETA for full recovery at this time. [...] https://status.aws.amazon.com/ --- Update from Pagely: Pagely Engineers have been working diligently to reinstate full functionality of Atomic. Unfortunately, the nature of the issue occurring at AWS limits our options for viable workarounds. We will continue our efforts to remediate the issue. The impact of this issue as it relates to Pagely is limited to the Atomic Dashboard. Our data does not indicate any problems with your actual websites or host servers. With that said, we will continue to monitor things using systems residing outside of AWS and remediate any issues that may occur. Thank you for your continued patience while this incident is ongoing.

identified2021-12-07T18:22:46.767Z

Customer API and CI/CD should be working but may intermittently return errors.

identified2021-12-07T16:07:36.131Z

The outage further affects the Atomic API and CI/CD Integrations.

identified2021-12-07T16:04:24.124Z

An issue with AWS in us-east-1 (Virginia) region is currently causing an outage of the Atomic control panel.

Oct 27, 2021

Report: "Atomic Login Issues"

Last update 2021-10-27T03:01:56.345Z

resolved2021-10-27T03:01:53.734Z

The issue preventing users to logging into Atomic control panel has been resolved. Feel free to contact us back in case you still experiencing any issue.

monitoring2021-10-26T23:24:47.541Z

A fix has been applied to allow users to login again. We will continue to monitor for further issues. If you continue to experience any issues logging in, don't hesitate to contact our support team.

investigating2021-10-26T21:54:14.074Z

Pagely Engineers are currently investigating issues with logging into the Atomic control panel. Some users may not be able to login to their account at this time due to these issues. We will update as progress is made to fix the issues.

Oct 6, 2021

Report: "Migration to new payment gateway"

Last update 2021-10-06T20:15:59.982Z

resolved2021-10-06T20:15:59.966Z

This incident has been resolved.

monitoring2021-10-06T20:15:41.288Z

We are continuing to monitor for any further issues.

monitoring2021-07-22T17:43:04.820Z

In an ongoing effort to improve our tools and our user's experience, we are currently migrating to a new billing system. During this migration, our support team will be available to assist with any billing information updates. Please <a href="https://support.pagely.com/hc/en-us/articles/114094215332-Contacting-Support">submit a ticket</a> for additional details. We appreciate your patience and apologize for any inconvenience this may cause.

Oct 5, 2021

Report: "Facebook Services Down -"

Last update 2021-10-05T20:14:09.331Z

resolved2021-10-05T20:14:09.313Z

Facebook is back online and functioning normally at this time. Our monitoring doesn't show any further performance issues in relation to Facebook's outage any further.

monitoring2021-10-05T00:23:10.126Z

Facebook and services that are part of Facebook have all appeared to come back online. However, some services may still be recovering at this time. You can view the status of some Facebook services at the following: https://status.fb.com/ Customers utilizing some of their services may still experience some issues at this time.

investigating2021-10-04T16:52:07.956Z

We're currently aware that Facebook services appear to be fully down at this time. As a result of this, sites that may utilize Facebook in any way may be running into timeouts or "Service Unavailable" errors.

Sep 10, 2021

Report: "WordPress 5.8.1 Security and Maintenance Release"

Last update 2021-09-10T13:58:31.284Z

resolved2021-09-10T13:58:31.265Z

Upgrades are now complete.

monitoring2021-09-09T14:51:06.122Z

The Pagely team has already begun rolling out this patch for all customers. If you have a version hold request on file, we will patch your site while keeping it on the same major branch version.

Jul 12, 2021

Report: "Intermittent connectivity issues"

Last update 2021-07-12T16:11:20.962Z

resolved2021-07-12T16:11:20.579Z

The issue has now been resolved and our servers are no longer experiencing connectivity issues.

investigating2021-07-12T12:10:05.931Z

We're continuing to investigate this issue. We're being alerted about 5-10 minutes outages intermittently. We're not seeing errors on signs of resource contention on the servers, however, some requests do not reach the servers due to networking issues. A small number of servers in the Ohio hosting region may also be affected.

investigating2021-07-12T10:49:58.233Z

We are currently investigating what appear to be connectivity/networking issues on AWS's side. This is causing intermittent timeout errors over a wide range of sites and has so far only been affecting servers in the Virginia hosting region.

Jun 11, 2021

Report: "Chat System Outage"

Last update 2021-06-11T00:08:33.144Z

resolved2021-06-11T00:08:32.540Z

This issue is now resolved and live chat is operational.

investigating2021-06-10T22:43:46.935Z

We're currently investigating an issue with our Live Chat system with our platform provider and working on a resolution in order to bring it back online.

Jun 8, 2021

Report: "Fastly is having an outage"

Last update 2021-06-08T13:22:04.933Z

resolved2021-06-08T13:22:04.499Z

Fastly status is reporting the incident as resolved: https://status.fastly.com/ Our monitoring does not show any sites still down due to the outage. As such we are marking this incident as resolved.

monitoring2021-06-08T10:56:22.799Z

Fastly status page now shows "The issue has been identified and a fix is being implemented". Customers that use Fastly can monitor https://status.fastly.com/ for further updates. Our monitoring is not reporting any sites as being down at this time.

investigating2021-06-08T10:16:08.931Z

Fastly appears to be having an outage. We are unable to load the main page from several locations and some client sites using Fastly are reporting down. Fastly has posted an update on their status page saying that they're currently investigating the issue: https://status.fastly.com/ Customers using Fastly may experience their sites being down at this time. Pagely hosting is unaffected.

Apr 26, 2021

Report: "WordPress Core Updates to 5.7.1"

Last update 2021-04-26T08:03:22.345Z

resolved2021-04-26T08:03:21.707Z

Upgrades are now complete.

monitoring2021-04-23T05:29:15.113Z

We have approved the release quality of WordPress version 5.7.1 and will be deploying it for all customer websites which don't have a version hold configured. Every customer is always up to date with their respective latest minor security release and a more consistent core version will ensure better compatibility as well as improve logistics. Completion is expected by the 28'th of April.

Mar 16, 2021

Report: "Atomic Control Panel - Elevated Response Times"

Last update 2021-03-16T23:31:05.478Z

resolved2021-03-16T23:31:04.988Z

This incident has been resolved.

monitoring2021-03-16T23:03:18.078Z

A fix has been applied and we are currently monitoring performance. All services should be operating normally at this time, however please do not hesitate to contact our Support Team if you are still experiencing any problems with Atomic. A resolution will be posted once we conclude things are back to normal.

investigating2021-03-16T22:12:19.306Z

Pagely Engineers are currently investigating elevated response times for interactions within the Atomic Control Panel. This does not relate to the performance of your actual hosting service. Certain interactions within Atomic, such as loading different sections or performing searches, may be performing inconsistently or returning error messages for exceeding timeouts. We will update this page as progress is made.

Mar 7, 2021

Report: "International traffic into the Sydney datacenter network connectivity issues"

Last update 2021-03-07T21:51:26.138Z

resolved2021-03-07T21:51:26.121Z

The external internet provider to the Sydney region network has been shifted to recover from this event. This incident has been resolved.

monitoring2021-03-07T19:21:59.932Z

The issue has been identified and a fix is being implemented by our underlying provider for the Sydney region. The network connectivity monitoring alerts have resolved, but we're continuing to keep an eye on things.

investigating2021-03-07T18:18:00.293Z

Internal monitoring has alerted us to network connectivity issue in the Sydney region, that our Operations team is working to resolve. Some international visitor requests outside of Australia into the Sydney region may be impacted while this is ongoing. Visitor requests within Australia should not be impacted at this time. Apologies on the disruption and a further update will be posted shortly.

Jan 4, 2021

Report: "Slack Outage"

Last update 2021-01-04T19:53:43.942Z

resolved2021-01-04T19:53:43.554Z

Slack is reporting that this issue is resolved, so customers with private Slack support channels should be able to use them as normal at this time.

investigating2021-01-04T17:44:34.870Z

At this time, Slack is connecting but may still be slow to respond.

investigating2021-01-04T16:22:47.312Z

Dec 10, 2020

Report: "RDS connectivity issue on p20-aurora-2 shared RDS"

Last update 2020-12-10T12:42:31.263Z

resolved2020-12-10T12:42:31.246Z

Everything has looked stable since yesterday's database restoration. This incident is now resolved.

monitoring2020-12-09T20:34:12.334Z

At this time your sites are operational again. Pagely Engineers have reinstated your databases to a very near point-in-time recovery. The first signs of problems occurred at 18:18 UTC, and we have restored your databases to a point-in-time backup at 18:00 UTC.

identified2020-12-09T18:54:01.760Z

The issue has been identified and a fix is being implemented.

investigating2020-12-09T18:41:41.365Z

Internal monitoring has alerted to an issue on a shared RDS (p20-aurora-2) that our Operations team is working to resolve. This appears to be related to a bug on the Amazon RDS service. We are currently working to bring up a new RDS cluster from a recent point in time to send traffic to it. Some customer sites, particularly uncached traffic requests, may be impacted while this is ongoing. Apologies on the disruption and a further update will be posted shortly.

Dec 9, 2020

Report: "RDS connectivity issue on a virginia-aurora shared RDS."

Last update 2020-12-09T02:54:48.389Z

postmortem2020-12-09T02:52:25.495Z

After more discussion with the database team at Amazon Web Services, we have a better understanding of the failure that occurred. The root cause was determined to be a rare bug within the Aurora database engine. The bug causes the mysql process to be unable to start up properly when an ALTER statement is interrupted by a DB instance reboot. This is why we were unable to launch new instances or new DB clusters from point-in-time recovery targets after the ALTER was issued, which contributed to the extended downtime you experienced. Amazon was able to correct the problem for the affected system and they have confirmed that this bug will be fixed in an upcoming RDS update. Pagely will apply the patch to all of our Aurora Database Clusters as soon as it becomes available. In the interim, our DevOps team knows about this issue and we feel confident that it will not recur. The conditions for triggering this bug are very specific which we can account for and avoid rather easily. Our team has adopted new Standard Operating Procedures which takes this condition into account when interacting with our database clusters. We appreciate your patience and understanding both during and following this event.

resolved2020-12-05T05:29:55.000Z

Summary On Friday December 4, 2020, one of our managed Amazon Aurora Database Clusters, vps-virginia-aurora-3, experienced an extended outage lasting approximately three and a half hours. A relatively small portion of the overall sites hosted in this region reside on this DB cluster, that is to say that there was plenty of spare capacity at the time of the event. Affected sites experienced database connection errors throughout the duration of the event. The recovery point of your data when services were restored is approximately 15-70 minutes prior to the onset of the service interruption. - Database services for the affected sites were unavailable between 9:15AM PST and 12:30PM PST. - By 12:30PM PST, service availability was restored to a backup DB cluster with a 9:00AM PST point-in-time. - By approximately 7:00PM PST, all restored sites were fully migrated to a brand new Aurora RDS DB Cluster and away from the problem system. More Details Our investigation is ongoing and we are working closely with the team at Amazon to fully understand the nature of this issue. Although database issues have happened in the past, they are usually resolved within a few minutes, not hours. An event of this nature had not occurred for us before. We need more time to investigate the matter with Amazon before we can say definitively what the cause was. Rest assured, both Pagely and Amazon are interested in finding a root cause so that a similar event can not occur in the future. We have already had some great discussions on mitigating this type of impact in the future, and we continue to work on determining a root cause. We can tell you that the behavior we observed of this Aurora Cluster was not typical and it also got the attention of the database team at AWS who, independent of Pagely's investigation, noticed the DB cluster was behaving erratically and connected with us to let us know they're applying an emergency fix. Typical actions such as adding a reader instance, performing a failover, restarting a DB instance, were not working for this cluster until steps were taken by AWS to address a problem they were seeing. While the issue was ongoing with the original DB cluster, Pagely Engineers were also launching new DB clusters with varying point-in-time recovery targets. This is a proactive step we will take if we feel the time it could take for a system to recover exceeds the time it may take to launch a new DB cluster with slightly older data. Our goal during these moments is to get sites running again as quickly as possible and with the last-known good set of data. At a certain point in the incident, because things were taking so long, we told you we'd restore from older (less than 24hrs) SQL backups, but we actually were able to get a DB cluster launched with a fairly recent point-in-time recovery target (15-70 minutes old). After this recovery was performed, and with the assistance of AWS, the originally affected system was also brought back to an operational state. This system is currently under evaluation and is not currently powering any of your live sites. Migrations were performed to get all affected sites relocated to a completely different and newly built Aurora DB cluster. With that said, if you think you are missing any data please us know and we can provide you with a separate SQL dump from the affected system for manual examination. We want to assure you that every step was taken to restore service availability as soon as possible and with the most current possible data set. Some of these operations take time to complete, even when everything is working correctly. When things are not working correctly, recovery timelines can be impacted further. We have a playbook we follow in these situations and we always try to think a few steps ahead. This typically leads to no or very little noticeable impact to your services, but then there are days like today. We always work to keep events of this severity a rarity, if not a faint memory, most of the time, and we thank you for your understanding as we worked to get things back to normal.

monitoring2020-12-04T20:40:07.021Z

At this time your sites are operational again. Pagely Engineers have reinstated your databases to a very near point-in-time recovery. We did not need to resort to the older SQL backups. The first signs of problems occurred between 16:15 and 17:00 UTC, and we have restored your databases to a point-in-time backup at 16:00 UTC. Further efforts are underway to migrate your databases to a final placement on one of our very newest DB clusters. We will follow up shortly with additional information, including a root cause analysis and issuance of service credits. Thank you.

identified2020-12-04T20:08:05.683Z

Pagely Engineers continue to wait for the most recent data sets in our restoration efforts to complete provisioning. At this time, we will begin restoring affected sites from your regular SQL backups. The age of this data is slightly older, but no more than 24 hours old. This is only being done because of the extended time it is taking to recover sites with more current data, we'd like to reinstate site availability as soon as possible. Our team will happily assist in providing more recent data after that is made available by the system.

identified2020-12-04T19:20:45.969Z

Pagely Engineers are still working to restore availability of the databases for all affected sites. We sincerely apologize for the extended delay, a full post-mortem will be provided after services are restored. Our team is currently working through a novel failure case that is not fixable by the typical remediation steps we take - such as adding a new replica instance to a DB cluster - and attempts to launch a new cluster based on the latest point in time is also taking an extended period of time to complete. While we are waiting for these contingency measures to finish provisioning, the originally affected database cluster is beginning to show signs of self-recovery. So our team will continue to assess the situation and make a decision soon based on the earliest available resource to restore your applications. Depending on the outcome of that, the data may be very current or slightly (15-30 minutes) behind. We will continue to report on progress as it is made.

identified2020-12-04T18:03:17.221Z

The vps-virginia-aurora-3 database cluster has experienced a critical failure. Although data integrity is still okay, we are having trouble getting the DB cluster to start. Pagely Engineers have already initiated the process to launch a new DB cluster with the latest available data set as a point-in-time recovery. Once this resource has finished creating, we will update your application to use the new endpoint.

identified2020-12-04T17:26:18.535Z

We are continuing to work on a fix for this issue.

identified2020-12-04T17:22:34.000Z

Internal monitoring has alerted to an issue on a shared RDS that our Operations team is working to resolve. Some customer sites, particularly uncached traffic requests, may be impacted while this is ongoing. Apologies on the disruption and a further update will be posted shortly.