Is Platform.sh Down Right Now? Discover if there is an ongoing service outage.

Platform.sh is currently Operational

Last checked Jul 29, 2025 14:26 UTC from Platform.sh's official status page

Historical record of incidents for Platform.sh

Jun 27, 2025

Report: "UI issue in Upsun Console"

Last update 2025-06-27T19:45:03.300Z

investigating2025-06-27T19:45:03.298Z

We’ve identified an issue where the Autoscaling feature is mistakenly visible in the Upsun Console interface. While the UI appears, the feature itself is not yet available and is currently non-functional. Our team is actively working on reverting this unintended change. In the meantime, please note that enabling or configuring Autoscaling will not have any effect. We apologize for the confusion and will provide an update once the rollback is complete. If you have any questions about this issue, please contact Platform.sh Support by logging in to https://console.platform.sh/ and visiting the Support Center.

Jun 16, 2025

Report: "Potential storage performance issues on FR-3"

Last update 2025-06-16T07:45:51.029Z

identified2025-06-16T07:45:51.026Z

Our monitoring has detected some issues with our cloud infrastructure provider which affect all sites hosted on the FR-3 region. Our operations team has been notified and they are investigating the issue with our provider. We will update you as soon as we have further information.

Jun 12, 2025

Report: "Unscheduled Maintenance on ch-1.platform.sh"

Last update 2025-06-12T12:56:00.026Z

resolved2025-06-12T09:00:00.000Z

The CH-1 region required an unscheduled maintenance period due to load concerns between 0900 and 1100 UTC. Our Operations team increased the capacity of the underlying hosts to provide additional capacity to the gateways. Projects hosted on CH-1 may have experienced performance issues with Console WebUI, SSH, and Git Integration access to projects, as well as connection issues with deployment activities during this maintenance.

Report: "Partial Outage on FR-3"

Last update 2025-06-12T10:43:10.566Z

investigating2025-06-12T10:43:10.562Z

We have detected an issue affecting service on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service.

Report: "Partial Outage on au.platform.sh"

Last update 2025-06-12T05:46:43.359Z

monitoring2025-06-12T05:46:43.350Z

We have detected some timeouts with the API services on the au.platform.sh region. - API, console, CLI, SSH and deployments are affected - Live sites are not affected Our operations team is investigating

Jun 4, 2025

Report: "Partial Outage on EU-5"

Last update 2025-06-04T11:35:33.046Z

investigating2025-06-04T11:35:33.042Z

Status moved back to investigating due to potential issues.

monitoring2025-06-04T11:21:10.323Z

Alerts have resolved. We are monitoring this situation.

identified2025-06-04T11:06:07.584Z

The issue has been identified and a fix is being implemented.

investigating2025-06-04T11:01:13.811Z

We noticed only a partial outage only on a few projects. We are currently investigating the issue.

Report: "Partial Outage on fr-3.platform.sh"

Last update 2025-06-04T07:42:03.172Z

postmortem2025-06-04T07:32:44.491Z

# What Happened Recent maintenance on the fr-3.platform.sh region caused some subsystems to be unresponsive. This resulted in an outage of the project console, API, and subsystems in the fr-3.platform.sh region. Live environments on Grid or Dedicated infrastructure were not affected. # Customer Impact ‌ From 2025-06-03 20:00 UTC to 2025-06-04 05:26 UTC, some customers may have had trouble accessing the project console, project API, SSH, and submitting deployments and backups. ‌ # What was done to resolve the incident Our team corrected the internal states of those subsystems in order to make them operational again.

resolved2025-06-04T06:57:21.190Z

This incident has been resolved.

monitoring2025-06-04T05:19:37.743Z

Services have been restored and we are monitoring to ensure stability

identified2025-06-04T04:12:57.300Z

We have detected an issue affecting service on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. The console and project services (deployments) are currently unavailable for some projects on the region. Production site uptime is NOT affected. We will update you as soon as we have further information.

Jun 3, 2025

Report: "Partial Outage on fr-3.platform.sh"

Last update 2025-06-03T23:12:00.000Z

Identified2025-06-03T23:12:00.000Z

We have detected an issue affecting service on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. The console and project services (deployments) are currently unavailable for some projects on the region.Production site uptime is NOT affected. We will update you as soon as we have further information.

May 31, 2025

Report: "Scheduled Maintenance – Accounts Service"

Last update 2025-05-31T00:53:00.000Z

Completed2025-05-31T00:53:00.000Z

The scheduled maintenance has been completed.

Update2025-05-31T00:52:00.000Z

Scheduled maintenance is still in progress. We will provide updates as necessary.

In progress2025-05-30T23:00:00.000Z

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Scheduled2025-05-29T05:34:00.000Z

We will be performing scheduled database maintenance on the Accounts service to improve system performance and reliability.During this time, certain account-related functionalities may be temporarily unavailable, including project creation, ticket submission, user management, provisioning, billing, updating billing addresses, payment methods, SSH keys, and profile pictures.All customer environments will remain available, and the project Console will continue to function normally throughout the maintenance window.If you have any questions or concerns, please don't hesitate to reach out via our discord channel by logging into https://discord.gg/PkMc2pVCDV.

May 28, 2025

Report: "Routine Maintenance in ca-1.platform.sh"

Last update 2025-05-28T23:51:00.000Z

Completed2025-05-28T23:51:00.000Z

The scheduled maintenance has been completed.

In progress2025-05-28T23:00:00.000Z

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Scheduled2025-05-23T03:47:00.000Z

Platform.sh has scheduled a maintenance window in the ca-1.platform.sh region.The host servers that run the region will be rebooted and/or upgraded. Downtime during this maintenance is expected only for a small proportion of the region’s projects running on the affected host.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team

Report: "Partial Outage on EU-5"

Last update 2025-05-28T12:30:59.326Z

postmortem2025-05-28T12:30:31.383Z

A host in the grid had a failure and the automatic evacuations of running containers in it failed too. Operations team manually evacuated those containers to other hosts and the faulty host was replaced with a new one.

resolved2025-05-26T14:26:20.150Z

This incident has been resolved.

monitoring2025-05-26T12:51:26.448Z

A fix has been implemented and we are monitoring the results.

identified2025-05-26T12:38:19.217Z

The issue has been identified and Engineers are working on a fix.

investigating2025-05-26T12:34:01.024Z

We are continuing to investigate this issue.

investigating2025-05-26T12:33:30.224Z

We noticed a few sites on alert due to a host service issue on EU-5. This is for Grid only. Our Engineers are working on it.

Report: "Partial Outage on EU-5"

Last update 2025-05-28T07:30:00.000Z

Postmortem2025-05-28T07:30:00.000Z

Resolved2025-05-26T09:26:00.000Z

This incident has been resolved.

Monitoring2025-05-26T07:51:00.000Z

A fix has been implemented and we are monitoring the results.

Identified2025-05-26T07:38:00.000Z

The issue has been identified and Engineers are working on a fix.

Update2025-05-26T07:34:00.000Z

We are continuing to investigate this issue.

Investigating2025-05-26T07:33:00.000Z

We noticed a few sites on alert due to a host service issue on EU-5. This is for Grid only. Our Engineers are working on it.

May 16, 2025

Report: "Partial Outage on us-3"

Last update 2025-05-16T12:16:09.233Z

postmortem2025-05-16T12:05:03.458Z

## **What happened** Our monitoring systems have detected slow heartbeat responses from several storage nodes. Initial analysis shown network latency or disk I/O contention on specific nodes. This incident did not affect Dedicated infrastructure. ## **Customer Impact** Customer Impact was between 08:20 and 09:00 A.M UTC at 2025-05-16. Some live sites in the affected region experienced an outage and customers were unable to access console or conduct any deployments. ## **What was done to resolve the incident** The storage back-end healed itself.

resolved2025-05-16T11:29:10.673Z

This incident has been resolved.

monitoring2025-05-16T10:45:24.649Z

Our systems are now stable and we are continuously monitoring it.

identified2025-05-16T09:22:43.078Z

The issue has been identified and Engineering team is working on it.

investigating2025-05-16T09:11:06.620Z

We have detected an issue affecting service on the US-3 regions. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience downtime or slowness. We will update you as soon as we have further information.

investigating2025-05-16T09:07:05.308Z

We identified partial outage on our US-3 Grid projects.

May 9, 2025

Report: "Partial Outage on ca-1.platform.sh"

Last update 2025-05-09T04:20:06.534Z

resolved2025-05-09T03:45:00.000Z

A hardware issue with our upstream provider caused a single grid host to reboot unexpectedly. As a result, we had to manually fail-over the environments running on that host. We are investigating why the automatic fail-over did not take place and will ensure that the expected fail-over occurs in future. Customers may have experienced outages of up to 26 minutes during this incident.

Apr 17, 2025

Report: "Partial Outage on FR-3.platform.sh"

Last update 2025-04-17T07:56:56.050Z

postmortem2025-04-17T07:53:20.210Z

# What Happened Our upstream provider for region FR-3.platform.sh had an unexpected network equipment issue during scheduled maintenance. # Customer Impact Between 2025-04-17 01:36 UTC and 03:21 UTC, services including project console UI, API, deployments, SSH, backups and customers' environments as well as live ones were unavailable. # What was done to resolve the incident We received a recovery report from our upstream provider and have restored the affected services and environments. Services and environments in region FR-3.platform.sh should now be available and accessible. Incident resolved.

resolved2025-04-17T04:13:06.590Z

This incident has been resolved.

monitoring2025-04-17T03:38:54.980Z

Our team has successfully restored the availability of those affected environments following the recovery report from the upstream provider. All FR-3 environments should now be accessible.

identified2025-04-17T02:27:17.303Z

Our upstream provider has confirmed that a network equipment issue is impacting their scheduled maintenance.

identified2025-04-17T02:10:56.995Z

Our upstream provider is reporting unspecified problems with their infrastructure. Our team will be keeping a close eye on the updates from our upstream provider and will take corrective actions as soon as possible.

investigating2025-04-17T01:49:22.866Z

We have detected an issue affecting service on the FR-3 region. Our Operations team has been notified and is currently working to restore service. Environments on affected regions may be unavailable. We will update you as soon as we have further information.

Mar 26, 2025

Report: "Partial outage of Console and Deployment services on FR-3 region"

Last update 2025-03-26T05:25:54.705Z

postmortem2025-03-26T05:23:08.794Z

# What Happened A networking issue was found in region FR-3.platform.sh, affecting the subsystems responsible for handling console UI, project API, SSH and deployment functions. Customers may have experienced issues while accessing project console, interacting with project API, logging into environments with SSH, submitting deployments and backups. ‌ # Customer Impact From 2025-03-25 17:58 **UTC** to 2025-03-26 01:43 **UTC** , some customers were unable to access console, environments or submit any deployments. ‌ There was no impact on environments or production sites. ‌ # What was done to resolve the incident Our team has restored the availability of subsystems responsible for handling console UI, project API and deployments functions. Incident resolved.

resolved2025-03-26T04:23:50.683Z

This incident has been resolved.

monitoring2025-03-26T02:05:15.767Z

A fix has been implemented and we are monitoring the results.

identified2025-03-26T02:04:51.457Z

Our Operations team has identified a networking issue and deployed a correction fix to restore the availability of those related subsystems.

investigating2025-03-26T01:57:41.644Z

We have detected an issue affecting service on the FR-3.platform.sh region. Some customers may not be able to submit deployments nor access to console UI and/or project API. Our Operations team has been notified and is actively recovering the availability of those services. This issue does not affect availability of any environments, only the API services for the project. There is no site downtime as a result of this issue.

Mar 23, 2025

Report: "Degraded Performance of Console and Deployment services on AU, AU-2 regions."

Last update 2025-03-23T08:34:20.563Z

postmortem2025-03-23T08:30:07.789Z

# What happened A recent upgrade to regions AU.platform.sh and AU-2.platform.sh was unable to fully purge the stale network states within our routing infrastructure. This caused client programs within our subsystems responsible for handling deployments, console and project API having busy-looping processes retrying invalid TCP connections indefinitely and consumed all the available CPU resources. ‌ This caused the console to freeze, slow loading times when using project API functions, and delays in deployments or backups. ‌ After further investigation, the Operations team has deployed an emergency fix to eliminate those busy-looping processes. Our team has stopped the scheduled upgrades to other regions and is still working on a permanent fix. ‌ # Customer impact From 2025-03-20 12:50 **UTC** to 2025-03-21 09:00 **UTC**, customers may experience delays when loading console, triggering deployments or taking backups for their projects in regions AU.platform.sh and AU-2.platform.sh . ‌ From 2025-03-21 07:27 **UTC** to 2025-03-21 07:55 **UTC**, there may be issues with outgoing connections in AU.platform.sh environments, including live ones, due to our emergency fix deployment. ‌ # What was done to resolve the incident The Operations team has taken out the busy-looping processes in the subsystems for deployments, console, and project API functions and deployed an emergency correction fix to the affected regions. ‌ Incident has been resolved.

resolved2025-03-23T05:06:04.139Z

This incident has been resolved.

monitoring2025-03-21T09:10:02.040Z

After applying an emergency fix, we can see that the services are now operational for both AU and AU-2 regions. Our Operations team is still actively monitoring the services and implementing a permanent fix for this issue.

identified2025-03-21T08:51:36.000Z

AU is now fully recovered. Our engineers are still implementing a permanent fix for this issue.

identified2025-03-21T06:39:12.107Z

The outgoing connections issues have been corrected after deploying an emergency fix.

identified2025-03-21T06:35:24.551Z

We are continuing to work on a fix for this issue.

identified2025-03-21T06:28:10.997Z

We noticed that some customer environments are failing to make outgoing connections. Our engineers are still actively working on the recovery.

identified2025-03-21T05:23:27.663Z

AU-2 is now fully recovered.

identified2025-03-21T04:55:07.139Z

We have detected an issue affecting service on the AU and AU-2 regions. Our Operations team has been notified and is currently working to resolve the issue. Projects on affected regions may experience delays when loading console, triggering deployments or taking backups. This issue does not affect availability of any environments, only the API services for the project. There is no site downtime as a result of this issue.

Feb 14, 2025

Report: "Outage on EU-5"

Last update 2025-02-14T05:29:24.398Z

postmortem2025-02-14T05:13:13.322Z

WHAT HAPPENED An incident at upstream provider \(AWS\) affecting networking resulted in outages in EU-5 region CUSTOMER IMPACT Sites were unavailable from 23:30 UTC Feb 13 to 01:00 UTC Feb 14. WHAT WAS DONE TO RESOLVE THE INCIDENT Fixes implemented by upstream provider resolved the incident.

resolved2025-02-14T05:05:01.634Z

The upstream problem has been resolved, and we are not receiving any further alerts. As a result, we are going to mark the incident as resolved.

monitoring2025-02-14T01:00:49.440Z

Affected projects are back online as upstream provider has implemented fixes - we will continue to monitor the situation.

investigating2025-02-13T23:59:41.253Z

We have detected an issue at our upstream provider affecting service on the EU-5 region. This issue affects multiple production sites as well as development environments. Access to your site, project UI as well as Git and SSH access may be affected. We will update you as soon as we have further information.

Dec 13, 2024

Report: "Partial Outage on EU-5 region"

Last update 2024-12-13T07:20:33.391Z

resolved2024-12-13T07:20:33.360Z

We have not seen any further storage alerts in the region and marking the incident as resolved.

monitoring2024-12-12T00:56:13.191Z

Operations team has implemented fixes and storage infrastructure is no longer in degraded state. We have not received new outage reports however, We will continue to monitor the situation.

identified2024-12-11T11:28:58.851Z

Our Team have added additional storage nodes and are actively monitoring the region.

investigating2024-12-11T05:35:29.538Z

We have detected degraded performance due to updates to storage infrastructure affecting service on the EU-5 region. Our Operations team has been notified and is currently investigating. Projects on the region may experience web request time-outs. We will update you as soon as we have further information.

Report: "HTTP Traffic reporting in console unavailable on Upsun"

Last update 2024-12-13T07:14:09.209Z

resolved2024-12-13T07:14:09.194Z

Our engineers have completed the roll-out of the fix and HTTP Traffic reporting is now working in the Upsun console.

investigating2024-12-02T19:22:10.030Z

Our engineers are continuing to investigate this issue. We believe we have identified a mitigation path, and are working to roll it out at an appropriate time. We will continue to provide updates here as we have new information to offer.

investigating2024-11-28T23:18:15.278Z

We have detected an issue affecting HTTP Traffic reporting in the Upsun console. Our Engineers are investigating and working to resolve this issue. No site services are impacted by this issue. Only reporting in the Upsun console. We will update you as soon as we have further information.

Dec 5, 2024

Report: "Console issues when creating variables."

Last update 2024-12-05T11:36:43.692Z

resolved2024-12-05T11:36:43.678Z

Our Team has deployed the fix for the issue and the incident is now resolved.

identified2024-12-05T08:23:08.203Z

The issue has been identified and a fix is being implemented.

investigating2024-12-05T08:22:53.657Z

We are currently seeing an issue with our console when users are trying to create variables. Our Teams are actively investigating the issue. Please use the CLI to add variables for now. https://docs.platform.sh/development/variables/set-variables.html#create-project-variables

Report: "Reports of phishing emails from @internal.platform.sh"

Last update 2024-12-05T09:42:14.198Z

resolved2024-12-05T09:42:14.186Z

This incident has been resolved.

investigating2024-11-22T00:20:44.085Z

We are aware of reports of spam/phishing emails being sent from @internal.platform.sh. Out of an abundance of caution we recommend not opening these emails or clicking on any links. We're investigating the issue with our email provider Sendgrid and are working to resolve this issue. For any questions please contact our support team by creating a ticket. Information on how to do that is in our documentation: https://docs.platform.sh/learn/overview/get-support.html

Nov 29, 2024

Report: "Partial Outage on us-2"

Last update 2024-11-29T13:02:45.479Z

postmortem2024-11-29T12:28:10.746Z

## **What happened** We detected a degradation of one or more physical storage drives in the us-2 region. As a result of that one host was down and containers exhibited unresponsiveness and or poor performance. After investigation, our engineers evacuated the containers to bring it back online. ## **Customer Impact** Between 2024-11-20 13:10 UTC - 13:55 UTC , platform.sh gird customers containers had read/write issues to the disk. ## **What was done to resolve the incident** Our team quickly evacuated the containers to another host and re-opened / restarted the affected containers and restored availability.

resolved2024-11-20T13:54:27.461Z

This incident has been resolved now.

monitoring2024-11-20T13:29:08.747Z

A fix has been implemented and we are monitoring it.

identified2024-11-20T13:22:07.096Z

The issue has been identified and we are working on a fix.

investigating2024-11-20T13:12:27.468Z

We have detected an issue affecting services on the us-2 region. Our Operations team has been notified and is currently working to restore service. Projects may experience downtime. We will update you as soon as we have further information.

Report: "Partial Outage on FR-3"

Last update 2024-11-29T09:28:13.914Z

postmortem2024-11-29T09:08:22.657Z

## **What happened** We have identified issues on hosts with git containers. This led to a Console and API outage on platform FR-3 region. This incident did not affect website availability on Grid or Dedicated infrastructure. After investigation, our engineers identified the host and rebooted and made sure all services were up after the reboot. ## **Customer Impact** Between 2024-11-22 11:00 AM UTC and 14:00 PM UTC , customers were unable to access console or conduct any deployments. There was no impact on environments or production sites. ## **What was done to resolve the incident** Our team quickly found the offending host and rebooted it. Then they re-opened / restarted the affected containers and restored availability. Console, API and Auth sub-systems outage on platform cloud are now resolved. ‌ `.`

resolved2024-11-22T14:00:06.621Z

This incident has been resolved.

monitoring2024-11-22T12:28:11.091Z

A fix has been implemented and we are monitoring the results.

identified2024-11-22T11:33:52.586Z

The issue has been identified and fix is being implemented.

investigating2024-11-22T11:00:34.840Z

We have detected an issue affecting service on the FR-3 regions. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience unable to login to console, listing environments via command line or unable to push changes. We will update you as soon as we have further information.

Report: "Disk issue in eu-5"

Last update 2024-11-29T08:58:24.627Z

postmortem2024-11-29T08:43:07.985Z

## What happened A few of our customers containers had disk issues in EU-5. This incident affected read/write operation for containers that were affected. After our investigation our Engineers found that one OSD \(CEPH storage\) had slow ops, and hit the IOPs limit. Hence just that one osd was changed to io2 type to mitigate the issue. ## **Customer Impact** Between 2024-11-25 09:45 AM UTC and 10:05 AM UTC , platform eu-5 gird customers containers had issues read/write issues. ## **What was done to resolve the incident** Our team quickly fixed the disk and also re-opened / restarted the affected containers and restored availability. This incident are now resolved.

resolved2024-11-25T13:40:47.498Z

This incident has been resolved

monitoring2024-11-25T11:16:20.047Z

A fix has been implemented and we are monitoring the situation.

identified2024-11-25T10:11:04.430Z

We have identified a slow disk in eu-5 and a fix has been implemented.

investigating2024-11-25T09:53:32.714Z

We are currently investigating a disk issue in eu-5

Nov 14, 2024

Report: "Partial Outage on Dedicated Clusters"

Last update 2024-11-14T02:57:51.564Z

postmortem2024-11-14T02:04:44.572Z

## **What happened** Underlying services on a small number of projects were found to be in unhealthy state and caused application errors. ## **Customer Impact** Sites were not accessible for up to 2 hours and 30 minutes. ## **What was done to resolve the incident** A configuration fix was applied to ensure site services can run without hiccups.

resolved2024-11-13T21:00:00.000Z

Services on some dedicated cluster were found to be in unhealthy state causing site outages.

Oct 21, 2024

Report: "Outage on Orange Flexible Engine dedicated hardware"

Last update 2024-10-21T19:43:34.742Z

resolved2024-10-21T19:43:34.720Z

This incident has been resolved.

monitoring2024-10-21T17:21:37.241Z

We're no longer seeing issues with our upstream provider and all affected sites have stabilized. We're continuing to investigate this with our provider, as well as continuing to monitor for any further issues affecting site availability.

investigating2024-10-21T16:13:56.789Z

We're seeing further connectivity issues on affected clusters. We're still working with our provider to investigate and resolve the issue.

investigating2024-10-21T15:55:26.402Z

Affected sites are coming back online. We are still investigating the issue with our provider.

investigating2024-10-21T15:38:53.592Z

Our monitoring has detected issues with our cloud infrastructure provider, which affect all sites hosted on Orange Flexible Engine dedicated hardware. Our operations team has been notified, and they are investigating the issue with our provider. Projects hosted on the underlying provider Orange Flexible Engine may experience connectivity issues to and from the cluster nodes. We will update you as soon as we have further information.

Report: "Degraded performance on Console"

Last update 2024-10-21T06:40:22.170Z

postmortem2024-10-21T06:26:04.601Z

## **What happened** A configuration error prevented some users from accessing web console ## **Customer Impact** Users were not able to access their projects through web console. Access through CLI was not impacted. ## **What was done to resolve the incident** Configuration fix was applied to ensure console loads for all users.

resolved2024-10-21T06:13:19.322Z

This incident has been resolved.

identified2024-10-21T00:13:04.095Z

The issue has been identified and a fix is being implemented.

investigating2024-10-20T22:24:36.296Z

Some users may experience intermittent unavailability with UI (Console) and CLI. Environments availability is NOT impacted. We will update you as soon as we have further information.

Oct 17, 2024

Report: "User Accounting outage"

Last update 2024-10-17T04:28:14.676Z

postmortem2024-10-17T03:37:38.774Z

#### WHAT HAPPENED Data was affected during planned disk maintenance of the Accounts system. #### CUSTOMER IMPACT No live sites were impacted Customers would not have been able to access projects \(through console, CLI, SSH\) or perform deployments from 2024-10-17 01:49:10 UTC to 2024-10-17 02:50:19 UTC Any User Accounting changes made between those times may need to be redone. #### WHAT WAS DONE TO RESOLVE THE INCIDENT Data was restored using backup taken before start of maintenance.

resolved2024-10-17T03:37:25.562Z

This incident has been resolved.

monitoring2024-10-17T02:52:50.630Z

A fix has been implemented and we are monitoring the results.

investigating2024-10-17T02:14:25.141Z

We have detected an issue affecting User Accounting and may be affected for the next 1 hour. Our Operations team has been notified and is currently working to restore service. Live sites are not affected We will update you as soon as we have further information.

Oct 16, 2024

Report: "Partial outage on de-2.platform.sh"

Last update 2024-10-16T15:13:33.224Z

postmortem2024-10-16T15:12:35.286Z

## **What happened** Our rate limiter started throttling connections between the projects hosted in de-2 region resulting in connections timing out. This incident only affected multi-app project and projects using microservices architecture. ## **Customer Impact** Sites were intermittently available from 13:40 to 14:52 UTC. This incident did not affect Dedicated clusters. ## **What was done to resolve the incident** The regional connection limit has been raised.

resolved2024-08-21T20:05:35.803Z

We've seen no further issues and the de-2 region is stable.

monitoring2024-08-21T15:00:43.314Z

Affected sites are recovering. We're continuing to monitor the situation.

identified2024-08-21T14:53:05.518Z

We've identified an issue in our region gateway configuration and a fix is being deployed.

investigating2024-08-21T14:36:50.621Z

We have detected an issue affecting service on the DE-2 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience downtime. We will update you as soon as we have further information.

Report: "Partial Outage on US-3 region"

Last update 2024-10-16T15:04:11.130Z

postmortem2024-10-16T15:03:16.890Z

## **What happened** Upstream infrastructure provider rebooted one of our regional gateways. ## **Customer Impact** Sites were intermittently available from 12:12 to 12:21 UTC. This incident did not affect Dedicated clusters. ## **What was done to resolve the incident** Self-resolved after the gateway went online.

resolved2024-08-31T13:27:47.317Z

This incident has been resolved.

identified2024-08-31T12:42:13.423Z

All of the region alerts have cleared. We are still investigating the root cause.

investigating2024-08-31T12:25:49.810Z

We have detected an issue affecting service on the US-3 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may be unreachable. We will update you as soon as we have further information.

Report: "Partial Outage on fr-3.platform.sh"

Last update 2024-10-16T14:56:24.858Z

postmortem2024-10-16T14:54:59.511Z

## **What happened** Our rate limiter started throttling connections between the region and our API resulting in an inability to load the Console UI and use the CLI in some projects. This incident did not affect live environments on Grid or Dedicated infrastructure. ## **Customer Impact** Between 2024-09-12 11:32 and 12:19 UTC, some customers were unable to load the Console UI and use the CLI. There was no impact on environments or production sites. ## **What was done to resolve the incident** We have raised the connection limit between the region and API.

resolved2024-09-12T15:18:43.241Z

What happened: Our rate limiter started throttling connections between the region and our API resulting in an inability to load the Console UI and use the CLI for some of the projects This incident did not affect live environments on Grid or Dedicated infrastructure. Customer Impact Between 2024-09-12 07:25 and 10:25 UTC, some customers were unable to load the Console UI and use the CLI. There was no impact on environments or production sites. What was done to resolve the incident We have raised the connection limit between the region and API.

monitoring2024-09-12T13:26:46.389Z

A fix has been implemented and we are monitoring the results.

identified2024-09-12T11:35:19.961Z

We have detected an issue affecting our console and CLI on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. Some projects on affected region may experience inability to access the Console UI, connect via SSH and use the CLI. We will update you as soon as we have further information.

Report: "Partial Outage on fr-3.platform.sh"

Last update 2024-10-16T14:54:19.919Z

postmortem2024-09-12T10:25:51.314Z

## **What happened** Our rate limiter started throttling connections between the region and our API resulting in an inability to load the Console UI and use the CLI in some projects. This incident did not affect live environments on Grid or Dedicated infrastructure. ## **Customer Impact** Between 2024-09-12 07:25 and 10:25 UTC, some customers were unable to load the Console UI and use the CLI. There was no impact on environments or production sites. ## **What was done to resolve the incident** We have raised the connection limit between the region and API.

resolved2024-09-12T10:23:14.259Z

This incident has been resolved.

investigating2024-09-12T09:51:03.196Z

We have detected an issue affecting our console and CLI on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience inability to access the Console UI, connect via SSH and use the CLI. We will update you as soon as we have further information.

Report: "Partial Outage on CA-1"

Last update 2024-10-16T14:51:46.883Z

postmortem2024-10-16T14:43:52.459Z

## **What happened** A large increase in incoming connections to the CA-1 region caused sites in the region to become intermittently available. This incident did not affect Dedicated clusters. ## **Customer Impact** Sites were intermittently available from 15:42 to 16:07 UTC. ## **What was done to resolve the incident** We have taken steps to block malicious traffic.

resolved2024-09-13T18:16:58.749Z

This incident has been resolved.

monitoring2024-09-13T16:32:32.623Z

A fix has been implemented and we are monitoring the results.

identified2024-09-13T16:23:23.728Z

The issue has been identified and a fix is being implemented.

investigating2024-09-13T16:11:37.000Z

We have detected an issue affecting service on the CA-1 regions. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience timeouts and slow response times. We will update you as soon as we have further information.

Oct 14, 2024

Report: "Dedicated clusters hosted in Orange FE are alerting"

Last update 2024-10-14T13:39:38.106Z

postmortem2024-10-14T13:38:21.440Z

**What happened** Upstream infrastructure provider encountered network issues that resulted in packet loss. **Customer Impact** Some projects running on dedicated infrastructure suffered degraded performance for around an hour and a half. **What was done to resolve the incident** Upstream provider applied a fix.

resolved2024-10-07T15:07:34.888Z

This incident has been resolved.

monitoring2024-10-07T12:56:48.138Z

Orange FE has implemented a fix and we are monitoring the results. The alerts on our monitoring have cleared.

identified2024-10-07T11:52:29.041Z

We have contacted Orange FE for further investigation.

investigating2024-10-07T11:36:46.988Z

We are investigating of dedicated clusters hosted in Orange FE alerting of time outs.

Oct 10, 2024

Report: "Console & API Issue in regions EU-5 and FR-4"

Last update 2024-10-10T06:13:49.081Z

postmortem2024-10-10T06:02:23.489Z

**WHAT HAPPENED** We identified an issue in some build hosts which affected metadata and metrics in affected projects. **CUSTOMER IMPACT** No live sites were impacted unless there was a deployment that encountered an error during this period Some customers may not have been able to access or encountered performance degradation with the console, CLI, SSH, metrics and deployments from 2024-09-19 08:33 to 2024-09-23 23:01 UTC. **WHAT WAS DONE TO RESOLVE THE INCIDENT** The underlying issue was corrected and any projects that did not heal automatically were corrected with a recent backup.

resolved2024-09-24T09:36:01.508Z

This incident has been resolved.

monitoring2024-09-23T20:06:12.194Z

We have seen no new instances of issues related to this incident, but we will continue in the monitoring phase for a short time longer.

monitoring2024-09-23T13:52:03.659Z

We are continuing to monitor for any further issues.

monitoring2024-09-22T06:25:26.734Z

Platform.sh teams have resolved the issue regarding the affected projects. Console, CLI and API-related tasks should be working for all the projects in affected regions. We will be monitoring the results closely.

identified2024-09-22T05:40:45.261Z

The recovery work is still in progress for both regions.

identified2024-09-22T04:30:16.374Z

We are getting new issue reports for this incident. Platform.sh teams are now actively working on recovering those affected projects.

monitoring2024-09-20T13:01:43.122Z

identified2024-09-20T10:10:25.705Z

Our teams are still reviewing the issues, fixing affected projects, and working on a permanent fix.

identified2024-09-20T08:18:03.000Z

After further internal review, we have noticed region FR-4 is also affected by this incident. The recovery work is still in progress for both regions.

identified2024-09-20T05:59:19.486Z

We are continuing to work on a fix for this issue.

identified2024-09-20T02:41:20.140Z

Another issue has been identified and a fix is being implemented

monitoring2024-09-19T20:58:12.561Z

Platform.sh teams have deployed the mitigation for this issue and are monitoring it's effects.

identified2024-09-19T19:47:22.453Z

Platform.sh teams are continuing to deploy the mitigation for this issue.

identified2024-09-19T18:45:27.672Z

The issue has been identified and a fix is being implemented.

investigating2024-09-19T16:29:49.357Z

We are investigating an issue affecting the Console & API on some projects in the eu-5 region. Our operations teams are aware of the issue and are taking measures to correct it. Affected projects will not be able to access their project's Console or perform API related tasks.

Oct 4, 2024

Report: "Outage on Orange Flexible Engine dedicated hardware."

Last update 2024-10-04T01:17:53.465Z

postmortem2024-10-04T00:36:24.628Z

## **What happened** Upstream infrastructure provider encountered network issues that resulted in packet loss ## **Customer Impact** Some projects running on dedicated infrastructure suffered degraded performance for less than an hour. ## **What was done to resolve the incident** Upstream provider applied a fix.

resolved2024-10-03T23:17:29.501Z

This incident has been resolved.

monitoring2024-10-03T16:23:29.672Z

A fix has been implemented and we are monitoring the results.

identified2024-10-03T12:23:08.260Z

A networking equipment issue has been identified with our upstream provider and a fix is being implemented, there may still be further connectivity issues until the issue has been corrected.

investigating2024-10-03T10:58:07.732Z

Our upstream provider is still investigating the networking issue.

investigating2024-10-03T08:33:47.283Z

Sep 27, 2024

Report: "Outage on fr-4.platform.sh"

Last update 2024-09-27T01:24:57.608Z

postmortem2024-09-27T00:55:42.829Z

## **What happened** Cloud partner suffered network connectivity issues. ## **Customer Impact** Access to to Websites, Console and API was disrupted for some customers from 13:24 UTC to 15:10 UTC on 26 September 2024 ## **What was done to resolve the incident** Cloud provider implemented mitigation to restore capacity. Communications from cloud partner were actively monitored until confirmation of resolution.

resolved2024-09-27T00:54:46.348Z

Resolution applied by cloud partner has been effective, all systems are fully functional.

identified2024-09-26T18:59:53.845Z

Our cloud partners have released an update that a networking issue has occurred and that mitigation actions have been deployed to restore service to their customers. Platform.sh teams are continuing to monitor this corrective action and will reach out to any remaining impacted customers from this event.

investigating2024-09-26T15:48:53.863Z

Platform.sh teams are continuing to investigate this issue with our cloud partners.

investigating2024-09-26T13:37:43.288Z

We have detected an issue affecting service on the fr-4.platform.sh region. This issue affects multiple production sites as well as development environments. Access to your site, project UI as well as Git and SSH access may be affected. This outage does not affect Dedicated Enterprise Clusters. We will update you as soon as we have further information.

Sep 19, 2024

Report: "Console and CLI issues on FR-4.platform.sh"

Last update 2024-09-19T14:34:50.902Z

postmortem2024-09-19T14:34:49.715Z

**WHAT HAPPENED** Hosts on this region entered a degraded state. **CUSTOMER IMPACT** Some customers experienced intermittent slowness and long response times while using console, CLI, SSH and submitting deployments from 2024-09-18 09:56 to 2024-09-19 14:21 UTC. However, existing live sites were not impacted. **WHAT WAS DONE TO RESOLVE THE INCIDENT** The degraded hosts have been restored, and additional capacity was added to the region. Services such as the console, CLI, SSH, and deployments shouldn't experience slowness anymore.

resolved2024-09-19T05:22:40.002Z

We can see that no new report related to this incident. Your projects and environments should now function as expected.

monitoring2024-09-19T01:36:46.196Z

A fix has been implemented at 2024-09-18 19:04 UTC and we are continuing to monitor the results.

identified2024-09-18T15:05:55.921Z

The slowness is re-occurring and we are working on a permanent fix.

monitoring2024-09-18T13:41:28.317Z

We have finished implementing the fix and we are monitoring the results.

identified2024-09-18T11:57:24.319Z

We are still working on in implementing the fix.

identified2024-09-18T10:50:37.021Z

The issue has been identified and a fix is being implemented. Live sites aren't being affected by this incident.

investigating2024-09-18T10:26:41.480Z

We have detected an issue affecting service on the FR-4 region. Our Operations team has been notified and is currently working to restore service. Affected projects may experience limited access to web console and CLI services, as well as unexpectedly long deployment times and difficulty accessing services via SSH connection. Deployed production sites are not affected at this time. We would recommend suspending deployments to environments on the affected region. We will update you as soon as we have further information.

Report: "Console and CLI issues on FR-4.platform.sh"

Last update 2024-09-19T14:34:29.195Z

postmortem2024-09-19T14:21:05.378Z

resolved2024-09-19T14:20:19.358Z

This incident has been resolved.

monitoring2024-09-19T12:33:17.098Z

A fix has been implemented and we are monitoring the results.

identified2024-09-19T09:31:31.218Z

We continue to experience high CPU usage on build hosts. We are doubling build capacity and rebalancing projects to recover from API/CLI slowness.

investigating2024-09-19T08:53:05.182Z

Aug 13, 2024

Report: "Partial Outage on UK-1"

Last update 2024-08-13T11:08:33.042Z

postmortem2024-08-13T11:05:21.812Z

**What happened** Cloud provider incident [https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF](https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF), this led to connectivity loss in London GCP region hence hence making projects on uk-1 region temporarily inaccessible. **Customer Impact** Between 13:23 and 13:33 UTC on 2024-08-12, some environments were inaccessible due to the GCP incident in the UK-1 region.

resolved2024-08-12T17:17:45.023Z

We've seen no further issues, however we continue to investigate for the post-mortem report.

monitoring2024-08-12T14:54:48.079Z

There has not been further alerts and we are monitoring the region. We are still investigating for the post mortem report.

identified2024-08-12T13:59:57.814Z

We are continuing to work on a fix for this issue.

identified2024-08-12T13:50:29.276Z

All alerts have cleared and we are continuing to investigate.

investigating2024-08-12T13:36:31.589Z

We have detected an issue affecting service on the UK-1 regions. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience downtime. We will update you as soon as we have further information.

Jul 31, 2024

Report: "Partial Outage on US-3 region"

Last update 2024-07-31T08:10:45.133Z

postmortem2024-07-31T07:43:08.698Z

### What happened A host on the region entered a degraded state. This incident did affect live environments on a single host on our Grid infrastructure. ### Customer Impact Between 2024-07-31 06:30 and 06:58 UTC, environments on the degraded host were impacted due to the project moving host, which then resulted in a site outage for the environments. ### What was done to resolve the incident The degraded host was recovered, and service and activities on impacted environments were then able to resolve successfully, and to restore service to impacted sites.

resolved2024-07-31T07:32:13.266Z

This incident has been resolved.

monitoring2024-07-31T07:03:49.454Z

We have isolated the single host causing this issue and projects are now online and responding.

investigating2024-07-31T07:00:42.935Z

We have detected an issue affecting service on the US-3 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience web request timeouts. We will update you as soon as we have further information.

Jul 24, 2024

Report: "Partial Outage on CA-1 region"

Last update 2024-07-24T16:52:11.332Z

postmortem2024-07-24T16:50:11.321Z

### What happened A host on the region entered a degraded state. ### Customer Impact Environments on the degraded host were impacted in the form of stuck activities \(such as a backup activity\) which then resulted in a site outage for the environments in question. ### What was done to resolve the incident The degraded host was recovered, and subsequent stuck activities on impacted environments were then able to resolve successfully, and to restore service to impacted sites.

resolved2024-07-24T16:43:56.643Z

This incident has been resolved.

monitoring2024-07-24T16:15:04.811Z

A fix has been implemented and we are monitoring the results.

identified2024-07-24T15:57:27.003Z

The issue has been identified and a fix is being implemented.

investigating2024-07-24T15:27:26.217Z

We are continuing to investigate this issue.

investigating2024-07-24T14:56:50.049Z

We are investigating an issue that has resulted in outages for some sites hosted on the CA-1 region. Our Operations team has been notified and is currently working to restore service. We will update you as soon as we have further information.

Jul 4, 2024

Report: "Deployments failing on Dedicated Enterprise Gen 2 clusters"

Last update 2024-07-04T14:44:58.202Z

postmortem2024-07-04T14:37:10.314Z

## **What happened** A recent code change in the deployment daemon running on Dedicated Enterprise Gen 2 clusters resulted in an inability to finish deployments. ## **Customer Impact** The issue has first been observed at 08:54 UTC and a fix has been rolled out at 12:27 UTC. ## **What was done to resolve the incident** Deployment daemon has been patched.

resolved2024-07-04T14:34:54.096Z

This incident has been resolved.

monitoring2024-07-04T12:47:08.404Z

A fix has been implemented and we are monitoring the results.

identified2024-07-04T12:09:01.863Z

The issue has been identified and a fix is being implemented.

investigating2024-07-04T11:53:39.716Z

We have detected an issue affecting deployments on Dedicated Enterprise Gen 2 clusters. Our Operations team has been notified and is currently working to restore service. Please refrain from deployments until further notice. We will update you as soon as we have further information.

Jun 24, 2024

Report: "API server outage"

Last update 2024-06-24T07:13:47.501Z

postmortem2024-06-24T07:12:03.843Z

## What happened After recent maintenance work, the API server was not able to communicate with internal database servers. ‌ ## Customer Impact Customers were unable to use the CLI or console, which included connecting to your project with SSH, and other activities like submitting deployments for a period of 40 minutes. The availability of your live sites were not affected. ## What was done to resolve the incident Our team quickly discovered the DB connection issues and made those DB servers available for the API server.

resolved2024-06-24T04:06:57.169Z

The CLI, console and API services should now function as expected. This incident has been resolved.

monitoring2024-06-24T03:50:09.234Z

A fix has been implemented and we are monitoring the results.

investigating2024-06-24T03:42:55.615Z

We have detected an issue affecting CLI, console and API usage. Our engineers are actively investigating this issue.

Jun 18, 2024

Report: "Partial Outage on UK-1"

Last update 2024-06-18T06:57:08.311Z

postmortem2024-06-18T06:54:26.566Z

### What happened We observed an extreme spike in traffic to our UK-1 region, which affected the availability of some projects. ### Customer Impact Some projects may have been unavailable for around 30 minutes until mitigation actions were in place. ### What was done to resolve the incident Our team investigated the issue and implemented mitigations to restore normal service.

resolved2024-06-17T22:30:00.000Z

We recently observed an extreme spike in traffic to our UK-1 region.

Jun 7, 2024

Report: "Partial Outage on FR-4"

Last update 2024-06-07T19:40:55.479Z

postmortem2024-06-07T19:39:12.883Z

## **What happened** A large increase in incoming connections to the FR-4 region caused sites in the region to become intermittently available. This incident did not affect Dedicated clusters. ## **Customer Impact** Sites were intermittently available from 12:00 to 14:00 UTC. ## **What was done to resolve the incident** We have added extra gateway hosts to distribute the load and taken steps to block traffic.

resolved2024-06-07T19:37:59.789Z

This incident has been resolved.

monitoring2024-06-07T13:56:39.991Z

A fix has been implemented and we are monitoring the results.

identified2024-06-07T12:26:58.625Z

The issue has been identified and a fix is being implemented.

investigating2024-06-07T12:04:21.335Z

We have detected an issue affecting service on the FR-4 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience intermittent connectivity issues or timeouts. We will update you as soon as we have further information.

Jun 6, 2024

Report: "Partial Outage on EU-5"

Last update 2024-06-06T19:28:03.618Z

postmortem2024-06-06T19:13:41.346Z

## **What happened** A large increase in incoming connections to the EU-5 region caused sites in the region to become intermittently available. This incident did not affect Dedicated clusters. ## **Customer Impact** Sites were intermittently available from 14:30 to 15:00 UTC. Some customers may have also had issues connecting to the project management console during that time. ## **What was done to resolve the incident** We have added extra gateway hosts to distribute the load and taken steps to block traffic.

resolved2024-06-06T18:47:39.519Z

This incident has been resolved.

monitoring2024-06-06T17:28:19.732Z

A fix has been implemented and we are monitoring the results.

identified2024-06-06T16:50:09.556Z

The issue has been identified and a fix is being implemented.

investigating2024-06-06T16:43:41.358Z

We have detected an issue affecting service on the EU-5 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience intermittent connectivity issues or timeouts. This also impacts the project management console for some users. Other regions will not have any impacts to site availability. We will update you as soon as we have further information.

May 25, 2024

Report: "Accounts / Auth systems outage"

Last update 2024-05-25T07:34:57.611Z

postmortem2024-05-25T07:32:44.682Z

### WHAT HAPPENED Based on reports from our monitoring system, the internal accounts and auth subsystems were not able to give timely responses to certain API calls. Our team identified a lock contention issue within the project information database system. This incident did NOT affect live environments on Grid or Dedicated infrastructure. However, SSH, CLI and console features including deployments may have been temporarily unavailable. ‌ ### CUSTOMER IMPACT Between 2024-05-25 02:28 and 03:56 UTC , some customers may have experienced issues while using the SSH / console / CLI / submitting deployments. This incident did NOT affect live environments on Grid or Dedicated infrastructure. ‌ ### WHAT WAS DONE TO RESOLVE THE INCIDENT Our team have restarted those deadlocked processes to make the accounts and auth subsystems available. Accounts and Auth subsystems outages are now resolved. Further investigation on this lock contention issue will be conducted and a fix will be implemented to optimize our subsystems further.

resolved2024-05-25T04:23:48.782Z

This incident has been resolved.

monitoring2024-05-25T04:20:46.604Z

A fix has been implemented and we are monitoring the results.

identified2024-05-25T03:52:46.814Z

We are continuing to work on a fix for this issue.

identified2024-05-25T03:46:33.512Z

The issue has been identified and a fix is being implemented.

investigating2024-05-25T03:23:39.719Z

We are currently investigating this issue. This issue does not impact existing live sites / environments. However, several functions including SSH, console, CLI may not be functioning as expected.

May 20, 2024

Report: "Partial Outage on CA-1 region"

Last update 2024-05-20T10:28:07.339Z

postmortem2024-05-20T10:27:10.384Z

**What Happened** We detected a substantial drop in connections reaching the CA-1 region from "14:29 UTC" to "15:39 UTC" on May 11th, 2024. **Customer Impact** Customers experienced intermittent availability during this time as connections were dropped, and requests failed to reach the CA-1 origin. **Incident Resolution** We suspect a transient network failure with our upstream provider, or another network layer upstream, was responsible for this incident.

resolved2024-05-11T22:27:07.315Z

This incident has been resolved.

monitoring2024-05-11T18:20:54.158Z

We are continuing to monitor for any further issues.

monitoring2024-05-11T16:31:03.957Z

A fix has been implemented and we are monitoring the results.

identified2024-05-11T15:49:06.213Z

The issue has been identified and a fix is being implemented.

investigating2024-05-11T15:36:50.059Z

We have detected an issue affecting service on the CA-1 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience unresponsive site. We will update you as soon as we have further information.

May 14, 2024

Report: "Partial Outage on EU 4"

Last update 2024-05-14T06:47:20.179Z

postmortem2024-05-14T06:45:49.414Z

### What Happened: We detected a partial outage on the EU-4 region, and our investigation identified a host in the region was not operating normally. ### Customer Impact: A small number of customers may have experienced an outage on any environment that was residing on the abnormal host. This may have included Production environments. ### Incident Resolution: Our operations and engineering team isolated the host and manually fixed any environment that did not automatically recover.

resolved2024-05-14T02:58:43.535Z

All affected project and cluster are successfully recovered.

monitoring2024-05-13T23:42:31.611Z

A fix has been implemented and we are monitoring the results.

identified2024-05-13T22:35:01.128Z

We are continuing to work on a fix for this issue.

identified2024-05-13T22:13:30.741Z

The issue has been identified and a fix is being implemented.

investigating2024-05-13T21:37:37.858Z

We are continuing to investigate this issue.

investigating2024-05-13T21:07:17.294Z

We have detected an issue affecting service on the EU 4 region. Our Operations team has been notified and is currently working to restore service. We will update you as soon as we have further information.

May 13, 2024

Report: "Partial Outage on FR-1 region"

Last update 2024-05-13T14:34:21.339Z

postmortem2024-05-13T14:34:06.849Z

**What Happened:** On May 11th, at about 14:50 UTC, our upstream provider, experienced an internal incident that affected the availability of their Virtual Machines \(VMs\) and Object Storage Devices \(OSDs\). **Customer Impact:** This incident affected all container types \(production, staging, development\) in the FR-1 region, impacting multiple projects. Affected projects experienced container outages, including production containers in some cases. **Incident Resolution:** During the incident, our engineers actively communicated with the upstream provider to ensure a swift resolution. On May 12th, at about 20:30 UTC, following the underlying provider full recovery, our engineers ensured all production environments were fully operational. On May 13th, at about 07:00 UTC, our engineers conducted a comprehensive review of all affected components to confirm their full functionality and address any outstanding technical issues.

resolved2024-05-13T13:11:39.608Z

This incident has been resolved.

monitoring2024-05-13T09:00:25.897Z

All services are currently operational. Our engineering team is conducting a comprehensive review of all affected components to confirm their full functionality and address any outstanding technical issues.

monitoring2024-05-13T02:32:04.204Z

All affected projects data and environments have been recovered successfully. Some metrics services may still be unavailable.

monitoring2024-05-12T20:44:53.891Z

Our upstream provider has recovered impacted hosts. We are monitoring the results.

identified2024-05-12T16:23:05.113Z

Most impacted hosts and projects have been fully recovered. The upstream provider is still working on full recovery.

identified2024-05-12T13:51:41.956Z

Our hosts have been recovered and all projects are operational. The upstream provider is still working on full recovery of their data centre.

identified2024-05-12T12:18:02.009Z

The upstream provider has recovered our core hosts and the projects are back online.

identified2024-05-12T12:03:20.169Z

We are continuing to work with our upstream provider on recovery.

identified2024-05-12T00:23:29.435Z

We are continuing to work with our upstream provider on recovery.

identified2024-05-11T18:20:32.987Z

Our upstream provider has encountered an unexpected temperature increase which required the emergency shutdown of the servers impacted by this incident. They are actively working on recovery.

identified2024-05-11T16:38:20.819Z

We are continuing to work on a fix for this issue.

identified2024-05-11T15:40:07.453Z

The issue has been identified and we are working on recovering the affected hosts.

investigating2024-05-11T14:54:51.142Z

We have detected an issue affecting service on the FR-1 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience slowness and unresponsive sites. We will update you as soon as we have further information.

May 6, 2024

Report: "Downtime for some sites on the CA-1 region"

Last update 2024-05-06T13:21:02.878Z

postmortem2024-05-06T13:18:33.426Z

**What Happened**: The incident was triggered when the incoming request gateway on cluster CA-1 depleted its resources, caused by an unforeseen surge in usage. This led to the gateway being unable to process incoming traffic effectively. **Customer Impact**: Approximately 50 projects were affected during a 30-minute period, experiencing intermittent downtime ranging from 5 to 10 minutes. This disruption impacted the operational functionality of these projects, leading to degraded service availability. **Incident Resolution**: To address this issue, we have permanently increased the resource capacity of the incoming request gateway to better handle similar spikes in traffic in the future. Additionally, we have identified the source of the unexpected usage and are implementing measures to prevent a recurrence of this problem.

resolved2024-05-06T12:29:39.107Z

All services are fully operational. No further downtime was registered in the past 2 hours.

monitoring2024-05-06T09:50:08.792Z

During our first investigation round, we have identified that the downtime could have been related to resource constraints on our incoming gateway. We have permanently increased the gateway capacity to prevent similar issues in the future.

investigating2024-05-06T08:01:27.626Z

We have detected an issue intermittently impacting approximately 50 production environments between 07:15 and 07:45 AM UTC. The affected environments experienced downtime ranging from 5 to 10 minutes. We are actively investigating issue.

May 1, 2024

Report: "API and CLI timeouts"

Last update 2024-05-01T05:21:28.423Z

postmortem2024-05-01T05:19:07.726Z

**WHAT HAPPENED** Users were unable to interact with projects through web console and CLI due to issues with the authentication backend. This incident did not affect live environments on Grid or Dedicated infrastructure. After investigation, our engineers identified errors in logs which lead to following remedial actions: * Backend was restarted * Underlying hosts were replaced * Additional capacity was added **CUSTOMER IMPACT** Between 2024-05-01 01:48 UTC and 03:21 UTC, customers were unable to access console or use CLI There was no impact on environments or production sites. **WHAT WAS DONE TO RESOLVE THE INCIDENT** The authentication backend was restarted and additional capacity was added.

resolved2024-05-01T05:17:41.485Z

This incident has been resolved.

monitoring2024-05-01T03:28:26.830Z

A fix has been implemented and we are monitoring the results.

investigating2024-05-01T02:39:55.335Z

The issue as reoccured and our team is investigating

monitoring2024-05-01T02:16:12.975Z

We identified an issue with our Auth system. Our team has performed some work to restore service availability and we're monitoring the situation

investigating2024-05-01T01:57:33.297Z

We are investigating a report of outages on our API and CLI

Apr 25, 2024

Report: "Partial Outage on eu-2.platform.sh"

Last update 2024-04-25T07:56:35.763Z

postmortem2024-04-25T07:52:59.389Z

**WHAT HAPPENED** Environments were in unexpected states during the planned maintenance event for region [eu-2.platform.sh](http://eu-2.platform.sh) [https://status.platform.sh/incidents/97ls23652mws](https://status.platform.sh/incidents/97ls23652mws) Affected environments including live sites were not available for use during the incident, this also includes functionalities like SSH accesses / deployments. ‌ Dedicated clusters were **NOT** affected by this incident. ‌ **CUSTOMER IMPACT** Some customers' environments were not available for up to 9 hrs and 12 minutes in total. \(Worst case estimate for all live & non-live environments\) Start: 2024-04-24 21:50 **UTC** End: 2024-04-25 07:02 **UTC** ‌ **WHAT WAS DONE TO RESOLVE THE INCIDENT** The unexpected states had been corrected by our engineers and now the environments should function as expected. Our engineering team will also be investigating a potential internal bug responsible for those unexpected states causing this incident. We will be improving the internal subsystems in order to minimize the negative impact from these planned maintenance events in the future.

resolved2024-04-25T07:33:31.152Z

Our engineers have deployed a fix to correct the issues. All environments in this region should now function as expected.

identified2024-04-25T04:40:56.183Z

All live / default environments should now function as expected.

identified2024-04-25T04:12:31.164Z

We have detected an issue affecting service on the eu-2.platform.sh region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience outages. We will update you as soon as we have further information.

Apr 12, 2024

Report: "Gateways unresponsive on uk-1.platform.sh"

Last update 2024-04-12T12:30:03.226Z

postmortem2024-04-12T12:29:11.050Z

**What Happened** On March 29, 2024, at 07:23 UTC, a large increase in incoming connections to the [uk-1.platform.sh](http://uk-1.platform.sh) region caused the gateways to become unresponsive. Availability of sites/environments in the region was impacted. This incident did not affect Dedicated clusters. **Customer Impact** Grid projects on the [uk-1.platform.sh](http://uk-1.platform.sh) region were affected, resulting in temporary outages and reduced performance for customer sites. Sites availability and performance was shortly degraded. **What Was Done to Resolve the Incident** To address the issue, our engineers initially blocked the majority of the malicious \(DDoS\) traffic at the time and made various configuration and software changes in our internal gateway systems to handle these situations better. They continued monitoring the changes to ensure system stability and performance.

resolved2024-04-12T11:26:47.222Z

This incident has been resolved.

monitoring2024-03-29T11:58:59.930Z

A large increase in incoming connections to the uk-1.platform.sh region caused the gateways to become unresponsive. Traffic has been blocked and we're seeing gateway performance returning to normal. We're continuing to monitor the situation and will take additional action if needed.

Apr 4, 2024

Report: "Partial Outage on fr-3.platform.sh"

Last update 2024-04-04T02:35:21.537Z

postmortem2024-04-04T02:27:32.686Z

## What happened Upstream provider had planned network maintenance causing unexpected interruption to our services in region `fr-3.platform.sh` ‌ ## Customer Impact All environments in region `fr-3.platform.sh` were unreachable temporarily, some of those environments may have experienced outages for more than 1.5 hours. ‌ ## What was done to resolve the incident Based on the internal monitoring alerts, our team migrated all the affected environments to the healthy hosts. And our upstream provider had also resolved the network issues so that our infrastructure can then function as expected.

resolved2024-04-03T23:59:16.083Z

This incident has been resolved.

monitoring2024-04-03T06:52:01.065Z

Our upstream provider is still working on their planned network maintenance events. Given the unexpected interruption to our services, we would like to monitor this region for prolonged period of time. All environments should now function as expected. Please submit a support ticket if you need further assistance from our team.

monitoring2024-04-03T02:32:25.756Z

Our team has successfully migrated affected environments to working hosts and will monitor the situation with our upstream provider

identified2024-04-03T01:55:24.286Z

Our team is still actively working on migrating the environments away from the affected hosts.

identified2024-04-03T00:56:24.691Z

There is a technical issue stemming from our upstream provider. They are actively investigating and recovering the availability of the service.

identified2024-04-03T00:38:41.968Z

We have detected an issue affecting service on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. Projects on affected region may experience outages. We will update you as soon as we have further information.

Mar 29, 2024

Report: "Gateways unresponsive on uk-1.platform.sh"

Last update 2024-03-29T09:20:56.677Z

postmortem2024-03-29T08:22:18.879Z

## **What happened** A surge in incoming connections to the uk-1.platform.sh region rendered the gateways unresponsive, affecting the availability of sites and environments within the region. This incident did not affect Dedicated clusters. ## **Customer Impact** During the incident, which commenced at 07:23 UTC on 2024-03-29, certain sites in the uk-1.platform.sh region experienced unavailability. Sites were restored within 10 minutes, with normal operations resuming by 07:33 UTC. ## **What was done to resolve the incident** To mitigate the impact, we swiftly implemented measures to block the majority of the malicious \(DDoS\) traffic targeting the region. ## **Short term mitigations** Our engineering team is actively analyzing the attack patterns to fortify the resilience of our incoming request gateways, including reviewing rate limit configurations and adding further gateway capacity.

resolved2024-03-29T08:15:13.645Z

This incident has been resolved.

monitoring2024-03-29T07:41:09.772Z

Mar 27, 2024

Report: "Gateways unresponsive on us-4.platform.sh"

Last update 2024-03-27T05:46:46.076Z

postmortem2024-03-27T05:43:42.233Z

## What happened A large increase in incoming connections to the `us-4.platform.sh` region caused the gateways to become unresponsive. Availability of sites/environments in the region was negatively impacted. This incident did not affect Dedicated clusters. ## Customer impact Some sites in `us-4.platform.sh` region were unavailable during the incident \(starting at 04:23 **UTC** on 2024-03-27\). Most affected sites have recovered within 18 minutes \(by 04:41 **UTC**\). ## What was done to resolve the incident We have blocked some malicious \(DDOS\) traffic to the region.

resolved2024-03-27T05:37:15.536Z

Alerts for the region have cleared following filtering of the excess traffic, and the region is again stable.

monitoring2024-03-27T04:45:02.083Z

Traffic has been blocked and we're seeing gateway performance returning to normal. We're continuing to monitor the situation and will take additional action if needed.

identified2024-03-27T04:42:38.721Z

A large increase in incoming connections to the us-4.platform.sh region caused the gateways to become unresponsive again. Our Operations team has been notified and is currently investigating the issue.