Historical record of incidents for Platform.sh
Report: "Unscheduled Maintenance on ch-1.platform.sh"
Last updateThe CH-1 region required an unscheduled maintenance period due to load concerns between 0900 and 1100 UTC. Our Operations team increased the capacity of the underlying hosts to provide additional capacity to the gateways. Projects hosted on CH-1 may have experienced performance issues with Console WebUI, SSH, and Git Integration access to projects, as well as connection issues with deployment activities during this maintenance.
Report: "Partial Outage on FR-3"
Last updateWe have detected an issue affecting service on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service.
Report: "Partial Outage on au.platform.sh"
Last updateWe have detected some timeouts with the API services on the au.platform.sh region. - API, console, CLI, SSH and deployments are affected - Live sites are not affected Our operations team is investigating
Report: "Partial Outage on EU-5"
Last updateStatus moved back to investigating due to potential issues.
Alerts have resolved. We are monitoring this situation.
The issue has been identified and a fix is being implemented.
We noticed only a partial outage only on a few projects. We are currently investigating the issue.
Report: "Partial Outage on fr-3.platform.sh"
Last update# What Happened Recent maintenance on the fr-3.platform.sh region caused some subsystems to be unresponsive. This resulted in an outage of the project console, API, and subsystems in the fr-3.platform.sh region. Live environments on Grid or Dedicated infrastructure were not affected. # Customer Impact From 2025-06-03 20:00 UTC to 2025-06-04 05:26 UTC, some customers may have had trouble accessing the project console, project API, SSH, and submitting deployments and backups. # What was done to resolve the incident Our team corrected the internal states of those subsystems in order to make them operational again.
This incident has been resolved.
Services have been restored and we are monitoring to ensure stability
We have detected an issue affecting service on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. The console and project services (deployments) are currently unavailable for some projects on the region. Production site uptime is NOT affected. We will update you as soon as we have further information.
Report: "Partial Outage on fr-3.platform.sh"
Last updateWe have detected an issue affecting service on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. The console and project services (deployments) are currently unavailable for some projects on the region.Production site uptime is NOT affected. We will update you as soon as we have further information.
Report: "Routine Maintenance in fr-3.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the fr-3.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in de-2.platform.sh"
Last updateWe will be performing maintenance in the de-2.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Report: "Routine Maintenance in fr-4.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the fr-4.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in fr-1.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the fr-1.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in eu-5.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the eu-5.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Scheduled Maintenance – Accounts Service"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is still in progress. We will provide updates as necessary.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing scheduled database maintenance on the Accounts service to improve system performance and reliability.During this time, certain account-related functionalities may be temporarily unavailable, including project creation, ticket submission, user management, provisioning, billing, updating billing addresses, payment methods, SSH keys, and profile pictures.All customer environments will remain available, and the project Console will continue to function normally throughout the maintenance window.If you have any questions or concerns, please don't hesitate to reach out via our discord channel by logging into https://discord.gg/PkMc2pVCDV.
Report: "Routine Maintenance in eu-4.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the eu-4.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in au-2.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the au-2.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in ca-1.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Platform.sh has scheduled a maintenance window in the ca-1.platform.sh region.The host servers that run the region will be rebooted and/or upgraded. Downtime during this maintenance is expected only for a small proportion of the region’s projects running on the affected host.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in eu.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the eu.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Partial Outage on EU-5"
Last updateA host in the grid had a failure and the automatic evacuations of running containers in it failed too. Operations team manually evacuated those containers to other hosts and the faulty host was replaced with a new one.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and Engineers are working on a fix.
We are continuing to investigate this issue.
We noticed a few sites on alert due to a host service issue on EU-5. This is for Grid only. Our Engineers are working on it.
Report: "Routine Maintenance in au.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the au.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Partial Outage on EU-5"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and Engineers are working on a fix.
We are continuing to investigate this issue.
We noticed a few sites on alert due to a host service issue on EU-5. This is for Grid only. Our Engineers are working on it.
Report: "Routine Maintenance in us-3.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the us-3.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in ch-1.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the ch-1.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in us-4.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the us-4.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in us-2.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the us-2.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in uk-1.platform.sh"
Last updateThe scheduled maintenance has been completed.
The scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing a routine maintenance in the uk-1.platform.sh region to renew our TLS certificates as part of our standard rotation process.All projects and environments will remain operational during this maintenance; however, you may experience brief interruptions or increased latency at certain points.If you have any questions or concerns about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also reach out to us via our public chat channel at https://chat.platform.sh.Thank you for your understanding as we work to ensure the security and reliability of our services.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in eu-3.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing a routine maintenance in the eu-3.platform.sh region to renew our TLS certificates as part of our standard rotation process.All projects and environments will remain operational during this maintenance; however, you may experience brief interruptions or increased latency at certain points.If you have any questions or concerns about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also reach out to us via our public chat channel at https://chat.platform.sh.Thank you for your understanding as we work to ensure the security and reliability of our services.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in eu-2.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the eu-2.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Routine Maintenance in us.platform.sh"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be performing maintenance in the us.platform.sh region to release the latest features, bug fixes, and performance updates.All projects and environments will continue to function during the upgrade; however, at one point each project will be reallocated resulting in a short interruption. Integrations and webhooks may fail during the maintenance, and you may notice increased deployment times.If you have any questions about this maintenance, please contact Platform.sh Support by logging in to https://accounts.platform.sh and visiting the Support Center. You can also access our public chat channel using https://chat.platform.sh.Sincerely,Platform.sh Support Team
Report: "Partial Outage on us-3"
Last update## **What happened** Our monitoring systems have detected slow heartbeat responses from several storage nodes. Initial analysis shown network latency or disk I/O contention on specific nodes. This incident did not affect Dedicated infrastructure. ## **Customer Impact** Customer Impact was between 08:20 and 09:00 A.M UTC at 2025-05-16. Some live sites in the affected region experienced an outage and customers were unable to access console or conduct any deployments. ## **What was done to resolve the incident** The storage back-end healed itself.
This incident has been resolved.
Our systems are now stable and we are continuously monitoring it.
The issue has been identified and Engineering team is working on it.
We have detected an issue affecting service on the US-3 regions. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience downtime or slowness. We will update you as soon as we have further information.
We identified partial outage on our US-3 Grid projects.
Report: "Partial Outage on ca-1.platform.sh"
Last updateA hardware issue with our upstream provider caused a single grid host to reboot unexpectedly. As a result, we had to manually fail-over the environments running on that host. We are investigating why the automatic fail-over did not take place and will ensure that the expected fail-over occurs in future. Customers may have experienced outages of up to 26 minutes during this incident.
Report: "Partial Outage on FR-3.platform.sh"
Last update# What Happened Our upstream provider for region FR-3.platform.sh had an unexpected network equipment issue during scheduled maintenance. # Customer Impact Between 2025-04-17 01:36 UTC and 03:21 UTC, services including project console UI, API, deployments, SSH, backups and customers' environments as well as live ones were unavailable. # What was done to resolve the incident We received a recovery report from our upstream provider and have restored the affected services and environments. Services and environments in region FR-3.platform.sh should now be available and accessible. Incident resolved.
This incident has been resolved.
Our team has successfully restored the availability of those affected environments following the recovery report from the upstream provider. All FR-3 environments should now be accessible.
Our upstream provider has confirmed that a network equipment issue is impacting their scheduled maintenance.
Our upstream provider is reporting unspecified problems with their infrastructure. Our team will be keeping a close eye on the updates from our upstream provider and will take corrective actions as soon as possible.
We have detected an issue affecting service on the FR-3 region. Our Operations team has been notified and is currently working to restore service. Environments on affected regions may be unavailable. We will update you as soon as we have further information.
Report: "Partial outage of Console and Deployment services on FR-3 region"
Last update# What Happened A networking issue was found in region FR-3.platform.sh, affecting the subsystems responsible for handling console UI, project API, SSH and deployment functions. Customers may have experienced issues while accessing project console, interacting with project API, logging into environments with SSH, submitting deployments and backups. # Customer Impact From 2025-03-25 17:58 **UTC** to 2025-03-26 01:43 **UTC** , some customers were unable to access console, environments or submit any deployments. There was no impact on environments or production sites. # What was done to resolve the incident Our team has restored the availability of subsystems responsible for handling console UI, project API and deployments functions. Incident resolved.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
Our Operations team has identified a networking issue and deployed a correction fix to restore the availability of those related subsystems.
We have detected an issue affecting service on the FR-3.platform.sh region. Some customers may not be able to submit deployments nor access to console UI and/or project API. Our Operations team has been notified and is actively recovering the availability of those services. This issue does not affect availability of any environments, only the API services for the project. There is no site downtime as a result of this issue.
Report: "Degraded Performance of Console and Deployment services on AU, AU-2 regions."
Last update# What happened A recent upgrade to regions AU.platform.sh and AU-2.platform.sh was unable to fully purge the stale network states within our routing infrastructure. This caused client programs within our subsystems responsible for handling deployments, console and project API having busy-looping processes retrying invalid TCP connections indefinitely and consumed all the available CPU resources. This caused the console to freeze, slow loading times when using project API functions, and delays in deployments or backups. After further investigation, the Operations team has deployed an emergency fix to eliminate those busy-looping processes. Our team has stopped the scheduled upgrades to other regions and is still working on a permanent fix. # Customer impact From 2025-03-20 12:50 **UTC** to 2025-03-21 09:00 **UTC**, customers may experience delays when loading console, triggering deployments or taking backups for their projects in regions AU.platform.sh and AU-2.platform.sh . From 2025-03-21 07:27 **UTC** to 2025-03-21 07:55 **UTC**, there may be issues with outgoing connections in AU.platform.sh environments, including live ones, due to our emergency fix deployment. # What was done to resolve the incident The Operations team has taken out the busy-looping processes in the subsystems for deployments, console, and project API functions and deployed an emergency correction fix to the affected regions. Incident has been resolved.
This incident has been resolved.
After applying an emergency fix, we can see that the services are now operational for both AU and AU-2 regions. Our Operations team is still actively monitoring the services and implementing a permanent fix for this issue.
AU is now fully recovered. Our engineers are still implementing a permanent fix for this issue.
The outgoing connections issues have been corrected after deploying an emergency fix.
We are continuing to work on a fix for this issue.
We noticed that some customer environments are failing to make outgoing connections. Our engineers are still actively working on the recovery.
AU-2 is now fully recovered.
We have detected an issue affecting service on the AU and AU-2 regions. Our Operations team has been notified and is currently working to resolve the issue. Projects on affected regions may experience delays when loading console, triggering deployments or taking backups. This issue does not affect availability of any environments, only the API services for the project. There is no site downtime as a result of this issue.
Report: "Outage on EU-5"
Last updateWHAT HAPPENED An incident at upstream provider \(AWS\) affecting networking resulted in outages in EU-5 region CUSTOMER IMPACT Sites were unavailable from 23:30 UTC Feb 13 to 01:00 UTC Feb 14. WHAT WAS DONE TO RESOLVE THE INCIDENT Fixes implemented by upstream provider resolved the incident.
The upstream problem has been resolved, and we are not receiving any further alerts. As a result, we are going to mark the incident as resolved.
Affected projects are back online as upstream provider has implemented fixes - we will continue to monitor the situation.
We have detected an issue at our upstream provider affecting service on the EU-5 region. This issue affects multiple production sites as well as development environments. Access to your site, project UI as well as Git and SSH access may be affected. We will update you as soon as we have further information.
Report: "Partial Outage on EU-5 region"
Last updateWe have not seen any further storage alerts in the region and marking the incident as resolved.
Operations team has implemented fixes and storage infrastructure is no longer in degraded state. We have not received new outage reports however, We will continue to monitor the situation.
Our Team have added additional storage nodes and are actively monitoring the region.
We have detected degraded performance due to updates to storage infrastructure affecting service on the EU-5 region. Our Operations team has been notified and is currently investigating. Projects on the region may experience web request time-outs. We will update you as soon as we have further information.
Report: "HTTP Traffic reporting in console unavailable on Upsun"
Last updateOur engineers have completed the roll-out of the fix and HTTP Traffic reporting is now working in the Upsun console.
Our engineers are continuing to investigate this issue. We believe we have identified a mitigation path, and are working to roll it out at an appropriate time. We will continue to provide updates here as we have new information to offer.
We have detected an issue affecting HTTP Traffic reporting in the Upsun console. Our Engineers are investigating and working to resolve this issue. No site services are impacted by this issue. Only reporting in the Upsun console. We will update you as soon as we have further information.
Report: "Console issues when creating variables."
Last updateOur Team has deployed the fix for the issue and the incident is now resolved.
The issue has been identified and a fix is being implemented.
We are currently seeing an issue with our console when users are trying to create variables. Our Teams are actively investigating the issue. Please use the CLI to add variables for now. https://docs.platform.sh/development/variables/set-variables.html#create-project-variables
Report: "Reports of phishing emails from @internal.platform.sh"
Last updateThis incident has been resolved.
We are aware of reports of spam/phishing emails being sent from @internal.platform.sh. Out of an abundance of caution we recommend not opening these emails or clicking on any links. We're investigating the issue with our email provider Sendgrid and are working to resolve this issue. For any questions please contact our support team by creating a ticket. Information on how to do that is in our documentation: https://docs.platform.sh/learn/overview/get-support.html
Report: "Partial Outage on us-2"
Last update## **What happened** We detected a degradation of one or more physical storage drives in the us-2 region. As a result of that one host was down and containers exhibited unresponsiveness and or poor performance. After investigation, our engineers evacuated the containers to bring it back online. ## **Customer Impact** Between 2024-11-20 13:10 UTC - 13:55 UTC , platform.sh gird customers containers had read/write issues to the disk. ## **What was done to resolve the incident** Our team quickly evacuated the containers to another host and re-opened / restarted the affected containers and restored availability.
This incident has been resolved now.
A fix has been implemented and we are monitoring it.
The issue has been identified and we are working on a fix.
We have detected an issue affecting services on the us-2 region. Our Operations team has been notified and is currently working to restore service. Projects may experience downtime. We will update you as soon as we have further information.
Report: "Partial Outage on FR-3"
Last update## **What happened** We have identified issues on hosts with git containers. This led to a Console and API outage on platform FR-3 region. This incident did not affect website availability on Grid or Dedicated infrastructure. After investigation, our engineers identified the host and rebooted and made sure all services were up after the reboot. ## **Customer Impact** Between 2024-11-22 11:00 AM UTC and 14:00 PM UTC , customers were unable to access console or conduct any deployments. There was no impact on environments or production sites. ## **What was done to resolve the incident** Our team quickly found the offending host and rebooted it. Then they re-opened / restarted the affected containers and restored availability. Console, API and Auth sub-systems outage on platform cloud are now resolved. `.`
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and fix is being implemented.
We have detected an issue affecting service on the FR-3 regions. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience unable to login to console, listing environments via command line or unable to push changes. We will update you as soon as we have further information.
Report: "Disk issue in eu-5"
Last update## What happened A few of our customers containers had disk issues in EU-5. This incident affected read/write operation for containers that were affected. After our investigation our Engineers found that one OSD \(CEPH storage\) had slow ops, and hit the IOPs limit. Hence just that one osd was changed to io2 type to mitigate the issue. ## **Customer Impact** Between 2024-11-25 09:45 AM UTC and 10:05 AM UTC , platform eu-5 gird customers containers had issues read/write issues. ## **What was done to resolve the incident** Our team quickly fixed the disk and also re-opened / restarted the affected containers and restored availability. This incident are now resolved.
This incident has been resolved
A fix has been implemented and we are monitoring the situation.
We have identified a slow disk in eu-5 and a fix has been implemented.
We are currently investigating a disk issue in eu-5
Report: "Partial Outage on Dedicated Clusters"
Last update## **What happened** Underlying services on a small number of projects were found to be in unhealthy state and caused application errors. ## **Customer Impact** Sites were not accessible for up to 2 hours and 30 minutes. ## **What was done to resolve the incident** A configuration fix was applied to ensure site services can run without hiccups.
Services on some dedicated cluster were found to be in unhealthy state causing site outages.
Report: "Outage on Orange Flexible Engine dedicated hardware"
Last updateThis incident has been resolved.
We're no longer seeing issues with our upstream provider and all affected sites have stabilized. We're continuing to investigate this with our provider, as well as continuing to monitor for any further issues affecting site availability.
We're seeing further connectivity issues on affected clusters. We're still working with our provider to investigate and resolve the issue.
Affected sites are coming back online. We are still investigating the issue with our provider.
Our monitoring has detected issues with our cloud infrastructure provider, which affect all sites hosted on Orange Flexible Engine dedicated hardware. Our operations team has been notified, and they are investigating the issue with our provider. Projects hosted on the underlying provider Orange Flexible Engine may experience connectivity issues to and from the cluster nodes. We will update you as soon as we have further information.
Report: "Degraded performance on Console"
Last update## **What happened** A configuration error prevented some users from accessing web console ## **Customer Impact** Users were not able to access their projects through web console. Access through CLI was not impacted. ## **What was done to resolve the incident** Configuration fix was applied to ensure console loads for all users.
This incident has been resolved.
The issue has been identified and a fix is being implemented.
Some users may experience intermittent unavailability with UI (Console) and CLI. Environments availability is NOT impacted. We will update you as soon as we have further information.
Report: "User Accounting outage"
Last update#### WHAT HAPPENED Data was affected during planned disk maintenance of the Accounts system. #### CUSTOMER IMPACT No live sites were impacted Customers would not have been able to access projects \(through console, CLI, SSH\) or perform deployments from 2024-10-17 01:49:10 UTC to 2024-10-17 02:50:19 UTC Any User Accounting changes made between those times may need to be redone. #### WHAT WAS DONE TO RESOLVE THE INCIDENT Data was restored using backup taken before start of maintenance.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have detected an issue affecting User Accounting and may be affected for the next 1 hour. Our Operations team has been notified and is currently working to restore service. Live sites are not affected We will update you as soon as we have further information.
Report: "Partial outage on de-2.platform.sh"
Last update## **What happened** Our rate limiter started throttling connections between the projects hosted in de-2 region resulting in connections timing out. This incident only affected multi-app project and projects using microservices architecture. ## **Customer Impact** Sites were intermittently available from 13:40 to 14:52 UTC. This incident did not affect Dedicated clusters. ## **What was done to resolve the incident** The regional connection limit has been raised.
We've seen no further issues and the de-2 region is stable.
Affected sites are recovering. We're continuing to monitor the situation.
We've identified an issue in our region gateway configuration and a fix is being deployed.
We have detected an issue affecting service on the DE-2 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience downtime. We will update you as soon as we have further information.
Report: "Partial Outage on US-3 region"
Last update## **What happened** Upstream infrastructure provider rebooted one of our regional gateways. ## **Customer Impact** Sites were intermittently available from 12:12 to 12:21 UTC. This incident did not affect Dedicated clusters. ## **What was done to resolve the incident** Self-resolved after the gateway went online.
This incident has been resolved.
All of the region alerts have cleared. We are still investigating the root cause.
We have detected an issue affecting service on the US-3 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may be unreachable. We will update you as soon as we have further information.
Report: "Partial Outage on fr-3.platform.sh"
Last update## **What happened** Our rate limiter started throttling connections between the region and our API resulting in an inability to load the Console UI and use the CLI in some projects. This incident did not affect live environments on Grid or Dedicated infrastructure. ## **Customer Impact** Between 2024-09-12 11:32 and 12:19 UTC, some customers were unable to load the Console UI and use the CLI. There was no impact on environments or production sites. ## **What was done to resolve the incident** We have raised the connection limit between the region and API.
What happened: Our rate limiter started throttling connections between the region and our API resulting in an inability to load the Console UI and use the CLI for some of the projects This incident did not affect live environments on Grid or Dedicated infrastructure. Customer Impact Between 2024-09-12 07:25 and 10:25 UTC, some customers were unable to load the Console UI and use the CLI. There was no impact on environments or production sites. What was done to resolve the incident We have raised the connection limit between the region and API.
A fix has been implemented and we are monitoring the results.
We have detected an issue affecting our console and CLI on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. Some projects on affected region may experience inability to access the Console UI, connect via SSH and use the CLI. We will update you as soon as we have further information.
Report: "Partial Outage on fr-3.platform.sh"
Last update## **What happened** Our rate limiter started throttling connections between the region and our API resulting in an inability to load the Console UI and use the CLI in some projects. This incident did not affect live environments on Grid or Dedicated infrastructure. ## **Customer Impact** Between 2024-09-12 07:25 and 10:25 UTC, some customers were unable to load the Console UI and use the CLI. There was no impact on environments or production sites. ## **What was done to resolve the incident** We have raised the connection limit between the region and API.
This incident has been resolved.
We have detected an issue affecting our console and CLI on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience inability to access the Console UI, connect via SSH and use the CLI. We will update you as soon as we have further information.
Report: "Partial Outage on CA-1"
Last update## **What happened** A large increase in incoming connections to the CA-1 region caused sites in the region to become intermittently available. This incident did not affect Dedicated clusters. ## **Customer Impact** Sites were intermittently available from 15:42 to 16:07 UTC. ## **What was done to resolve the incident** We have taken steps to block malicious traffic.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We have detected an issue affecting service on the CA-1 regions. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience timeouts and slow response times. We will update you as soon as we have further information.
Report: "Dedicated clusters hosted in Orange FE are alerting"
Last update**What happened** Upstream infrastructure provider encountered network issues that resulted in packet loss. **Customer Impact** Some projects running on dedicated infrastructure suffered degraded performance for around an hour and a half. **What was done to resolve the incident** Upstream provider applied a fix.
This incident has been resolved.
Orange FE has implemented a fix and we are monitoring the results. The alerts on our monitoring have cleared.
We have contacted Orange FE for further investigation.
We are investigating of dedicated clusters hosted in Orange FE alerting of time outs.
Report: "Console & API Issue in regions EU-5 and FR-4"
Last update**WHAT HAPPENED** We identified an issue in some build hosts which affected metadata and metrics in affected projects. **CUSTOMER IMPACT** No live sites were impacted unless there was a deployment that encountered an error during this period Some customers may not have been able to access or encountered performance degradation with the console, CLI, SSH, metrics and deployments from 2024-09-19 08:33 to 2024-09-23 23:01 UTC. **WHAT WAS DONE TO RESOLVE THE INCIDENT** The underlying issue was corrected and any projects that did not heal automatically were corrected with a recent backup.
This incident has been resolved.
We have seen no new instances of issues related to this incident, but we will continue in the monitoring phase for a short time longer.
We are continuing to monitor for any further issues.
Platform.sh teams have resolved the issue regarding the affected projects. Console, CLI and API-related tasks should be working for all the projects in affected regions. We will be monitoring the results closely.
The recovery work is still in progress for both regions.
We are getting new issue reports for this incident. Platform.sh teams are now actively working on recovering those affected projects.
Platform.sh teams have resolved the issue regarding the affected projects. Console, CLI and API-related tasks should be working for all the projects in affected regions. We will be monitoring the results closely.
Our teams are still reviewing the issues, fixing affected projects, and working on a permanent fix.
After further internal review, we have noticed region FR-4 is also affected by this incident. The recovery work is still in progress for both regions.
We are continuing to work on a fix for this issue.
Another issue has been identified and a fix is being implemented
Platform.sh teams have deployed the mitigation for this issue and are monitoring it's effects.
Platform.sh teams are continuing to deploy the mitigation for this issue.
The issue has been identified and a fix is being implemented.
We are investigating an issue affecting the Console & API on some projects in the eu-5 region. Our operations teams are aware of the issue and are taking measures to correct it. Affected projects will not be able to access their project's Console or perform API related tasks.
Report: "Outage on Orange Flexible Engine dedicated hardware."
Last update## **What happened** Upstream infrastructure provider encountered network issues that resulted in packet loss ## **Customer Impact** Some projects running on dedicated infrastructure suffered degraded performance for less than an hour. ## **What was done to resolve the incident** Upstream provider applied a fix.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
A networking equipment issue has been identified with our upstream provider and a fix is being implemented, there may still be further connectivity issues until the issue has been corrected.
Our upstream provider is still investigating the networking issue.
Our monitoring has detected issues with our cloud infrastructure provider, which affect all sites hosted on Orange Flexible Engine dedicated hardware. Our operations team has been notified, and they are investigating the issue with our provider. Projects hosted on the underlying provider Orange Flexible Engine may experience connectivity issues to and from the cluster nodes. We will update you as soon as we have further information.
Report: "Outage on fr-4.platform.sh"
Last update## **What happened** Cloud partner suffered network connectivity issues. ## **Customer Impact** Access to to Websites, Console and API was disrupted for some customers from 13:24 UTC to 15:10 UTC on 26 September 2024 ## **What was done to resolve the incident** Cloud provider implemented mitigation to restore capacity. Communications from cloud partner were actively monitored until confirmation of resolution.
Resolution applied by cloud partner has been effective, all systems are fully functional.
Our cloud partners have released an update that a networking issue has occurred and that mitigation actions have been deployed to restore service to their customers. Platform.sh teams are continuing to monitor this corrective action and will reach out to any remaining impacted customers from this event.
Platform.sh teams are continuing to investigate this issue with our cloud partners.
We have detected an issue affecting service on the fr-4.platform.sh region. This issue affects multiple production sites as well as development environments. Access to your site, project UI as well as Git and SSH access may be affected. This outage does not affect Dedicated Enterprise Clusters. We will update you as soon as we have further information.
Report: "Console and CLI issues on FR-4.platform.sh"
Last update**WHAT HAPPENED** Hosts on this region entered a degraded state. **CUSTOMER IMPACT** Some customers experienced intermittent slowness and long response times while using console, CLI, SSH and submitting deployments from 2024-09-18 09:56 to 2024-09-19 14:21 UTC. However, existing live sites were not impacted. **WHAT WAS DONE TO RESOLVE THE INCIDENT** The degraded hosts have been restored, and additional capacity was added to the region. Services such as the console, CLI, SSH, and deployments shouldn't experience slowness anymore.
We can see that no new report related to this incident. Your projects and environments should now function as expected.
A fix has been implemented at 2024-09-18 19:04 UTC and we are continuing to monitor the results.
The slowness is re-occurring and we are working on a permanent fix.
We have finished implementing the fix and we are monitoring the results.
We are still working on in implementing the fix.
The issue has been identified and a fix is being implemented. Live sites aren't being affected by this incident.
We have detected an issue affecting service on the FR-4 region. Our Operations team has been notified and is currently working to restore service. Affected projects may experience limited access to web console and CLI services, as well as unexpectedly long deployment times and difficulty accessing services via SSH connection. Deployed production sites are not affected at this time. We would recommend suspending deployments to environments on the affected region. We will update you as soon as we have further information.
Report: "Console and CLI issues on FR-4.platform.sh"
Last update**WHAT HAPPENED** Hosts on this region entered a degraded state. **CUSTOMER IMPACT** Some customers experienced intermittent slowness and long response times while using console, CLI, SSH and submitting deployments from 2024-09-18 09:56 to 2024-09-19 14:21 UTC. However, existing live sites were not impacted. **WHAT WAS DONE TO RESOLVE THE INCIDENT** The degraded hosts have been restored, and additional capacity was added to the region. Services such as the console, CLI, SSH, and deployments shouldn't experience slowness anymore.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We continue to experience high CPU usage on build hosts. We are doubling build capacity and rebalancing projects to recover from API/CLI slowness.
We have detected an issue affecting service on the FR-4 region. Our Operations team has been notified and is currently working to restore service. Affected projects may experience limited access to web console and CLI services, as well as unexpectedly long deployment times and difficulty accessing services via SSH connection. Deployed production sites are not affected at this time. We would recommend suspending deployments to environments on the affected region. We will update you as soon as we have further information.
Report: "Partial Outage on UK-1"
Last update**What happened** Cloud provider incident [https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF](https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF), this led to connectivity loss in London GCP region hence hence making projects on uk-1 region temporarily inaccessible. **Customer Impact** Between 13:23 and 13:33 UTC on 2024-08-12, some environments were inaccessible due to the GCP incident in the UK-1 region.
We've seen no further issues, however we continue to investigate for the post-mortem report.
There has not been further alerts and we are monitoring the region. We are still investigating for the post mortem report.
We are continuing to work on a fix for this issue.
All alerts have cleared and we are continuing to investigate.
We have detected an issue affecting service on the UK-1 regions. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience downtime. We will update you as soon as we have further information.
Report: "Partial Outage on US-3 region"
Last update### What happened A host on the region entered a degraded state. This incident did affect live environments on a single host on our Grid infrastructure. ### Customer Impact Between 2024-07-31 06:30 and 06:58 UTC, environments on the degraded host were impacted due to the project moving host, which then resulted in a site outage for the environments. ### What was done to resolve the incident The degraded host was recovered, and service and activities on impacted environments were then able to resolve successfully, and to restore service to impacted sites.
This incident has been resolved.
We have isolated the single host causing this issue and projects are now online and responding.
We have detected an issue affecting service on the US-3 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience web request timeouts. We will update you as soon as we have further information.
Report: "Partial Outage on CA-1 region"
Last update### What happened A host on the region entered a degraded state. ### Customer Impact Environments on the degraded host were impacted in the form of stuck activities \(such as a backup activity\) which then resulted in a site outage for the environments in question. ### What was done to resolve the incident The degraded host was recovered, and subsequent stuck activities on impacted environments were then able to resolve successfully, and to restore service to impacted sites.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are investigating an issue that has resulted in outages for some sites hosted on the CA-1 region. Our Operations team has been notified and is currently working to restore service. We will update you as soon as we have further information.
Report: "Deployments failing on Dedicated Enterprise Gen 2 clusters"
Last update## **What happened** A recent code change in the deployment daemon running on Dedicated Enterprise Gen 2 clusters resulted in an inability to finish deployments. ## **Customer Impact** The issue has first been observed at 08:54 UTC and a fix has been rolled out at 12:27 UTC. ## **What was done to resolve the incident** Deployment daemon has been patched.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We have detected an issue affecting deployments on Dedicated Enterprise Gen 2 clusters. Our Operations team has been notified and is currently working to restore service. Please refrain from deployments until further notice. We will update you as soon as we have further information.
Report: "API server outage"
Last update## What happened After recent maintenance work, the API server was not able to communicate with internal database servers. ## Customer Impact Customers were unable to use the CLI or console, which included connecting to your project with SSH, and other activities like submitting deployments for a period of 40 minutes. The availability of your live sites were not affected. ## What was done to resolve the incident Our team quickly discovered the DB connection issues and made those DB servers available for the API server.
The CLI, console and API services should now function as expected. This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have detected an issue affecting CLI, console and API usage. Our engineers are actively investigating this issue.
Report: "Partial Outage on UK-1"
Last update### What happened We observed an extreme spike in traffic to our UK-1 region, which affected the availability of some projects. ### Customer Impact Some projects may have been unavailable for around 30 minutes until mitigation actions were in place. ### What was done to resolve the incident Our team investigated the issue and implemented mitigations to restore normal service.
We recently observed an extreme spike in traffic to our UK-1 region.
Report: "Partial Outage on FR-4"
Last update## **What happened** A large increase in incoming connections to the FR-4 region caused sites in the region to become intermittently available. This incident did not affect Dedicated clusters. ## **Customer Impact** Sites were intermittently available from 12:00 to 14:00 UTC. ## **What was done to resolve the incident** We have added extra gateway hosts to distribute the load and taken steps to block traffic.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We have detected an issue affecting service on the FR-4 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience intermittent connectivity issues or timeouts. We will update you as soon as we have further information.
Report: "Partial Outage on EU-5"
Last update## **What happened** A large increase in incoming connections to the EU-5 region caused sites in the region to become intermittently available. This incident did not affect Dedicated clusters. ## **Customer Impact** Sites were intermittently available from 14:30 to 15:00 UTC. Some customers may have also had issues connecting to the project management console during that time. ## **What was done to resolve the incident** We have added extra gateway hosts to distribute the load and taken steps to block traffic.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We have detected an issue affecting service on the EU-5 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience intermittent connectivity issues or timeouts. This also impacts the project management console for some users. Other regions will not have any impacts to site availability. We will update you as soon as we have further information.
Report: "Accounts / Auth systems outage"
Last update### WHAT HAPPENED Based on reports from our monitoring system, the internal accounts and auth subsystems were not able to give timely responses to certain API calls. Our team identified a lock contention issue within the project information database system. This incident did NOT affect live environments on Grid or Dedicated infrastructure. However, SSH, CLI and console features including deployments may have been temporarily unavailable. ### CUSTOMER IMPACT Between 2024-05-25 02:28 and 03:56 UTC , some customers may have experienced issues while using the SSH / console / CLI / submitting deployments. This incident did NOT affect live environments on Grid or Dedicated infrastructure. ### WHAT WAS DONE TO RESOLVE THE INCIDENT Our team have restarted those deadlocked processes to make the accounts and auth subsystems available. Accounts and Auth subsystems outages are now resolved. Further investigation on this lock contention issue will be conducted and a fix will be implemented to optimize our subsystems further.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue. This issue does not impact existing live sites / environments. However, several functions including SSH, console, CLI may not be functioning as expected.
Report: "Partial Outage on CA-1 region"
Last update**What Happened** We detected a substantial drop in connections reaching the CA-1 region from "14:29 UTC" to "15:39 UTC" on May 11th, 2024. **Customer Impact** Customers experienced intermittent availability during this time as connections were dropped, and requests failed to reach the CA-1 origin. **Incident Resolution** We suspect a transient network failure with our upstream provider, or another network layer upstream, was responsible for this incident.
This incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We have detected an issue affecting service on the CA-1 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience unresponsive site. We will update you as soon as we have further information.
Report: "Partial Outage on EU 4"
Last update### What Happened: We detected a partial outage on the EU-4 region, and our investigation identified a host in the region was not operating normally. ### Customer Impact: A small number of customers may have experienced an outage on any environment that was residing on the abnormal host. This may have included Production environments. ### Incident Resolution: Our operations and engineering team isolated the host and manually fixed any environment that did not automatically recover.
All affected project and cluster are successfully recovered.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We have detected an issue affecting service on the EU 4 region. Our Operations team has been notified and is currently working to restore service. We will update you as soon as we have further information.
Report: "Partial Outage on FR-1 region"
Last update**What Happened:** On May 11th, at about 14:50 UTC, our upstream provider, experienced an internal incident that affected the availability of their Virtual Machines \(VMs\) and Object Storage Devices \(OSDs\). **Customer Impact:** This incident affected all container types \(production, staging, development\) in the FR-1 region, impacting multiple projects. Affected projects experienced container outages, including production containers in some cases. **Incident Resolution:** During the incident, our engineers actively communicated with the upstream provider to ensure a swift resolution. On May 12th, at about 20:30 UTC, following the underlying provider full recovery, our engineers ensured all production environments were fully operational. On May 13th, at about 07:00 UTC, our engineers conducted a comprehensive review of all affected components to confirm their full functionality and address any outstanding technical issues.
This incident has been resolved.
All services are currently operational. Our engineering team is conducting a comprehensive review of all affected components to confirm their full functionality and address any outstanding technical issues.
All affected projects data and environments have been recovered successfully. Some metrics services may still be unavailable.
Our upstream provider has recovered impacted hosts. We are monitoring the results.
Most impacted hosts and projects have been fully recovered. The upstream provider is still working on full recovery.
Our hosts have been recovered and all projects are operational. The upstream provider is still working on full recovery of their data centre.
The upstream provider has recovered our core hosts and the projects are back online.
We are continuing to work with our upstream provider on recovery.
We are continuing to work with our upstream provider on recovery.
Our upstream provider has encountered an unexpected temperature increase which required the emergency shutdown of the servers impacted by this incident. They are actively working on recovery.
We are continuing to work on a fix for this issue.
The issue has been identified and we are working on recovering the affected hosts.
We have detected an issue affecting service on the FR-1 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience slowness and unresponsive sites. We will update you as soon as we have further information.
Report: "Downtime for some sites on the CA-1 region"
Last update**What Happened**: The incident was triggered when the incoming request gateway on cluster CA-1 depleted its resources, caused by an unforeseen surge in usage. This led to the gateway being unable to process incoming traffic effectively. **Customer Impact**: Approximately 50 projects were affected during a 30-minute period, experiencing intermittent downtime ranging from 5 to 10 minutes. This disruption impacted the operational functionality of these projects, leading to degraded service availability. **Incident Resolution**: To address this issue, we have permanently increased the resource capacity of the incoming request gateway to better handle similar spikes in traffic in the future. Additionally, we have identified the source of the unexpected usage and are implementing measures to prevent a recurrence of this problem.
All services are fully operational. No further downtime was registered in the past 2 hours.
During our first investigation round, we have identified that the downtime could have been related to resource constraints on our incoming gateway. We have permanently increased the gateway capacity to prevent similar issues in the future.
We have detected an issue intermittently impacting approximately 50 production environments between 07:15 and 07:45 AM UTC. The affected environments experienced downtime ranging from 5 to 10 minutes. We are actively investigating issue.
Report: "API and CLI timeouts"
Last update**WHAT HAPPENED** Users were unable to interact with projects through web console and CLI due to issues with the authentication backend. This incident did not affect live environments on Grid or Dedicated infrastructure. After investigation, our engineers identified errors in logs which lead to following remedial actions: * Backend was restarted * Underlying hosts were replaced * Additional capacity was added **CUSTOMER IMPACT** Between 2024-05-01 01:48 UTC and 03:21 UTC, customers were unable to access console or use CLI There was no impact on environments or production sites. **WHAT WAS DONE TO RESOLVE THE INCIDENT** The authentication backend was restarted and additional capacity was added.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue as reoccured and our team is investigating
We identified an issue with our Auth system. Our team has performed some work to restore service availability and we're monitoring the situation
We are investigating a report of outages on our API and CLI
Report: "Partial Outage on eu-2.platform.sh"
Last update**WHAT HAPPENED** Environments were in unexpected states during the planned maintenance event for region [eu-2.platform.sh](http://eu-2.platform.sh) [https://status.platform.sh/incidents/97ls23652mws](https://status.platform.sh/incidents/97ls23652mws) Affected environments including live sites were not available for use during the incident, this also includes functionalities like SSH accesses / deployments. Dedicated clusters were **NOT** affected by this incident. **CUSTOMER IMPACT** Some customers' environments were not available for up to 9 hrs and 12 minutes in total. \(Worst case estimate for all live & non-live environments\) Start: 2024-04-24 21:50 **UTC** End: 2024-04-25 07:02 **UTC** **WHAT WAS DONE TO RESOLVE THE INCIDENT** The unexpected states had been corrected by our engineers and now the environments should function as expected. Our engineering team will also be investigating a potential internal bug responsible for those unexpected states causing this incident. We will be improving the internal subsystems in order to minimize the negative impact from these planned maintenance events in the future.
Our engineers have deployed a fix to correct the issues. All environments in this region should now function as expected.
All live / default environments should now function as expected.
We have detected an issue affecting service on the eu-2.platform.sh region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience outages. We will update you as soon as we have further information.
Report: "Gateways unresponsive on uk-1.platform.sh"
Last update**What Happened** On March 29, 2024, at 07:23 UTC, a large increase in incoming connections to the [uk-1.platform.sh](http://uk-1.platform.sh) region caused the gateways to become unresponsive. Availability of sites/environments in the region was impacted. This incident did not affect Dedicated clusters. **Customer Impact** Grid projects on the [uk-1.platform.sh](http://uk-1.platform.sh) region were affected, resulting in temporary outages and reduced performance for customer sites. Sites availability and performance was shortly degraded. **What Was Done to Resolve the Incident** To address the issue, our engineers initially blocked the majority of the malicious \(DDoS\) traffic at the time and made various configuration and software changes in our internal gateway systems to handle these situations better. They continued monitoring the changes to ensure system stability and performance.
This incident has been resolved.
A large increase in incoming connections to the uk-1.platform.sh region caused the gateways to become unresponsive. Traffic has been blocked and we're seeing gateway performance returning to normal. We're continuing to monitor the situation and will take additional action if needed.
Report: "Partial Outage on fr-3.platform.sh"
Last update## What happened Upstream provider had planned network maintenance causing unexpected interruption to our services in region `fr-3.platform.sh` ## Customer Impact All environments in region `fr-3.platform.sh` were unreachable temporarily, some of those environments may have experienced outages for more than 1.5 hours. ## What was done to resolve the incident Based on the internal monitoring alerts, our team migrated all the affected environments to the healthy hosts. And our upstream provider had also resolved the network issues so that our infrastructure can then function as expected.
This incident has been resolved.
Our upstream provider is still working on their planned network maintenance events. Given the unexpected interruption to our services, we would like to monitor this region for prolonged period of time. All environments should now function as expected. Please submit a support ticket if you need further assistance from our team.
Our team has successfully migrated affected environments to working hosts and will monitor the situation with our upstream provider
Our team is still actively working on migrating the environments away from the affected hosts.
There is a technical issue stemming from our upstream provider. They are actively investigating and recovering the availability of the service.
We have detected an issue affecting service on the fr-3.platform.sh region. Our Operations team has been notified and is currently working to restore service. Projects on affected region may experience outages. We will update you as soon as we have further information.
Report: "Gateways unresponsive on uk-1.platform.sh"
Last update## **What happened** A surge in incoming connections to the uk-1.platform.sh region rendered the gateways unresponsive, affecting the availability of sites and environments within the region. This incident did not affect Dedicated clusters. ## **Customer Impact** During the incident, which commenced at 07:23 UTC on 2024-03-29, certain sites in the uk-1.platform.sh region experienced unavailability. Sites were restored within 10 minutes, with normal operations resuming by 07:33 UTC. ## **What was done to resolve the incident** To mitigate the impact, we swiftly implemented measures to block the majority of the malicious \(DDoS\) traffic targeting the region. ## **Short term mitigations** Our engineering team is actively analyzing the attack patterns to fortify the resilience of our incoming request gateways, including reviewing rate limit configurations and adding further gateway capacity.
This incident has been resolved.
A large increase in incoming connections to the uk-1.platform.sh region caused the gateways to become unresponsive. Traffic has been blocked and we're seeing gateway performance returning to normal. We're continuing to monitor the situation and will take additional action if needed.
Report: "Gateways unresponsive on us-4.platform.sh"
Last update## What happened A large increase in incoming connections to the `us-4.platform.sh` region caused the gateways to become unresponsive. Availability of sites/environments in the region was negatively impacted. This incident did not affect Dedicated clusters. ## Customer impact Some sites in `us-4.platform.sh` region were unavailable during the incident \(starting at 04:23 **UTC** on 2024-03-27\). Most affected sites have recovered within 18 minutes \(by 04:41 **UTC**\). ## What was done to resolve the incident We have blocked some malicious \(DDOS\) traffic to the region.
Alerts for the region have cleared following filtering of the excess traffic, and the region is again stable.
Traffic has been blocked and we're seeing gateway performance returning to normal. We're continuing to monitor the situation and will take additional action if needed.
A large increase in incoming connections to the us-4.platform.sh region caused the gateways to become unresponsive again. Our Operations team has been notified and is currently investigating the issue.