Is Statuspage.io Down Right Now? Discover if there is an ongoing service outage.

Statuspage.io is currently Operational

Last checked Jul 30, 2025 6:08 UTC from Statuspage.io's official status page

Historical record of incidents for Statuspage.io

Mar 11, 2025

Report: "statuspage.io domain not resolving in Brazil region for some users"

Last update 2025-03-11T07:52:20.870Z

resolved2025-03-11T07:52:20.847Z

After a thorough investigation, we have determined that the recent connectivity issues experienced for statuspage.io domains in Brazil were isolated to specific Internet Service Providers (ISPs) within Brazil and there was no impact or issue with Statuspage’s infrastructure and DNS configurations itself. Our team is actively monitoring the situation and trying our best to reach out to the relevant ISPs to ensure that access issues are resolved for all affected users. If connectivity issues persist, and you are located in Brazil, we kindly ask that you open a support request with your internet service provider or follow our community post to resolve the issue. https://community.atlassian.com/forums/Statuspage-articles/Mitigation-steps-for-statuspage-io-domain-ISP-Issues-in-Brazil/ba-p/2951025#M274

identified2025-02-25T14:29:15.216Z

We are actively working with ISPs to resolve the current issues and to better understand the recent changes that have contributed to these problems. We also encourage our customers to contact their ISPs, as this appears to be an ISP-specific issue. Our investigation show that statuspage.io domains can be resolved using Google or Cloudflare DNS resolvers, such as 1.1.1.1 and 8.8.8.8. Please refer to the steps outlined in our community post to access all Statuspage-based domains effectively. https://community.atlassian.com/t5/Statuspage-articles/Mitigation-steps-for-statuspage-io-domain-ISP-Issues-in-Brazil/ba-p/2951025#M274

identified2025-02-20T15:43:10.284Z

Please follow the steps mentioned in the below link to mitigate the DNS issues in Brazil for statuspage.io domains: https://community.atlassian.com/t5/Statuspage-articles/Mitigation-steps-for-statuspage-io-domain-ISP-Issues-in-Brazil/ba-p/2951025#M274

identified2025-02-19T09:55:08.647Z

We have determined that the connectivity issue is due to a DNS name resolution problem affecting certain ISPs in Brazil, which are unable to resolve *.statuspage.io domains. This issue does not impact all Brazilian customers and subscribers; it varies depending on the customer's or subscriber's ISP. Customers using custom domains for their status pages are not affected. While your Statuspage remains available, ISP configurations are preventing traffic from resolving the Statuspage domain. We are exploring mitigation strategies, but we advise affected customers to contact their ISP to understand why their ISP cannot resolve *.statuspage.io domains.

investigating2025-02-14T07:07:46.698Z

We are continuing to investigate this issue.

investigating2025-02-13T19:22:01.470Z

We are currently investigating this issue.

Nov 9, 2024

Report: "Elevated Server Errors across Statuspage Services"

Last update 2024-11-09T10:10:01.340Z

resolved2024-11-09T10:10:01.297Z

This incident has been resolved.

monitoring2024-11-09T09:39:38.605Z

A fix has been implemented and we are monitoring the results.

identified2024-11-09T09:28:06.158Z

We are investigating server errors. We have isolated the problem and applied the fix.

Jun 28, 2024

Report: "Errors on the Statuspage JSM plugin"

Last update 2024-06-28T04:37:32.841Z

resolved2024-06-28T04:37:32.823Z

This issue has been resolved.

identified2024-06-28T04:07:03.250Z

The issue has been identified, and we will be fixing it shortly.

investigating2024-06-28T03:09:53.157Z

We are aware of error messages displayed by the Statuspage JSM plugin. The team is currently investigating the issue.

Jun 4, 2024

Report: "Error responses across multiple Cloud products"

Last update 2024-06-04T00:14:00.185Z

resolved2024-06-04T00:14:00.163Z

This incident has been resolved.

monitoring2024-06-03T23:26:21.306Z

We are investigating an issue with error responses for some Cloud customers across multiple products. We have identified the root cause and expect recovery shortly. In the meantime, we have enabled the alternative login option for Statuspage, so that our customer can still log in to their Statuspages.

Mar 13, 2024

Report: "Statuspage API is facing intermittent issues"

Last update 2024-03-13T10:27:01.914Z

resolved2024-03-13T10:27:01.897Z

This issue has been resolved.

monitoring2024-03-13T08:47:10.655Z

A fix has been implemented, and we are monitoring the results. The API has fully recovered.

identified2024-03-13T08:25:32.695Z

The issue has been identified, and a fix is in progress.

investigating2024-03-13T08:00:17.561Z

We are currently investigating this issue.

Feb 5, 2024

Report: "Statuspage product provisioning failing intermittently"

Last update 2024-02-05T21:00:35.722Z

resolved2024-02-05T21:00:35.701Z

We have identified and fixed the issue and now Statuspage provisioning is working for our customers.

investigating2024-02-05T18:36:08.867Z

We are investigating an issue where we are facing intermittent issues while provisioning Statuspage.

Nov 17, 2023

Report: "Intermittent issues in incidents shown in Status embed frame"

Last update 2023-11-17T17:25:26.079Z

resolved2023-11-17T17:25:26.063Z

There was an intermittent error with status embed page where cached responses were incorrectly being shown to a few customers. This is resolved as of now.

investigating2023-11-17T16:37:26.172Z

We are investigating an intermittent issue reported by a few customers about wrong data shown in the status embed frame in their sites. We are investigating this issue currently and will share updates soon.

Nov 6, 2023

Report: "Intermittent errors while accessing public Statuspages"

Last update 2023-11-06T09:36:00.111Z

postmortem2023-11-06T09:25:25.226Z

### **SUMMARY** From 06:00 UTC to 07:45 UTC on October 28, 2023, Atlassian customers using Statuspage had intermittent issues with all Statuspage functionality. The event occurred due to a database performance issue during a [scheduled database maintenance](https://metastatuspage.com/incidents/s21b66328h9j). This impacted customers in all regions. The incident was detected within one minute by monitoring the upgrade process and mitigated by rolling back to a known good snapshot which put Statuspage systems into a known good state. The total time to resolution was about one hour and 45 minutes. ### **IMPACT** The overall impact was between 06:00 UTC and 07:45 UTC October 28, 2023. This incident affected Statuspage customers from all regions and caused intermittent backend errors on all Statuspage activity including viewing pages, adding subscribers, and creating/updating events. We performed a rollback operation during recovery to return to a known good state. ### **ROOT CAUSE** The issue was caused by database performance issues after a routine database maintenance and upgrade. As a result, our backends returned intermittent errors to several user requests. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We take the utmost care to provide a highly reliable service. We will pursue several preventive measures to ensure that this situation does not occur in the future, including: * Fixing the cause of the performance issues before future upgrades; and * Improving our testing process for database upgrades to catch potential performance issues. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved2023-10-28T08:01:08.533Z

Issue is now resolved and everything is back to normal working state.

monitoring2023-10-28T07:55:42.978Z

Update: We have fixed the issue and are monitoring actively

investigating2023-10-28T07:36:24.461Z

We are currently seeing intermittent errors in viewing public Statuspages. We are investigating this problem and will provide updates shortly

Oct 6, 2023

Report: "Atlassian Account login issues"

Last update 2023-10-06T15:19:32.098Z

postmortem2023-10-06T15:19:08.763Z

### **SUMMARY** On Sep 13, 2023, between 12:00 PM UTC and 03: 30 PM UTC, some Atlassian users were unable to sign in to their accounts and use multiple Atlassian cloud products. The event was triggered by a misconfiguration of rate limits in an internal service which caused a cascading failure in sign-in and signup-related APIs. The incident was quickly detected by multiple automated monitoring systems. The incident was mitigated on Sep 13, 2023, 03: 30 PM UTC by the rollback of a feature and additional scaling of services which put Atlassian systems into a known good state. The total time to resolution was about 3 hours & 30 minutes. ‌ ### **IMPACT** The overall impact was between Sep 13, 2023, 12:00 PM UTC and Sep 13, 2023, 03: 30 PM UTC on multiple products. The Incident caused intermittent service disruption across all regions. Some users were unable to sign in for sessions. Other scenarios that temporarily failed were new user signups, profile retrieval, and password reset. During the incident we had a peak of 90% requests failing across authentication, user profile retrieval, and password reset use cases. ‌ ### **ROOT CAUSE** The issue was caused due to a misconfiguration of a rate limit in an internal core service. As a result, some sign-in requests over the limit received HTTP 429 errors. However, retry behavior for requests caused a multiplication of load which led to higher service degradation. As many internal services depend on each other, the call graph complexity led to a longer time to detect the actual faulty service. ‌ ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We are continuously improving our system's resiliency. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Audit and improve service rate limits and client retry and backoff behavior. * Improve scale and load test automation for complex service interactions. * Audit cross-service dependencies and minimize them where possible related to sign-in flows. ‌ Due to the unavailability of sign-in, some customers were unable to create support tickets. We are making additional process improvements to: * Enable our unauthenticated support contact form and notify users that it should be used when standard channels are not available. * Create status page notifications more quickly and ensure that for severe incidents, notifications to all subscribers are enabled. ‌ We apologize to users who were impacted during this incident; we are taking immediate steps to improve the platform’s reliability and availability. Thanks, Atlassian Customer Support

resolved2023-09-13T19:37:55.763Z

Between 12:45 UTC to 15:30 UTC, we experienced login and signup issues for Atlassian Accounts. The issue has been resolved and the service is operating normally. We will publish a post-incident review with the details of the incident and the actions we are taking to prevent similar problem in the future.

monitoring2023-09-13T17:37:59.380Z

We are no longer seeing occurrences of the Atlassian Accounts login errors, all clients should be able to successfully login now. We will continue to monitor.

monitoring2023-09-13T16:42:29.502Z

We can see a reduction in the Atlassian Accounts login issues after the mitigation actions were taken. We are still monitoring closely and will continue to provide updates.

monitoring2023-09-13T15:33:47.824Z

We have identified the root cause of the Atlassian Accounts login issues impacting Cloud Customers and have mitigated the problem. We are now monitoring this closely.

investigating2023-09-13T14:16:53.617Z

We are investigating an issue with Atlassian Accounts login that is impacting some Cloud customers. We will provide more details within the next hour.

Sep 18, 2023

Report: "Elevated Server Errors on Public Pages"

Last update 2023-09-18T21:17:22.115Z

resolved2023-09-18T19:19:50.467Z

This incident has been resolved.

monitoring2023-09-18T19:16:51.000Z

We have identified the issue and a fix has been implemented. We have scaled our services to mitigate the issue and are monitoring the results.

investigating2023-09-18T18:50:16.000Z

We are investigating cases of degraded performance for public pages. Pages may be failing to load or loading more slowly than normal.

Aug 30, 2023

Report: "Intermittent errors during login for some customers"

Last update 2023-08-30T06:16:10.510Z

resolved2023-08-30T06:16:10.494Z

This incident has been resolved.

identified2023-08-30T05:03:20.861Z

We have identified issues with our login system. To unblock customer login we have temporarily enabled alternate login flows for the manage portal. We will continue to monitor the situation and provide further details as we investigate further.

Jul 6, 2023

Report: "Intermittent errors during login for some customers"

Last update 2023-07-06T13:26:59.625Z

resolved2023-07-06T13:26:59.604Z

This incident has been resolved.

monitoring2023-07-06T13:14:41.410Z

A fix has been implemented, we are seeing recovery and continuing to monitor the incident.

identified2023-07-06T09:28:37.093Z

investigating2023-07-06T09:18:49.630Z

We are investigating reports of intermittent errors during login for some customers using Statuspage. We will provide more details once we identify the root cause.

Jun 16, 2023

Report: "Partial outage while accessing pages over http protocol"

Last update 2023-06-16T17:31:05.250Z

resolved2023-06-16T17:31:05.238Z

This incident has been resolved.

identified2023-06-16T08:50:57.803Z

We are in the process of rolling out a fix for impacted domains.

identified2023-06-15T08:09:33.312Z

We have identified an unintended issue with redirecting http to https for a tiny cohort of customers on SSL-enabled custom domains. This does not affect the availability of any Statuspage on a custom domain with SSL enabled - they are available via https:// .

May 31, 2023

Report: "Some Statuspage users are experiencing difficulties in creating/editing incidents"

Last update 2023-05-31T10:17:40.987Z

resolved2023-05-31T10:17:40.972Z

This incident has been resolved.

identified2023-05-31T08:20:35.477Z

The issue has been identified and a fix is being implemented.

Apr 6, 2023

Report: "Increased error rate while SMS Subscription signups"

Last update 2023-04-06T19:17:02.671Z

resolved2023-04-06T19:17:02.658Z

This incident is resolved.

investigating2023-04-06T18:24:11.522Z

Issue seems intermittent and we are continuing to investigate this issue.

investigating2023-04-06T15:52:48.990Z

We are currently investigating the issue.

Mar 23, 2023

Report: "Statuspage was unable to accept new signups"

Last update 2023-03-23T05:28:38.295Z

resolved2023-03-22T15:30:00.000Z

Between 9:09 PM PST on March 22nd and 9:46 PM PST on March 23rd, there was an issue preventing new signups for Statuspage. Customers attempting to sign up during that time may have encountered difficulties. We have since identified and resolved the issue, which only affected new signups and did not impact any existing Statuspages or their components. We appreciate your patience and understanding while we addressed this matter.

Mar 20, 2023

Report: "We are noticing some slowness/intermittent errors while loading some public pages."

Last update 2023-03-20T21:06:51.220Z

resolved2023-03-20T21:06:51.202Z

This has been resolved.

monitoring2023-03-20T20:28:23.134Z

Systems seem stable and we are monitoring now

investigating2023-03-20T19:54:27.072Z

We are noticing some slowness/intermittent errors while loading some public pages. We are investigating the errors.

Feb 27, 2023

Report: "Not able to upload images to the Manage Portal - Statuspage"

Last update 2023-02-27T18:57:21.353Z

resolved2023-02-27T18:57:21.335Z

This incident has been resolved.

investigating2023-02-27T16:06:30.916Z

We are currently experiencing an issue with uploading images on manage portal.

Feb 21, 2023

Report: "Delayed delivery of Incident notifications via email"

Last update 2023-02-21T22:29:45.665Z

resolved2023-02-21T18:00:00.000Z

We experienced a delay in delivery of notifications of incidents for a period of 15 minutes which has been resolved. We have identified the cause of this is an ongoing incident with one of our messaging vendors, and we have migrated all possible notification traffic to our alternate messaging vendors.

Feb 14, 2023

Report: "SSO enabled private pages are facing authentication issues"

Last update 2023-02-14T14:35:22.761Z

resolved2023-02-14T14:35:22.745Z

This incident has been resolved.

monitoring2023-02-14T14:09:55.350Z

A fix has been implemented and we are monitoring the results.

investigating2023-02-14T12:32:32.730Z

We are currently investigating the issue.

Feb 8, 2023

Report: "Manage Portal Outage"

Last update 2023-02-08T19:43:03.516Z

resolved2023-02-08T19:26:46.000Z

The manage portal was out of service between 11:21-11:25 AM PST. The problem has resolved itself and we have returned to normal operations.

Nov 22, 2022

Report: "Abnormal API Timeouts"

Last update 2022-11-22T22:14:34.456Z

resolved2022-11-22T20:30:00.000Z

Our API service experienced an abnormal number of timeouts affecting 0.6% of total traffic due to DNS errors between 12:30-1:10 pm PST.

Nov 9, 2022

Report: "Statuspage notifications are not being sent out"

Last update 2022-11-09T23:59:36.326Z

resolved2022-11-09T23:10:28.000Z

This incident has been resolved. All notifications have been processed. No notifications were lost.

monitoring2022-11-09T22:59:41.933Z

A fix has been implemented and we are monitoring the results.

identified2022-11-09T22:37:14.000Z

The issue has been identified and a fix is being implemented.

Sep 28, 2022

Report: "Elevated Server Errors on Public Pages"

Last update 2022-09-28T01:12:42.656Z

resolved2022-09-28T00:00:00.000Z

Between 5:09 to 5:13 PM PST, public status pages experienced elevated errors due to increased traffic.

Sep 12, 2022

Report: "Billing page inaccessible for some customers"

Last update 2022-09-12T20:49:34.309Z

resolved2022-09-12T20:49:34.296Z

This incident has been resolved.

monitoring2022-09-12T17:03:38.007Z

Billing page access should be restored for all customers. We are continuing to monitor the issue.

identified2022-09-12T16:30:24.337Z

The issue has been identified and a fix is being implemented.

investigating2022-09-12T16:06:26.520Z

A subset of Statuspage customers are unable to access the Billing page within Statuspage. We are currently investigating the issue.

Aug 2, 2022

Report: "Elevated Errors for New Email Subscriptions"

Last update 2022-08-02T05:37:56.644Z

resolved2022-08-02T00:00:00.000Z

Between 4:55pm and 10:10pm PST, users were not able to subscribe via email to statuspages. We have identified the root cause and have resolved the issue.

Jul 21, 2022

Report: "Errors accessing manage portal"

Last update 2022-07-21T17:46:49.606Z

resolved2022-07-21T17:20:00.000Z

A recent deploy was found to contain errors. Our infrastructure has been successfully rolled back to a previous version of the code, and traffic is being served as normal again.

Report: "Elevated Errors for New SMS Subscriptions"

Last update 2022-07-21T16:51:05.454Z

resolved2022-07-21T16:49:53.000Z

We have resolved issue.

investigating2022-07-21T16:37:48.724Z

We are currently investigating this issue.

Jul 18, 2022

Report: "Intermittent Errors Accessing Public Pages Due to Elevated Traffic"

Last update 2022-07-18T12:31:07.153Z

resolved2022-07-18T11:30:00.000Z

Due to elevated traffic, we experienced intermittent timeouts and errors in serving public pages between 4:24 to 4:25 AM PST. In response, we have implemented updates to our services to mitigate the same issue in the future.

Jun 27, 2022

Report: "Login failures for manage portal"

Last update 2022-06-27T01:31:46.868Z

resolved2022-06-27T01:31:46.853Z

This incident has been resolved.

investigating2022-06-27T00:35:07.000Z

We are still investigating the root cause of the issue. Users who have attempted to login to manage their Statuspages, should be receiving an email with a login link which they can use to login temporarily instead of the normal login flow.

investigating2022-06-27T00:01:01.003Z

We are currently investigating failed logins when users are accessing Statuspage's manage portal.

May 16, 2022

Report: "Incorrect AWS Component Emails"

Last update 2022-05-16T22:31:26.275Z

resolved2022-05-16T22:31:26.258Z

This incident has been resolved.

identified2022-05-16T20:45:46.531Z

We are continuing to work on a fix for the incorrect emails. If any other emails are received about AWS components please disregard them

identified2022-05-16T19:41:22.677Z

We have identified the cause of the emails and are working on a resolution now

investigating2022-05-16T19:00:58.906Z

A number of incorrect emails were sent out saying AWS third party components will no longer receive updates. These components have not been impacted and we are investigating the root cause of the emails now.

May 14, 2022

Report: "Login failures"

Last update 2022-05-14T20:16:03.818Z

resolved2022-05-14T20:16:03.803Z

This incident has been resolved.

monitoring2022-05-14T19:48:06.637Z

A fix has been implemented and we are monitoring the results.

identified2022-05-14T19:32:04.041Z

We have identified the issue and are working on a resolution now

investigating2022-05-14T19:16:15.378Z

We are currently investigating failed logins when users are accessing Statuspage

Apr 29, 2022

Report: "Multiple sites showing down/under maintenance"

Last update 2022-04-29T20:34:10.545Z

postmortem2022-04-29T20:34:01.795Z

Earlier this month, several hundred Atlassian customers were impacted by a site outage. We have published a Post-Incident Review which includes a technical deep dive on what happened, details on how we restored customers sites, and the immediate actions we’ve taken to improve our operations and approach to incident management. [https://www.atlassian.com/engineering/post-incident-review-april-2022-outage](https://www.atlassian.com/engineering/post-incident-review-april-2022-outage)

resolved2022-04-17T22:06:00.000Z

We have restored impacted Statuspage customer sites and the service is operating normally. If you need assistance, please reply to your support ticket so that our engineers can work with you. If you have any trouble accessing your support ticket, contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu) Our teams will be working on a detailed Post Incident Report to share publicly by the end of April.

monitoring2022-04-17T19:12:06.368Z

identified2022-04-17T04:19:44.000Z

We have now restored 99% of users impacted by the outage and have reached out to all affected customers. Our teams are available to help customers with any concerns. If you need assistance, please reply to your support ticket so that our engineers can work with you. If you have any trouble accessing your support ticket, contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

identified2022-04-16T20:12:25.141Z

We have now restored 85% of users impacted by the outage and will continue to get sites back to customers for validation, over the weekend. As we hand your restored site over to you for validation, please reach out to our teams should you find any issues so that our support engineers can work to get you fully operational. You can contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

identified2022-04-16T01:42:32.542Z

We have now restored 78% of users impacted by the outage as we continue to move with more speed and accuracy. Our teams will continue to restore sites through the weekend, and we expect to have all sites restored no later than end of day Tuesday, April 19th PT. As we restore your site and hand it over to you for validation, please reach out to our teams should you find any issues so that our support engineers can work to get you fully restored. You can contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

identified2022-04-15T20:13:37.626Z

We have made significant progress over the last 24 hours and have now restored functionality for 62% of users impacted by the outage. We have also doubled the size of the batches we are pushing through the restoration process, which was a result of optimizing automated processes as well as accelerating our restoration speed. Our global engineering teams continue to work 24/7, and we expect to progress quickly through technical restoration of remaining customer sites over the weekend. If you do not have access to your open ticket, please contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

identified2022-04-15T03:06:12.147Z

We have now restored functionality for 55% of users impacted by the outage. With automation in full effect, we have significantly increased the pace at which we are conducting technical restoration of affected customer sites, and we have reduced the time required for the validation of restored sites by half. If you are still experiencing an outage and do not have access to your open ticket, please contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

identified2022-04-14T20:14:32.234Z

We have now restored functionality for 53% of users impacted by the outage. As outlined in yesterday’s update, we are restoring affected customers using a three step process: 1. Technical restoration of affected sites 2. Internal validation of restored sites 3. Validating with affected customers before enabling their users By automating some of our validation steps, we have now reduced time for internal validation of restored sites by half, which allows our support engineers to more quickly engage restored customers for validation and full site handover. If you are still experiencing an outage and do not have access to your open ticket, please contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

identified2022-04-14T15:53:27.407Z

We have restored functionality for 49% of users impacted by the outage. We are taking a batch-based approach to restoring customers, and to-date, this process has been semi-automated. We are beginning to shift towards a more automated process to restore sites. That said, there are still a number of steps required before we hand a site to customers for review and acceptance. We are restoring affected customers identified by a mix of multiple variables including site size, complexity, edition, tenure, and several other factors in groups of up to 60 at a time. The full restoration process involves our engineering teams, our customer support teams, and our customer, and has three steps: 1. Technical restoration involving meta-data recovery, data restores across a number of services, and ensuring the data across the different systems is working correctly for product and ecosystem apps 2. Verification of site functionality to ensure the technical restoration has worked as expected 3. Lastly, working directly with the affected customer to enable them to verify their data and functionality before enabling for their users We have also contacted all customers who are *up next* for step 3 in the site restoration process described above. These customers are aware that they are next in queue through their support ticket and/or via a support engineer. We have proactively reached out to technical contacts and system admins at all impacted customers, and opened support tickets for each of them. However, we learned that some customers have not yet heard from us or engaged with our support team. If you are experiencing an outage and do not have access to your open ticket, please contact us through our (choose the Billing, Payments, & Pricing options from the drop down menu): https://support.atlassian.com/contact/#/ For more information from our engineering team, please read our update from our CTO, Sri Viswanath: https://www.atlassian.com/engineering/april-2022-outage-update

identified2022-04-12T13:48:53.804Z

The team is moving through the restoration process this week and is accelerating toward recovery. Functionality for 40% of impacted users has been restored.

identified2022-04-11T16:04:52.237Z

A small number of Atlassian customers continue to experience service outages and are unable to access their sites. Our global engineering teams are working 24/7 to make progress on this incident. At this time, we have rebuilt functionality for over 35% of the users who are impacted by the service outage, with no reported data loss. The rebuild stage is particularly complex due to several steps that are required to validate sites and verify data. These steps require extra time, but are critical to ensuring the integrity of rebuilt sites. We apologize for the length and severity of this incident and have taken steps to avoid a recurrence in the future.

identified2022-04-11T12:11:35.979Z

identified2022-04-11T09:01:35.080Z

identified2022-04-11T00:50:54.984Z

A dedicated team continue to work 24/7 to expedite service recovery. Restoration of all customers remains our top priority. We hear and appreciate all the feedback from our valued customers and are taking every necessary step to both restore full service and ensure site integrity as soon as possible.

identified2022-04-10T19:45:29.970Z

We are still working 24/7 to restore service to affected customers. We have restored partial access for some customers and will be continuing to restore access into next week.

identified2022-04-10T13:14:38.748Z

We continue to work 24/7 to restore service to affected customers. We have restored partial access for some customers and will be continuing to restore access into next week.

identified2022-04-10T09:35:20.468Z

Our teams are committed to restoring each customer’s service as soon as possible and are working through the weekend toward recovery.

identified2022-04-10T04:05:19.090Z

Our teams are committed to restoring each customer’s service as soon as possible and are working through the weekend toward recovery.

identified2022-04-09T22:23:45.069Z

The restoration process is underway. At this time we have no new significant updates, but the team continues to work around the clock to bring our customers back online.

identified2022-04-09T18:25:22.267Z

The restoration process is underway. At this time we have no new significant updates, but the team continues to work around the clock to bring our customers back online.

identified2022-04-09T14:33:40.322Z

Our team is working 24/7 to progress through site restoration work. Core functionality has been restored across a number of sites. We are continuously improving the process with the aim of accelerating the restoration process from here.

identified2022-04-09T10:52:14.546Z

identified2022-04-09T01:36:49.498Z

The team is continuing the restoration process through the weekend and working toward recovery. We are continuously improving the process based on customer feedback and applying those learnings as we bring more customers online.

identified2022-04-08T20:28:21.633Z

Restoration work to restore sites is underway and will continue into the weekend. We are taking a controlled and hands-on approach as we gather feedback from customers to ensure the integrity of these site restorations.

identified2022-04-08T17:11:05.222Z

identified2022-04-08T15:30:30.205Z

We have started successfully restoring sites and continue to work on restoration to a wider cohort of customers. We are taking a controlled and hands-on approach as we gather feedback from customers to ensure the integrity of these site restorations.

identified2022-04-08T10:59:47.564Z

identified2022-04-08T07:53:17.069Z

identified2022-04-08T01:27:42.395Z

We continue to work on partial restoration to a cohort of customers. The plan is to take a controlled and hands-on approach as we gather feedback from customers to ensure the integrity of this first round of restorations remains the same from our last update

identified2022-04-07T21:51:19.518Z

We continue to work on partial restoration to the first cohort of customers. The plan to take a controlled and hands-on approach as we gather feedback from customers to ensure the integrity of this first round of restorations remains the same from our last update.

identified2022-04-07T18:43:40.013Z

We are beginning partial restoration to a cohort of customers. The early stages of this process will be controlled and hands-on, as we work with customers live to get feedback and ensure that restoration is working correctly before we accelerate the process for the next cohort. We will continue to post updates here as we move the process along.

identified2022-04-07T12:26:41.141Z

We are continuing work in the verification stage on a subset of instances. Once reenabled, support will update accounts via opened incident tickets. Restoration of customer sites remains our first priority and we are coordinating with teams globally to ensure that work continues 24/7 until all instances are restored.

identified2022-04-07T09:34:26.382Z

identified2022-04-07T04:19:40.307Z

identified2022-04-06T21:26:05.078Z

We are continuing to work on the resolution of the incidents for some Statuspage, Jira Work Management, Jira Service Management, Confluence, Jira Software, Atlassian Access, Jira Product Discovery, and Opsgenie Cloud customers.

identified2022-04-05T21:11:56.336Z

We have partially reactivated the Statuspages of affected customers. The hosted pages should be up, and the API capabilities have been restored so affected customers can use this to manage their pages while work is done to restore access to the manage portal. We have defined two processes to resolution of the issues impacting some customers. These processes each involve multiple stages of work. We are currently working on one of the processes and we will provide more detail as we progress through resolution.

identified2022-04-05T13:45:01.985Z

The issue has been identified and a fix is being implemented.

Report: "Missing Metrics for a Subset of Status Pages"

Last update 2022-04-29T16:08:46.120Z

resolved2022-04-29T16:08:46.105Z

This incident has been resolved.

monitoring2022-04-28T21:46:31.729Z

We have released a fix for the metrics display problem. Affected metrics displays should have returned to a functioning state. We are still actively engaged with our vendor and are working with their engineering team on fixing the root cause of this issue.

identified2022-04-28T00:50:11.089Z

We are still actively engaged with our vendor while their engineering teams work on the issue. Unfortunately, alternative workarounds did not produce acceptable results. Once again, this is purely a display problem and we are not experiencing any data loss.

identified2022-04-26T01:54:10.563Z

We have identified a metrics display problem for a majority of status pages due to a vendor issue. This is purely a display problem and we are not experiencing any data loss. We are working to return the service to normal operations.

investigating2022-04-25T22:47:43.203Z

We are currently investigating an issue with missing system metrics for a subset of status pages.

Apr 1, 2022

Report: "Delay in System Metrics"

Last update 2022-04-01T00:51:04.691Z

resolved2022-04-01T00:51:04.675Z

This incident has been resolved.

monitoring2022-03-31T23:18:19.595Z

We have identified and resolved all issues with third-parties and we are now monitoring and continuing to process any delayed metrics.

identified2022-03-31T19:18:36.000Z

Based on additional reporting and investigation, we have found that System Metrics is experiencing delays between 5 to 30 minutes. We are continuing to work with our vendor to return the service to normal operations.

identified2022-03-31T00:05:17.874Z

We have identified the issue and we’re working with our vendor to return the service to normal operations.

investigating2022-03-30T22:07:02.335Z

System metrics is experiencing a delay of up to 5 to 10 minutes. We are currently investigating this issue.

Mar 22, 2022

Report: "Issues With Login"

Last update 2022-03-22T15:58:53.337Z

postmortem2022-03-22T15:58:19.111Z

### **SUMMARY** On March 14, 2022, between 01:05pm and 01:47pm UTC, some Atlassian customers were unable to login to our products including Trello and Statuspage, and could not access some services including the ability to create support tickets. The underlying cause was a newly introduced configuration data store that did not scale up properly due to a misconfiguration of autoscaling. The incident was detected by Atlassian's automated monitoring system and mitigated by disabling the use of the new configuration datastore which put our systems into a known good state. The total time to resolution was approximately 42 minutes. ### **IMPACT** The overall impact was between March 14 2022, 01:05 PM UTC and March 14, 2022, 01:47 PM UTC across seven products and services. The bug impacted several of the key dependent services which resulted in an outage for end users, leading to failed logins across the following products and services: * [**getsupport.atlassian.com**](http://getsupport.atlassian.com) * [**confluence.atlassian.com**](http://confluence.atlassian.com) * [**jira.atlassian.com**](http://jira.atlassian.com) * [**partners-jira.atlassian.com**](http://partners-jira.atlassian.com) * [**community.atlassian.com**](http://community.atlassian.com) * [**manage.statuspage.io**](http://manage.statuspage.io) * [**trello.com**](http://trello.com) * [**university.atlassian.com**](http://university.atlassian.com) ### **ROOT CAUSE** The issue was caused by an underlying configuration data store based on AWS DynamoDB failing to scale up. During post-setup fine-tuning it was identified that initial values for the read capacity units \(RCUs\) and write capacity units \(RCUs\) were over-provisioned. As a result a decision was made to decrease them however the resulting values proved to be insufficient to handle the increased traffic in our system. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We're prioritizing the following improvement actions to avoid repeating this type of incident: * Fix the configuration so that the new configuration data store dynamically scales-up regardless of the size of the incoming traffic. * Conduct more thorough capacity planning and load testing. * Improve the resilience of the system by adding fallbacks to our secondary data store. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved2022-03-14T14:28:08.778Z

This incident has been resolved.

monitoring2022-03-14T14:10:31.268Z

We have identified and mitigated an issue with users logging in to Statuspage starting at 6:05am PST and ending at 6:46am PST and are monitoring the results.

Mar 9, 2022

Report: "Statuspage Cache Invalidation Delayed"

Last update 2022-03-09T05:17:09.705Z

resolved2022-03-09T05:17:09.691Z

This incident has been resolved.

identified2022-03-09T04:23:48.974Z

We are continuing to work on a fix for this issue.

identified2022-03-09T04:08:26.000Z

Content changes to public status pages (e.g. incident creation and updates) were delayed by up to 15 minutes, for changes that were made beginning Mar. 4, 1pm PST. Users may have briefly seen out-of-date content when viewing public status pages during this time. A bug was identified in our cache invalidation layer and a fix is currently being deployed.

Feb 7, 2022

Report: "Public API experiencing increased errors"

Last update 2022-02-07T19:34:57.689Z

resolved2022-02-07T18:00:00.000Z

Increased load caused API service to experience errors from 10:03 - 10:15 PST.

Jan 26, 2022

Report: "Intermittent errors accessing public pages due to elevated traffic"

Last update 2022-01-26T23:15:35.094Z

resolved2022-01-26T13:57:00.000Z

Due to elevated traffic, we experienced intermittent timeouts and errors in serving public pages between 5:57 and 5:58 AM PST. We have made updates to our services to prevent similar problems from happening.

Jan 11, 2022

Report: "Server errors from the Manage Portal"

Last update 2022-01-11T01:04:59.407Z

resolved2022-01-10T13:30:00.000Z

Between 5:20 to 5:24 AM PST, a slow performing API query resulted in an elevated number of 500 errors on the manage portal. The root cause of the degraded performance has been identified and a fix has been deployed. The manage portal is now operating normally.

Jan 6, 2022

Report: "Decrease in site availability due to errant database migration"

Last update 2022-01-06T00:11:00.164Z

resolved2022-01-06T00:11:00.134Z

This incident has been resolved.

identified2022-01-05T23:39:58.151Z

A data migration erroneously dropped indexes that were still in use, causing decrease in availability for a subset of all inbound requests.

Dec 16, 2021

Report: "Elevated server errors"

Last update 2021-12-16T17:05:15.863Z

resolved2021-12-16T16:00:00.000Z

Increased load caused API service to experience errors from 8:15 to 8:45 AM PST.

Report: "Infrastructure Issues - Billing and signup impacted"

Last update 2021-12-16T16:57:47.747Z

postmortem2021-12-16T16:56:37.274Z

### **SUMMARY** On December 7, 2021, between 15:54 UTC and December 8, 2021, at 01:55 UTC, Atlassian Cloud services using AWS services in the US-EAST-1 region experienced a failure. This affected customers using Atlassian Access, Bitbucket Cloud, Compass, Confluence Cloud, the Jira family of products, and Trello. Products were unable to operate as expected, resulting in partial or complete degradation of services. The event was triggered by an AWS networking outage in US-EAST-1 affecting multiple AWS services and led to the inability to access AWS APIs and the AWS management console. The incident was first reported by Atlassian Access whose monitoring detected faults accessing DynamoDB services in the region. Recovery of affected Atlassian services occurred on a service-by-service basis from 2021-12-07 21:50 UTC when the underlying AWS services also began to recover. Full recovery of Atlassian Cloud services was notified at 2021-12-08 1:55 UTC. ### **IMPACT** The overall impact occurred between December 7, 2021, between 15:54 UTC and December 8, 2021, at 01:55 UTC_._ The incident caused partial to complete service disruption of Atlassian Cloud services in the US-EAST-1 region. Product-specific impacts are listed below. The primary impact for customers of Jira Software, Jira Service Management and Jira Work Management hosted in the US-EAST-1 region, was being unable to scale up, which caused slow response times for web requests and delays in background job processing, including webhooks in the AP region. There was significant latency for customers accessing Jira. Some customers experienced service unavailability while the incident took place. Jira Align experienced an email outage for US customers due to the AWS Service outage that affected many of the AWS Services including Simple Email Service. A small percentage of Jira Align emails were not sent due to the AWS incident. Bitbucket Pipelines was unavailable and steps failed to be executed. For Jira Automation, tenant’s rules execution were delayed since CloudWatch was affected. Confluence experienced minor impact due to upstream services impacting user management, search, notifications, and media. At the same time Confluence was impacted by error rates related to the inability to scale up, and GraphQL had higher latencies. Trello email-to-board and dashcards features experienced degraded performance. Atlassian Access reported product transfers from one organization failed intermittently. Admins were not able to update features like IP Allowlist, Audit Logs, Data Residency, Custom Domain Email Notification and Mobile Application Management. Yet, users were able to access and view these features. During the incident, emails to admins experienced a delay. There was degraded experience when creating and deleting API tokens. Statuspage was largely unaffected. However, notification workers could not scale up and communications to customers were delayed, though they could be replayed later. The incident also impacted users trying to sign in to manage portals and private pages. Compass experienced a minor impact on its ability to write to its primary database store. No core features were affected. Atlassian's customers could have experienced stale data issues in production, US-EAST-1 for ~30s, against expected 5s at p99, because of delayed token resolution. The provisioning of new cloud tenants was also impacted until the recovery of the services. ### **ROOT CAUSE** The issue was caused by a problem with several network devices within AWS’s internal network. These devices were receiving more traffic than they were able to process, which led to elevated latency and packet loss. As a result, it affected multiple AWS services which Atlassian's platform relies on, causing service degradation and disruption to the products mentioned above. For more information in regards to the root cause, see [Summary of the AWS Service Event in the Northern Virginia \(US-EAST-1\) Region](https://aws.amazon.com/message/12721). There were no relevant Atlassian-driven events in the lead-up that have been identified to cause or contribute to this incident. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. We are taking immediate steps to improve the Atlassian platform's resiliency and availability to reduce the impact of such an event in the future. While Atlassian's Cloud services do run in several regions \(US EAST and WEST, AP, EU CENTRAL and WEST, among others\) and data is replicated across several regions to increase the resilience against outages of this magnitude, we have identified and are taking actions that include improvements to our region failover process. This will minimize the impact of future outages on Atlassian’s Cloud services and provide better support for our customers. We are prioritizing the following actions to avoid repeating this type of incident: * Enhance and strengthen our plans for cross-region resiliency and disaster recovery plans, including: continue practicing region failover in production, investigate and implement better resilience strategies for services, Active/Active or Active/Passive. * Improving and adopting multi-region architecture for services that do require it. * Exercise wargaming scenarios that will simulate this outage to assess customer view of the incident. This will allow us to create further action items to improve our region failover process. We apologize to customers whose services were impacted during this incident. Thanks, Atlassian Customer Support

resolved2021-12-08T01:10:13.606Z

This incident has been resolved.

monitoring2021-12-07T23:33:22.115Z

Signin to the manage portal and certain private pages will resume usual authentication through Atlassian Access

identified2021-12-07T22:24:07.912Z

Signin to the manage portal and certain private pages will take place through a link sent via email until the authentication issues have been resolved.

identified2021-12-07T21:55:37.911Z

Notification services have recovered and are operational.

identified2021-12-07T21:50:58.380Z

We're investigating issues affecting notifications. More information will be made available as soon as we can determine the cause and work toward a fix.

identified2021-12-07T21:25:25.580Z

We are continuing to work on a fix for this issue.

identified2021-12-07T20:41:02.166Z

We are continuing to work on a fix for this issue.

identified2021-12-07T19:49:53.255Z

We're investigating issues affecting authentication and sign-in.

identified2021-12-07T18:59:02.000Z

We're investigating issues affecting billing and signup, which may impact signing into the manage portal and private pages. More information will be made available as soon as we can determine the cause and work toward a fix.

Dec 15, 2021

Report: "Infrastructure Issues - Manage Portal"

Last update 2021-12-15T16:48:27.913Z

resolved2021-12-15T15:30:00.000Z

Between 7:25 am (PST) - 7:47 am (PST) users may have experienced issues accessing the manage portal. These issues should be resolved now.

Dec 10, 2021

Report: "Errant deploy. Successfully rolled back to previous version."

Last update 2021-12-10T16:29:24.738Z

resolved2021-12-10T15:39:00.000Z

A recent deploy was found to contain errors or significant performance degradations. Our infrastructure has been successfully rolled back to a previous version of the code, and traffic is being served as normal again.

Dec 9, 2021

Report: "Delays in SSL certificate provisioning"

Last update 2021-12-09T21:34:16.402Z

resolved2021-12-09T21:34:16.371Z

This incident has been resolved.

monitoring2021-12-09T21:01:41.355Z

A fix has been implemented and we are monitoring the results.

identified2021-12-09T20:28:22.594Z

The issue has been identified and a fix is being implemented.

investigating2021-12-09T20:19:56.334Z

We are currently investigating this issue.

Report: "Elevated server errors"

Last update 2021-12-09T11:57:24.439Z

resolved2021-12-09T11:00:00.000Z

Increased load caused manage service to experience errors

Dec 8, 2021

Report: "Elevated Server Errors for Jira Software Integration."

Last update 2021-12-08T04:30:17.113Z

resolved2021-12-08T04:14:25.000Z

The elevated server errors were due to a faulty certificate deployment in an upstream service. We have fixed the issue and returned the service to normal operations.

investigating2021-12-08T03:58:15.993Z

We are currently investigating this issue.

Dec 6, 2021

Report: "Elevated server errors in us east region"

Last update 2021-12-06T18:59:43.531Z

resolved2021-12-06T18:00:00.000Z

The site experienced a higher than normal amount of load, and may have caused pages to be slow or unresponsive.

Nov 18, 2021

Report: "Timeouts accessing manage portal"

Last update 2021-11-18T20:15:25.709Z

resolved2021-11-18T20:15:25.630Z

This incident has been resolved.

monitoring2021-11-18T19:29:15.573Z

A fix has been implemented and we are monitoring the results.

identified2021-11-18T18:00:54.617Z

The issue has been identified and a fix is being implemented.

investigating2021-11-18T17:16:11.189Z

We are currently investigating a small percentage of timeouts when accessing the Manage portal.