Is Opsgenie Down Right Now? Discover if there is an ongoing service outage.

Opsgenie is currently Operational

Last checked Jul 29, 2025 17:51 UTC from Opsgenie's official status page

Historical record of incidents for Opsgenie

Jun 19, 2025

Report: "Degraded performance in Opsgenie and JSM operations"

Last update 2025-06-19T15:21:43.115Z

investigating2025-06-19T15:21:43.111Z

We are investigating degraded performance issues in Jira Service Management, Opsgenie, and Compass Cloud customers. We will provide more details within the next hour.

Jun 17, 2025

Report: "Degraded performance in Opsgenie and JSM operations"

Last update 2025-06-17T12:37:02.554Z

investigating2025-06-17T12:37:02.551Z

We are investigating cases of degraded performance for some Jira Service Management and Opsgenie Cloud customers. We will provide more details within the next hour.

Jun 4, 2025

Report: "Customers may experience delays or failures receiving emails"

Last update 2025-06-04T17:17:40.199Z

investigating2025-06-04T17:17:40.196Z

We were experiencing cases of degraded performance for outgoing emails from Confluence, Jira Work Management, Jira Service Management, Jira, Opsgenie, Trello, Atlassian Bitbucket, Guard, Jira Align, Jira Product Discovery, Atlas, Compass, and Loom Cloud customers. The system is recovering and mail is being processed normally as of 16:45 UTC. We will continue to monitor system performance and will provide more details within the next hour.

May 10, 2025

Report: "Delays observed at JSM and Opsgenie alert search functionality"

Last update 2025-05-10T10:51:53.689Z

resolved2025-05-10T10:51:53.663Z

All search functionality is operational without any latency. Thank you for your patience.

monitoring2025-05-10T09:49:44.448Z

The problem is mitigated, and we are now monitoring closely.

identified2025-05-10T07:08:28.022Z

We identified degraded performance at alert search functionality for some Jira Service Management and Opsgenie Cloud customers due to the infrastructure issue from cloud provider. No impact has been observed at alert critical flows like notification. The team has taken actions to mitigate the issue and minimize the impact to search functionality

Report: "Delays observed at JSM and Opsgenie alert search functionality"

Last update 2025-05-10T07:08:00.000Z

Identified2025-05-10T07:08:00.000Z

Apr 4, 2025

Report: "Schedule API are getting timed out"

Last update 2025-04-04T10:52:31.360Z

resolved2025-04-04T10:52:31.337Z

This incident has been resolved.

investigating2025-04-04T10:05:59.948Z

We are investigating cases of degraded performance for Alert Schedules experiencing timeouts and slowness for Opsgenie Cloud customers. Requests have been taking more than 30s and some have been timing out. We will provide more details within the next hour.

Report: "Schedule API are getting timed out"

Last update 2025-04-04T10:05:00.000Z

Investigating2025-04-04T10:05:00.000Z

Mar 5, 2025

Report: "EU OpsGenie API calls having intermittent routing issues"

Last update 2025-03-05T07:24:39.022Z

resolved2025-03-05T07:24:38.999Z

Issues where some API calls configured to use api.opsgenie.com/eu were intermittently returning a 404 error with the message 'no Route matched with those values' should now be resolved. API calls to api.eu.opsgenie.com and api.opsgenie.com (without /eu) were not affected at this time.

investigating2025-03-05T06:34:27.594Z

We are aware of an issue where some API calls configured to use api.opsgenie.com/eu are intermittently returning a 404 error with the message 'no Route matched with those values'. API calls to api.eu.opsgenie.com and api.opsgenie.com (without /eu) are not affected. Our team is investigating this issue.

Jan 14, 2025

Report: "Services menu in Opsgenie is not responding"

Last update 2025-01-14T16:47:38.908Z

resolved2025-01-14T16:47:38.894Z

Our engineers have been closely monitoring the platform and are declaring this incident resolved. Thank you for your patience.

identified2025-01-14T15:55:24.903Z

Our team has identified the issue with the Services page in Opsgenie and is working to fix it.

investigating2025-01-14T15:48:15.102Z

Our engineering team is actively investigating this incident and working to bring the Opsgenie service back up as quickly as possible. Users affected by this incident may notice that Services functionality is slow or completely unavailable for the web page We will update this page as we have additional information.

Dec 16, 2024

Report: "Elevated 5XX errors in Schedule API at Opsgenie USA region"

Last update 2024-12-16T17:18:41.759Z

resolved2024-12-16T15:30:00.000Z

Our team has identified the issue in Schedule API between 15:30 UTC and 17:00 UTC. We saw performance degradation and 5XX errors in response. Faulty deployment has been reverted quickly in the USA region and rapid recovery is seen. We are monitoring the system for a full recovery right now. The Schedule API is up and running again without any data loss.

Nov 6, 2024

Report: "Opsgenie Web UI is slow or unavailable in US region"

Last update 2024-11-06T19:59:29.421Z

resolved2024-11-06T19:59:29.407Z

Our engineers have been closely monitoring the platform and are declaring this incident resolved. Thank you for your patience.

monitoring2024-11-06T19:49:36.261Z

Our engineering team has implemented fixes. We will continue to monitor all systems. Thank you for your patience.

investigating2024-11-06T19:47:02.404Z

We've noticed that Opsgenie Web UI is responding slowly or unavailable in US region. Our engineering team is actively investigating this incident and working to bring Opsgenie back up to speed as quickly as possible. We'll keep you posted with further updates on this page.

Sep 24, 2024

Report: "Reported issues with OEC functionality"

Last update 2024-09-24T20:54:18.469Z

resolved2024-09-24T20:54:18.453Z

We've verified that OEC endpoints are back online.

monitoring2024-09-24T20:49:32.985Z

We reverted the faulty routing configuration change and started getting traffic on OEC endpoints

identified2024-09-24T20:22:39.203Z

We've identified a recent change that has broken some endpoint routing configurations and caused OEC endpoint requests to be directed to wrong service. We're reverting that change on production at the moment.

investigating2024-09-24T19:56:24.821Z

We are continuing to investigate this issue.

investigating2024-09-24T19:55:17.405Z

We've been notified that a number of OEC clients have been failing to create Jira tickets. We're currently investigating the issue.

Sep 12, 2024

Report: "Users are experiencing reCaptcha errors while signing up"

Last update 2024-09-12T03:28:58.986Z

resolved2024-09-12T03:28:58.968Z

This issue has been resolved.

monitoring2024-09-12T02:40:38.770Z

We have identified the root cause and the issue appears to be resolved.

investigating2024-09-12T02:08:09.228Z

Users attempting to sign up are encountering reCaptcha errors that are preventing a successful signup.

Aug 28, 2024

Report: "Unable to edit Opsgenie rotations in US, EU and Sydney regions in Web Application"

Last update 2024-08-28T15:17:58.909Z

resolved2024-08-28T15:17:58.894Z

Our engineers have been closely monitoring the platform and are declaring this incident resolved. Thank you for your patience.

identified2024-08-28T15:02:33.137Z

We are continuing to work on a fix for this issue.

identified2024-08-28T15:01:37.697Z

Our team has identified the issue with Opsgenie Web Application / Edit Rotation feature in US, EU and Sydney regions and is working to fix it. Check back soon for another update! Our team is working hard to get the feature up and running again.

Jul 4, 2024

Report: "Some products are hard down"

Last update 2024-07-04T00:36:14.114Z

resolved2024-07-04T00:36:14.100Z

Between 03-07-2024 20:08 UTC to 03-07-2024 20:31 UTC, we experienced downtime for Opsgenie. The issue has been resolved and the service is operating normally.

monitoring2024-07-03T21:45:05.568Z

We have mitigated the problem and continue looking into the root cause. The outage was between 8:08pm 03/07 UTC - 08:31pm 03/07 UTC We are now monitoring closely.

investigating2024-07-03T20:51:01.620Z

We are investigating an issue with <FUNCTIONALITY IMPACTED> that is impacting <SOME/ALL> Atlassian, Atlassian Partners, Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira, Opsgenie, Atlassian Developer, Atlassian (deprecated), Trello, Atlassian Bitbucket, Guard, Jira Align, Jira Product Discovery, Atlas, Atlassian Analytics, and Rovo Cloud customers. We will provide more details within the next hour.

Jun 21, 2024

Report: "Intermittent error accessing content"

Last update 2024-06-21T02:47:33.881Z

resolved2024-06-21T02:47:33.862Z

Between 2024-06-20 22:04 UTC to 2024-06-20 22:28 UTC, we experienced intermittent issue for users to access the services for some Atlassian Cloud customers. The issue has been resolved and the service is operating normally.

monitoring2024-06-21T01:31:35.840Z

We have identified the root cause of the intermittent errors and have mitigated the problem. We are now monitoring closely.

investigating2024-06-21T00:17:58.059Z

We are investigating an intermittent issue with accessing Atlassian Cloud services that is impacting some Atlassian Cloud customers. We will provide more details once we identify the root cause.

Jun 11, 2024

Report: "Error responses across multiple Cloud products"

Last update 2024-06-11T00:57:03.748Z

postmortem2024-06-11T00:56:58.295Z

### Summary On June 3rd, between 09:43pm and 10:58 pm UTC, Atlassian customers using multiple product\(s\) were unable to access their services. The event was triggered by a change to the infrastructure API Gateway, which is responsible for routing the traffic to the correct application backends. The incident was detected by the automated monitoring system within five minutes and mitigated by correcting a faulty release feature flag, which put Atlassian systems into a known good state. The first communications were published on the Statuspage at 11:11pm UTC. The total time to resolution was about 75 minutes. ### **IMPACT** The overall impact was between 09:43pm and 10:17pm UTC, with the system initially in a degraded state, followed by a total outage between 10:17pm and 10:58pm UTC. _The Incident caused service disruption to customers in all regions and affected the following products:_ * Jira Software * Jira Service Management * Jira Work Management * Jira Product Discovery * Jira Align * Confluence * Trello * Bitbucket * Opsgenie * Compass ### **ROOT CAUSE** A policy used in the infrastructure API gateway was being updated in production via a feature flag. The combination of an erroneous value entered in a feature flag, and a bug in the code resulted in the API Gateway not processing any traffic. This created a total outage, where all users started receiving 5XX errors for most Atlassian products. Once the problem was identified and the feature flag updated to the correct values, all services started seeing recovery immediately. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. While we have several testing and preventative processes in place, this specific issue wasn’t identified because the change did not go through our regular release process and instead was incorrectly applied through a feature flag. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Prevent high-risk feature flags from being used in production * Improve the policy changes testing * Enforcing longer soak time for policy changes * Any feature flags should go through progressive rollouts to minimize broad impact * Review the infrastructure feature flags to ensure they all have appropriate defaults * Improve our processes and internal tooling to provide faster communications to our customers We apologize to customers whose services were affected by this incident and are taking immediate steps to address the above gaps. Thanks, Atlassian Customer Support

resolved2024-06-04T00:32:03.962Z

Between 22:18 UTC to 22:56 UTC, we experienced errors for multiple Cloud products. The issue has been resolved and the service is operating normally.

identified2024-06-03T23:11:00.907Z

We are investigating an issue with error responses for some Cloud customers across multiple products. We have identified the root cause and expect recovery shortly.

Apr 30, 2024

Report: "US - Increased delays on alert flow"

Last update 2024-04-30T16:13:13.901Z

resolved2024-04-30T16:13:13.887Z

This incident has been resolved.

monitoring2024-04-30T15:59:15.209Z

Our engineering team has implemented fixes. We will continue to monitor all systems. Thank you for your patience.

identified2024-04-30T12:59:46.415Z

We are continuing to work on a fix for this issue.

identified2024-04-30T12:52:17.751Z

We are observing delays for our alert flow. No alert has been lost and our team is actively working on it to mitigate the delays. We'll keep you posted with further updates.

Apr 3, 2024

Report: "Admin Portal Feature Access Issue"

Last update 2024-04-03T10:23:28.546Z

resolved2024-04-03T10:23:28.532Z

Between 6:30 AM UTC to 9:50 AM UTC, we experienced failures in accessing some features from the Admin Portal. The issue has been resolved and the service is operating normally.

identified2024-04-03T09:52:00.775Z

We are investigating an issue causing failures in accessing some features from the Admin Portal, which is impacting some of our Cloud customers. We have identified the root cause and anticipate recovery shortly.

Feb 29, 2024

Report: "Investigating new product purchasing"

Last update 2024-02-29T03:59:17.930Z

resolved2024-02-29T02:53:17.000Z

Between 28th Feb 2024 23:15 UTC to 29th Feb 2024 00:05 UTC, we experienced issue with new product purchasing for all products. All new sign up products have been successfully provision and confirmed issue has been resolved and the service is operating normally.

investigating2024-02-29T01:27:59.682Z

We are investigating an issue with new product purchasing that is impacting for all products. Customers adding new cloud products may have experienced a long waiting page or an error page after attempting to add a product. We have mitigated the root cause and are working to resolve impact for customers who attempted to add a product during the impact period. We will provide more details within the next hour.

Feb 27, 2024

Report: "Opsgenie SAML login at eu region is not working"

Last update 2024-02-27T17:44:30.419Z

resolved2024-02-27T14:58:57.000Z

We fixed the problem and verified that there are no more login issues.

monitoring2024-02-27T14:42:30.039Z

The team has identified the issue causing the signature validation error at EU SAML Login. The fix has been started to deploy and SAML login activities are being monitored by team currently

investigating2024-02-27T14:25:21.853Z

Opsgenie SAML Login functionality is not working at only EU region due to signature verification error at login certificates. Already logged in customers haven't been affected by the error.

Report: "Service Disruptions Affecting Atlassian Products"

Last update 2024-02-27T05:45:33.885Z

postmortem2024-02-27T05:45:32.461Z

### **Summary** On February 14, 2024, between 20:05 UTC and 23:03 UTC, Atlassian customers on the following cloud products encountered a service disruption: Access, Atlas, Atlassian Analytics, Bitbucket, Compass, Confluence, Ecosystem apps, Jira Service Management, Jira Software, Jira Work Management, Jira Product Discovery, Opsgenie, StatusPage, and Trello. As part of a security and compliance uplift, we had scheduled the deletion of unused and legacy domain names used for internal service-to-service connections. Active domain names were incorrectly deleted during this event. This impacted all cloud customers across all regions. The issue was identified and resolved through the rollback of the faulty deployment to restore the domain names and Atlassian systems to a stable state. The time to resolution was two hours and 58 minutes. ### **IMPACT** External customers started reporting issues with Atlassian cloud products at 20:52 UTC. The impact of the failed change led to performance degradation or in some cases, complete service disruption. Symptoms experienced by end-users were unsuccessful page loads and/or failed interactions with our cloud products. ### **ROOT CAUSE** As part of a security and compliance uplift, we had scheduled the deletion of unused and legacy domain names that were being used for internal service-to-service connections. Active domain names were incorrectly deleted during this operation. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. The detection was delayed because existing testing & monitoring focused on service health rather than the entire system’s availability. To prevent a recurrence of this type of incident, we are implementing the following improvement measures: * Canary checks to monitor the entire system availability. * Faster rollback procedures for this type of service impact. * Stricter change control procedures for infrastructure modifications. * Migration of all DNS records to centralised management and stricter access controls on modification to DNS records. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. ‌ Thanks, Atlassian Customer Support

resolved2024-02-14T23:32:18.761Z

We experienced increased errors on Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Align, Jira Product Discovery, Atlas, Compass, and Atlassian Analytics. The issue has been resolved and the services are operating normally.

monitoring2024-02-14T22:55:47.418Z

We have identified the root cause of the Service Disruptions affecting all Atlassian products and have mitigated the problem. We are now monitoring this closely.

identified2024-02-14T22:31:09.458Z

We have identified the root cause of the increased errors and have mitigated the problem. We continue to work on resolving the issue and monitoring this closely.

investigating2024-02-14T21:57:44.007Z

We are investigating reports of intermittent errors for all Cloud Customers across all Atlassian products. We will provide more details once we identify the root cause.

Jan 29, 2024

Report: "Major outages in heartbeat services pinging via email and email integration in EU region"

Last update 2024-01-29T15:39:48.654Z

resolved2024-01-29T15:29:45.453Z

This incident has been resolved.

monitoring2024-01-29T15:14:53.000Z

The team has reverted the changes and identified that corresponding heartbeat and email integration start working now

identified2024-01-29T15:08:32.000Z

Due to faulty domain configurations, heartbeats updates via email and incoming email services have been rejected starting from 13:45 UTC. The team has been working on the fix. Except for email updates and email integrations, heartbeat feature and integrations are still fully functional.

Dec 13, 2023

Report: "IOS based Alert Notifications delivered as non-critical"

Last update 2023-12-13T14:37:29.987Z

resolved2023-12-13T14:37:29.974Z

This incident has been resolved.

monitoring2023-12-13T14:26:54.098Z

A fix has been implemented and we are monitoring the results.

identified2023-12-13T13:18:52.076Z

The issue has been identified and a fix is being implemented.

investigating2023-12-13T13:18:25.474Z

As of 12.12.2023 14:46 UTC, we have observed that some alert notifications are not being delivered to IOS-based devices as critical. We are currently investigating the issue.

Dec 10, 2023

Report: "We observe increased error rates due to the cloud provider"

Last update 2023-12-10T03:52:07.268Z

resolved2023-12-10T03:52:07.246Z

Our engineers have been closely monitoring the platform and are declaring this incident resolved. Thank you for your patience.

monitoring2023-12-10T03:40:57.793Z

We observe increased error rates due to the cloud provider. It caused notification delays and elevated API errors for our customers. All systems are recovering now, we are monitoring.

Dec 4, 2023

Report: "Egress connectivity timing out"

Last update 2023-12-04T17:15:04.036Z

resolved2023-12-04T17:15:04.016Z

The systems are stable after the fix and monitoring for a specified duration

monitoring2023-12-04T16:59:12.429Z

The issue was identified and a fix implemented. We are monitoring currently.

identified2023-12-04T16:54:22.829Z

We are currently investigating an incident that result in outbound connections from Atlassian cloud in us-east-1 intermittently timing out. This affects Jira, Trello, Confluence, Ecosystem products. The features affected for these products are those that require opening a connection from Atlassian Cloud to public endpoints on the Internet

identified2023-12-04T16:43:28.955Z

Including Atlassian Developer

identified2023-12-04T16:33:13.636Z

We are currently investigating an incident that result in connection time outs on service egress proxy. This affects Jira, JSM, Confluence, BitBucket, Trello, Ecosystem products. The features affected for these products are those that require a connection to service egress.

Nov 24, 2023

Report: "Scheduled report functionality is disabled"

Last update 2023-11-24T14:23:24.912Z

resolved2023-11-24T14:23:24.893Z

We're excited to inform you that we've shipped upgrades to our production environment, enabling scheduled reports once again. What's Changed: To continuously improve and ensure the security of our services, we've implemented additional controls including domain restrictions and a limitation on the number of recipients per email. This is specifically for mitigation purposes. From now on, users will start receiving emails for the reports they've scheduled for themselves, and they will also have the ability to create new tasks. Impact: Please note that any existing scheduled jobs with external recipients will no longer be editable. However, users can delete these and create new jobs using their email IDs. Thank you for your patience during these changes. We want to assure you that future updates and communications will be shared promptly to keep you informed. We appreciate your understanding and continued support.

monitoring2023-11-24T08:32:21.868Z

The changes have been shipped to production, and scheduled reports are enabled now.

identified2023-11-23T15:05:44.355Z

We have implemented additional controls and are introducing domain restrictions as well as limitations to the number of recipients for mitigation purposes. Scheduled reports will be enabled for all customers on November 24th, PST. Users will begin receiving emails for the reports they have scheduled for themselves and they will also have the ability to create new tasks. We will share more updates when the changes are fully implemented.

identified2023-11-18T18:53:58.184Z

The root cause is identified and the spam activity is mitigated. The team is working on adding more controls to prevent further spam activities. The scheduled report feature will be kept disabled for a while until the further controls implemented. However, the reporting service is fully available and the reports can be downloaded manually via the reporting page.

identified2023-11-18T15:52:17.199Z

The cause of the issue is identified, and the team is working on the fix.

investigating2023-11-18T11:41:39.338Z

Scheduled report functionality is disabled as we suspect a possible spam activity. Only the reports with a custom schedules are disabled, periodic emails are not impacted. The team is investigating the issue and will provide more update.

Sep 22, 2023

Report: "Atlassian Account login issues"

Last update 2023-09-22T20:04:36.935Z

postmortem2023-09-22T00:47:27.943Z

### **SUMMARY** On Sep 13, 2023, between 12:00 PM UTC and 03: 30 PM UTC, some Atlassian users were unable to sign in to their accounts and use multiple Atlassian cloud products. The event was triggered by a misconfiguration of rate limits in an internal service which caused a cascading failure in sign-in and signup-related APIs. The incident was quickly detected by multiple automated monitoring systems. The incident was mitigated on Sep 13, 2023, 03: 30 PM UTC by the rollback of a feature and additional scaling of services which put Atlassian systems into a known good state. The total time to resolution was about 3 hours & 30 minutes. ‌ ### **IMPACT** The overall impact was between Sep 13, 2023, 12:00 PM UTC and Sep 13, 2023, 03: 30 PM UTC on multiple products. The Incident caused intermittent service disruption across all regions. Some users were unable to sign in for sessions. Other scenarios that temporarily failed were new user signups, profile retrieval, and password reset. During the incident we had a peak of 90% requests failing across authentication, user profile retrieval, and password reset use cases. ‌ ### **ROOT CAUSE** The issue was caused due to a misconfiguration of a rate limit in an internal core service. As a result, some sign-in requests over the limit received HTTP 429 errors. However, retry behavior for requests caused a multiplication of load which led to higher service degradation. As many internal services depend on each other, the call graph complexity led to a longer time to detect the actual faulty service. ‌ ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We are continuously improving our system's resiliency. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Audit and improve service rate limits and client retry and backoff behavior. * Improve scale and load test automation for complex service interactions. * Audit cross-service dependencies and minimize them where possible related to sign-in flows. ‌ Due to the unavailability of sign-in, some customers were unable to create support tickets. We are making additional process improvements to: * Enable our unauthenticated support contact form and notify users that it should be used when standard channels are not available. * Create status page notifications more quickly and ensure that for severe incidents, notifications to all subscribers are enabled. ‌ We apologize to users who were impacted during this incident; we are taking immediate steps to improve the platform’s reliability and availability. Thanks, Atlassian Customer Support

resolved2023-09-13T19:32:02.061Z

Between 12:45 UTC to 15:30 UTC, we experienced login and signup issues for Atlassian Accounts. The issue has been resolved and the service is operating normally. We will publish a post-incident review with the details of the incident and the actions we are taking to prevent similar problem in the future.

monitoring2023-09-13T17:36:16.571Z

We are no longer seeing occurrences of the Atlassian Accounts login errors, all clients should be able to successfully login now. We will continue to monitor.

monitoring2023-09-13T16:30:47.696Z

We can see a reduction in the Atlassian Accounts login issues after the mitigation actions were taken. We are still monitoring closely and will continue to provide updates.

monitoring2023-09-13T15:21:44.326Z

We have identified the root cause of the Atlassian Accounts login issues impacting Cloud Customers and have mitigated the problem. We are now monitoring this closely.

investigating2023-09-13T14:08:51.854Z

We are investigating an issue with Atlassian Accounts login that is impacting some Cloud customers. We will provide more details within the next hour.

Sep 18, 2023

Report: "Multiple product logins"

Last update 2023-09-18T01:00:18.758Z

postmortem2023-09-18T01:00:16.529Z

### **SUMMARY** On August 30, 2023, between 4:07 and 5:30 UTC, some customers were unable to login to Atlassian's Cloud products using [id.atlassian.com](http://id.atlassian.com). Logged-in users were also unable to switch accounts, change passwords, or log out. Users with existing sessions were not impacted. Between 5:32 and 6:00 UTC, traffic was incrementally restored to a previous build, mitigating the impact for users. The total time to resolution was one hour and 53 minutes. ### **IMPACT** Users were not able to login using Atlassian's shared account management system \([id.atlassian.com](http://id.atlassian.com)\). This affected users who were trying to login to the following products: Jira, Confluence, Trello, Opsgenie, mobile apps and ecosystem apps. Aside from the inability to login, there was no impact on other Atlassian products or features. ### **ROOT CAUSE** Multiple Set-Cookie headers were unintentionally modified so that only the last Set-Cookie header remained in the response to user's browsers. The issue was caused by a change to Network Extensions within the Edge Network. As a result, users that needed a new session could not login. Upon login, the users were redirected to login again and no session was created for them. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue was not detected in Atlassian's staging environment. End-to-end tests did not cover the use case of multiple Set-Cookie headers in the single response and therefore this bug went unnoticed. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Automated tests to be put in place to validate that cookies are not being removed from responses. * Configuration of networking extensions will be guaranteed to be identical in staging and production to ensure errors are picked up earlier. Furthermore, we typically deploy our changes progressively by cloud region to avoid broad impact, but in this case, the change was not deemed risky and was deployed to all regions. To minimize the impact of breaking changes to our environments, we will implement additional preventative measures: * Changes to network extensions in the future will use progressive rollouts. * With staging being properly utilized, errors similar to this one will not be deployed to any production environments. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved2023-08-30T06:17:15.168Z

Between 4:30AM UTC to 6:00AM UTC, we experienced issues for users attempting to login for Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Jira Product Discovery, Compass, and Atlassian Analytics. The issue has been resolved and the service is operating normally.

investigating2023-08-30T05:19:00.607Z

We are investigating reports of intermittent errors for login to Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Jira Product Discovery, Compass, and Atlassian Analytics Cloud customers. We will provide more details once we identify the root cause.

Sep 10, 2023

Report: "Opsgenie has experienced delay on Android Notifications"

Last update 2023-09-10T19:17:28.479Z

resolved2023-09-10T19:17:28.464Z

This incident has been resolved.

monitoring2023-09-10T19:15:14.520Z

No delay on Android Notification is experienced now. All Android Notification delay has return to normal.

identified2023-09-10T18:37:42.916Z

We are seeing delays with Android notifications. We have identified the cause and are currently working on mitigation of this issue

Aug 28, 2023

Report: "We observe degraded performance on incident timeline functionality"

Last update 2023-08-28T19:30:09.832Z

resolved2023-08-28T19:30:09.816Z

The issue has been resolved completely and all the functionality is fully working now. The customers experienced some latencies for the entries added to the incident timeline during the incident. However, there is not data loss and all the messages processed successfully.

monitoring2023-08-28T19:10:44.388Z

The rollback completed successfully and we observed the remediation. The functionality is fully working now, and we are closely monitoring the system.

identified2023-08-28T19:03:45.904Z

Some misconfiguration caused the incident. We identified the root cause and reverting the change.

investigating2023-08-28T17:57:51.387Z

We observe degraded performance on incident timeline functionality. We are investigating the issue and we will provide more details within the next hour.

Aug 2, 2023

Report: "Sign-ups, Product Activation, and Billing not working"

Last update 2023-08-02T15:59:20.030Z

resolved2023-08-02T15:59:20.015Z

We mitigated the issue with Sign-ups, Product Activation, and Billing, and the systems are back to BAU, and all functionality is restored.

monitoring2023-08-02T14:25:40.213Z

We have identified the root cause of the Sign-ups, Product Activation, and Billing not working and have mitigated the problem. We are now monitoring closely.

investigating2023-08-02T13:41:13.465Z

We are investigating an issue with Sign-ups, Product Activation, and Billing that is impacting all of our Cloud Customers. We will provide more details within the next hour.

Jul 14, 2023

Report: "Performance issues and outages with Cloud products"

Last update 2023-07-14T05:07:00.318Z

postmortem2023-07-14T05:06:57.815Z

### **SUMMARY** We understand the importance of providing reliable and consistent service to our valued customers. On July 6, 2023, from 03:52 to 15:11 UTC, we experienced an issue with an upgraded version of a third-party tool that functions as our internal artifact management system. Despite our monitoring system identifying the incident within two minutes, this issue led to the degradation of the scaling capabilities of our internal hosting platform, resulting in service degradation or outages for customers of Atlassian cloud. In response to this situation, we are taking immediate measures to enhance the stability of our system and prevent similar issues from re-occurring. ### **IMPACT** This incident affected multiple regions and products due to the diminished scaling capabilities of our internal hosting platform. In most products and offerings, customers faced reduced functionality, slower response times, and limited access to specific features. ### **ROOT CAUSE** The root cause of the incident was the introduction of new functionality in a third-party tool that functions as our internal artifact management system. It led to an unexpected increase in the load on the primary database of the artifact system. Upon identifying and localizing the problem, we promptly adjusted the system configuration to regain stability. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** Over the next months, we will enact a temporary freeze on non-critical upgrades of the artifact management system, and we will focus our efforts on three high-priority initiatives: 1. **Enhancing system scaling:** We prioritized work ensuring that downtime in a critical infrastructure component does not affect the scaling of other components. We expect to complete this initiative within the next two months. 2. **Reducing interdependencies:** We are working to mitigate the risk of potential cascading failures by ensuring that significant system components are able to operate independently in the case of issues. Initiatives 1 and 2 are already in progress but have been given priority to be completed as soon as possible. 3. **Strengthening testing procedures:** Alongside these initiatives, we are addressing the need for even more stringent testing procedures than we already have in place to prevent potential issues in future updates. We are committed to collaborating closely with our technology partners to ensure the most optimal experience for our customers. We apologize for any inconvenience caused by this incident and appreciate your understanding. Our team is dedicated to continually improving our systems and processes to provide you with the exceptional service you deserve. Thank you for your continued support and trust in us. Sincerely, Atlassian Customer Support

resolved2023-07-06T15:37:19.011Z

We experienced performance issues and outages for several Atlassian Cloud Products. The issue has been resolved and the service is operating normally.

monitoring2023-07-06T13:55:08.117Z

We have identified the root cause of an issue with an internal infrastructure component that has been impacting multiple Cloud products - including Jira Software, Jira Service Management and Confluence - and customers. This issue had lead to a performance impact and, in some cases, outages. We have implemented a fix to resolve the issue and recovery is in progress.

identified2023-07-06T11:18:50.967Z

We are investigating an issue with an internal infrastructure component that is impacting multiple Cloud products, including Jira Software, Bitbucket, Jira Service Management and Confluence, and customers. These issues include performance impact and, in some cases, outages. Users may experience slow loading and uploading of attachments, login issues or inability for new customers to sign up. We have identified the root cause and are actively working on the service recovery.

Jul 6, 2023

Report: "Intermittent errors during login for some customers"

Last update 2023-07-06T13:32:52.831Z

resolved2023-07-06T13:32:52.818Z

Between 07:31 UTC to 12:32 UTC, we experienced errors during login for Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Product Discovery, Compass, and Atlassian Analytics. The issue has been resolved and the service is operating normally.

monitoring2023-07-06T12:56:28.179Z

We have identified the root cause of the errors during login and have mitigated the problem. We are now monitoring closely.

identified2023-07-06T11:00:01.240Z

We are investigating reports of errors during login that is impacting some Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Product Discovery, Compass, and Atlassian Analytics. We have identified the root cause and expect recovery shortly.

investigating2023-07-06T09:39:46.542Z

We are investigating reports of errors during login for some customers that is impacting some Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Product Discovery, and Atlassian Analytics Cloud customers. We will provide more details within the next hour.

investigating2023-07-06T09:34:25.188Z

investigating2023-07-06T09:03:45.906Z

We are investigating reports of errors during login for some customers that is impacting some Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Product Discovery, and Atlassian Analytics Cloud customers. We will provide more details within the next hour.

Jun 26, 2023

Report: "Increased delays on all flows in EU region"

Last update 2023-06-26T12:36:55.308Z

resolved2023-06-26T12:36:55.289Z

Our engineers have been closely monitoring the platform and are declaring this incident resolved. Thank you for your patience.

monitoring2023-06-26T12:33:41.902Z

Our engineering team has implemented fixes. We are monitoring all systems. Thank you for your patience.

identified2023-06-26T11:49:51.515Z

Due to extreme load, we are experiencing problem in cache layer. Our team is actively addressing the problem and working on implementing a fix as quickly as possible. We appreciate your patience.

identified2023-06-26T09:44:19.405Z

We detected delays in login flow. We identified the problem and working on fix.

identified2023-06-26T07:42:21.711Z

We are continuing on a fix for the issue.

identified2023-06-26T06:30:26.937Z

The issue has been identified and a fix is being implemented.

investigating2023-06-26T06:30:18.626Z

The team has identified the cause of the problem and actively working on it to mitigate the delays.

investigating2023-06-26T06:25:58.879Z

Our platform is experiencing some delays for all system in EU region. Team is actively working on it to mitigate the delays. We'll keep you posted with further updates.

Jun 6, 2023

Report: "Partial Outage in Schedule API"

Last update 2023-06-06T17:32:27.783Z

resolved2023-06-06T17:24:57.000Z

Our engineering team has fixed the problem and we are closely monitoring the platform. Thank you for your patience.

investigating2023-06-06T17:06:08.314Z

We are investigating an issue with Schedule API that is impacting some Opsgenie US Cloud customers. We will provide more details within the next hour.

Apr 26, 2023

Report: "Delays in Android push notifications"

Last update 2023-04-26T06:43:45.828Z

postmortem2023-04-26T06:41:54.892Z

### **SUMMARY** On April 4, 2023, between 13:32 and 14:50 UTC, Atlassian customers using Opsgenie faced significant delays while receiving Android push notifications. This was caused by an incident in a third party messaging service, which is responsible for Android push notification delivery. This in turn affected our systems. The incident was immediately detected by our monitoring tools, our on-call engineers were paged, and at 14:50 UTC our systems recovered successfully. The total time to resolution was about 80 minutes. ### **IMPACT** The overall impact was between April 4, 2023, 13:32 - 14:50 UTC in Opsgenie_._ The incident only resulted in delays in Android push notifications only, and these notifications were delivered successfully after FCM service was restored and no data loss occurred. ### **ROOT CAUSE** The issue was caused by an incident in a third party messaging service, which is responsible for delivering push notifications to Android devices. ‌ **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. The impact was immediately caught by our monitoring tools, and the responsible team immediately started analysis of incident. We value transparency with our customers and will continue to notify you and take any necessary actions promptly during an incident. In order to handle degradation or outage of messaging channels, Opsgenie recommends that users configure multiple channels of message delivery - including push notifications, mobile SMS, phone calls, and email. In order to improve our response for the future, we will also be analyzing whether we can employ autoscaling solutions for our systems in case of an outage/high load related to one notification channel. We apologize to customers whose services were impacted during this incident. Thanks, Atlassian Customer Support

resolved2023-04-04T18:16:20.620Z

This incident on Firebase has been resolved. Android push notifications are operational.

monitoring2023-04-04T18:13:24.115Z

We are continuing to monitor for any further issues.

monitoring2023-04-04T15:23:15.192Z

Android push notification delivery is fully operational for now, but we are still monitoring the Firebase outage (https://status.firebase.google.com/incidents/9ZPv9faHLen8bzLVSaft).

monitoring2023-04-04T14:34:51.537Z

The issue has been identified as caused by an error on Firebase (https://status.firebase.google.com/incidents/9ZPv9faHLen8bzLVSaft). We continue to monitor the situation and send update within the next hour.

investigating2023-04-04T14:30:34.000Z

We are investigating an issue with our Android push notifications that is impacting some of our notifications for Android. We will provide more details within the next hour.

Apr 13, 2023

Report: "Increased delays on Jira Cloud and Jira Service Management Cloud integrations while creating/updating Opsgenie alerts in US region"

Last update 2023-04-13T07:16:59.401Z

postmortem2023-04-13T07:15:31.990Z

### **SUMMARY** On April 3, 2023, from 1:15 pm UTC to 5:20 pm UTC Atlassian customers using Opsgenie product to integrate with a separate Jira Service Management Cloud instance faced significant delays while creating and updating alerts from Jira Cloud and Jira Service Management Cloud integrations in the US region. The issue was reported by our customers and also detected via internal monitoring tools. The reason for the incident was that one of the Opsgenie integration components could not scale to the high volume of requests from Jira. This caused delays in creating alerts or Jira issues by up to 30 minutes. The incident was mitigated by scaling the integration component, which put Atlassian systems into a known good state. The total time to resolution was about four hours and 30 minutes. ### **IMPACT** The overall impact was on April 3, 2023, from 1:15 pm UTC to 5:20 pm UTC. The Incident caused degradation to customers hosted in the US region only. This caused delays of up to 30 min, in creating Opsgenie alerts from Jira issues for customers who have the Jira to Opsgenie integration enabled. ### **ROOT CAUSE** The issue was caused by the sudden spike in the volume of messages, due to bulk actions. This requires scaling up the instances manually. Our proactive monitoring prevents delays by alerting early enough to allow manual scaling. A misconfiguration in this threshold and escalation policy, in our monitoring system, prevented us from scaling up instances well in time. ‌ **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Improving auto-scaling for integration components to take care of sudden spikes in the volume of incoming messages for creating alerts via integration * Adding additional monitoring mechanisms to raise an alarm when volume thresholds are breached We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the product’s performance and availability. Thanks, Atlassian Customer Support

resolved2023-04-03T17:34:11.456Z

We observed some delays while creating/updating alerts from Jira Cloud and Jira Service Management Cloud integrations in US region. The problem is resolved now.

Apr 6, 2023

Report: "Partial outage in ICC sessions due to network problem"

Last update 2023-04-06T15:07:51.256Z

resolved2023-04-06T15:07:51.236Z

Our engineers have been closely monitoring the platform and are declaring this incident resolved. Thank you for your patience.

identified2023-04-06T14:45:17.073Z

The problem is related to a misconfiguration on our network causing ICC sessions to fail. We are working on a fix and adjusting our network. Incident flow is continue to work without a problem and there is no data loss. We appreciate your patience as our teams continue with our investigations into the service interruption

Mar 4, 2023

Report: "Opsgenie On Call analytics dashboards are not showing data"

Last update 2023-03-04T18:22:25.496Z

resolved2023-03-04T18:22:25.476Z

For the 3 impacted dashboards, we have made data accessible for upto 1 year. For date ranges longer than 1 year, please reach out to the customer support. We will continue to monitor the system.

identified2023-03-03T13:27:23.195Z

The team is still working to increase the data accessibility for the 3 impacted dashboards. We will keep you posted on further updates.

identified2023-03-02T12:19:18.907Z

While we continue to increase the data accessibility for the 3 impacted dashboards, customers can continue to use them for the last 6 months of data. Rest of the reporting dashboards are working as expected. We will keep you posted on further updates.

identified2023-03-01T07:04:31.238Z

While we continue to fix the data, we have observed latency in data shown for a few of the other analytics dashboards. We will continue to work to fix the rest of the data and will update you once we have more information.

identified2023-02-28T15:14:23.663Z

Data is fixed for last six months and impacted dashboards can be generated with last six month of data. We will continue to work on fixing rest of the data and will update once we have more information.

identified2023-02-27T19:37:40.885Z

Job triggered to fix rest of the data is taking more time then expected, we are actively monitoring it and will update once we have more information.

identified2023-02-27T10:39:11.157Z

The issue has been identified and a fix is being implemented.

investigating2023-02-27T10:37:47.393Z

We are continuing to investigate this issue.

investigating2023-02-27T10:28:41.000Z

We have identified the root cause and rolled out a fix for the last month of data. All data generated in the last month should be accessible. We have triggered the fix for the rest of the data and will update once the data is fully accessible.

investigating2023-02-27T08:49:05.533Z

Opsgenie "On Call Reports", "On Call Time Analytics" & "Total On Call Time per User" Analytics dashboards are not showing data. We are working to identify the root cause and we'll keep you posted with further updates.

Jan 17, 2023

Report: ""_incomingData" and "_actionSource" fields are missing in Opsgenie Debug logs"

Last update 2023-01-17T06:18:30.249Z

resolved2023-01-17T06:18:30.235Z

This incident has been resolved.

monitoring2023-01-13T14:59:52.989Z

"_incomingData" and "_actionSource" fields are visible in debug logs generated after January 13, 2023 1:55:39 PM UTC

identified2023-01-13T13:07:19.616Z

We have identified the problem and worked on the solution to fix debug logs now.

investigating2023-01-13T11:52:51.289Z

We have identified that "_incomingData" and "_actionSource" fields are missing in Opsgenie Debug Logs due to internal issues.

Dec 2, 2022

Report: "Observing delays in incoming webhook integration processing in EU region"

Last update 2022-12-02T19:51:20.194Z

resolved2022-12-02T19:51:20.184Z

This incident has been resolved.

monitoring2022-12-02T19:23:06.544Z

A fix has been implemented and we are monitoring the results.

investigating2022-12-02T19:09:08.702Z

We are currently investigating this issue.

Dec 1, 2022

Report: "Opsgenie analytics dashboards was not accessible"

Last update 2022-12-01T08:22:03.647Z

resolved2022-11-29T20:30:00.000Z

Opsgenie analytics dashboards were not available between 2022-11-29 18:15 (UTC) and 2022-11-29 19:30 (UTC). We observed a spike in traffic patterns which caused degradation in one of our services. After detecting the problem, our engineering team worked towards getting the service back up as quickly as possible. The analytics dashboards started working partially from 2022-11-29 19:30 (UTC) and were fully functional by 2022-11-29 20:10 (UTC).

Nov 12, 2022

Report: "Android users of Jira, Confluence and Opsgenie app with Compromised device check feature turned on is getting locked out of their app"

Last update 2022-11-12T00:01:30.380Z

resolved2022-11-12T00:01:30.365Z

Between 2022-11-06 and 2022-11-11, 18:45 EST, we experienced an issue where Android users of Jira, Confluence, and Opsgenie apps with the Compromised device check feature turned on is getting locked out of their apps for Confluence, Jira Software, and Opsgenie. The issue has been resolved and the service is operating normally. If a customer is locked out post-login and there is no retry option, we request that a user either clears the app data or reinstall the app.

identified2022-11-11T21:03:28.740Z

We are investigating an issue where Android users of Jira, Confluence and Opsgenie app with the Compromised device check feature turned on is getting locked out of their app. Note that this is only affecting the Android mobile app for customers who have turned on the Compromised device check feature via admin.atlassian.com

identified2022-11-11T21:02:41.582Z

We are investigating reports of intermittent errors for <SOME/ALL> Confluence, Jira Software, and Opsgenie Cloud customers. We will provide more details once we identify the root cause.

Nov 5, 2022

Report: "Logs are delayed on search and download API in EU region"

Last update 2022-11-05T22:36:23.514Z

resolved2022-11-04T15:28:23.034Z

The incident has been resolved. Logs are available without any delay.

monitoring2022-11-04T15:24:31.231Z

A fix has been implemented and the delay of the logs are decreasing rapidly. We are monitoring the progress. Alert logs and all other functionality is working as expected.

identified2022-11-04T14:29:15.960Z

We have identified the root cause of the problem . Our engineers are working on fixing the problem now.

investigating2022-11-04T14:28:53.349Z

We are continuing to investigate this issue.

investigating2022-11-04T14:23:01.083Z

We are continuing to investigate this issue.

investigating2022-11-04T14:22:44.000Z

Logs are delayed on the log search page and the download API in EU Region. We are currently investigating the issue.

Oct 21, 2022

Report: "Elevated error rate in all Opsgenie services"

Last update 2022-10-21T10:17:16.810Z

resolved2022-10-21T09:00:00.000Z

Between 2022-10-21 09:01 UTC and 2022-10-21 09:12 UTC, in the US region, we started to see elevated error rate in our infrastructure due to faulty deployment. We have deployed a fix to mitigate the issue and have verified that all services have recovered without data loss. Thanks to quick reaction of our engineering team, issue has been resolved and the service is operating normally.

Oct 20, 2022

Report: "Schedule overrides were not editable/deletable"

Last update 2022-10-20T17:58:53.783Z

resolved2022-10-20T14:00:00.000Z

Between 2022-10-20 14:23 (UTC) and 2022-10-20 17:39 (UTC), due to latency in Opsgenie system, we were not allowing any edit or delete for schedule overrides and returning error to some of our users. After detecting the problem, our engineering team worked towards getting Opsgenie service back up as quickly as possible.

Oct 18, 2022

Report: "Opsgenie reporting & Analytics are not accessible"

Last update 2022-10-18T12:07:34.710Z

resolved2022-10-18T12:07:34.697Z

Reporting & analytics is completely operational now.

monitoring2022-10-18T12:07:10.920Z

We are continuing to monitor for any further issues.

monitoring2022-10-18T11:44:07.517Z

Customers were not able to access Opsgenie reporting & Analytics between 9:36 AM UTC & 11:40 AM UTC. We identified the root cause to be one of the recent deployments and we have reverted the change to fix the issue. Currently we are monitoring the current state and validating it with support ticket owners.

identified2022-10-18T10:47:24.768Z

Customers are not able to access Opsgenie reporting & Analytics since 9:36 AM UTC. We have identified the root cause and rolling out the fix.

Sep 21, 2022

Report: "Delays in notification service"

Last update 2022-09-21T08:42:46.267Z

postmortem2022-09-21T08:37:33.567Z

### **SUMMARY** On Sep 14, 2022, between 03:36 PM and 04:26 PM UTC, Atlassian customers using the Opsgenie product received delayed notifications for up to 50 minutes. The event was triggered by a code change that upgrades a common framework. The changes included in this framework update impacted customers in the both US and EU regions. The incident was detected by the on-call developer and mitigated by reverting the latest changes, which put Opsgenie systems into a known good state. The total time to resolution was around 50 minutes. ### **IMPACT** The overall impact was between Sep 14, 2022, 03:36 PM UTC, and Sep 14, 2022, 04:26 PM UTC on Opsgenie products. The incident service disruption was limited to US and EU region customers who did not receive their notifications immediately, but instead experienced notification delays of up to 50 minutes. In total, ~132K notifications in the US region and ~23.6K notifications in the EU region were sent with delays. Only less than %0.6 of the active customers were affected. ### **ROOT CAUSE** The issue was caused by an Atlassian-initiated change to upgrade a common framework. While the majority of the intended changes had been tested successfully, there were some accompanying changes with the framework upgrade that caused the notification service to stop processing new notification requests. Instead, these notifications remained in the queues until the deployment was reverted, resulting in notification delays for customers of up to 50 minutes. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. We are prioritizing the following improvement actions to avoid repeating this type of incident: * We are improving the testing and deployment processes we follow after framework updates. * We are implementing new monitoring to reduce the detection and response time even further. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved2022-09-14T16:53:16.000Z

This incident has been resolved.

monitoring2022-09-14T16:24:34.000Z

A fix has been implemented and we are monitoring the results.

identified2022-09-14T16:08:11.000Z

We have identified the problem and working on it. We are expecting that notification service will return normal state in a short time.

investigating2022-09-14T16:00:34.000Z

We are seeing delays with outbound notifications. We have identified the cause and are currently working on mitigation of this issue.

Sep 7, 2022

Report: "Incident Page cannot be reachable by some of accounts"

Last update 2022-09-07T18:05:35.251Z

resolved2022-09-07T18:05:35.237Z

The Incident Page is now completely operational.

monitoring2022-09-07T17:53:51.461Z

We reverted the recent deployment. We also validated the error response codes has been disappeared immediately after the fix. Currently we are monitoring the current state and validating it with some of support ticket owners.

identified2022-09-07T17:46:07.762Z

At one of a recent deployment, we made a change on handling user permissions at specific component of Incident page. That causes a problem while loading a page. We are working on a fix and expecting to close the incident very soon.

investigating2022-09-07T17:25:01.825Z

We are currently investigating an issue at Incident page. When you try to open Incident page, Opsgenie Web application drops user to login page. We are currently working on investigating route cause.

Sep 2, 2022

Report: "Opsgeine Reporting service down"

Last update 2022-09-02T03:39:39.588Z

resolved2022-09-02T03:37:25.035Z

This incident has been resolved.

monitoring2022-09-02T03:35:10.000Z

We have released a fix for the report display problem. Affected report displays and download should have returned to a functioning state. We are still actively working with engineering team on fixing the root cause of this issue.

monitoring2022-09-02T03:17:43.000Z

investigating2022-09-02T03:17:10.632Z

We are continuing to investigate this issue.

investigating2022-09-02T01:46:53.469Z

We are continue to investigate

investigating2022-09-02T00:11:12.810Z

We are currently investigating it.

Aug 3, 2022

Report: "Errors navigating products, logging in, and logging out"

Last update 2022-08-03T19:57:32.280Z

resolved2022-08-03T19:57:32.270Z

Between 03/Aug/22 15:20 UTC to 03/Aug/22 17:47 UTC, some customers experienced errors using Atlassian products, including errors while logging in or being forcibly logged out. The root cause was a DNS service deployment in US East region, which caused widespread DNS lookup errors for a variety of Atlassian services including authentication services. We have rolled back the change to mitigate the issue and have verified that the authentication services have recovered. The issue has been resolved and Atlassian services are operating normally.

Aug 2, 2022

Report: "Intermittent errors across multiple products in eu-central"

Last update 2022-08-02T22:59:31.096Z

postmortem2022-08-02T22:59:27.880Z

### **SUMMARY** On July 19, 2022, between 05:40 and 07:10 UTC, Atlassian customers in the EU region using Jira, Confluence and Opsgenie experienced problems loading pages through the web UI. The incident was automatically detected at 05.14 by one of Atlassian’s automated monitoring systems. The main disruption was resolved within 16 minutes with the full recovery taking additional 74 minutes. ### **IMPACT** Between July 19, 2022, 05:40 UTC and July 19, 2022, 07:10 UTC Jira, Confluence and OpsGenie users saw some web pages fail to load. During the 16 minute period from 06:40 UTC to 6:56 UTC, customers were unable to access Jira Confluence and OpsGenie web UI because the Atlassian Proxy \(the ingress point for service requests\) was unable to service most requests. ### **ROOT CAUSE** The issue was caused by an AWS initiated change that impacted Elastic Block Store \(EBS\) volume performance to such an extent that new instance creation and therefore auto scaling, was blocked. As a result, the products above, as well as essential internal Atlassian services could not auto scale to the increasing incoming service requests as the EU region came online. Once the AWS change had been rolled back, most Atlassian services recovered. Some internal services required manual scaling as a result of unhealthy nodes preventing scaling initiation, which prolonged complete recovery. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and we apologize to customers whose services were impacted during this incident. We see two main avenues to increase our resiliency during an incident where AWS auto scaling is blocked: * Implement step scaling: Simple scaling in most cases works well. In this case due to nodes becoming unhealthy, simple scaling stops responding to scaling alarms and therefore the service can become “stuck” and will not recover once scaling is possible again. We are exploring the use of step scaling, as this will allow scaling even in the case of instances becoming unhealthy. * Implement improved alarming to identify “stuck” scaling to increase the TTR when scaling is available again. We are taking these immediate steps to improve the platform’s resiliency. Thanks, Atlassian

resolved2022-07-19T08:52:04.717Z

Between 07:00 UTC to 07:45 UTC, we experienced degraded functionality for some features in Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, and Atlassian Developer. The issue has been resolved and the service is operating normally.

monitoring2022-07-19T08:43:04.815Z

Multiple Atlassian Cloud products and addons were unavailable to customers in some EU regions. The issue has been resolved and we are monitoring for further impact.

Jul 26, 2022

Report: "Opsgenie Analytics is slow or unavailable"

Last update 2022-07-26T02:21:46.113Z

resolved2022-07-26T02:21:46.096Z

The incident has been resolved

identified2022-07-26T01:39:40.841Z

We have identified the issue and working on resolving it

investigating2022-07-26T01:03:23.761Z

We've noticed that Reporting and Analytics is responding slowly. The issue is related to emailing of reports and csv downloads. Our web interface for reporting continues to work as expected. Our engineering team is actively investigating this incident and working to bring Opsgenie back up to speed as quickly as possible. We'll keep you posted with further updates on this page.

Jul 19, 2022

Report: "Web and Mobile Application are slow or unavailable"

Last update 2022-07-19T08:27:33.061Z

resolved2022-07-19T08:27:33.045Z

The Problem has been resolved and the services are operating normally! Opsgenie has faced partial outages due to a minor update by the cloud provider and the team has worked with the cloud provider team to solve the incident in time. Only 15% of total requests and 4.1% of customers are affected by the incident. We will take the necessary actions to prevent facing a similar incident.

monitoring2022-07-19T07:30:12.922Z

The Fix has been deployed and rapid recovery is seen. We are monitoring the system for a full recovery.

identified2022-07-19T06:38:20.120Z

Our team has identified that Web and Mobile Application are responding slowly or unavailable for only Frankfurt.

identified2022-07-19T06:31:33.437Z

We are continuing to work on a fix for this issue.

identified2022-07-19T06:30:44.379Z

Our team has identified the issue and are working on a fix. Next update in 1 hour or with a resolution of the incident

investigating2022-07-19T06:01:01.000Z

We've noticed that the Web and Mobile Application are responding slowly or unavailable for Frankfurt and North Virginia regions. This is not affecting our APIs. Our engineering team is actively investigating this incident and working to bring Opsgenie back up to speed as quickly as possible. We'll keep you posted with further updates on this page.