Trello

Is Trello Down Right Now? Check if there is a current outage ongoing.

Trello is currently Operational

Last checked from Trello's official status page

Historical record of incidents for Trello

Report: "Customers may experience delays or failures receiving emails"

Last update
investigating

We were experiencing cases of degraded performance for outgoing emails from Confluence, Jira Work Management, Jira Service Management, Jira, Opsgenie, Trello, Atlassian Bitbucket, Guard, Jira Align, Jira Product Discovery, Atlas, Compass, and Loom Cloud customers. The system is recovering and mail is being processed normally as of 16:45 UTC. We will continue to monitor system performance and will provide more details within the next hour.

Report: "The search bar functionality in Trello is not working properly"

Last update
resolved

On June 2nd, Trello's search bar functionality was not working correctly. The issue has now been resolved, and the service operates normally for all affected customers.

monitoring

The issues causing Trello’s search bar functionality to malfunction have been resolved, and services are now operating normally for all affected customers. We will monitor it closely to ensure stability.

identified

We are currently experiencing an issue with Trello, as the search bar functionality is not working properly. We have identified the issue and are in the process of mitigating the impact. We'll keep you posted with further updates.

investigating

We are currently experiencing an issue with Trello, as the search bar functionality is not working properly. Our team is diligently working to restore services as quickly as possible. We will keep you updated with further developments.

Report: "The search bar functionality in Trello is not working properly"

Last update
Investigating

We are currently experiencing an issue with Trello, as the search bar functionality is not working properly. Our team is diligently working to restore services as quickly as possible. We will keep you updated with further developments.

Report: "Trello was temporarily inaccessible"

Last update
postmortem

### **SUMMARY** On May 15, 2025, between 13:55 and 14:18 UTC, Atlassian customers using the Trello product experienced errors or slow loading times when attempting to view their cards and boards. The event was triggered by a database plan cache expiring and high resource usage caused by subsequent database query planning operations. The particular database shard that was impacted held data that was required for every card load. The incident was detected within two minutes by the automated monitoring system and mitigated by increasing resources available to the affected database shard, which put Atlassian systems into a known good state. The total time to resolution was about 23 minutes. ### **IMPACT** The overall impact was between May 15, 2025, 13:55 and May 15, 2025, 14:18 UTC on the Trello product. The incident caused service disruption for all Trello customers. ### **ROOT CAUSE** The issue was caused by a query plan expiring from the database cache, which caused incoming queries to go through a replanning operation. These queries had multiple plans that could satisfy them, and depending on the size of the query, one plan might be significantly more efficient than another. This caused the query planner to perform a great many more replanning operations than usual, which consumed all of the CPU on the server for a brief moment. Once the CPU was consumed, the planning operations themselves began taking too long and therefore required constant replanning in an effort to find more efficient options. This negative feedback loop could not be broken without intervention. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue wasn’t identified because it would only occur under very distinct conditions, including the amount of load and the order of database queries. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Review our capacity planning thresholds and ensure that all shards have sufficient overhead to handle unexpected load. * Improve query planner performance by: * Implement hinting for known problematic query shapes to circumvent the query planner. * Investigate long-term generalized solutions to prevent query planner thrashing. Furthermore, we are prioritizing the following additional measures to reduce the impact of any future incidents: * Analyze and reduce single points of failure for loading Trello boards and cards. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved

A fix has been implemented, and the issue is now resolved.

identified

We are aware that Trello was temporarily inaccessible for all customers. The system has already been recovered, and we are monitoring the situation.

Report: "Trello is slow or unavailable"

Last update
postmortem

### **SUMMARY** On April 25, 2025, between 18:18 and 18:33 UTC, Atlassian customers using Trello may have experienced service interruptions. The event was triggered by temporarily reduced capacity following a rollback deployment, with insufficient nodes to handle the load. Automated monitoring systems detected the incident within one minute and mitigated it by scaling up the deployment, which put Atlassian systems into a known-good state. The total time to resolution was about 15 minutes. ### **IMPACT** The overall impact was on April 25, 2025, between 18:18 and 18:33 UTC, on Trello. The incident caused service disruption to Trello users, resulting in reduced functionality, slower response times, and errors when performing key actions such as loading boards and cards. ### **ROOT CAUSE** The root cause of the incident was a failure to scale our nodes to optimal capacity caused by a release rollback. If an issue is found during a deployment, we can roll back to a previous release. In this case, a rollback was executed to a previous release that had already undergone a scaling-down process. As that rollback happened, more compute nodes needed to be available to handle the high traffic. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and strive to avoid incidents like these. We are prioritizing the following efforts as next steps: * Improvements to rollback tooling including UX upgrades and pre-scaling * Conduct an updated incident response training focused on rollback tooling and best practices We apologize to customers whose services were impacted during this incident; we are taking steps designed to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved

Between 18:18 UTC to 18:33 UTC, we experienced an outage for Trello. The issue has been resolved and the service is operating normally. Our teams are investigating and will publish the root cause as soon as available.

monitoring

Trello was slow or unavailable. Our engineering team is actively investigating this incident to determine root cause. Users affected by this incident may have noticeed that Trello was slow or completely unavailable in both the web and mobile apps. Trello operations have recovered. We will update this page as we have additional information.

Report: "Trello is slow or unavailable for some users"

Last update
postmortem

### **SUMMARY** On May 5, 2025, between 2:08 p.m. and 4:29 p.m. UTC, some Atlassian customers using Trello were unable to view their boards or cards. The event was triggered by an unexpected error encountered by our infrastructure management tools, which resulted in an incorrect DNS configuration being deployed to a portion of our database. The incident was detected within four minutes by automated monitoring systems and mitigated by identifying the faulty portion of the database and performing a failover, which put Atlassian systems into a known good state. The total time to resolution was about two hours and 21 minutes. ### **IMPACT** The overall impact was on the Trello product on May 5, 2025, between 2:08 p.m. and 4:29 p.m. UTC. The incident caused service disruption to Trello customers whose accounts and boards contained or referenced data on the affected shard of our database. Additionally, some Trello customers would have experienced a service disruption due to our use of load-shedding tools during the incident to strategically block portions of our traffic to aid in recovery. ### **ROOT CAUSE** The day before the incident, on May 4, our infrastructure management tooling encountered an unexpected error when attempting to fetch the networking metadata on a particular host. This led to the host, which was a member of our database cluster, to incorrectly apply the default Operating System DNS configuration. This DNS configuration was not able to resolve internal domains, which led to a partial failure state of the node. The database continued to function normally and there was no immediate customer impact but in the background this incorrect DNS configuration led to the slow buildup of database sessions. These database sessions are usually short-lived and automatically expire when no longer needed, but the DNS misconfiguration prevented this automatic expiration. The database sessions eventually grew to the default maximum on this particular shard. At that point, the shard was unable to generate new sessions, which are required for all basic operations, and the Trello product began experiencing elevated error rates. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue wasn’t identified due to the isolated nature of the database session resource and monitoring gaps around this resource and around DNS resolution. We are prioritizing the following improvement actions designed to avoid repeating this type of incident: * Update our infrastructure management tool to use a safe fall-back DNS configuration in the case of unexpected errors. * Expand existing DNS monitoring to include the resolution of internal domains. * Expand existing database session count monitoring to include all database node types. Furthermore, we are prioritizing the following additional measures to reduce the duration of any future incidents: * Evaluate our incident response process to identify actions that can be streamlined for quicker resolution. We apologize to customers whose services were impacted during this incident; we are taking steps designed to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved

On May 5th, 2025 we identified a degradation for Trello. Trello is now back online and no further impact has been observed.

monitoring

We have identified and mitigated the issue causing Trello to be slow or unavailable for some users. We expect API traffic to return to normal within the next 30 minutes. We are now monitoring closely. We will update within the next 30 minutes.

investigating

We've noticed that Trello is slow or unavailable for some users. This will be present in both the web and mobile apps. Our engineering team is actively investigating this incident and working to bring Trello back up to speed as quickly as possible. We'll keep you posted with further updates on this page.

Report: "Trello was temporarily inaccessible"

Last update
Identified

We are aware that Trello was temporarily inaccessible for all customers. The system has already been recovered, and we are monitoring the situation.

Report: "Trello is slow or unavailable for some users"

Last update
Investigating

We've noticed that Trello is slow or unavailable for some users. This will be present in both the web and mobile apps.Our engineering team is actively investigating this incident and working to bring Trello back up to speed as quickly as possible. We'll keep you posted with further updates on this page.

Report: "Trello is slow or unavailable"

Last update
Monitoring

Trello was slow or unavailable.Our engineering team is actively investigating this incident to determine root cause. Users affected by this incident may have noticeed that Trello was slow or completely unavailable in both the web and mobile apps. Trello operations have recovered.We will update this page as we have additional information.

Report: "Degraded performance in Trello"

Last update
resolved

Following a period of monitoring, we've confirmed that the issue with degraded performance in Trello has been resolved. For any further issues please contact our Support team.

monitoring

Trello is back up and running and we're not seeing any more performance issues but we'll continue to monitor this. If you're still experiencing any performance issues, please reach out to us through our support channels for further assistance

investigating

We're still looking into issues with degraded performance in Trello and our team are working to get Trello back up and running as quickly as possible. We'll post more updates on this shortly.

investigating

We're continuing to investigate issues causing degraded performance in Trello for some Trello customers. We'll post more updates on this shortly.

monitoring

Our team have implemented a fix and we're seeing performance improve, we'll continue to monitor this to make sure everything is working as normal.

investigating

We're currently investigating an issue that's causing degraded performance in Trello for some customers and our team are working to get Trello back up and running as quickly as possible.

Report: "Trello Realtime Updates are degraded"

Last update
resolved

We've identified this issue and released a fix so realtime updates are back up and running in Trello. We appreciate your patience and understanding while we worked on this.

investigating

We're currently investigating an issue causing degraded performance for Trello's realtime updates for some customers. Our team are actively working to bring Trello back up to speed as quickly as possible.

Report: "Trello is slow"

Last update
resolved

The changes we applied to our infrastructure have resulted in very good results on our side, and we’re no longer noticing issues that would result in constant disconnections or degraded services for Trello. We appreciate your understanding while we worked on this.

identified

We’ve already identified the issue and are working on a possible fix in our infrastructure for all connection problems some users have been experiencing. We’ll share a new update soon!

investigating

We're continuing to investigate this issue with connections to Trello being dropped. Our engineering are working to bring Trello back up to speed as quickly as possible.

investigating

We are investigating an issue related to connections being dropped. Some Trello users are experiencing slower updates and disconnection messages. Our engineering team is actively working to bring Trello back up to speed as quickly as possible.

Report: "Trello Realtime Updates are degraded"

Last update
resolved

Between <15:34 UTC> to <19:12 UTC>, we experienced realtime update functionality degraded performance for Trello. We have mitigated the problem and are investigating the root cause. The issue has been resolved and the service is operating normally.

monitoring

Between <15:34 UTC> to <19:12 UTC>, we experienced realtime update functionality degraded performance for Trello. We have mitigated the problem and are investigating the root cause. We are now monitoring closely.

investigating

We are continuing our investigation on realtime updates that is impacting all Trello Cloud customers. We will provide more details within the next hour.

investigating

We are still investigating realtime update errors that is impacting most Trello Cloud customers. We will provide more details within the next hour.

investigating

We are investigating cases of degraded performance for SOME Trello Cloud customers. We will provide more details within the next hour.

Report: "The Trello Desktop app is re-directing users to the home page"

Last update
resolved

Following a period of monitoring, we have confirmed that the issue affecting the MacOS and Windows Desktop Apps is resolved. Please update to the latest version for the latest fix. For any further issues please contact our Support team. Thank you for your patience.

monitoring

Our engineering team has successfully released a fix for both the MacOS and Windows Desktop Apps. To benefit from this fix, users are required to update to the latest version of the app. If you continue to experience any issues, particularly with redirection, please reach out to us through our Support channels for further assistance. Thank you for your patience and understanding as we worked to resolve this issue.

identified

We're still working on a fix for the Trello Desktop. We'll provide a new update soon.

identified

We're still working on a fix for the Trello Desktop. We'll provide a new update soon.

identified

We've identified an issue that's causing users to be re-directed to the Home page when using the Trello Desktop app. We'll provide a new update soon.

Report: "Users aren't able to create new repeats or run existing ones when using the Card Repeater Power-Up"

Last update
resolved

A fix was released for the Card Repeater Power-up, and the repeats and new ones are working as intended.

identified

Our recent change hasn’t fixed the existing issue with the Card Repeater Power-Up. We’ll keep investigating the problem, and a new update will be provided soon.

monitoring

A fix has been implemented and we're currently monitoring it.

identified

We've identified the issue that's causing issues with the repeats, and we're working now on a fix.

investigating

We’re currently investigating reports that users can’t create new repeats while using the Card Repeater and that existing repeats aren’t being executed.

Report: "Boards are demonstrating slow performance for Workspace Guests"

Last update
resolved

On October 15th, we identified that some boards were performing slowly for Workspace Guests. A fix has been released to all affected users, and the slow performance issue is fixed. No further impact has been observed.

monitoring

On October 15th, we identified that some boards were performing slowly for Workspace Guests. We have taken action to mitigate this issue and we will continue to monitor for the next 30 minutes before marking the incident as resolved.

identified

We've identified the issue that caused the issue for slow performance for boards when loading them and opening cards for Workspace Guests, and we're now working on a fix for the problem and will provide a new update here soon.

investigating

We’re continuing to investigating issues with boards that are experiencing slow performance when loading them and opening cards for Workspace Guests, and we will provide updates here soon

investigating

We are continuing to investigate issues with some large boards that are experiencing poor performance when loading them and opening cards, and will provide updates as we learn more.

investigating

We are continuing to investigate issues with some large boards that are experiencing poor performance when loading them and opening cards, and will provide updates as we learn more.

investigating

We are continuing to investigate issues with some large boards that are experiencing poor performance when loading them and opening cards, and will provide updates as we learn more.

investigating

We are investigating issues with some large boards that are experiencing poor performance when loading them and opening cards, and we will provide updates here soon.

Report: "Email to Board feature isn't working as expected for some customers"

Last update
resolved

On September 25th, we identified a temporary outage with the Email-to-Board feature. The affected feature is now back online, and no further impact has been observed.

monitoring

On Sept 25 we identified that our email-to-board feature was failing for some users. We have taken action to mitigate this issue and we will continue to monitor for the next 30 minutes before marking the incident as resolved.

identified

We've identified the issue that caused the issues with the Email-to-Board feature not working correctly, and we're now working on a fix for the problem and will provide a new update here soon.

investigating

We are continuing to investigate issues with Email-to-Board feature not working for some customers and will provide updates as we learn more.

investigating

We are investigating issues with Email-to-Board feature not working for some customers and will provide updates here soon.

Report: "Users are experiencing reCaptcha errors while signing up"

Last update
resolved

This issue has been resolved.

monitoring

We have identified the root cause and the issue appears to be resolved.

investigating

Users attempting to sign up are encountering reCaptcha errors that are preventing a successful signup.

Report: "Issue with search and moving cards for some users"

Last update
resolved

We previously identified the issue with moving cards and searching for cards in Trello. The issue has been resolved and Trello is operating normally.

identified

We’ve identified the issue causing problems with moving and searching for cards on Trello. We’re currently working on implementing a fix, and a new update will be shared soon.

investigating

We are investigating reports of intermittent errors for some Trello customers when moving cards between boards or searching for cards. We will provide more details once we identify the root cause

investigating

We are investigating reports of intermittent errors for some Trello customers when moving cards between boards or searching for cards. We will provide more details once we identify the root cause.

Report: "Some users may experience delays in receiving email notifications"

Last update
resolved

Between 12:00am 9th July to 08:00am 10th July, we experienced email deliverability issues for some recipient domains for Confluence, Jira Work Management, Jira Service Management, Jira, Trello, Atlassian Bitbucket, and Jira Product Discovery. The issue has been resolved and future emails will flow normally.

identified

We continue to work on resolving the Email Notifications for Confluence, Jira Work Management, Jira Service Management, Jira, Trello, Atlassian Bitbucket, and Jira Product Discovery. We have identified the root cause.

investigating

We are investigating reports of intermittent errors whilst sending Email Notifications for Confluence, Jira Work Management, Jira Service Management, Jira, Trello, and Jira Product Discovery Cloud customers. We will provide more details once we identify the root cause.

Report: "Some products are hard down"

Last update
resolved

Between 03-07-2024 20:08 UTC to 03-07-2024 20:31 UTC, we experienced downtime for Trello. The issue has been resolved and the service is operating normally.

monitoring

We have mitigated the problem and continue looking into the root cause. The outage was between 8:08pm 03/07 UTC - 08:31pm 03/07 UTC We are now monitoring closely.

investigating

We are investigating an issue with <FUNCTIONALITY IMPACTED> that is impacting <SOME/ALL> Atlassian, Atlassian Partners, Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira, Opsgenie, Atlassian Developer, Atlassian (deprecated), Trello, Atlassian Bitbucket, Guard, Jira Align, Jira Product Discovery, Atlas, Atlassian Analytics, and Rovo Cloud customers. We will provide more details within the next hour.

Report: "Issues with the Jira powerup"

Last update
resolved

The issues with the Jira powerup have been resolved!

identified

Users of the Jira powerup for Trello may be experiencing errors when installing and using the powerup on their boards. The team is working to resolve this issue as quickly as possible.

Report: "Intermittent error accessing content"

Last update
resolved

Between 2024-06-20 22:04 UTC to 2024-06-20 22:28 UTC, we experienced intermittent issue for users to access the services for some Atlassian Cloud customers. The issue has been resolved and the service is operating normally.

monitoring

We have identified the root cause of the intermittent errors and have mitigated the problem. We are now monitoring closely.

investigating

We are investigating an intermittent issue with accessing Atlassian Cloud services that is impacting some Atlassian Cloud customers. We will provide more details once we identify the root cause.

Report: "Error responses across multiple Cloud products"

Last update
postmortem

### Summary On June 3rd, between 09:43pm and 10:58 pm UTC, Atlassian customers using multiple product\(s\) were unable to access their services. The event was triggered by a change to the infrastructure API Gateway, which is responsible for routing the traffic to the correct application backends. The incident was detected by the automated monitoring system within five minutes and mitigated by correcting a faulty release feature flag, which put Atlassian systems into a known good state. The first communications were published on the Statuspage at 11:11pm UTC. The total time to resolution was about 75 minutes. ### **IMPACT** The overall impact was between 09:43pm and 10:17pm UTC, with the system initially in a degraded state, followed by a total outage between 10:17pm and 10:58pm UTC. _The Incident caused service disruption to customers in all regions and affected the following products:_ * Jira Software * Jira Service Management * Jira Work Management * Jira Product Discovery * Jira Align * Confluence * Trello * Bitbucket * Opsgenie * Compass ### **ROOT CAUSE** A policy used in the infrastructure API gateway was being updated in production via a feature flag. The combination of an erroneous value entered in a feature flag, and a bug in the code resulted in the API Gateway not processing any traffic. This created a total outage, where all users started receiving 5XX errors for most Atlassian products. Once the problem was identified and the feature flag updated to the correct values, all services started seeing recovery immediately. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. While we have several testing and preventative processes in place, this specific issue wasn’t identified because the change did not go through our regular release process and instead was incorrectly applied through a feature flag. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Prevent high-risk feature flags from being used in production * Improve the policy changes testing * Enforcing longer soak time for policy changes * Any feature flags should go through progressive rollouts to minimize broad impact * Review the infrastructure feature flags to ensure they all have appropriate defaults * Improve our processes and internal tooling to provide faster communications to our customers We apologize to customers whose services were affected by this incident and are taking immediate steps to address the above gaps. Thanks, Atlassian Customer Support

resolved

Between 22:18 UTC to 22:56 UTC, we experienced errors for multiple Cloud products. The issue has been resolved and the service is operating normally.

identified

We are investigating an issue with error responses for some Cloud customers across multiple products. We have identified the root cause and expect recovery shortly.

Report: "Card Repeater Power-Up failing to repeat"

Last update
resolved

Our engineering team has resolved this issue, and the Card Repeater Power-Up is working again. Cards that missed their scheduled repeat will run at their next scheduled time.

investigating

Our engineering team identified an incident affecting the Card Repeater Power-Up service which caused an issue while repeating cards and we're currently investigating.

Report: "Managing workspace members is failing for some workspace administrators"

Last update
resolved

A fix has been deployed. Thank you for your patience.

identified

We have identified the root cause and are working towards releasing a fix.

investigating

We are currently investigating this issue.

Report: "Admin Portal Feature Access Issue"

Last update
resolved

Between 6:30 AM UTC to 9:50 AM UTC, we experienced failures in accessing some features from the Admin Portal. The issue has been resolved and the service is operating normally.

identified

We are investigating an issue causing failures in accessing some features from the Admin Portal, which is impacting some of our Cloud customers. We have identified the root cause and anticipate recovery shortly.

Report: "Some paginated queries in Forge hosted storage kept repeating the last page"

Last update
resolved

We have identified and resolved a problem with Forge hosted storage, where some paginated queries kept repeating the last page. The incident was detected by our internal monitoring and was resolved quickly after detection by reverting the deployment. Activating changes recently made to the query cursors for paginated queries introduced a bug that impacted some apps. A small number of requests were impacted over a 16-minute window, while the incident lasted. Timeline: - 25/Mar/24 10:08 p.m. UTC - Impact started, when the changes were deployed to production - 25/Mar/24 10:09 p.m. UTC - Incident was detected - 25/Mar/24 10:24 p.m. UTC - Incident was resolved and impact ended The impact of this incident has been completely mitigated and our monitoring tools confirm that query operations are back to the pre-incident behaviour. We have also resolved the underlying bug and deployed the fix to production, completely eliminating the cause of this incident. We apologise for any inconvenience this may have caused to our customers and our developer community.

Report: "Investigating new product purchasing"

Last update
resolved

Between 28th Feb 2024 23:15 UTC to 29th Feb 2024 00:05 UTC, we experienced issue with new product purchasing for all products. All new sign up products have been successfully provision and confirmed issue has been resolved and the service is operating normally.

investigating

We are investigating an issue with new product purchasing that is impacting for all products. Customers adding new cloud products may have experienced a long waiting page or an error page after attempting to add a product. We have mitigated the root cause and are working to resolve impact for customers who attempted to add a product during the impact period. We will provide more details within the next hour.

Report: "Service Disruptions Affecting Atlassian Products"

Last update
postmortem

### **Summary** On February 14, 2024, between 20:05 UTC and 23:03 UTC, Atlassian customers on the following cloud products encountered a service disruption: Access, Atlas, Atlassian Analytics, Bitbucket, Compass, Confluence, Ecosystem apps, Jira Service Management, Jira Software, Jira Work Management, Jira Product Discovery, Opsgenie, StatusPage, and Trello. As part of a security and compliance uplift, we had scheduled the deletion of unused and legacy domain names used for internal service-to-service connections. Active domain names were incorrectly deleted during this event. This impacted all cloud customers across all regions. The issue was identified and resolved through the rollback of the faulty deployment to restore the domain names and Atlassian systems to a stable state. The time to resolution was two hours and 58 minutes. ### **IMPACT** External customers started reporting issues with Atlassian cloud products at 20:52 UTC. The impact of the failed change led to performance degradation or in some cases, complete service disruption. Symptoms experienced by end-users were unsuccessful page loads and/or failed interactions with our cloud products. ### **ROOT CAUSE** As part of a security and compliance uplift, we had scheduled the deletion of unused and legacy domain names that were being used for internal service-to-service connections. Active domain names were incorrectly deleted during this operation. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. The detection was delayed because existing testing & monitoring focused on service health rather than the entire system’s availability. To prevent a recurrence of this type of incident, we are implementing the following improvement measures: * Canary checks to monitor the entire system availability. * Faster rollback procedures for this type of service impact. * Stricter change control procedures for infrastructure modifications. * Migration of all DNS records to centralised management and stricter access controls on modification to DNS records. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. ‌ Thanks, Atlassian Customer Support

resolved

We experienced increased errors on Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Align, Jira Product Discovery, Atlas, Compass, and Atlassian Analytics. The issue has been resolved and the services are operating normally.

monitoring

We have identified the root cause of the Service Disruptions affecting all Atlassian products and have mitigated the problem. We are now monitoring this closely.

identified

We have identified the root cause of the increased errors and have mitigated the problem. We continue to work on resolving the issue and monitoring this closely.

investigating

We are investigating reports of intermittent errors for all Cloud Customers across all Atlassian products. We will provide more details once we identify the root cause.

Report: "Trello is slow or unavailable"

Last update
postmortem

### Summary On Feb. 13, 2024, between 8:00 AM and 11:34 AM UTC, Trello experienced severely degraded performance appearing as a full or partial outage to Atlassian customers. The event was triggered by a buildup of long-running queries against our database, leading to slowed API response times and causing Trello to be degraded or unavailable for users. The root cause of the incident was identified as a compression change in our database deployed approximately 11 hours earlier during a low-traffic period. As European customers came online, traffic started increasing, resulting in a buildup of queries and the subsequent incident. The incident was detected by our monitoring system at 8:07 AM UTC and was mitigated by reverting the compression change and restarting components of our database system. The total time to resolution was 3 hours and 34 minutes. ### **IMPACT** The overall impact was between 8:00 AM and 11:34 AM UTC on Feb. 13, 2024. The incident caused Trello to be fully or partially unavailable for customers using or attempting to access the site during this period. ### **ROOT CAUSE** The issue was caused by a compression change in our database, which resulted in the build up of queries in the system. This build up then caused API response times to increase to critical levels. During the incident many users received HTTP 429 errors as the system began rate-limiting in an attempt to recover. Users that did not receive errors experienced API response times 10-100x slower than our standard response times. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. We are prioritizing the following actions to avoid repeating this incident and reduce time to resolution: * Improve our process for releasing incremental configuration changes which would have allowed the team to identify the root cause before a peak load period and prevent similar incidents. * Adjust priority level of alerts related to this class of incident to improve signal to noise ratio and drive faster time to resolution. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve Trello’s performance and availability. Thanks, Atlassian Customer Support

resolved

Trello is now available. Thank you for your patience.

monitoring

We've identified an issue that was causing slowness in Trello and put a fix in place. We're seeing things improving for all our customers and we'll keep monitoring this latest fix for now. Once again, we appreciate your understanding!

investigating

Our team is still looking into the issue causing Trello to be slow or unavailable. Thank you for your understanding, we're working to have Trello back up and running as quickly as possible.

investigating

We're still investigating the root cause of this problem, and a new update will be shared soon! Thanks for your patience!

investigating

Our team is still investigating the issue that's causing Trello to be slow or unavailable, and working to bring Trello back up to speed as quickly as possible. Thanks for your patience and understanding!

investigating

We've noticed that Trello is responding slowly. This will be present in both the web and mobile apps. Our engineering team is actively investigating this incident and working to bring Trello back up to speed as quickly as possible. We'll keep you posted with further updates on this page.

Report: "Atlassian Intelligence functionally completely down"

Last update
resolved

Between 13:40 UTC to 15:30 UTC, we experienced an outage in all Atlassian Intelligence features for Confluence, Jira Service Management, Jira Software, Trello, Atlassian Bitbucket, and Atlas. The issue has been resolved and the service is operating normally.

identified

We continue to work on resolving the outage affecting Atlassian Intelligence related features for Confluence, Jira Service Management, Jira Software, Trello, Atlassian Bitbucket, and Atlas. We have identified the root cause and expect recovery shortly.

Report: "Trello is unavailable"

Last update
postmortem

### **SUMMARY** On Nov 30 2023, between 14:04 and 16:57 UTC, Atlassian customers using Trello experienced errors when accessing and interacting with the application. This incident impacted Trello users on the iOS and Android mobile apps as well as those using the Trello web app. The event was triggered by the release of a code change that eventually overloaded a critical part of the Trello database. The incident was detected immediately by our automated monitoring systems and was mitigated by disabling the relevant code change. The issue was extended by the failure of a secondary service whose recovery caused an increase in load on the same critical part of the Trello database, which created a negative feedback loop. This secondary service recovery involved reestablishing over a million connections, with each connection attempt adding load to the same part of the Trello database. We attempted to aid the service recovery by intentionally blocking some of the inbound Trello traffic to reduce load on the database and by increasing the capacity of the Trello database to better handle the high load. Over time the connections were all successfully reestablished, which returned Trello to a known good state. The total time to resolution was just under 3 hours. ### **IMPACT** The overall impact was between Nov 30 2023, 14:04 UTC and Nov 30 2023, 16:57 UTC on the Trello product. The incident caused service disruption to all Trello customers. Our metrics show there were elevated API response times and increased error rates through the entire incident period, which indicates that most users were unable to load Trello at all or easily interact with the application in any way. The particular database collection that was overloaded was one that is necessary for the Trello service to make authorization decisions, which meant that all requests were impacted. ### **ROOT CAUSE** The issue was caused by a series of changes intended to standardize Trello’s approach to authorizing requests, but had the unintended side effect of modifying a database query from a [targeted operation](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-targeted) to a [broadcast operation](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-broadcast). Broadcast operations are more resource-intensive as they must be sent to _all_ database servers to be satisfied. These broadcast operations eventually overloaded some of the Trello database servers as Trello approached its daily peak usage period on Nov 30 2023. 1. The first change of this type was deployed over a period of seven days at the end of August and changed the authorization type used by our websocket service. This meant that newly established websocket connections required this new broadcast query. At any given moment, we have a great deal of _established_ websocket connections, but the usual rate of _new_ websocket connections is relatively low. Therefore, our monitoring systems only detected a slight increase in resource usage and flagged this change as a low priority performance regression. We acknowledged the regression and created a task to identify and reduce the resource demands of these new queries. 2. The second change of this type was deployed over the course of a few days before being fully rolled out on Nov 29, 2023, the day before this incident. This change caused the Trello application server to use the new broadcast query while authorizing standard web browser traffic, which is the vast majority of our traffic. The change was fully deployed at 19:34 UTC on Nov 29, which was during a low traffic period. The next day, as the application approached its daily peak traffic period, our monitoring on the database servers indicated they were overloaded. When these database nodes were overloaded, users' HTTP requests received very slow responses or HTTP 504 errors. As we activated our load shedding strategies, some users received HTTP 429 errors. The incident’s length can be attributed to a secondary failure where our websocket servers experienced a rapid increase in memory leading to processes crashing with OutOfMemoryErrors. As new servers came online and the websockets attempted to reconnected, they once again generated the broadcast queries on the Trello database servers. These broadcast queries continued to put load on the database, which meant the Trello API continued to have high latency, thus perpetuating the negative feedback loop. We are working to determine the root cause of the OutOfMemoryErrors. We also determined after the incident that due to the Trello application server making the load shedding decision AFTER performing the authorization step, the overloaded database servers were still being queried before the request was rejected. We are working to improve our load shedding strategies post incident. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and we are continually working to improve our testing and preventative processes to prevent similar outages in the future. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Increase the capacity of our database \(completed during the incident\). * This action is the most critical and is aimed at preventing a recurrence of this particular incident and gracefully recover if the websocket service were to fail again. * Refactor the new authorization approach to avoid [broadcast operations](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-broadcast). * Add pre-deployment tests to avoid releasing unnecessary broadcast operations. * Determine the root cause of the secondary failure of the websocket service. Furthermore, we deploy our changes only after thorough review and automated testing, and we deploy them progressively using feature flags to avoid broad impact. To minimize the impact of breaking changes to our environments, we will implement additional preventative measures: * Ensure that our load-shedding strategies fail fast. * Add monitoring to observe [broadcast operations](https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#std-label-sharding-mongos-broadcast) in all our environments. We apologize to customers who were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved

The fix we released was successful, and all the issues our users were experiencing with Trello have been resolved. Thank you for your patience and understanding!

monitoring

We've implemented a new fix to the issue affecting our database, and we're now seeing signs of recovery for all users on Trello. We'll keep monitoring this latest fix for now. Once again, we appreciate your understanding!

identified

Our engineering team has identified an issue with Trello's database, and we're now working on implementing a fix to restore Trello's availability to all users. We appreciate everyone's understanding!

investigating

Our teams are still investigating the issue affecting Trello's availability, and a new update will be provided soon. We appreciate your understanding!

investigating

Our team is still investigating the issue that's affecting Trello's availability, and we'll provide a new update soon. Thanks for your patience and understanding!

investigating

We're still investigating the root cause of this problem, and a new update will be shared soon! Thanks for your patience!

investigating

We're currently investigating an issue that's causing Trello to be slow or unavailable to our users. A new update will be shared soon.

Report: "Intermittent issues viewing and loading attachments"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implement and we are monitoring the results.

investigating

Some users may see intermittent issues viewing and loading attachments. We are investigating the cause - for now, refreshing your browser may fix the issue. We will update this page with more information as it is discovered.

Report: "Egress connectivity timing out"

Last update
resolved

The systems are stable after the fix and monitoring for a specified duration

monitoring

The issue was identified and a fix implemented. We are monitoring currently.

identified

We are currently investigating an incident that result in outbound connections from Atlassian cloud in us-east-1 intermittently timing out. This affects Jira, Trello, Confluence, Ecosystem products. The features affected for these products are those that require opening a connection from Atlassian Cloud to public endpoints on the Internet

identified

Including Atlassian Developer

identified

We are currently investigating an incident that result in connection time outs on service egress proxy. This affects Jira, JSM, Confluence, BitBucket, Trello, Ecosystem products. The features affected for these products are those that require a connection to service egress.

Report: "Forge Function Invocations outage impacting Smartlinks"

Last update
resolved

Forge Invocations had an 8 minute outage between 2023-11-29 03:05:13 UTC to 2023-11-29 03:13:27 UTC resulting in Smart Links failing. This service has recovered post this time period.

Report: "Degraded experience for some of Trello's users"

Last update
resolved

This incident has been resolved. If you're still seeing issues, please reach out at https://trello.com/contact/

monitoring

Our Engineers have rolled out a fix and have seen services recovering. We're now monitoring it.

identified

Our Engineers have identified the issue and are working on a fix. We have found the following services to be affected: - Inability to create new Automation commands - Card changes not saving - Some file uploads failing

investigating

We are currently investigating an issue affecting some of Trello's services. We have identified that some users may experience trouble with the following service: - Automations commands not running - Card changes not saving

Report: "Trello is down"

Last update
resolved

We want to let you know that the changes we made on our side fixed all the connection issues our users were experiencing with Trello, and all services are functional right now. We appreciate your patience and understanding.

monitoring

The changes we released have fixed the connection issues our users were experiencing when connecting to Trello. We're now monitoring it.

identified

We've identified the root cause of the issue, and we made some adjustments, which should bring Trello's functionalities back soon.

identified

We are continuing to work on a fix for this issue.

identified

We've identified an issue on Trello that's affecting multiple users to connect to our servers. We'll provide a new update soon.

Report: "Degraded experience for Atlassian Intelligence features"

Last update
resolved

Between 23:51 19th Oct, 2023 UTC to 03:30 20th Oct, 2023 UTC, we experienced a service degradation of Atlassian Intelligence capabilities in Confluence, Jira Service Management, Jira Software, Trello, Atlassian Bitbucket, and Atlas. The issue has been resolved and the service is operating normally.

monitoring

OpenAI has mitigated the problem and we are currently not seeing any errors w.r.t. Atlassian Intelligence capabilities. We are now monitoring closely.

identified

OpenAI team has applied a fix and we are seeing a reduction in the failure rate for Atlassian Intelligence capabilities in Confluence, Jira Service Management, Jira Software, Trello, Atlassian Bitbucket, and Atlas. We continue to closely monitor our systems and coordinating with OpenAI for a faster recovery of Atlassian Intelligence capabilities.

identified

We continue to work on resolving the failures in Atlassian Intelligence capabilities for Confluence, Jira Service Management, Jira Software, Trello, Atlassian Bitbucket, and Atlas. The cause of incident is increased API failure rate from OpenAI, we are in touch with them to understand the time to recover.

identified

We are investigating reports of intermittent errors in Atlassian Intelligence capabilities for some Confluence, Jira Service Management, Jira Software, Trello, Atlassian Bitbucket, and Atlas Cloud customers. We will provide more details once we identify the root cause.

Report: "Degraded performance in Cloud"

Last update
postmortem

### **SUMMARY** Between September 18 2023 5:47 PM UTC and September 19 2023 04:15 AM UTC, customers using Confluence Cloud, Jira Software, Jira Service Management, Jira Work Management, Jira Product Discovery and Trello products with services hosted in the AWS us-east-1 and us-west-2 region experienced slow performance and/or page load failures as a result of an AWS issue that began on September 18 2023, 4:00 PM UTC. This was triggered by an underlying networking fault in our cloud provider AWS, which affected multiple AWS services in their us-west-2 and us-east-1 regions, used by Atlassian. The incident was detected within one minute by our monitoring systems. Recovery of affected Atlassian services occurred on a product-by-product basis with full recovery for all products completed by September 19 2023 04:15 AM UTC. ‌ ### **IMPACT** Product impact varied based on which regions and availability zones services are using, with services hosted in us-west-2 being affected more than services hosted in us-east-1.   Product-specific impacts are listed below: * **Jira Software -** A number of Jira nodes were affected with highly elevated error rates due to Jira databases in us-east-1 and us-west-2 being impacted.  The impact was varied with some Jira nodes being unusable whilst others were in a usable but degraded state. * **Jira Service Management -** Some users hosted in us-east-1 and us-west-2 experienced problems when creating issues through the Help Center, viewing issues, transitioning issues, posting comments and using queues * **Jira Work Management -** Users based in us-west-2 experienced minor service degradation. * **Jira Product Discovery -** Users experienced some issues when loading insights. * **Confluence Cloud -** Impact was limited to customers hosted in the us-west-2 region. During this time, users attempting to load confluence pages experienced sporadic product degradation, including brief periods where Confluence was inaccessible, complete and partial page load failures, page timeouts, increased request latency. * **Trello -** Users had minimal service degradation - only 0.1% of Trello users had automation rules impacted. ### **ROOT CAUSE** The root cause was an issue with subsystem responsible for network mapping propagation within the Amazon Virtual Private Cloud in the us-east-1 \(use1-az1\) and us-west-2 \(usw2-az1 and usw2-az2\) regions, which impacted network connectivity for multiple AWS services which Atlassian products rely upon. There was a delay between the AWS incident and Atlassian being affected as existing compute instances and resources were not affected by the issue. However any changes to networking state - such as scaling-up with additional compute nodes - experienced delays in the propagation of network mappings. This led to network connectivity issues until these network mappings had been fully propagated. Other AWS services that create or modify networking resources also saw impact as a result of this issue. There were no relevant Atlassian-driven events in the lead-up that have been identified to cause or contribute to this incident. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** To avoid repeating this type of incident, we are prioritizing documenting and evaluating ways to improve Availability Zone failure resiliency. Thanks, Atlassian Customer Support

resolved

Between 09/18 10:47 UTC to 09/19 04:15 UTC, we experienced degraded performance for some Confluence, Jira Work Management, Jira Service Management, Jira Software, Trello, Atlassian Access, and Jira Product Discovery customers. The issue has been resolved and the service is operating normally.

monitoring

Services are confirmed to be stable. We are performing final validation checks before confirming the incident's resolution.

monitoring

We have identified the root cause of the downgraded performance and have mitigated the problem. We are monitoring this closely.

investigating

We are investigating cases of degraded performance for some Confluence, Jira Work Management, Jira Service Management, Jira Software, Trello, and Jira Product Discovery Cloud customers. We will provide more details within the next hour.

Report: "Trello UI components not working as expected"

Last update
resolved

We can confirm the fix we released fixed the problems our users were experiencing with Trello's UI. Hence, we're closing this incident. Thanks for your patience and understanding.

monitoring

We've identified the root cause of the problem that was affecting Trello's UI, and we've now released a new fix to it. We'll keep monitoring it from our side.

investigating

We're investigating some reports from our users that Trello's UI isn't working as expected. We'll provide a new update soon!

Report: "Atlassian Account login issues"

Last update
postmortem

### **SUMMARY** On Sep 13, 2023, between 12:00 PM UTC and 03: 30 PM UTC, some Atlassian users were unable to sign in to their accounts and use multiple Atlassian cloud products. The event was triggered by a misconfiguration of rate limits in an internal service which caused a cascading failure in sign-in and signup-related APIs. The incident was quickly detected by multiple automated monitoring systems. The incident was mitigated on Sep 13, 2023, 03: 30 PM UTC by the rollback of a feature and additional scaling of services which put Atlassian systems into a known good state. The total time to resolution was about 3 hours & 30 minutes. ‌ ### **IMPACT** The overall impact was between Sep 13, 2023, 12:00 PM UTC and Sep 13, 2023, 03: 30 PM UTC on multiple products. The Incident caused intermittent service disruption across all regions. Some users were unable to sign in for sessions. Other scenarios that temporarily failed were new user signups, profile retrieval, and password reset. During the incident we had a peak of 90% requests failing across authentication, user profile retrieval, and password reset use cases. ‌ ### **ROOT CAUSE** The issue was caused due to a misconfiguration of a rate limit in an internal core service. As a result, some sign-in requests over the limit received HTTP 429 errors. However, retry behavior for requests caused a multiplication of load which led to higher service degradation. As many internal services depend on each other, the call graph complexity led to a longer time to detect the actual faulty service. ‌ ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We are continuously improving our system's resiliency. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Audit and improve service rate limits and client retry and backoff behavior. * Improve scale and load test automation for complex service interactions. * Audit cross-service dependencies and minimize them where possible related to sign-in flows. ‌ Due to the unavailability of sign-in, some customers were unable to create support tickets. We are making additional process improvements to: * Enable our unauthenticated support contact form and notify users that it should be used when standard channels are not available. * Create status page notifications more quickly and ensure that for severe incidents, notifications to all subscribers are enabled. ‌ We apologize to users who were impacted during this incident; we are taking immediate steps to improve the platform’s reliability and availability. Thanks, Atlassian Customer Support

resolved

Between 12:45 UTC to 15:30 UTC, we experienced login and signup issues for Atlassian Accounts. The issue has been resolved and the service is operating normally. We will publish a post-incident review with the details of the incident and the actions we are taking to prevent similar problem in the future.

monitoring

We are no longer seeing occurrences of the Atlassian Accounts login errors, all clients should be able to successfully login now. We will continue to monitor.

monitoring

We can see a reduction in the Atlassian Accounts login issues after the mitigation actions were taken. We are still monitoring closely and will continue to provide updates.

monitoring

We have identified the root cause of the Atlassian Accounts login issues impacting Cloud Customers and have mitigated the problem. We are now monitoring this closely.

investigating

We are investigating an issue with Atlassian Accounts login that is impacting some Cloud customers. We will provide more details within the next hour.

Report: "Degraded performance in Cloud"

Last update
resolved

The issues with Trello have been mitigated and the incident is being marked resolved for Trello.

monitoring

An outage with a cloud provider is impacting multiple Atlassian Cloud products including Access, Confluence, Trello, and Jira Products. We will provide more details within the next hour.

Report: "Multiple product logins"

Last update
postmortem

### **SUMMARY** On August 30, 2023, between 4:07 and 5:30 UTC, some customers were unable to login to Atlassian's Cloud products using [id.atlassian.com](http://id.atlassian.com).  Logged-in users were also unable to switch accounts, change passwords, or log out. Users with existing sessions were not impacted. Between 5:32 and 6:00 UTC, traffic was incrementally restored to a previous build, mitigating the impact for users. The total time to resolution was one hour and 53 minutes. ### **IMPACT** Users were not able to login using Atlassian's shared account management system \([id.atlassian.com](http://id.atlassian.com)\). This affected users who were trying to login to the following products: Jira, Confluence, Trello, Opsgenie, mobile apps and ecosystem apps. Aside from the inability to login, there was no impact on other Atlassian products or features. ### **ROOT CAUSE** Multiple Set-Cookie headers were unintentionally modified so that only the last Set-Cookie header remained in the response to user's browsers.  The issue was caused by a change to Network Extensions within the Edge Network. As a result, users that needed a new session could not login.  Upon login, the users were redirected to login again and no session was created for them. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue was not detected in Atlassian's staging environment.  End-to-end tests did not cover the use case of multiple Set-Cookie headers in the single response and therefore this bug went unnoticed. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Automated tests to be put in place to validate that cookies are not being removed from responses. * Configuration of networking extensions will be guaranteed to be identical in staging and production to ensure errors are picked up earlier. Furthermore, we typically deploy our changes progressively by cloud region to avoid broad impact, but in this case, the change was not deemed risky and was deployed to all regions. To minimize the impact of breaking changes to our environments, we will implement additional preventative measures: * Changes to network extensions in the future will use progressive rollouts. * With staging being properly utilized, errors similar to this one will not be deployed to any production environments. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved

Between 4:30AM UTC to 6:00AM UTC, we experienced issues for users attempting to login for Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Jira Product Discovery, Compass, and Atlassian Analytics. The issue has been resolved and the service is operating normally.

investigating

We are investigating reports of intermittent errors for login to Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Jira Product Discovery, Compass, and Atlassian Analytics Cloud customers. We will provide more details once we identify the root cause.

Report: "Infrastructure outage affecting attachments and Bitbucket Pipelines in all regions"

Last update
resolved

Between 04:00 UTC to 05:00 UTC, we experienced a partial outage for Bitbucket Pipelines, issues viewing in-product attachments; and degraded performance across all products. The issue has been resolved and all services are operating normally.

monitoring

We have identified the root cause, initiated mitigation actions, and are seeing a reduction in errors when scaling up in response to customer traffic. We are still monitoring closely and will continue to provide updates.

identified

A major infrastructure outage is affecting our ability to scale up in response to customer traffic. We are pre-emptively disabling any downscaling to minimize future impact. This is primarily affecting Bitbucket Pipelines, and attachment storage for all products. All regions are affected. We will provide more details in the next hour.

identified

A major infrastructure outage is affecting our ability to scale up in response to customer traffic. We are pre-emptively disabling any downscaling to minimize future impact. This is primarily affecting Bitbucket Pipelines, and attachment storage for all products. All regions are affected. We will provide more details in the next hour.

Report: "Intermittent errors during login and Workspace not showing up for some customers"

Last update
postmortem

### **SUMMARY** On Aug 4, 2023, Trello users encountered issues accessing their workspaces. This was caused by a processing error during user deletion events involving two users who shared a workspace. The error resulted in unintended workspaces being marked as deleted. The issue was identified, the deletion process halted, and data restoration initiated. The solution involved marking workspaces as undeleted and implementing a code fix to prevent similar issues in the future. ### **IMPACT** The overall impact occurred on August 4th, 2023, spanning from the afternoon to the early evening, in UTC time. All Trello workspaces created before July 2021 were inaccessible during the incident. The impact of this was 39% of active workspaces were inaccessible. ### **ROOT CAUSE** The event was triggered by a race condition which occurred during the response to user deletion events. When the last user in a workspace is deleted the system automatically marks the workspace as deleted. In this case two users sharing a workspace were deleted simultaneously, causing a race condition. The race condition triggered a code path which generated a query that was not targeted to an individual workspace, but instead marked all workspaces \(including unrelated ones\) as deleted in our database in a systematic way. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages affect your productivity, and we are committed to preventing incidents like these from occurring. We already implemented code changes to prevent the specific condition that caused the incident. We are prioritizing the following improvement actions to avoid repeating this type of incident: * Implement a monitoring system for the following metrics in order to improve anomaly detection: CPU usage, inbound and outbound network traffic, memory usage, and disk usage. * Add anomaly detection to monitor the number of soft deletes and set up alerting for it. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved

Our team has identified the root cause and restored access to previously unavailable workspaces. We will share more in our post incident review which will be published as part of this incident report.

monitoring

Our team has successfully identified the underlying cause of the issue that resulted in log-in difficulties and Workspaces not being displayed while utilizing Trello and subsequently restored access to all affected Workspaces. We will continue to monitor the situation from our end closely.

identified

We have discovered the cause of workspaces being unavailable, and are working to restore access as soon as possible. We will continue to update our Statuspage with the latest information as it becomes available.

investigating

We are currently investigating issues with Trello workspaces being unavailable and are working to restore service as quickly as possible. We will continue to update our Statuspage with the latest information as it becomes available.

investigating

We are investigating reports of intermittent errors during login and Workspace not appearing for some customers using Trello. We will provide more details once we identify the root cause.

Report: "Sign-ups, Product Activation, and Billing not working"

Last update
resolved

We mitigated the issue with Sign-ups, Product Activation, and Billing, and the systems are back to BAU, and all functionality is restored.

monitoring

We have identified the root cause of the Sign-ups, Product Activation, and Billing not working and have mitigated the problem. We are now monitoring closely.

investigating

We are investigating an issue with Sign-ups, Product Activation, and Billing that is impacting all of our Cloud Customers. We will provide more details within the next hour.

Report: "Real-time updates are not functioning as expected"

Last update
resolved

This incident has been resolved.

identified

Real-time updates are now working for the majority of users, and new connections will also be working. Some users may still be affected. We are continuing to implement a fix.

identified

The issue has been identified and a fix is being implemented. Real-time updates are working for the majority of users, but some may still be impacted.

investigating

We have identified an incident affecting real time updates. For now, real-time updates are not functioning. Changes can be seen by refreshing your browser.

Report: "Performance issues and outages with Cloud products"

Last update
postmortem

### **SUMMARY** We understand the importance of providing reliable and consistent service to our valued customers. On July 6, 2023, from 03:52 to 15:11 UTC, we experienced an issue with an upgraded version of a third-party tool that functions as our internal artifact management system. Despite our monitoring system identifying the incident within two minutes, this issue led to the degradation of the scaling capabilities of our internal hosting platform, resulting in service degradation or outages for customers of Atlassian cloud. In response to this situation, we are taking immediate measures to enhance the stability of our system and prevent similar issues from re-occurring. ### **IMPACT** This incident affected multiple regions and products due to the diminished scaling capabilities of our internal hosting platform. In most products and offerings, customers faced reduced functionality, slower response times, and limited access to specific features. ### **ROOT CAUSE** The root cause of the incident was the introduction of new functionality in a third-party tool that functions as our internal artifact management system. It led to an unexpected increase in the load on the primary database of the artifact system. Upon identifying and localizing the problem, we promptly adjusted the system configuration to regain stability. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** Over the next months, we will enact a temporary freeze on non-critical upgrades of the artifact management system, and we will focus our efforts on three high-priority initiatives: 1. **Enhancing system scaling:** We prioritized work ensuring that downtime in a critical infrastructure component does not affect the scaling of other components. We expect to complete this initiative within the next two months. 2. **Reducing interdependencies:** We are working to mitigate the risk of potential cascading failures by ensuring that significant system components are able to operate independently in the case of issues. Initiatives 1 and 2 are already in progress but have been given priority to be completed as soon as possible. 3. **Strengthening testing procedures:** Alongside these initiatives, we are addressing the need for even more stringent testing procedures than we already have in place to prevent potential issues in future updates. We are committed to collaborating closely with our technology partners to ensure the most optimal experience for our customers. We apologize for any inconvenience caused by this incident and appreciate your understanding. Our team is dedicated to continually improving our systems and processes to provide you with the exceptional service you deserve. Thank you for your continued support and trust in us. Sincerely, Atlassian Customer Support

resolved

We experienced performance issues and outages for several Atlassian Cloud Products. The issue has been resolved and the service is operating normally.

monitoring

We have identified the root cause of an issue with an internal infrastructure component that has been impacting multiple Cloud products, including Jira Software, Jira Service Management and Confluence, and customers. This issue had lead to a performance impact and, in some cases, outages. We have implemented a fix to resolve the issue and recovery is in progress.

identified

We are investigating an issue with an internal infrastructure component that is impacting multiple Cloud products, including Jira Software, Bitbucket, Jira Service Management and Confluence, and customers. These issues include performance impact and, in some cases, outages. Users may experience slow loading and uploading of attachments, login issues or inability for new customers to sign up. We have identified the root cause and are actively working on the service recovery.

Report: "Intermittent errors during login for some customers"

Last update
resolved

Between 07:31 UTC to 12:32 UTC, we experienced errors during login for Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Product Discovery, Compass, and Atlassian Analytics. The issue has been resolved and the service is operating normally.

monitoring

We have identified the root cause of the errors during login and have mitigated the problem. We are now monitoring closely.

identified

We are investigating reports of errors during login that is impacting some Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Product Discovery, Compass, and Atlassian Analytics. We have identified the root cause and expect recovery shortly.

investigating

We are investigating reports of errors during login for some customers that is impacting some Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Product Discovery, and Atlassian Analytics Cloud customers. We will provide more details within the next hour.

investigating

We are investigating reports of errors during login for some customers that is impacting some Atlassian Support, Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Product Discovery, and Atlassian Analytics Cloud customers. We will provide more details within the next hour.

investigating

We are investigating reports of errors during login for some customers that is impacting some Confluence, Jira Work Management, Jira Service Management, Jira Software, Opsgenie, Trello, Atlassian Bitbucket, Atlassian Access, Jira Product Discovery, and Atlassian Analytics Cloud customers. We will provide more details within the next hour.

investigating

We are investigating reports of intermittent errors during login for some customers using Confluence, Jira Work Management, Jira Service Management, Jira Software, Trello, Atlassian Bitbucket, Atlassian Access, and Jira Product Discovery Cloud customers. We will provide more details once we identify the root cause.

Report: "Some features are experiencing degraded performance"

Last update
postmortem

### **SUMMARY** On June 13 2023, from 6:49 PM UTC to June 14, 2023, 02:20 AM UTC, Atlassian customers using Jira Software, Jira Service Management, Jira Work Management, Confluence and Trello with services hosted in AWS us-east-1 region were impacted by Automation rule degradation. This event was triggered by an increased error rates and latencies for AWS Lambda function invocations in the us-east-1 region. Some other AWS services also experienced increased error rates and latencies as a result of degraded Lambda functions invocations.  This incident was automatically detected by multiple monitoring systems within 6 minutes, paging on-call teams. Recovery of the affected AWS Lambda service began after 116 minutes at June 13th 8:45 PM UTC.  Full recovery of all AWS services occurred at 10:37 PM UTC June 13th after the backlog of asynchronous Lambda events had been processed. Some Jira tenants with large event backlogs experienced delays in running schedule-based rule reruns. Full recovery of all Atlassian Cloud services was notified at June 14, 2023, 02:20 AM UTC. ### **IMPACT** The overall impact was between June 13, 2023, 06:49 PM UTC and June 14, 2023, 02:20 AM UTC.  Product-specific impacts are listed below. * Jira Software, Jira Service Management,  Jira Work Management - Automation rules were not executed for 2 hours between Jun 13, 06:49 PM UTC and Jun 13, 08:45 PM UTC.  Jira automation events generated during this period were unable to be rerun.  When AWS Lambda recovered delays were still experienced in our schedule-based and event-based rules for some larger tenants due to a large backlog of events. Full recovery was at June 14, 2023, 02:20 AM UTC. * Confluence - Automation rules were not executed for 2 hours between Jun 13, 06:49 PM UTC and Jun 13, 08:45 PM UTC.  On AWS service restoration Confluence automation recovered and Confluence automation events generated during this period were rerun and processed.  Full recovery was at June 14, 2023, 12:41 AM UTC. * Jira Product Discovery - Automation rules were not executed for 2 hours between Jun 13, 06:49 PM UTC and Jun 13, 08:45 PM UTC. Jira automation events generated during this period were unable to be rerun.  Sending feedback/filing a support ticket from the application did not work.  * Trello -  Email to board delays, card covers image upload failures, attachment preview generation failures, board background upload failures, custom sticker images upload failures, custom emoji upload failures. Trello automation was unaffected. Full recovery was at June 13, 2023, 10:08 PM UTC. The service disruption lasted for 7 hours and 1 minutes between June 13, 2023, 06:49 PM UTC and June 14, 2023, 02:20 AM UTC and caused service disruption to customers with services hosted in the US-EAST-1 region. ### **ROOT CAUSE** Atlassian uses Amazon Web Services \(AWS\) as a cloud service provider. The root cause was an issue with a subsystem responsible for capacity management for AWS Lambda in US-EAST-1 Region, which also impacted 104 AWS services.  This impacted Automation rules as the service is hosted exclusively in this region. There were no relevant Atlassian-driven events in the lead-up that have been identified to cause or contribute to this incident. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We are prioritizing the following improvement actions to avoid repeating this type of incident: * Increase reliability of message delivery and recoverability from Jira to Automation platform to improve recovery times.  * Create a plan for multi-region impact mitigation for Automation. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support

resolved

From 6:49 PM UTC to 8:45 PM UTC, Trello experienced issues with image uploads that affected card covers, attachment previews, board backgrounds, and custom sticker images. Additionally, we observed degradation in our email-to-board and search features. The services are now operating as expected, and we consider this incident resolved.

monitoring

AWS Services have recovered from an outage. This issue affected the Attachment previews, email-to-board features, search, and card covers, and now all Trello services are back online. For now, we'll keep monitoring from our end.

identified

We are continuing to work with our third-party partner on a fix. Attachment previews, email-to-board features, search, and card covers are intermittently affected.

identified

We have confirmed an issue with AWS services is causing the current issues. We are working with AWS to get the service restored back to normal as soon as possible.

identified

It has come to our attention that the Email-to-Board feature, along with other previously mentioned features, has been affected by a technical problem. Our team is currently collaborating with our third-party partner to identify and implement an effective solution for this matter.

identified

We have identified an issue that may cause problems for users while adding new card covers, viewing attachment previews, or using the search function. The issue has been identified with one of our third-party partners, and we are currently working on resolving it.

Report: "All Trello users may experience problems receiving emails from Trello"

Last update
resolved

After our third-party email service released the fix, all the problems with receiving emails from Trello were fixed.

monitoring

We got an update from our third-party email service provider, who has already implemented a fix. We're monitoring our side to ensure all emails are successfully delivered to our users. We will quickly provide more updates as more information becomes available.

identified

Our third-party email service provider has identified the problem on their side, and they're now working on a fix to be implemented. We will quickly provide more updates as more information becomes available.

identified

We have discovered an issue with our third-party email service provider, and we are currently awaiting their analysis to determine the root cause of the problem, which is hindering our users from receiving emails. We will quickly provide more updates as more information becomes available.

investigating

We're investigating an issue affecting the ability to receive any emails from Trello, except for marketing emails. Our team is still investigating, and we'll share a new update soon.

Report: "Trello login and signup attempts are intermittently failing"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Users may see issues logging in or signing up for Trello. Our engineering team is actively investigating this incident and working to resolve as soon as possible. We will update this page as we have additional information.

Report: "Trello realtime updates are slow"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor.

investigating

Realtime updates have fully recovered. We are continuing to investigate.

investigating

Trello realtime updates were slow for a brief period, but are recovering now. We'll keep you posted with further updates on this page.