Historical record of incidents for Octopus Deploy
Report: "octopus.com/downloads degraded"
Last updateWe have fixed the issue with the navigation bar on octopus.com/downloads
We have shipped a change to fix the navigation bar issue and are monitoring to ensure it works in all regions.
We have identified the issue with the top navigation bar loading and are working on a fix.
octopus.com/downloads top navigation bar is failing to load correctly. We are investigating the cause. Links to downloads from this website are still correct. Please reload the page if you cannot see the content.
Report: "octopus.com/downloads degraded"
Last updateWe have fixed the issue with the navigation bar on octopus.com/downloads
We have shipped a change to fix the navigation bar issue and are monitoring to ensure it works in all regions.
We have identified the issue with the top navigation bar loading and are working on a fix.
octopus.com/downloads top navigation bar is failing to load correctly. We are investigating the cause. Links to downloads from this website are still correct. Please reload the page if you cannot see the content.
Report: "ECS Update steps failing to execute"
Last updateThis incident has been resolved.
We have uploaded a new version of the ECS Update step. Affected customers can manually force a re-acquisition of the updated packages via the following steps 1. Go to Tasks 2. Click Show Advanced filters 3. Check "Include System Tasks" 4. Select the "Acquire new steps and targets" from the Task Type 5. Choose the most previously run task 6. Select "Re-run" from the context menu (3 dots).
We are aware of an issue where a recent update has caused ECS Update steps to fail deployment.
Report: "Performance issues identified with Octopus Cloud"
Last updateAzure have advised us this issue has been mitigated. Octopus Cloud instances in the West Europe region were affected on 5 March from 6 AM to 5 PM UTC. Azure will be providing us with a Root Cause Analysis, after which we can share more information.
We continue to see issues with our upstream to instances within the WestEU region. We are awaiting further information from Azure.
Octopus Deploy is aware of performance issues for some customers on Octopus Cloud. Our engineers are investigating and we will provide an update soon.
Report: "ADO 'Use Octopus CLI Tool' step Certificate Issue"
Last updateWe've had no further reports of this issue following the certificate update, and customers have indicated that they are back up and deploying. If you encounter any further issues, please email support@octopus.com
We have updated the certificate chain on our host, which has resolved the verification issues. We are continuing to monitor the outcome of this change.
We're investigating an issue with the v5 and below versions of the step 'Use Octopus CLI Tool' on build servers where customers are seeing the error 'unable to verify the first certificate'. This is currently affecting customer deployments due to not being able to install the depreciated version of the OctoCLI. The v6 versions of our ADO extension tasks do not require the OctoCLI so upgrading to our v6 steps will resolve this issue. Find out more information about the migration to the new step version here: https://octopus.com/blog/azure-devops-octopus-v6
Report: "Some Octopus Cloud customers affected by missing variable substitution on Helm template sources"
Last update# Report and learnings: Missing variable substitution on Helm template sources ###### Author: Kevin Tchang # Summary The Octopus Server `2025.1.6849` update introduced a bug in the _Deploy a Helm Chart_ step, preventing variable substitution from working correctly in certain Helm template value sources. This led to Helm deployment failures for Cloud customers, which had previously succeeded without issue. # Background [Calamari](https://octopus.com/docs/octopus-rest-api/calamari) is a deployment tool used by Octopus to execute deployment tasks on target machines, such as extracting packages and running scripts. It supports the deployment of many built-in Octopus steps, including the _Deploy a Helm Chart_ step, which allows users to deploy Helm charts to Kubernetes clusters. When deploying Helm charts, Octopus allows users to pass values into the Helm release via **Helm Template Value Sources** \(Helm TVS\). These values can come from various sources, including charts, packages, Git repositories, key-values, or inline YAML. The Helm TVS are sent to Calamari during deployment to configure the Helm chart correctly. **Octopus variable substitution** allows dynamic replacement of placeholder variables within deployment configurations. This enables users to store sensitive data or environment-specific values as variables in Octopus and substitute them during deployment. ## Incident timeline _\(All dates and times below are shown in UTC\)_ ##### 23/1/2025 – 18:49 \(5:49 AEDT\) We receive the first reports of Helm deployments failing due to missing variable substitutions. ##### 23/1/2025 – 23:39 \(10:39 AEDT\) Our support team escalated the issue to our engineering teams. ##### 24/1/2025 – 1:35 \(12:35 AEDT\) Our internal incident response process was initiated. ##### 24/1/2025 – 8:58 \(19:58 AEDT\) The fix for the bug is merged `2025.1.7389`, and our Status Page is updated to _Monitoring_. However, due to the timing of a recent Octopus Server upgrade to .NET 9 and concerns about stability, the fix was not immediately rolled out to all Cloud customers. In the meantime, our support team provided assistance with manual upgrades. ##### 27/1/2025 – 22:21 \(9:21 AEDT\) After the long weekend \(due to a public holiday\), the Status Page was updated to _Resolved_ once the fix was confirmed by customers. ## Technical details Helm TVS are transmitted to Calamari as strings within a JSON array structure. Upon receiving these, Calamari parses the JSON array into their respective TVS types \(such as package, chart, inline-YAML, etc.\), which are then used by the `helm upgrade` command. The bug was inadvertently introduced during a recent change aimed at eliminating the need for escaping quotes when using Octopus variable values in Helm TVS. Prior to this change, variable values with unescaped quotes caused parsing errors because the quotes were misinterpreted when processed from the JSON array structure. The change modified the process so that variable substitution occurs after the JSON array is parsed into its respective TVS types, rather than before. However, this substitution was initially applied only to the **inline YAML TVS type**, and not the other four TVS types, which also could contain variables to substitute. This resulted in a regression, and the fix involved applying variable substitution to all TVS types and ensuring it was handled correctly wherever a variable could appear. ## Remediation and next steps At Octopus, ensuring deployment reliability is a top priority. Following this incident, we conducted a comprehensive review to identify areas for improvement. Given the flexibility and complexity of our product features, designing a robust testing process for all scenarios can be challenging. However, we’re taking proactive steps to improve how we test variable evaluation features.
This issue has been resolved and will be rolling out to Octopus Cloud customers in the next couple of days. If you are affected by this issue, please get in touch with support@octopus.com
We have fixed the issue, but due to timing and other factors will not be rolling this out immediately. If you are affected by this issue, please get in touch with support@octopus.com who can manually update your instance.
We have had reports of and have identified an issue with variable substitution with Helm template sources. We are tracking the fix here: https://github.com/OctopusDeploy/Issues/issues/9224
Report: "Some Octopus Cloud customers affected by parsing issue with ConfigMap on Kubernetes deployments"
Last update# Report and learnings: Configure and apply Kubernetes resources step error parsing configmap.yml ###### Author: Kevin Tchang ## Summary In Octopus Server `2025.1.5751`, a bug caused the deployment of Kubernetes config maps containing multi-line variables, when created through the _Configure and apply Kubernetes resources_ step \(built-in Kubernetes step for deploying containers\), to fail. Config maps created using the dedicated Kubernetes config map step, as well as those generated with Raw YAML or Helm steps, were unaffected. This issue impacted Cloud customers, who experienced deployment failures that had previously been successful. The bug was a regression caused by a change supporting manifest reporting for Kubernetes deployment steps, part of an upcoming feature. This change mistakenly caused line breaks in multi-line Octopus variable values to not be properly escaped when substituted into the config map's key-value pairs. The problem became apparent when customers had PEM certificates or JSON blobs that needed to be inserted into the config map. These were replaced verbatim in Calamari, leading to YAML formatting issues due to unescaped line breaks. ## Background The [_Configure and apply Kubernetes resources_](https://octopus.com/docs/kubernetes/steps/kubernetes-resources) step deploys a combination of Kubernetes Deployment, Service, and Ingress resources. It also allows the optional configuration and deployment of an associated Kubernetes ConfigMap and Secret for reference by the Deployment. To support Rolling Update and Blue/Green deployment strategies, ConfigMap and Secret resources must have unique names for each Deployment version. These resources are assigned [computed names](https://octopus.com/docs/kubernetes/steps/kubernetes-resources?q=configmap#configmap-and-secret), which, by default, combine the resource name with the Octopus deployment ID, and are determined only at deployment time. ## Incident timeline _\(All dates and times below are shown in UTC\)_ ##### **22/1/2025 – 7:31 \(18:31 AEDT\)** Began receiving customer reports of an increase in failing Kubernetes deployments. These failures have been observed across various projects, with similar errors related to parsing config maps. Our support team worked with our customers to troubleshoot the reasons for the failures. ##### **22/1/2025 – 10:58 Jan 22, 2025 \(21:58 AEDT\)** Our support team escalated the issue to our engineering teams. ##### **22/1/2025 – 15:51 \(2:51 AEDT\)** Our internal incident response process was initiated. ##### **22/1/2025 – 21:45 \(8:45 AEDT\)** Our engineers logged on and begin to identify the cause of the incident. ##### **23/1/2025 – 1:42 \(12:42 AEDT\)** The fix for the bug is merged, and our Status Page is updated to _Identified._ ##### **23/1/2025 – 2:47 \(13:47 AEDT\)** Our Status Page is updated to _Monitoring_ as we begin the process to expedite the release `2025.1.7128` of the fix to our affected Cloud customers. ##### **23/1/2025 – 7:49 \(18:49 AEDT\)** Status Page updated to _Resolved._ ## Technical details Before the change to support manifest reporting, the Kubernetes container deployment step created associated Kubernetes config maps \(and secrets\) using the `kubectl create` command with the `--from-files` flag, where each config map key-value pair was sent to Calamari as an individual file. This process was updated to use the more standard `kubectl apply -f` method, where Octopus now sends a single YAML manifest to Calamari representing the config map. The YAML is generated from a config map resource that we build as an in-memory C# object. The bug was introduced when the argument for the config map object used raw, unevaluated Octopus variable values. The issue wasn’t identified during testing because the deployment step involves two stages of variable substitution: the first on Octopus Server, and the second inside Calamari during deployment. The two substitution passes are necessary to support the use of computed names, ensuring that each deployment version has its own unique resources. The change didn't account for multi-line strings as potential variables, causing newline characters to not be properly escaped before serialization. This issue occurred because encoding needs to happen on Octopus Server before the object is serialized into YAML. The second substitution in Calamari is direct on the YAML file. The bug was a regression, and the fix involved evaluating the values before serialization to ensure newline characters were handled correctly. ## Remediation and next steps At Octopus, we take deployment reliability very seriously. After this incident, we conducted a thorough review to identify areas where we can improve our processes, in light of the lessons learned. We’ve identified a complex and unconventional area of the code—specifically script-based Kubernetes deployments—that requires further attention. Given the distinctive challenges these deployments present, we are committed to enhancing this area with additional tests to ensure better reliability.
We are currently rolling out the fix to Cloud customers. Instances will be upgraded in their next maintenance window. If you are affected by this issue and want to expedite the upgrade, please contact support@octopus.com
A fix has been implemented and will be rolled out to affected cloud customers in their next maintenance window.
We have identified the issue with a recently released version of Octopus Server to cloud customers. A recent change has caused config maps with multi-line variables, created via the Kubernetes containers deployment step, to fail. Config maps created by the dedicated Kubernetes config map step are not affected, nor config maps created by Raw YAML or Helm steps. We are in the process of fixing this and will update the public issue with the fixed version: https://github.com/OctopusDeploy/Issues/issues/9221.
We are currently investigating the issue
Report: "Intermittent service disruption for a small number of customers"
Last updateTwo seperate recent changes were identified as causes of the excess memory use. These changes have been rolled back, and we have confirmed through monitoring that memory use has returned to normal on the affected instances. No further crashes terminations have been observed.
Internal monitoring notified us that a small number of Octopus Cloud instances were terminated for excess memory use up to 12 hours after a recent a software upgrade. These instances were automatically restarted and resumed functioning correctly. At this stage we are investigating to determine the cause of the excess memory use. If you believe you are affected by this issue please contact support@octopus.com
Report: "Config as Code projects might be affected by Github incident"
Last updateThe Github incident has been resolved, we have confirmed that Config-as-Code project loading is working correctly
For customers using Github to store Config As Code projects, an ongoing incident (https://www.githubstatus.com/incidents/qd96yfgvmcf9) with Github is affecting the loading of those projects. There is no impact to projects that are using database storage.
Report: "Octopus Cloud service may be unavailable for some customers"
Last updateAll affected Instances are now up and healthy.
Root cause has been identified and mitigated. All Instances are now running. Further updates will be provided shortly.
14 Instances are down at this time. We've identified the issue and are working to restore services as a matter of priority.
We are currently investigating the issue.
Report: "Package Acquisition step sometimes fails with exception"
Last updateThe fix has been released to Cloud - if you are still encountering issues, please ask the support team to upgrade your instance to 2025.1.3365
The fix is now rolling out to Cloud. If this issue is affecting you, please reach out to the support team to ask them to upgrade your instance to the latest version.
We've identified the underlying code change that caused this issue to start appearing. We have a fix for it that will be rolling out to Cloud customers as quickly as possible. Self-hosted customers will not be affected.
We are continuing to actively work on resolving this issue. Our first potential mitigation didn't work as expected. We have a second potential mitigation that is performing better in testing. We will update here if/when we ship it.
We are continuing to actively work on this issue. We have identified the issue and are testing a potential mitigation/fix. We are also continuing to investigate the underlying causes of the issue to build confidence our mitigation will work as expected.
Some Octopus Cloud customers are having intermittent issues with package acquisition steps throwing exceptions. We are currently investigating.
Report: "Octopus Cloud Trial Signup - Service Disruption"
Last updateThis issue has now been resolved and all systems are working as expected
The cause of the issue has been identified and engineers are working on a fix
The cause has been identified and engineers are working on a fix
We are currently experiencing an issue with Octopus Cloud Trial signups. New Trial instances are not being provisioned as expected. Our team is actively investigating the root cause and working to resolve the issue as quickly as possible. We will provide updates here as soon as more information becomes available. Users attempting to sign up for new Octopus Cloud Trials are unable to provision new trial instances at this time.
Report: "Octopus ID and a subset of Octopus.com unavailable"
Last updateBetween 12:15 PM and 12:25 PM AEST, a brief outage affected Octopus ID and a subset of Octopus.com services, including the Profile, Control Centre v1, Blogs, and License Purchasing. During this time, customers were unable to log in to cloud instances using Octopus ID and access certain areas of the site. The issue was caused by an unintended outage during a system upgrade. The issue was identified and resolved promptly, restoring all services within 10 minutes. All impacted systems are fully operational, and no further issues are expected.
Report: "Unable to sign into Octopus ID"
Last updateThis incident has been resolved.
A recent configuration change to the Octopus ID Web Application Firewall (WAF) caused some customers to be unable to sign into their Octopus Cloud instance using Octopus ID. The change has been rolled back and customers should be able to sign into Octopus ID and their Octopus Cloud instances again. We are monitoring, but if you are having issues, please get in touch at support@octopus.com
Report: "Octopus.com/signin Unavaliable"
Last updateoctopus.com/signin has been up for the last hour; we will continue to monitor it throughout the day. Customers that notified us have confirmed they can now logon to their Octopus Cloud Instances.
octopus.com/signin should be available again and we are monitoring at the moment.
We're investigating an issue with Octopus.com. This also affects all customers attempting to sign in to their Octopus Cloud instances.
We're investigating an issue with Octopus.com. This also affects all customers attempting to sign in to their Octopus Cloud instances.
Report: "Octopus ID and a subset of Octopus.com URLS unavailable"
Last updateServices have been operational and stable since recovery 6 days ago. The incident was caused by an abnormally high frequency of requests, resulting in degraded service. We have relieved the bottleneck and are investigating why the compensating controls we had in place didn't compensate as expected.
Services are currently operational. Our team is continuing to investigate the root cause.
Users are reporting 502 and 504 timeouts when attempting to log into their Octopus.com accounts, limiting access to Octopus Cloud and Control Center. A small subset of Octopus.com URLs are also impacted: Octopus.com/blog Octopus.com/start
Report: "Connectivity issues with Octopus.com"
Last updateAzure has resolved the underlying issue causing these connectivity problems.
Azure is continuing the global rollout of their mitigation measures. We continue to monitor this situation, and await the completion of this rollout by Azure engineers.
Azure has reported improved service availability. We will continue to monitor this situation.
Due to an upstream issue with Azure, some users are reporting connectivity issues to Octopus.com, billing.octopus.com, and potentially other Octopus-hosted services.
Report: "Windows2022 Dynamic Workers crashing and failing to lease"
Last updateThis incident has been resolved.
Incident has been mitigated and Windows Dynamic Workers are operating normally.
Our upstream provider, CrowdStrike, is currently experiencing an issue which is affecting our ability to lease Windows2022 Dynamic Workers. We will resolve this incident once we have received confirmation the issue has been resolved from our upstream provider.
Report: "Octopus Cloud - intermittent deployment failures when using Git Credentials - MultipleActiveResultSets error"
Last updateOctopus Cloud Customers have reported intermittent failures when deploying using Git Credentials. Deployments fail, showing an error message similar to "There is already an open DataReader associated with this connection which must be closed first. The connection does not support MultipleActiveResultSets".We have identified this problem was introduced by a recent code change, affecting Octopus Deploy versions 2024.3.2940 or higher. All known instances of this issue have been resolved.
Customers have reported intermittent failures when deploying using Git Credentials. Deployments fail, showing an error message similar to "There is already an open DataReader associated with this connection which must be closed first. The connection does not support MultipleActiveResultSets". We have identified this problem was introduced by a recent code change, affecting Octopus Deploy versions 2024.3.2940 or higher. Our engineering team is investigating further to determine a fix. Potential Workaround: Retry the deployment. The problem is intermittent in nature.
Report: "Dynamic Worker Leases failing in East AU"
Last updateThis incident has been resolved.
We have resumed provisioning Dynamic Workers in East AU (from Australia Southeast). We are continuing to monitor for any degradation of service.
Our service provider, Microsoft Azure, has confirmed that the underlying issue has been resolved. We are continuing to monitor for any degradation of service.
We are continuing to see periods of service degradation in the Australia East region. New Dynamic Workers will be provisioned in the Australia Southeast region until we have confirmation from our service provider that the issue has been resolved.
Dynamic Workers are now provisioning successfully in East AU. We are continuing to monitor for any degradation of service.
Our upstream provider, Microsoft Azure, is currently experiencing an issue which is affecting our ability to provision worker virtual machines for customers in East AU. We are working with Azure to resolve the issue. If you experience this problem, deployments utilizing dynamic workers will fail with an error specifying that Octopus Deploy could not obtain a dynamic worker lease. We will resolve this incident once we have received confirmation the issue has been resolved from our upstream provider.
Octopus Cloud is experiencing issues leasing Dynamic Workers in Australia East. The issue is currently under investigation and this incident will be updated when further details are known.
Report: "Octopus.com/signin unavailable"
Last updateoctopus.com/signin have been up and available for the last hour, we'll continue to monitor it but all is operational again.
octopus.com/signin should be available again and we are monitoring at the moment.
We're investigating an issue with Octopus.com. This also affects all customers attempting to sign in to their Octopus Cloud instances.
Report: "Dynamic Worker Leases failing in WestEU"
Last updateThis incident has been resolved.
Dynamic Workers are now provisioning successfully. We are continuing to monitor for any degradation of service.
Dynamic Workers are now provisioning successfully. We are continuing to monitor for any degradation of service.
Dynamic Workers are now provisioning successfully. We are continuing to monitor for any degradation of service.
We continue to work with Azure to investigate the issue. We are also investigating a potential workaround. Another update will be provided within the next hour.
Our upstream provider, Microsoft Azure, is currently experiencing an issue which is affecting our ability to provision worker virtual machines for a subset of customers in WestEU. We are working with Azure to resolve the issue. If you experience this problem, deployments utilising dynamic workers will fail with an error specifying that Octopus Deploy could not obtain a dynamic worker lease. We will provide an update in an hour, at approximately 20:00 UTC. We will resolve this incident once we have received confirmation the issue has been resolved from our upstream provider.
Report: "New signups to Octopus Cloud are unavailable"
Last updateSignups are available again. We are sorry for any inconvenience.
We have identified an issue preventing new signups from completing. Engineers are currently investigating.
Report: "AWS deployments failing when using service role for EC2 instance"
Last updateThis incident has been resolved.
We have implemented a fix in v2024.1.6809 which will be rolling out to cloud customers over the next two days. If you need the fix in your cloud instance sooner, please contact https://octopus.com/support. We will continue to monitor for any further issues.
We have found a likely cause of this issue and are working on a fix. In the meantime, please refer to the Github issue for more updates - https://github.com/OctopusDeploy/Issues/issues/8551
We're investigating AWS deployment failures for customers upgraded to 2024.1.6245 and are using the "AWS service role for an EC2 instance" option, which throw the error "An account identifier was not found". For more details of the issue and a possible workaround see the Github issue https://github.com/OctopusDeploy/Issues/issues/8551
Report: "API requests for projects associated with a lifecycle not responding"
Last updateThis issue was resolved in 2023.4.8185 and 2024.1.5399.
A fix for this issue has been released in version 2023.4.8185 which is now available for download, and 2024.1.5399 which will be rolling out to cloud customers early in the new year. If you need the fix in your cloud instance sooner, please contact https://octopus.com/support. We will continue to monitor for any further issues.
We are currently investigating an issue where API requests for the projects associated with a lifecycle may hang indefinitely, never responding. We believe we have identified the cause and are working on rolling out a version containing a fix. For more details see this GitHub issue: https://github.com/OctopusDeploy/Issues/issues/8533
Report: "Self-hosted customers experiencing high CPU usage after upgrading to 2023.4.8xxx"
Last updateThis issue was resolved in 2023.4.8166.
With the release of 2023.4.8166, we believe we have addressed the memory and CPU usage spikes associated with the variables endpoints. We continue to monitor to ensure these endpoints operate smoothly. For more details see this GitHub issue: https://github.com/OctopusDeploy/Issues/issues/8529
We have implemented a fix in v2023.4.8166. Should you encounter any issues after upgrading, please do not hesitate to reach out to our support team.
We are actively continuing our investigation into the reported issue. Thank you for your ongoing patience as we work towards a resolution.
We are currently investigating this issue that has been reported, as a high priority.
Report: "Dynamic Worker Leases failing in WestEU"
Last updateAzure has advised that this issue has been resolved. A preliminary root cause has been published here: https://azure.status.microsoft/en-us/status/history/
Dynamic Workers are now provisioning successfully. We are continuing to monitor for any degradation of service.
We continue to see issues with our upstream. We are investigating a workaround. Another update will be provided within the next hour.
Our upstream provider, Microsoft Azure, is currently experiencing an issue which is affecting our ability to provision worker virtual machines for a subset of customers in WestEU. See https://azure.status.microsoft/en-gb/status for more details. If you experience this problem, deployments utilising dynamic workers will fail with an error specifying that Octopus Deploy could not obtain a dynamic worker lease. We will provide an update in an hour, at approximately 0730 UTC. We will resolve this incident once we have received confirmation the issue has been resolved from our upstream provider.
Report: "Cloud instances may be missing task log lines or show incorrect status for deployment steps"
Last updateThis issue has been resolved and we have not seen any further errors.
A fix has been implemented on all Octopus Cloud instances and we are monitoring for any further errors.
We are currently rolling out a fix for this issue. We will provide an update once all Octopus Cloud instances have the fix applied.
We have identified the cause of the issue and are testing mitigations. We are working on this issue as a priority.
We have identified a workaround. The issue occurs when a deployment's task log is viewed while the deployment is running. If users navigate away from viewing a running deployment, all task log lines should be successfully saved. This will also avoid the issue where deployment steps may show the wrong status. We are working with our upstream provider to resolve this issue.
We have identified an issue where some Octopus Cloud instances since 20 November 2023 are missing some log lines from their task logs. Deployment steps that encounter this problem could also show the wrong status (e.g. showing as still running) even though the step has completed successfully. The deployment task will have the correct status. We are investigating this issue as a priority.
Report: "Cloud instances unable to communicate with listening and polling tentacles"
Last updateAll Octopus Cloud customers have been upgraded to fixed software versions, resolving this incident. Self-hosted customers were not affected.
We have identified customers that were most likely affected. and prioritized applying the fix to these customers on the 9th and 10th of December. Rollout continues to all remaining customers, and we anticipate this to be complete by the 12th of December.
A fix has been created and is being rolled out to affected customers. This issue affects instances of Tentacle that use SHA1 (an old, unsupported, encryption algorithm), and prevents them from connecting to Octopus Cloud. This was a side effect of the Octopus Server .NET 8 upgrade, and the resulting change to the underlying OS from Debian 11/OpenSSL 1.x to Debian 12/OpenSSL 3.x. If you continue to be affected by this issue, please try the workaround detailed at https://github.com/OctopusDeploy/Issues/issues/8523, or contact support at https://octopus.com/support.
The underlying cause has been identified. The issue is currently isolated to cloud customers. A fix is actively being work on.
We have identified a suspected root cause and workaround. We suspect this is limited to certificates using a SHA1 algorithm but are investigating further. If affected please try the workaround detailed in this issue: https://github.com/OctopusDeploy/Issues/issues/8523
Since the morning of Friday 8th December, cloud customers are reporting issues with communicating to listening and polling tentacles. We are investigating this as a priority incident and will work on a resolution as soon as possible.
Report: "Manual intervention steps on Configuration-as-code projects may throw errors"
Last updateThis incident has been resolved. A public incident report will be made available to affected customers. If you would like a copy, please get in touch with us at support@octopus.com.
We are now in the process of rolling out a version containing a fix to resolve the issue. This will be available to affected customers and we will be monitoring for any further issues.
We have identified the root cause of this issue and are working on a hotfix build.
We are investigating reports of Octopus Cloud users on 2024.1.746 having issues in the follow scenario: * Configuration-as-code projects with a Manual Intervention step, with the "Responsible team" set to a System-level team. More details including a possible workaround can be found in the following GitHub issue: https://github.com/OctopusDeploy/Issues/issues/8473
Report: "library.octopus.com is inaccessible"
Last updatelibrary.octopus.com has been restored and Community Step Template Update sync is working correctly again.
We are currently investigating this issue. Due to the nature of how Community Library Templates work, this outage will not block deployments. Customers will still be able to deploy and utilise steps already cached on their instance via the regular Community Library Template Sync process, if they have this feature enabled. Customers who have the feature enabled may see the Sync task failing, but their cached templates will not be affected.
Report: "Investigating bug reports related to step execution logic"
Last update# Incident Report - Deployments run more than once in High Availability \(HA\) clusters # Summary Octopus Server 2023.3 contained a bug causing self-hosted Octopus Server High Availability \(HA\) clusters to run some deployments more than once, often concurrently. This resulted in incorrect task statuses and confusing task logs. The bug was caused by a feature-flagged change to our internal TaskQueue. That change removed a database write lock that stops multiple Octopus Server nodes in a High Availability \(HA\) configuration picking up the same task. The write lock removal was accidentally left un-flagged. Multiple Octopus Server nodes could execute the same task concurrently without the write lock in place. This primarily presented as incorrect or out-of-order logging of tasks in a deployment. The issue could affect any self-hosted customers running HA mode on 2023.3 releases below 2023.3.13026.Once we received a report that the issue was impacting the correctness of task execution and not only incorrect display, we escalated immediately and resolved the issue as quickly as possible. We know how critical deployments are for our customers, and we take the trust they have in us to execute those deployments correctly very seriously. We apologise to our customers for not meeting our own standards of correct deployment execution. ## Timings Time to detection: 14 days \(from GA of 2023.3 to first report\) Time to incident declaration: * 3 days \(from initial report of log ordering issue\) * 27 minutes \(from first report of incorrect execution\) **Time to resolution: 27 hours 25 minutes** # What happened? From Monday 18 September 2023, we received customer reports that task statuses, outputs and ordering were displaying incorrectly. Our Support team worked with our customers to troubleshoot common reasons for incorrect task display, and escalated to our Engineering team when they couldn’t resolve the issue. On Thursday 21 September 2023, a customer reported that tasks were executing out of order. On Friday 22 September 2023, we identified a change to our task queue that caused the same task to execute on multiple Octopus Server nodes in HA mode had been released in 2023.3. We fixed the issue immediately and contacted affected customers. We received reports from four affected customers, and identified a total of 24 customers who were using the impacted versions. We have contacted all 24 customers. \* All times in AEST # Technical details of the problem Octopus Server can either be run as a managed instance in Octopus Cloud, or hosted by our customers on their platform of choice. Octopus Cloud gets changes continuously and for self-hosted customers, Octopus Server has major releases [four times a year](https://octopus.com/blog/long-term-support#introducing-the-octopus-server-lts-program), with each release rolling up all the changes from the last three months. Some complex or early access features will only target the next major version and not be backported to previous supported LTS versions. [Octopus Server High Availability](https://octopus.com/docs/administration/high-availability) \(HA\) mode is only used by self-hosted customers. In HA, multiple nodes of Octopus Server are run concurrently and distribute tasks between them. Octopus Server uses the task queue persisted in the shared database to manage task execution across nodes. Octopus Deploy has been working on a fix for an issue where deployments would “hang”, getting stuck in a `Cancelling` state and not progressing. Under the hood, deployments and other work are represented as a `ServerTask` , and they are added to a `TaskQueue`. The first iteration of a fix changed how the database handled conflicting updates to the `ServerTask` entity, and required flow-on changes to the `TaskQueue`. It was added to the 2023.3 release behind a feature flag which defaulted to off. One of the changes was removing a write lock that Octopus Server nodes used to indicate they were executing a specific `ServerTask` on the queue. The write lock removal should have been behind the feature flag, but was mistakenly shipped as a universal change. The Pull Request containing the problem was merged in June 2023 and has since been running in CI environments and on the Cloud platform. The issue didn’t show up in those environments because they don’t use HA mode, and only HA mode has multiple Octopus Server nodes contending to execute tasks. When 2023.3 was released in September the problem started appearing, and only for self-hosted customers. The fix was to put the write lock back in place on the `TaskQueue`. Replacing the lock was a small change that was quick to test and ship. The work to reduce hung deployments isn’t used in Production environments yet so there was no concern with interactions between the fix and the feature flag. # Remediation and next steps We have removed all affected releases from public availability. The fixed version of 2023.3 is available on our [downloads page](https://octopus.com/downloads). We have also reached out to all potentially affected self-hosted customers. Our next step will be running an incident review to understand where our processes allowed us to ship a critical bug. We have identified that we need to improve our automated testing of HA and our process around how we manage changes to those tests, and will be addressing these as a priority.
We have published a fix https://octopus.com/downloads/2023.3.13026 A public incident report will be shared with affected customers. If you would like a copy, please get in touch with us at support@octopus.com.
We've identified a very likely root cause. We made some changes to our task queues that should have been behind a feature flag, but a change to remove a write lock on the task queue table was accidentally left un-flagged. This means that multiple nodes could pick up the same task accidentally. This confirms that the incident will only affect self-hosted customers using High-Availability mode. We're working on a fix now and should have it available later on today. There are two potential workarounds, although we know that they are not good ones. Moving to a single node instead of HA will be safe as it removes task queue contention. You could also drain all of the nodes, then turn one of them on at a time. Allow each node to pick up some tasks, then drain it, and turn on the next. This approach would be extremely manual and we don't recommend it.
We continue to investigate isolated reports from a limited number of self-hosted customers on Octopus Server 2023.3 of this bug: https://github.com/OctopusDeploy/Issues/issues/8356. Out of an abundance of caution, we have temporarily removed the 2023.3 release from our downloads page while we continue to investigate.
We are currently investigating reports from a limited number of self-hosted customers on Octopus Server 2023.3 of this bug: https://github.com/OctopusDeploy/Issues/issues/8356. We will update that bug as our investigation progresses.
Report: "Octopus cloud AU region outage"
Last update# Octopus Cloud Australia East outage - report and learnings Between 10:48am UTC \(8:48pm AEST\) and 6:55pm UTC on Wednesday, August 30, 2023 \(4:55am AEST on Thursday, August 31, 2023\), Octopus Cloud customers in the Australia East region experienced an outage of their Cloud instance. Additionally, between 12:17pm UTC \(10:17pm AEST\) and 3:19pm UTC \(1:19am AEST \+ 1d\), the remaining customers in the Australia East region whose instances were up would have been unable to perform deployments that used Dynamic Workers. This disruption was caused by a cooling issue in one of Microsoft Azure’s Australia East datacenters. ## **Key timings** | **Event** | **Time period** | | --- | --- | | Time to detection | 31 mins | | Time to incident declaration | 40 mins | | Time to resolution | 8 hrs 7 mins | ## **Incident timeline** _\(All dates and times below are shown in UTC\)_ ### Wednesday, August 30, 2023 _**10:48 \(20:48 AEST\)**_ 50% of Cloud Instances in Australia East went down. _**11:19 \(21:19 AEST\)**_ A support engineer acknowledged an automated alert and began investigating. _**11:31 \(21:31 AEST\)**_ Our internal incident response process was initiated. _**11:55 \(21:55 AEST\)**_ Status Page updated: An incident was declared. _**12:17 \(22:17 AEST\)**_ Dynamic Workers in Australia East went down. _**15:06 \(01:06 AEST \+ 1d\)**_ Status Page updated: We are still monitoring. _**15:19 \(01:19 AEST \+ 1d\)**_ Dynamic Workers in Australia East came online. _**17:54 \(03:54 AEST \+ 1d\)**_ Service was restored to 97% of Cloud Instances in Australia East. _**18:11 \(04:11 AEST \+ 1d\)**_ On-call engineer commenced remediation efforts on the remaining instances that were not online. _**18:55 \(04:55 AEST \+ 1d\)**_ All Cloud Instances instances up. _**19:00 \(05:00 AEST \+ 1d\)**_ Status Page updated: All cloud instances are back online. _**21:01 \(07:01 AEST \+ 1d\)**_ Status Page updated: Incident resolved. ## **Technical details** As designed, our services automatically came back online as Microsoft's Azure services were restored. There were a handful of Cloud Instances that required manual intervention, this was expected as these instances were undergoing scheduled maintenance until they were interrupted by the outage. ### Microsoft Azure’s technical details > Starting at approximately 08:30 UTC on 30 August 2023, a utility power surge in the Australia East region tripped a subset of the cooling units offline in one datacenter, within one of the Availability Zones. While working to restore cooling, temperatures in the datacenter increased so we proactively powered down a small subset of selected compute and storage scale units, to avoid damage to hardware. Source: [https://azure.status.microsoft/en-us/status/history/](https://azure.status.microsoft/en-us/status/history/) \(Incident Tracking ID: VVTQ-J98\), retrieved on Thursday, August 31, 2023. ## **Remediation** Octopus takes service availability seriously. Despite the difficulty with upstream cloud provider outages, we fully review and remediate any outages that occur. We do this so that we're continuously improving and maintaining the best possible service we can. We are aiming to reduce the time between a Cloud Instance going down and a human being notified, and reducing the time to publish a Status Page notification to better inform our customers. ## **Conclusion** We deeply value the trust you place in our services, and we understand the importance of maintaining that trust. The recent service disruption was a significant event for us, and it has highlighted areas where we can enhance our processes. We are taking active steps to improve our notification and response mechanisms, ensuring that you are informed promptly and accurately. We appreciate your patience and are committed to delivering the consistent and reliable service you expect from us.
This incident has been resolved.
Our upstream provider has mitigated this incident. All cloud instances are back online. Our team will continue monitoring the situation for any issues from the upstream outage.
Our upstream provider has not yet provided an ETA for resolution for the AU region outage affecting a number of Octopus Cloud customers. We are still monitoring the situation and will continue to provide periodic updates.
We are aware of an outage affecting our Australian hosted Octopus Cloud customers. Unfortunately, this outage is with our provider in this region. We will continue to monitor the situation and update the status page as more information comes available.
Report: "Unable to view pages after upgrade"
Last updateUpgrading to 2023.3.10333 resolves the issue. Cloud instances will be upgraded over the coming days. If you wish to be upgraded sooner, please contact our support team.
We are currently investigating. If you are affected, clearing the browser cache and reloading the page should resolve the issue. More details can be found here: https://github.com/OctopusDeploy/Issues/issues/8277
We are currently investigating. If you are affected, clearing the browser cache should resolve the issue.
Report: "Octopus Cloud Connectivity Issue"
Last updateThis incident has been resolved.
We're investigating a network connectivity issue with Octopus Cloud. This may cause issues: - when accessing Octopus Cloud instances, including API requests - for polling tentacles over standard ports (443)
Report: "Error viewing Projects after upgrading to 2023.2.X"
Last updateIncident resolved, fix is available on GA on 2023.2.12331
A fix has been implemented and is being rolled out to customers.
We have identified an issue with viewing projects, with an error "Object reference not set to an instance of an object". An issue has been raised and workaround is available. More details can be found here: https://github.com/OctopusDeploy/Issues/issues/8200 Please use the workaround until a patch is released. Contact support@octopus.com with any questions and we'll be happy to help.
Report: "SQL Timeouts and Errors Impacting Cloud Customers in the West US Region"
Last updateThis incident has been resolved.
This issue looks to have stemmed from an upstream issue with Azure. All impacted instances have recovered, and we will continue to monitor this situation.
We are currently investigating some instances experiencing SQL timeouts. This looks constrained to the US West region.
Report: "AWS authentication issue for customers using the "Run an AWS CLI Script" step"
Last updateWe are marking this issue as resolved. A fix has rolled out to customers.
A fix is being deployed to our Cloud instances during their maintenance windows. If you require the fix immediately, please contact support@octopus.com to arrange this.
We have identified an issue with AWS credentials not being passed while using the AWS CLI Script step. An issue has been raised and workaround is available, more details can be found here - https://github.com/OctopusDeploy/Issues/issues/8177 Please use the workaround until a patch is released. Contact support@octopus.com with any questions and we'll be happy to help.
We are currently investigating an AWS authentication issue affecting Cloud customers using the "Run an AWS CLI Script" step
Report: "Incorrect Resource Usage values displayed in Control Center for Octopus Cloud instances"
Last updateThis incident has been resolved.
We have identified and are in the process of fixing an issue where the Resource Usage values displayed for Octopus Cloud instances in Control Center are incorrect.
Report: "Issue preventing login with Microsoft/AzureAD on Octopus.com and Octopus Cloud"
Last updateThe fix previously applied caused a mismatch to occur with AzureAD credentials for some users. Steps to resolve: * Login with username/password via the "forgot password" function * Go to the Profile page: https://octopus.com/profile * "Remove" the "Organization account", then re-add it. Please reach out to support@octopus.com for further assistance if needed.
Following the resolution of the previous AzureAD sign-in issue we have had reports of some users receiving the following error when attempting to sign-in: "Server Error - We're sorry, an unexpected error occurred whilst processing this request. Please try again later or contact support" Our engineers are investigating, and we will provide updates. Workaround: If you use this mechanism to log in, you can fall back to a username (your email) and a new password. You can follow the "forgot password" mechanism to set up a new password.
Report: "Issue identified with Octopus.com and Octopus Cloud Microsoft/AzureAD sign in."
Last updateWe have re-enabled Azure AD login and verified the system is operating as expected. If you used Azure AD to sign into Octopus ID, you may have been logged out as part of this resolution. You should be able to sign in again smoothly, but if you are having issues or have questions, please reach out to our support team.
As promised, we have re-enabled the AzureAD login and are close to marking this as resolved. Do not hesitate to contact us via support if you need help / have questions/hit issues. We'll monitor for a touch longer to ensure there aren't issues as we're all heading into the weekend. Thank you all for your patience!
We're ready to move this into "identified" as we had some extended investigation to complete. If all goes well, we can hopefully mark this resolved approx 24 hours from now. If not, we'll keep our wonderful customers updated. Once again, we thank you for your patience and for using the username and password workaround if impacted by our work on the AzureAD/Microsoft auth flow.
Thanks for your patience. We're moving forward with changes, but they are taking time as it involves 2+ internal teams coordinating changes. Please continue using the workaround, and rest assured we will restore the Microsoft/Azure login mechanism as soon as possible.
Octopus Deploy is aware of an issue with logging into Octopus.com and Octopus Cloud instances for customers who use Microsoft accounts and AzureAD sign-in. Our engineers are investigating and we will provide updates. Workaround: If you use this mechanism to log in, you can fall back to a username (your email) and a new password. You can follow the "forgot password" mechanism to set up a new password.
Report: "Intermittent errors in West Europe"
Last update# Dynamic Worker Outage in West Europe - report and learnings From 3:03am UTC our Octopus Cloud Infrastructure in West Europe was unable to provision new Dynamic Workers. Customers were impacted between 5:15am to 6:51am UTC Thursday, March 23, 2023. Twenty-three Octopus Cloud customers in West Europe were affected during this time period and could not lease Dynamic Workers to run deployments and runbooks. _We’re sorry, and we’re taking steps to minimize the occurrence and impacts of similar events in the future._ ## Key timings ## Background Octopus Cloud uses [Dynamic Workers](https://octopus.com/docs/infrastructure/workers/dynamic-worker-pools) to execute workloads. During this incident, Dynamic Workers were unavailable for 23 customers, who were therefore unable to execute any of their Deployments and Runbooks that relied on Dynamic Workers. ## Incident timeline _\(All dates and times below are shown in UTC\)_ ### Thursday, March 23, 2023 02:41 One of our upstream dependencies, Azure Resource Manager \(ARM\), started returning 503 responses \([Incident Tracking ID: RNQ2-NC8](https://azure.status.microsoft/en-us/status/history/)\) **03:03 The first Dynamic Worker provisioning failure occurred. At this time, our pre-provisioned pool of Dynamic Workers continued to operate and serve all customer workloads** 04:01 Internal monitoring alerted us about anomalous provisioning failures 04:13 We initiated our incident response process 04:14 We confirmed a sharp rise in 503 responses from ARM 04:17 We disabled automated internal infrastructure functions to limit the number of customers impacted by this issue **04:31 Alerted customers to the incident via** [**status.octopus.com**](http://status.octopus.com) 04:38 We created a ticket with Azure \(Sev A\) **05:15 Our pooled resources were exhausted, leading to the first customer impact** 05:39 As a potential mitigation, we decided to start provisioning additional infrastructure in an alternate region within Europe 06:04 Azure confirmed the outage 06:51 We observed that Dynamic Workers were beginning to recover **06:59 Alerted customers that the incident was mitigated via** [**status.octopus.com**](http://status.octopus.com) 07:10 Azure incident resolved 07:10 We confirmed alternate infrastructure was available for failover if the issue recurred ## Technical details Dynamic Workers makes heavy use of ARM to provision Workers for customer workloads. An outage with ARM meant that we could not provision new Workers in the West Europe region. We maintain a pre-provisioned pool of Workers, but they were depleted after around two and a half hours. ## Remediation and next steps We have identified improvements to our alerting to reduce the time it takes for us to detect similar incidents. We’re prioritizing these improvements using our Risk Treatment Policy. Currently, we rely heavily on single-region availability in Azure. We are evaluating our options to diversify the regions that we use, to mitigate regional availability issues.
Azure has advised that this issue has been resolved. A preliminary root cause has been published here: https://azure.status.microsoft/en-us/status/history/ 03/23/2023 Azure Resource Manager - Azure Resource Manager Operations Failures - Mitigated Tracking ID: RNQ2-NC8
Dynamic workers are now provisioning successfully. We are continuing to monitor for any degradation of service.
Azure are aware of this issue and are actively investigating. See the Azure status page for ongoing updates: https://azure.status.microsoft/en-us/status
We are experiencing issues provisioning dynamic workers in West Europe. This may affect deployments or runbooks relying on dynamic workers. We are working with Azure to have this operational as soon as possible. If you have urgent tasks relying on dynamic workers please contact support@octopus.com.
We are investigating an issue with our cloud vendor that may affect customers in the the West Europe region
Report: "Octopus.com sign-in, docs, blogs and downloads unavailable"
Last updateBetween 8:00am and 10:30am UTC, February 9, 2023, sections of [octopus.com](http://octopus.com) intermittently returned 503 responses. The affected routes were /signin, /blogs, and /docs. ## Background Octopus Deploy recently migrated our DNS management to a new provider to centralize our infrastructure. During the migration, we set the web application firewall \(WAF\) in front of [octopus.com](http://octopus.com) to detection mode. At the same time, we tuned the ruleset to prevent false positives from blocking legitimate customer access to Octopus systems. ## Key timings ## Timeline _\(All dates and times below are shown in UTC.\)_ ### Thursday, February 9, 2023 _08:05_ Our automated systems detected decreased availability in sections of the [octopus.com](http://octopus.com) website. _08:35_ Engineers on call were notified. _08:56_ Status Page updated: An incident was declared. _10:30_ We updated the WAF to block malicious traffic. _10:48_ Status Page updated: Incident status changed to \`Monitoring\`. _12:24_ Status Page updated: Incident status changed to \`Resolved\`. ## What happened? An attacker ran a fuzzing application across our public-facing website during the time the WAF was in “detection” mode. This caused excessive load that would normally have been prevented by the WAF, in turn reducing availability of [octopus.com](http://octopus.com). Engineers mitigated the outage by applying a cut-down implementation of the WAF that protected the website from single origin attacks. ## Remediation and next steps Since this incident, we've completed the migration to our new DNS provider, and the WAF is fully enabled. During our incident review process, we identified and corrected gaps in our defense to reduce the time from detection to mitigation. We identified the internal oversight in risk management that led to this situation: by mitigating one risk, we became susceptible to another risk. We have since updated our project risk assessment process to include more formal internal reviews of our planned changes to core systems. ## Conclusion Octopus Deploy takes service availability seriously. In the past month, we’ve had multiple incidents affecting sign-in infrastructure, which is below our desired standard. We apologize for the disruption to our customers and are working to reduce the likelihood and severity of future disruptions.
This incident has been resolved.
We have applied a mitigation that will improve the availability of the affected URL's and are monitoring it's effects.
We are aware of issues affecting parts of Octopus.com including /signin, /docs, /blog, and /downloads. Engineers are investigating.
Report: "Authentication errors when attempting to sign into https://billing.octopus.com"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigation issues signing into https://billing.octopus.com and requests returning HTTP 500's
Report: "Octopus.com pages (signin/docs/downloads etc.) are returning 404 errors for some users"
Last update**Postmortem** - [Read details](https://status.octopus.com/incidents/1zrm6sb0ngc4).
This incident has been resolved.
Our upstream provider has applied a fix that should resolve this issue. We are continuing to monitor this on our side. If you are continuing to see 404 errors navigating to parts of Octopus.com (sign-in, documentation, blog etc.), please contact support@octopus.com
Mitigations didn't hold. We're back to investigating new options. Signin / docs and blog are still impacted.
**Advice**: go to www.octopus.com (your browser cache will be in your way), and try incognito/private browsing. We are 302 redirecting from octopus.com to www.octopus.com and are seeing signs of improved availability for impacted systems /signin /docs /blog Northcentral US may still experience a partial outage depending on how your traffic is routed. Most regions should be healthy or just experience some degraded performance. The only impact to Octopus Cloud is the log in capabilities.
We have identified a cause of this incident and are working to remediate.
We have escalated with our Cloud provider and are working with them to resolve. Further updates will be posted when they are available.
We are continuing to investigate and have escalated internally to specialist engineers for additional support. Further updates will be posted when they are available.
We currently believe that this is only affecting certain geographical locations and their related CDN endpoints. We are continuing to investigate and have engaged our CDN provider for further assistance/investigation. Further updates will be posted when they are available.
We are continuing to investigate this issue
We are currently investigating this issue.
Report: "Octopus Cloud and Octopus.com log in issues"
Last updateWe have published an Incident Report to our blog - read it [here](https://www.octopus.com/blog/cloud-connectivity-disruption-report-learnings).
Resolved, no longer seeing impacts to customers.
Azure has resolved their major network outage.
Azure status is reporting widespread networking impact and issues. We are impacted by Azure's availability now.
We are continuing to investigate this issue.
We have detected DNS and Azure issues, which are impacting our ability to monitor systems, and impacting customers using Octopus systems.
Report: "Octopus Control Center (planned outage)"
Last updateThis has been resolved
To facilitate DNS changes Octopus Control center will be partially/completely unavailable for up to 30 minutes. We apologise for any inconvenience this may cause.
Report: "Some Cloud Instances in West Europe are showing the "Undergoing Maintenance" page"
Last updateThe affected Cloud Instances are now operational.
Some Cloud Instances in the West Europe region are showing the "Octopus Server is Undergoing Maintenance" page. We are currently investigating this issue.
Report: "Brief interruption in the West US 2 region"
Last updateSome instances in the West US 2 region experienced a brief database connectivity issue between 05:37-5:43am UTC. Individual Octopus Cloud instances experienced issues for ~60 seconds. We are investigating the root cause.
Report: "Intermittent Dynamic Worker leasing failures"
Last updateThis incident has been resolved.
A fix has been implemented for a vendor issue and we are monitoring the results.
We are currently investigating an issue affecting provisioning of new Dynamic Workers. Leasing of new Dynamic Workers may intermittently fail in all regions.
Report: "billing.octopus.com control center access control section is unavailable"
Last updateControl center functionality is back to normal
Engineers are investigating this partial outage. All customers with monthly or annual cloud subscriptions are impacted. The impact is isolated to the ability to manage your access control list. Customers cannot add or remove access grants until this is resolved. Access to Octopus Deploy cloud instances is not impacted.