Cloud 66

Is Cloud 66 Down Right Now? Check if there is a current outage ongoing.

Cloud 66 is currently Operational

Last checked from Cloud 66's official status page

Historical record of incidents for Cloud 66

Report: "Google login issues"

Last update
investigating

Google authentication issues is having issues causing login and access issues for our systems and customers. We are investigating the impact and mitigation strategies.

Report: "Kubernetes Cluster Scale/Creation"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are aware of problems when scaling/creating Kubernetes clusters. This appears to be caused by new rate limits imposed by docker hub. The team are working on a solution.

Report: "Stuck Deployments"

Last update
resolved

This incident has now been resolved.

monitoring

We have identified the issue causing the deployments to be stuck and fixed the issue. This was caused by a database failure at our cloud provider which we are investigating and monitoring.

investigating

We are currently investigating a number of stuck deployments.

Report: "CustomConfig git access"

Last update
resolved

The CustomConfig git backend migration has been completed

investigating

Applications are able to be deployed, but the CustomConfig backend is not available for the moment.

investigating

We are aware of issues with our CustomConfig git repositories. This issue impacts direct git access to the CustomConfig git repository. Direct access to CustomConfig pages via the web dashboard and API is not affected. We are investigating the root cause of the problem.

Report: "Production Issues"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are investigating issues affecting some production systems.

Report: "Github API Outage"

Last update
resolved

This incident has been resolved.

monitoring

Github is experiencing some API outages, deployments don't currently appear to be affected. We are monitoring (https://www.githubstatus.com/)

Report: "Multiple Google Cloud services in the europe-west9 region are impacted"

Last update
resolved

Maestro installations are working as expected again

monitoring

We are waiting on Google resolutions

investigating

Due to a major outage in Google Cloud (https://status.cloud.google.com/regional/europe) - Kubernetes installations may fail on installations on servers located in the same region, regardless of the cloud, due to Kubernetes images themselves being hosted on Google Cloud

investigating

Water intrusion in a data center in europe-west9 has caused a multi-cluster failure and has led to a shutdown of multiple zones. We expect general unavailability of the europe-west9 region. There is no current ETA for recovery of operations in the europe-west9 region at this time, but it is expected to be an extended outage. Customers are advised to failover to other regions if they are impacted.

Report: "Github Not Listing Repositories"

Last update
resolved

Issue confirmed to be fixed by GitHub.

identified

The issue has been identified and a fix is being implemented and this should resolve the issue for most users. If the error is still persisting, please use the workaround provided: 1. Remove the Github installation. 2. Manually configure SSH key access to Github, please see the link below. https://help.cloud66.com/rails/how-to-guides/common-tools/access-your-code#manually-configuring-github-access

investigating

Currently, GitHub is having an issue with listing repositories while using the Cloud 66 Github app. As a workaround while this issue is ongoing. 1. Remove the Github installation. 2. Manually configure SSH key access to Github, please see the link below. https://help.cloud66.com/rails/how-to-guides/common-tools/access-your-code#manually-configuring-github-access

Report: "Subset of Buildgrid Image pushes slower than normal"

Last update
resolved

This incident has been resolved.

monitoring

The team has completed the first part of the mitigation strategy for intermittent slow Buildgrid pushes. We are now monitoring ongoing performance.

identified

Our engineering team is continuing to look into mitigations for this issue.

identified

The team is aware that some Buildgrid pushes are taking longer than normal, and are working on a mitigation strategy

Report: "Production Agent Reporting Outage"

Last update
postmortem

At approximately 07:00 UTC we started to perform a minor system quality-of-life backend update. The update had unintended consequences on an older system component which hadn’t been changed in a while. The component was to do with agent server communications, and it ended up disabled after the update. Although the same component was unaffected during testing in dev/staging and passed UAT, it turned out that there was a necessarily different configuration around rewrites in production that caused the issue. The nature of the update meant that rolling back wasn’t straightforward, so the team resolved to fix the incompatibility in-place, and as such were unable to stop the erroneous “server down” notifications that then went out. The issue was resolved approximate 1 hour later. at 08:00 UTC - after which time servers would have started to appear as “online” again. Subsequent to this outage, the difference in configuration has been added to our operations processes, such that this will not occur again. Our apologies for the inconvenience and concern that this may have created!

resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

Report: "Github outage"

Last update
resolved

This incident has been resolved.

investigating

GitHub is having an outage at this time. This may adversely affect Deployments.

Report: "Azure reporting DNS problems (affecting Ubuntu 18.04)"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor this incident

identified

Azure is recommending that affected customers reboot their servers to obtain an updated DHCP lease to resolve this issue

identified

From Azure (https://status.azure.com/en-gb/status) Starting at approximately 06:00 UTC on 30 Aug 2022, a number of customers running Ubuntu 18.04 (bionic) VMs recently upgraded to systemd version 237-3ubuntu10.54 reported experiencing DNS errors when trying to access their resources. Reports of this issue are confined to this single Ubuntu version.