Historical record of incidents for CloudAMQP
Report: "Backend slow/unresponsible"
Last updateWe are currently experiencing issues with our backend services handling account and server creation. This does not affect any running customer servers but might delay provisioning new servers. This is due to ongoing Heroku service disruption (affecting Heroku's dyno connectivity, Dashboard, and API). Read more about the incident here: https://status.salesforce.com/generalmessages/10001540?locale=en-US
Report: "Delayed delivery of logs to Datadog"
Last updateLogs shipped to Datadog are currently delayed.
Report: "Connectivity issues with Azure westeurope"
Last updateServers in west Europe can experience connection issues due to Virtual Machines issues at the provider. We will monitor and update when we know more.
Report: "RabbitMQ log integration maintenance (logstream shovels)"
Last updateScheduled maintenance is currently in progress. We will provide updates as necessary.
We will migrate our RabbitMQ log system to a new log cluster, this should not affect the log collection or log integrations. However, you might see log messages related to queues and shovels named "logstream" or "logstream2" as we migrate to the new cluster with a blue/green deployment strategy.
Report: "LavinMQ shared server horse.lmq.cloudamqp.com is not running"
Last updateWe identified the cause of the crash and found a solution to prevent further crashes. The cluster is now online.
The shared server horse.lmq.cloudamqp.com is not able to start. We are investigating why it is unable to do so.
Report: "Issues with Azure region North Europe"
Last updateThis incident has been resolved.
Servers in North Europe can experience connection issues due to Virtual Machines issues at the provider. We will monitor and update when we know more.
Report: "Issues with Webhook integration"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Issues with Azure region Norway East"
Last updateThis incident has been resolved.
We are seeing instances coming back online.
Update from the provider: > We are aware of this issue and are actively investigating. An update will be provided as events warrant.
Servers in Norway East can experience connection issues due to underlying issues at provider. We monitor and will update when we know more.
Report: "Issues with AWS region eu-north-1"
Last updateThis incident has been resolved.
AWS mitigation efforts are working as expected. Affected CloudAMQP clusters experienced some network partitions that are resolved. We are monitoring the results as well as updates from AWS.
Servers in eu-north-1 can experience connection issues due to underlying issues at provider. We monitor and will update when we know more.
Report: "Connection issues for compute-engine::asia-southeast regions"
Last updateWe can now see connection from our applications to these regions
We have noticed our internal apps having difficulties connecting to the below regions: compute-engine::asia-southeast1 compute-engine::asia-southeast2 This looks to impact metrics and alarms
Report: "Shared server 'Wildboar' under heavy load"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Shared server 'seal' under high load"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Azure Southcentral US region connectivity issues"
Last updateAll clusters in Azure south central are operational.
Most of our managed clusters are back online now.
We are observing connectivity issues for several clusters in Azure south central US currently. We are investigating.
Report: "Shared server toucan rebooted"
Last updateThis incident has been resolved.
Shared server Toucan ran into an unexpected issue and had to be rebooted. Connections to server during this time would have dropped.
Report: "Backend slow/unresponsible"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently experiencing issues with our backend services handling account and server creation. This does not affect any running customer servers but might delay provisioning new servers.
Report: "CloudWatch and Datadog metrics missing for some clusters"
Last updateBetween Sep 12 07:51:57 and Sep 12 16:01:05 some RabbitMQ clusters were not reporting metrics data to AWS CloudWatch and Datadog. This issue was because the "cluster_utilization"-value was introduced and not calculated the same way in all RabbitMQ versions, causing a small number of clusters to have their metrics ingestion discarded because of faulty values.
Report: "Email delays"
Last updateThis incident has been resolved
Emails seem to be operational again.
Our email provider has reported degraded performance and slowness in email processing. There may be delays in responses.
Report: "Shared server 'whale' under high load"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue have been identified and the underlying hardware is updated (instance reboot)
We are currently investigating this issue.
Report: "Email delays"
Last updateThis incident has been resolved by third party.
Our email provider has advised emails are delayed and are not being received on a timely manner.
Report: "Shared server 'goose' under high load"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We need to upgrade underlying hardware, the cluster will be rebooted!
Report: "Azure Central US connectivity issues"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
We have received the following update from Azure: Impact Statement: Starting at 21:56 UTC on 18 Jul 2024, you have been identified as a customer using Virtual Machines in Central US who may experience connection failures when trying to access some Virtual Machines hosted in the region. These Virtual Machines may have also restarted unexpectedly. We are monitoring further
We are observing connectivity issues for several clusters in Azure Central US currently. We are investigating.
Report: "New Relic metrics integration"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Partial outage on control plane"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We're looking into an issue where SSO request to the management interface is not working. All servers are still running
Report: "Shared server woodpecker unavailable"
Last updateserver is back up and load is better.
we're upgrading RabbitMQ and Erlang to fix the root cause of the service degradation.
The shared server woodpecker is currently unavailable, we are investigating.
Report: "Issues creating new plans and plan changes"
Last updateDeployment introduced a bug in RabbitMQ config which affected new plans and plan changes. The issue was solved by 11:10 PM and only one cluster seems to have been affected.
Report: "Port 15672 inaccessible for VPC"
Last updateThis incident has been resolved.
We are monitoring. Please notify our Support Team if you are encountering further issues.
The issue has been identified and a fix is being implemented.
Customers have reported that port 15672 is inaccessible, particularly related to VPC. Please note, access is still available through HTTPS 443.
Report: "Shared server 'sparrow' under high load"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Upgrading underlying hardware, instance will be rebooted!
We are currently investigating this issue.
Report: "Shared server 'gull' under high load"
Last updateThis incident has been resolved.
The issue have been identified and a reboot of the VM is needed
We are currently investigating this issue.
Report: "Issues creating new plans and plan changes"
Last updateThis incident has been resolved.
We are seeing intermittent issues in creating new VMs and updating plans takes long time. We are currently investigating the issue. Please contact support if you need help on support@cloudamqp.com
Report: "Datadog metrics integration unavailable"
Last updateThe incident has been resolved by Datadog
Datadog experiences issues with their submission API:s, https://status.datadoghq.com. We're monitoring the situation.
Report: "Create cluster and update plans on Azure unavailable"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are seeing intermittent issues in creating and updating plans in Azure. We are currently investigating the issue. Please contact support if you need help on support@cloudamqp.com
Report: "Metrics delivery outage"
Last updateThis incident has been resolved. The root cause was an API key rotation that failed, we have updated our internal routines to ensure this does not happen again.
A fix has been implemented and we are monitoring the results.
We're currently experiencing issues with metrics delivery, this affects all integrations
Report: "Backend Maintenance"
Last updateThe maintenance is finished.
We are currently doing maintenance on our backend services handling account and server creation. This does not affect any running customer servers but might delay provisioning new servers.
Report: "Instances not visible in CloudAMQP console"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Shared server 'fly' under high load"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Report: "Azure West Europe connectivity issues"
Last updateThis incident has been resolved.
All Azure VMs are now online, we're continue to monitor the situation.
We're seeing some VMs coming back up online, still a few offline.
Azure is experiencing issues in region West Europe, we are monitoring the situation.
Report: "Some RabbitMQ metrics missing in the last 3h"
Last updateSome RabbitMQ specific metrics (such as message rates) were not forwarded in the last 3h. CPU, memory, disk, network and other metrics were still being sent.
Report: "Some RabbitMQ metrics missing in the last 12h"
Last updateThis incident has been resolved.
Some RabbitMQ specific metrics (such as message rates) were not forwarded in the last 12h. CPU, memory, disk, network and other metrics were still being sent.
Report: "Missing metrics for ~30min"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Our systems that fetches metrics from each cluster failed to do so for about 30 minutes. We have rolled back to a working release and are working on resolving the issue. All metrics are being delivered as normal while we investigate
Report: "Azure Australia East connectivity issues"
Last updateThis incident has been resolved.
Azure is experiencing issues in region Australia East, we are monitoring the situation.
Report: "Log Integration"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Log integrations are impacted while resolving an issue with our log processor. the logs can still be found in YOUR_SERVER_NAME.rmq.cloudamqp.com/logs.
Report: "Digital Ocean Connection Issue"
Last updateThis incident has been resolved.
Digital ocean across multiple data centers has experienced network issues affecting some clusters. This can cause connectivity issues with servers. AMS3, FRA1, NYC3, and LON1.
Report: "High load on shared server "hawk""
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "High load on shared server "rattlesnake""
Last updateThis incident has been resolved.
The issue have been identified and a reboot of the server is initiated
We are currently investigating this issue.
Report: "Degraded monitoring of some GCE regions"
Last updateThis incident has been resolved.
We're currently having connections issues between our systems and some GCE regions. This affects our monitoring of servers in the affected regions. Currently affected regions: * australia-southeast1 * asia-east1 * asia-east2
Report: "Issue with shared server "stingray""
Last updateThis incident has been resolved.
We're performing an emergency update on one shared server.
Report: "Metrics unavailable"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating an issue where metrics are unavailable for customers. Integrations and alarms are working as usual.
Report: "Increased latency in Azure West Europe"
Last updateThis incident has been resolved.
Some of our customers are experiencing increased latency in Azure West Europe. For more info see https://azure.status.microsoft/en-us/status
Report: "Missing data points in metric graphs"
Last updateDue to an issue in our time series database, no data points were written between 03:12 - 05:57 (UTC). A fix has been deployed to ensure this issue does not re-appear.
Report: "Issue with shared server "seal""
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We're performing an emergency update on one shared server.
Report: "Delay in support emails"
Last updateThis incident has been resolved.
The situation seems to be better now
We seem to have a delay in outgoing emails. We will be available in the chat in the meanwhile.
Report: "Missing graphs metrics"
Last updateDue to a temporary failure in our time series database there's a 1 hour gap between 9:30 and 10:30 AM (UTC) on 2023-04-21. This did not affect metrics integrations or alarms.
Report: "Problems delivering emails"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue that's causing delays in our outgoing emails.
Report: "Metrics collection interupted"
Last updateThis incident has been resolved.
We are having issues communicating with Heroku's API.