Grafana Cloud

Is Grafana Cloud Down Right Now? Check if there is a current outage ongoing.

Grafana Cloud is currently Operational

Last checked from Grafana Cloud's official status page

Historical record of incidents for Grafana Cloud

Report: "GCP - Major Incident Affecting Multiple Grafana Cloud Components"

Last update
investigating

As of 17:55 UTC, we were alerted a significant outage involving one of our cloud providers. Users experiencing this issue may encounter inability to query or ingest data for metrics, logs, traces, and profiles. Users experiencing this issue may also not be able to create new Hosted Grafana instances. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Report: "Loki - Slow Queries"

Last update
investigating

As of 20:23 UTC, we became aware of an issue with Loki query speeds. Users experiencing this issue may encounter slow query speeds for logs. Write performance should not be affected. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Report: "Instances on the "Slow" Release Channel Receiving Unexpected Errors."

Last update
resolved

At approximately 12:00 UTC a feature toggle was rolled out which negatively impacted instances on the slow release channel. Users on this release channel began to receive an "AlertStatesDataLayer" error. A workaround was quickly identified and applied to reporting users. The feature toggle in question was fully reverted by 18:00 UTC.

Report: "Private Datasource Connect - New Agents Failing To Get SSH Certificates Signed"

Last update
resolved

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

monitoring

As of 13:34 UTC, our engineering team became aware of an issue with PDC Agents failing to get new SSH certificates signed. Existing PDC Agents will continue working until the connection is terminated. Engineering has released a fix and as of 14:58, customers should no longer experience issues with new PDC Agent SSH certificates. We will continue to monitor for recurrence and provide updates accordingly.

Report: "Scheduled maintenance caused a temporary issue with logging in into Grafana Cloud stacks."

Last update
resolved

Due to scheduled maintenance (https://status.grafana.com/incidents/rz7nt6cs4prb) we hit an issue with some users being unable to log in into their Grafana Cloud stacks. The issue was affecting only users who: - had no session already opened in the Grafana Cloud stack; - or they were located close to Europe (geographically), but their stack is closer to US (or vice versa). The issue was caused by an incorrect configuration introduced by the maintenance, which was fixed shortly after being discovered. Login is fully operational and stable now.

Report: "Private Datasource Connect - New Agents Failing To Get SSH Certificates Signed"

Last update
Resolved

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

Monitoring

As of 13:34 UTC, our engineering team became aware of an issue with PDC Agents failing to get new SSH certificates signed. Existing PDC Agents will continue working until the connection is terminated.Engineering has released a fix and as of 14:58, customers should no longer experience issues with new PDC Agent SSH certificates. We will continue to monitor for recurrence and provide updates accordingly.

Report: "grafana.com Portal Scheduled Maintenance."

Last update
Completed

The scheduled maintenance has been completed.

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Scheduled

We will perform an internal scheduled migration of the grafana.com portal which will result in user sessions being expired. The migration will be regional and first, the European region will be migrated and then the US region will follow up. During the maintenance, users will be logged out twice (if actively browsing the portal) and asked to authenticate again.

Report: "Scheduled maintenance caused a temporary issue with logging in into Grafana Cloud stacks."

Last update
Resolved

Due to scheduled maintenance (https://status.grafana.com/incidents/rz7nt6cs4prb) we hit an issue with some users being unable to log in into their Grafana Cloud stacks. The issue was affecting only users who:- had no session already opened in the Grafana Cloud stack;- or they were located close to Europe (geographically), but their stack is closer to US (or vice versa).The issue was caused by an incorrect configuration introduced by the maintenance, which was fixed shortly after being discovered. Login is fully operational and stable now.

Report: "Elevated latency in prod-us-east-0 cluster."

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented, and we are monitoring the results.

investigating

We're observing an elevated latency in prod-us-east-0 cluster for Hosted Metrics service. We're currently investigating the issue.

Report: "Push errors and elevated latency in prod-us-central-0 cluster."

Last update
resolved

This incident has been resolved.

investigating

A fix has been implemented, and we are monitoring the results.

investigating

We're observing push errors and elevated latency in prod-us-central-0 cluster affecting the write path performance. We're investigating the issue currently and will post updates accordingly.

Report: "Push errors and elevated latency in prod-us-central-0 cluster."

Last update
Resolved

This incident has been resolved.

Update

A fix has been implemented, and we are monitoring the results.

Investigating

We're observing push errors and elevated latency in prod-us-central-0 cluster affecting the write path performance. We're investigating the issue currently and will post updates accordingly.

Report: "Elevated latency in prod-us-east-0 cluster."

Last update
Resolved

This incident has been resolved.

Monitoring

A fix has been implemented, and we are monitoring the results.

Investigating

We're observing an elevated latency in prod-us-east-0 cluster for Hosted Metrics service. We're currently investigating the issue.

Report: "Some Pages in the User Portal Showing Errors."

Last update
resolved

This incident has been resolved.

investigating

Our team is aware of an issue that is causing some pages in the User Portal to error out. We are currently working on deploying a fix.

Report: "Some Pages in the User Portal Showing Errors."

Last update
Resolved

This incident has been resolved.

Investigating

Our team is aware of an issue that is causing some pages in the User Portal to error out. We are currently working on deploying a fix.

Report: "Issues with Hosted Grafana in some Regions."

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented, and we are monitoring the results.

investigating

Our Engineering Teams are continuing to investigate this and are actively working towards a resolution for this issue.

investigating

We are currently investigating an issue occurring in the prod-eu-west-2 and prod-us-central-0 regions. Some users may be experiencing availability issues in these regions. Our team is actively working towards a resolution for this issue.

Report: "Issues with Hosted Grafana in some Regions."

Last update
Resolved

This incident has been resolved.

Monitoring

A fix has been implemented, and we are monitoring the results.

Update

Our Engineering Teams are continuing to investigate this and are actively working towards a resolution for this issue.

Investigating

We are currently investigating an issue occurring in the prod-eu-west-2 and prod-us-central-0 regions. Some users may be experiencing availability issues in these regions. Our team is actively working towards a resolution for this issue.

Report: "Grafana Cloud: Migration Assistant - degraded performance"

Last update
resolved

We have implemented a fix and verified it with a load test.

monitoring

A fix is scheduled for release in an upcoming Grafana update, expected in late May or early June.

investigating

We are continuing to investigate this issue.

investigating

The team is aware of issues with migrating large numbers of resources and is working on a fix

Report: "Grafana Cloud: Migration Assistant - degraded performance"

Last update
Resolved

We have implemented a fix and verified it with a load test.

Monitoring

A fix is scheduled for release in an upcoming Grafana update, expected in late May or early June.

Update

We are continuing to investigate this issue.

Investigating

The team is aware of issues with migrating large numbers of resources and is working on a fix

Report: "Grafana Cloud k6: service outage"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We have a database issue. We are investigating the cause of it.

Report: "Grafana Cloud k6: service outage"

Last update
Resolved

This incident has been resolved.

Update

We are continuing to investigate this issue.

Update

We are continuing to investigate this issue.

Investigating

We have a database issue. We are investigating the cause of it.

Report: "SLO Performance Degradation"

Last update
resolved

Engineering has released a fix and as of 18:45 UTC, customers should no longer experience issues with creating, editing, updating, or deleting SLOs. We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

investigating

As of 17:55 UTC, we became aware of a performance issue to our Cloud App Platform. Users experiencing this issue may be unable to create, edit, update, or delete SLOs. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Report: "Load test metrics not being ingested during execution"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Traceql metric queries not returning recently ingested data in us-east-2 region."

Last update
resolved

Starting from May 9th 2025 12:00 UTC, we were facing Hosted Traces partial read path outage in us-east-2 region. Traceql metric queries were not returning a recently ingested data. After an investigation, the fix was deployed and the incident was resolved at May 14th at 10:00 UTC. There might be data gaps which will be eventually filled, we confirmed no data loss happened.

Report: "Some Hosted Grafana instances in prod-ap-south-1 experiencing degraded performance"

Last update
resolved

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.

monitoring

Engineering has released a fix and as of 1:40am UTC, customers should no longer experience degraded performance in prod-ap-south-1. We will continue to monitor for recurrence and provide updates accordingly.

investigating

Beginning approximately 22:30 UTC, we identified some Hosted Grafana instances in prod-ap-south-1 experiencing degraded performance, presenting as either unavailable or slow. We are currently investigating this issue

Report: "Private Datasource Connect (PDC) Agent - Partial Outage"

Last update
resolved

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

monitoring

As of 20:32 UTC, customers should no longer experience any issues with the Private Datasource Connection (PDC) Agent. We will continue to monitor for recurrence and provide updates accordingly.

identified

As of 18:52 UTC, we were alerted to an issue with the Private Datasource Connect (PDC) Agent. Users experiencing this issue may have encountered a "Connection Status Not Available" error for the duration for any PDC Agent configured in their environment. Additionally, users may have experienced 504 Gateway Timeout Errors and/or dashboards that query data from PDC agents failing to load data.

Report: "Synthetic monitoring: browser synthetics elevated failure rates in Oregon public probe"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been deployed and browser checks are running normally as of 21:25 UTC, we are continuing to monitor

investigating

We are investigating an issue with the Oregon probe where browser synthetics have delayed execution and higher failure rates. Other synthetics types (protocol-level and k6 scripted) are currently unaffected.

Report: "Instances Inaccessible in prod-us-central-0"

Last update
resolved

Engineering has applied a fix and as of 5:30 AM UTC, customers should no longer experience an issue with accessing their instance. At this time, we are considering this issue resolved.

investigating

We were alerted to an issue with accessing instances hosted in the prod-us-central-0 cluster. This issue started occurring from ~1 UTC. Users experiencing this issue may encounter a "connection error" page when attempting to login to their instance. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Report: "Slowdown on Loki Write Path in prod-eu-north-0 Region"

Last update
resolved

The incident has been resolved.

identified

Loki service experienced brief slowing down in the write path. There is no impact to customer data. We are actively working on mitigation.

Report: "Brief Slowdown on Loki Write Path in prod-eu-north-0 Region"

Last update
resolved

The incident has been resolved.

monitoring

Loki experienced brief slowing down in the write path starting 6:10 UTC. There was no impact to customer data. The situation has been mitigated and we're monitoring the service.

Report: "Some Instances in prod-us-east-0 Experienced Issues Accessing Their Instance."

Last update
resolved

At approximately 14:20 UTC we observed an issue that caused some deployments in prod-us-east-0 to experience long load times, or issues accessing their instance. The issue was resolved shortly before 15:00 UTC.

Report: "API offline in GCP Singapore (prod-ap-southeast-0) and GCP Brazil (prod-sa-east-0)"

Last update
resolved

API came online again at 19:14 UTC in GCP Singapore and 19:21 UTC in GCP Brazil. While the API was unavailable, probes were unaffected. Synthetic monitoring checks in all probes continued to run and publish data during this incident. The incident was caused by earlier database maintenance in these regions.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

The synthetics API is currently offline in both GCP Singapore (prod-ap-southeast-0) and GCP Brazil (prod-sa-east-0). We are investigating the issue.

Report: "Cloud Migration Assistance Disabled in Some Regions"

Last update
resolved

This incident has been resolved.

investigating

The Cloud Migration Assistant feature is currently disabled in the following regions due to an issue discovered by our engineers. AWS UAE AWS Indonesia GCP Saudi Arabia We are working on a fix for this issue in the coming days. Updates will be provided here as they become available to our teams.

Report: "Logs Drilldown Broken for Some Users"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented, and we are monitoring the results.

investigating

We are currently investigating an issue that is causing an "App Not Found" error for some users when attempting to access logs drilldown.

Report: "Email notifications issue"

Last update
resolved

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.

monitoring

The issue has been mitigated, and a root cause analysis is currently underway.

identified

The issue has been mitigated, and a root cause analysis is currently underway.

investigating

Some users may not receive alert email notifications or other email service messages. We are working to resolve the issue as soon as possible.

Report: "Graphite proxy ingestion failing - prod-ap-northeast-0"

Last update
resolved

This incident has been resolved

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an issue affecting Graphite proxy ingestion. Our engineering team is actively investigating and will provide updates as more information becomes available.

Report: "AWS Firehose Logs Down in Prod-US-East-2"

Last update
resolved

This incident has been resolved.

investigating

Our team is currently investigating an issue that is affecting AWS Logs with Firehouse traffic in the us-east-2 region. Users will be receiving an "Unable to connect to the destination endpoint. Contact the owner of the endpoint to resolve this issue." error message.

Report: "Cloudwatch datasource query issues on cluster GPC US central"

Last update
resolved

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.

monitoring

Our engineering team is applying the adequate fixes to tackle the issue, and monitoring the behavior.

identified

Since today at ~9:40 UTC we are observing connectivity errors on CloudWatch datasources located on US central cluster. This impacts the query of data and can affect the Alerts based on CloudWatch datasources.

Report: "Synthetic Monitoring: connectivity issue between probes and GCP Saudi Arabia (prod-me-central-0)"

Last update
resolved

As of 20:29 UTC the incident is resolved all probes have reconnected to the API in GCP Saudi Arabia(prod-me-central-0).

investigating

We continue to monitor connectivity issues between probes and GCP Saudi Arabia. As of 19:15 UTC all public probes are unable to connect to GCP Saudi Arabia to get changes to synthetic monitoring. Probe continue to send metrics and logs to GCP Saudi Arabia, with the exception of Paris.

investigating

Starting at 16:55 UTC, the Paris probe is not receiving updates from or publishing data to the GCP Saudi Arabia grafana cloud region. We are investigating the issue.

Report: "Prod-EU-West-2 Connection Issues"

Last update
resolved

At approximately 16:00 UTC some customers in the prod-eu-west-2 region began experiencing connectivity issues with their environment. The issue was remediated by our engineers ~30 minutes later.

Report: "Error Creating Teams"

Last update
resolved

The fix has been rolled out and at this time, we are considering this resolved.

identified

The issue has been identified, and we are in the process of rolling out a fix.

identified

We are currently investigating an issue where some customers are unable to create a new Team in Grafana Cloud.

Report: "Issue with Multiple Components in prod-eu-west-2 and prod-eu-west-4"

Last update
resolved

This incident has been resolved.

monitoring

As of 1:15am UTC, we were alerted to an issue with the following components in prod-eu-west-2 and prod-eu-west-4 clusters, and what users experiencing these issues may encounter: - Authentication failing - Alerting, partial degradation of rule evaluation as well as configuration syncs - Loki impacted, errors on read and write path - SLO plugin is down, unable to access page Engineering is actively engaged and noted that the issue is stabilizing. We are continuing to monitor this.

Report: "Degraded Performance of Adaptive Metrics in GCP US Central"

Last update
resolved

This incident has been resolved.

investigating

Since today at 06:50 UTC, we are experiencing degraded performance on Adaptive Metrics recommendations in cluster GCP US Central. The impact can be reflected in rule recommendations getting are outdated.

Report: "Log ingestion rate limit issue in AWS Germany (logs-prod-012)"

Last update
resolved

Between 2025-04-07:13:00:00UTC and 2025-04-08:14:00:00UTC some customers in AWS Germany (logs-prod-012) have had their ingestion rate limits incorrectly lowered, causing affected customers to hit rate limits writing their logs. The correct rate limits have been reinstated and we can confirm affected customers are no longer being rate limited.

Report: "Multiple components of Grafana down"

Last update
resolved

This incident has been resolved.

investigating

We were alerted to this incident affecting Mimir in the prod-us-east-0 cluster as well.

investigating

We are continuing to investigate this issue.

investigating

As of 5:30am UTC, we were alerted to an issue with the following components and what users experiencing these issues may encounter: - Authentication failing - Alerting impacted, unable to evaluate rules - Loki impacted, errors on read and write path - SLO plugin is down in the prod-us-east-0 cluster, unable to access page Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Report: "Intermittent 500 Errors for Log Queries"

Last update
resolved

Some Loki Queries responded with a 500 Response Code, between 06:45 - 7:15UTC.

Report: "Hosted Grafana Instance Creation Failures in ap-southeast-1 (Singapore)"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and engineering is working on a fix.

investigating

We are currently investigating an issue affecting Hosted Grafana infrastructure in the ap-southeast-1 (Singapore) region. Some users may experience failures when provisioning new resources or instances. At this time, existing instances appear to be unaffected. Our engineering team is actively investigating and will provide updates as more information becomes available.

Report: "Grafana Cloud: SAML UI Role mapping"

Last update
resolved

Engineering has rolled out a fix for this issue. At this point, we are considering the incident resolved.

identified

Engineering has identified additional issues that need to be addressed to fully resolve the SAML UI role mapping behavior for Hosted Grafana users. At this time, users may still encounter problems when attempting to apply changes to SAML role mappings. A fix has been identified and is currently being implemented. We’ll share further updates as they become available.

Report: "Instances on "Fast" experiencing issues querying the Tempo datasources"

Last update
resolved

The fix has been rolled out on our "fast" release channel. At this point, we are considering the incident resolved.

identified

The issue has been identified, and a fix is being rolled out.

investigating

We are currently investigating an issue which has caused instances on the "fast" release cycle to experience issues when querying the Tempo datasource.

Report: "Grafana Cloud: SAML UI Role Mapping"

Last update
resolved

Engineering has released a fix and as of 0600 UTC, customers should no longer experience the inability to change SAML role mapping. At this time, we are considering this issue resolved. No further updates.

identified

Engineering has identified the issue and are currently implementing remediation. We will continue to provide updates as more information is shared.

investigating

As of 18:00 UTC, we became aware of an issue with the Grafana Cloud SAML UI. Users experiencing this issue may encounter not being able to successfully save any role mapping changes. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Report: "Hosts unreachable for instances in prod-sa-east-1"

Last update
resolved

From 6:23 UTC to 6:39 UTC, we became aware of an issue with instances in the prod-sa-east-1 cluster. Users experiencing this issue may have encountered with requests failing for instances in this cluster. This has stabilised and customers should no longer experience this issue.

Report: "Not possible to create stacks on prod-us-west-0"

Last update
resolved

The stack creation process is now functioning properly in the prod-us-west-0 region.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are experiencing issues with stack creation in the prod-us-west-0 region. Adding and removing plugins are impacted as well

Report: "Some Grafana Instances Taking Longer to Initialize"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented, and we are monitoring the results.

identified

The issue has been identified, and we are working on a fix.

Report: "prod-us-central-3 prod-us-central-4 clusters are unavailable"

Last update
resolved

This incident has been resolved.

monitoring

Starting at 18:15 UTC a recent networking configuration change has caused the prod-us-central-3 and prod-us-central-4 clusters to become unavailable. As of 19:04 UTC a fix is deployed and we are monitoring the results.

Report: "Tests failures on k6"

Last update
resolved

This incident has been resolved.

investigating

Some users may see their tests not running and getting aborted by system. We are investigating this issue.

Report: "Some Grafana Instances Taking Longer to Initialize"

Last update
resolved

This incident has been resolved.

identified

We have identified the issue and are mitigating.

investigating

We are currently investigating an issue that is causing some customers to experience longer than expected initializing times.

Report: "Issue with Grafana Access in prod-ap-south-1 and prod-ap-northeast-0"

Last update
resolved

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.

monitoring

We have observed marked improvement with Grafana access. Customers should no longer experience this issue. We will continue to monitor and provide updates.

investigating

We have discovered that this is also affecting instances in the prod-us-east-0 cluster. Our Engineering team is still investigating and assessing the issue. We will provide further updates accordingly.

investigating

As of 3am UTC, we were alerted to an issue with accessing Grafana for instances in the prod-ap-south-1 and prod-ap-northeast-0 clusters. Users experiencing this issue may encounter an error message when attempting to log in to Grafana. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Report: "Degraded performance when querying metrics on AWS Germany"

Last update
resolved

On 12th of March, between 13:30 UTC and 13:50 UTC we experienced some degraded performance when querying metrics on the AWS Germany cluster. The effects were reflected in unavailability to see metrics on dashboards and alert queries triggering errors during that period. The issue has been resolved.

Report: "Delay in aggregated series through adaptive metrics"

Last update
resolved

All series are now caught up and the incident is resolved.

identified

We are continuing to monitor the fix for this issue.

identified

Aggregated series through adaptive metrics have been delayed since ~15:50 UTC in prod-us-central-0/cortex-prod-04. No data is lost, it is only being delayed for querying. We’ve put a fix in place and it’s catching up now. We're currently monitoring further.

Report: "Write path down in prod-us-east-2-prometheus-prod-56"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating an issue with the write path in prod-us-east-2-prometheus-prod-56.

Report: "Read errors in prod-eu-west-0"

Last update
resolved

Many ingesters were evicted from nodes in cortex-prod-01 at once causing a read path outage. Once the ingesters were rescheduled the read path recovered. The errors lasted about 10 minutes.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

As of 3:35pm UTC, we were alerted to an issue with a Mimir read path outage in prod-01-eu-west-0. Users experiencing this issue may have encountered timeouts on Prometheus/metrics queries. Service has recovered but we are monitoring.

Report: "Longer than expected load times in multiple AWS regions"

Last update
resolved

This incident has been resolved.

monitoring

Our engineering team has taken temporary measures to alleviate the impact on customers. We will continue to provide updates as more information becomes available.

identified

Our engineering team is still investigating the root cause of the issue and has taken temporary measures to alleviate the impact on customers. We will continue to provide updates as more information becomes available.

identified

We are seeing this issue occur in multiple AWS regions. Our engineering team is currently engaged in remediating this issue.

monitoring

We are continuing to monitor for further issues. The issue is scoped to AWS and is only significantly observed in the prod-us-east-0 and prod-eu-west-2 regions. We have also retroactively changed the title of this status to accurately reflect the scope.

monitoring

Our Engineering team has identified that https://status.grafana.com/incidents/q6k82s6s7sj0 is related to this issue. At this moment we keep monitoring our services. We'll be providing further updates accordingly.

monitoring

Engineering has released a fix, and customers should no longer be experiencing this issue. We will continue to monitor for recurrence and provide updates accordingly.

identified

Engineering has identified the issue and is currently exploring remediation options. At this time, users may experience instances taking 15+ minutes to come off the "Grafana is Loading" page. We will continue to provide updates as more information is shared.

Report: "Degraded performance due to overloaded internal queues"

Last update
resolved

Our internal queues were affected by a scheduled data migration. For around three hours asynchronous scheduled tasks were affected and their processing delayed, but none cancelled or lost. Test scheduled particularly may have been not run at their intended time during that period.