Is Grafana Cloud Down Right Now? Discover if there is an ongoing service outage.

Grafana Cloud is currently Operational

Last checked Jul 29, 2025 17:42 UTC from Grafana Cloud's official status page

Historical record of incidents for Grafana Cloud

Jul 28, 2025

Report: "Widespread Outages in Prod-EU-West-0"

Last update 2025-07-28T17:52:48.868Z

investigating2025-07-28T17:52:48.866Z

We are currently experiencing widespread outages in prod-eu-west-0. Our team is actively investigating the issue and we will provide more updates as they become available.

Report: "Grafana Cloud Traces some trace search result unavailable in prod-ap-southeast-2 region."

Last update 2025-07-28T14:03:00.997Z

resolved2025-07-14T12:00:00.000Z

Due to a configuration change, one cell in prod-ap-southeast-2 region was incorrectly configured causing some trace search results to not return as expected. The issue was fixed on July 28th around 12:20 UTC and all search results are now being returned as expected. No data was lost in the process.

Jul 23, 2025

Report: "IRM (OnCall and Incident) plugin failing to load in eu-west-2 region."

Last update 2025-07-23T09:08:30.897Z

investigating2025-07-23T09:08:30.894Z

We're experiencing an issue with the IRM plugin in eu-west-2 region right now. The plugin is not available and failing to load. We're working on resolving the problem.

Jul 22, 2025

Report: "Some Users Unable to Connect New Datasources with PDC"

Last update 2025-07-22T22:12:55.999Z

investigating2025-07-22T22:12:55.996Z

We are currently investigating an issue that is preventing users from connecting new datasources with PDC in the us-east-0 region.

Jul 17, 2025

Report: "New Pods Failing to Start for a Small Subset of Instances"

Last update 2025-07-17T21:22:23.718Z

resolved2025-07-17T14:00:00.000Z

At approximately 13:50 UTC a new deployment resulted in pointing to a non-existent plugin version for pinned instances. This resulted in new pods for those instances to fail upon start. Since older pods were still running, users should not have experienced a disruption.

Jul 15, 2025

Report: "IRM (OnCall + Incident) plugin not available"

Last update 2025-07-15T12:10:21.370Z

investigating2025-07-15T12:10:21.366Z

After the latest plugin update we have identified an issue with the IRM plugin making this one unavailable across Grafana Hosted instances. We are actively working to restore the service.

Report: "Degraded performance on Email notifications across all regions"

Last update 2025-07-15T11:20:46.478Z

resolved2025-07-15T09:30:00.000Z

Hosted Grafana experienced degraded performance on the Email notification service today, 15th of July from 11:45 UTC to 13:00 UTC. This impacted the sending/receiving of emails used in modules like alerting or reporting. Service was restored shortly after identification.

Jul 10, 2025

Report: "Most Prometheus Queries Failing in Prod-US-East-0"

Last update 2025-07-10T13:52:07.374Z

investigating2025-07-10T13:52:07.371Z

We are currently experiencing an issue causing most Prometheus queries to fail in the prod-us-east-0 region. Our team is investigating this issue.

Jul 4, 2025

Report: "Grafna fast release channel error"

Last update 2025-07-04T12:44:10.519Z

investigating2025-07-04T12:44:10.514Z

We have identified an issue whereas customers using Grafana in our fast release channel may experience a bug with time series panels that causes their browser tab to crash. Our engineering teams are investigating the issue and we will provide more updates shortly.

Jul 3, 2025

Report: "Hosted Grafana Outage (prod-us-west-0)"

Last update 2025-07-03T22:17:26.092Z

investigating2025-07-03T22:17:26.089Z

As of 21:27 UTC, we identified a DNS-related issue affecting the prod-us-west-0 cluster. This may result in a full outage of Hosted Grafana services for impacted users. Our engineering team is actively investigating and working to resolve the issue. We will share updates as they become available.

Report: "Read and write path outage in Hosted Logs ap-south-1 region cells."

Last update 2025-07-03T09:23:53.276Z

investigating2025-07-03T09:23:53.274Z

We faced an issue with cells in ap-south-1 Hosted Logs region. Between 8:55 and 9:07 UTC this region faced the complete read and write paths outage. Since then it fully recovered and services are fully operational again. We're investigating the root cause right now.

Report: "cortex-prod-05 cell partial read-path outage"

Last update 2025-07-03T09:18:29.062Z

investigating2025-07-03T09:18:29.059Z

We are experience issues with the cortex-prod-05 cell at the moment which is suffering from a partial read-path outage. Engineering remains actively engaged in investigation and we will continue to provide updates as more information is identified.

Jun 30, 2025

Report: "Degraded Performance in Mimir Ingestion Path"

Last update 2025-06-30T18:27:43.756Z

monitoring2025-06-30T18:27:43.746Z

At 16:15 UTC our team became aware of an issue causing errors when accessing Adaptive Metrics in the prod-us-east-0 region. At approximately 17:15 UTC our engineers rolled out a fix and we are currently monitoring the results.

Jun 27, 2025

Report: "Read & Write Outage in Prod-GB-South-1"

Last update 2025-06-27T15:41:47.968Z

resolved2025-06-27T15:41:47.960Z

We observed a brief read & write outage in the prod-gb-south-1 region. This lasted from approximately 11:51-12:14 UTC.

Jun 22, 2025

Report: "Synthetic Monitoring: Spain public probe failing intermittently."

Last update 2025-06-22T09:46:35.271Z

resolved2025-06-21T18:00:00.000Z

Spain public probe was facing issues and had intermittent failures between June 21st 20:00 UTC to June 22nd 8:40 UTC. Synthetic monitoring checks using Spain public probe have been failing intermittently during that window. The issue is now resolved - the probe doesn't suffer any issues anymore and is stable.

Jun 20, 2025

Report: "Degraded Write Performance on Tempo Metrics Generator"

Last update 2025-06-20T16:26:53.059Z

investigating2025-06-20T16:26:53.055Z

As of 15:03 UTC, we identified an issue causing degraded write performance in the Tempo Metrics Generator. Affected users may experience dropped spans in remote-written metrics. Our engineering team is actively investigating, and we will provide updates as more information becomes available.

Report: "High query latency for hosted Prometheus datasources"

Last update 2025-06-20T15:48:59.213Z

investigating2025-06-20T15:48:59.210Z

Starting at 13:23 UTC and resolved at 13:36 UTC, some hosted Prometheus datasources experience high query latency, which could trigger false alarms or cause timeouts in the UI when querying. The root cause is being investigated.

Jun 19, 2025

Report: "Cloudwatch/Athena Integrations - Partial Outage"

Last update 2025-06-19T17:50:35.054Z

investigating2025-06-19T17:50:35.052Z

As of approximately 15:00 UTC, we have became aware of an issue affecting our Cloudwatch and Athena datasources. Users may experience failures when querying logs or metrics from these datasources and/or see "No Data" within dashboards. Our engineering team is actively investigating and working to resolve the issue. We will share more details and provide updates as they become available.

Jun 18, 2025

Report: "Slow user queries exceed threshold"

Last update 2025-06-18T17:54:07.674Z

resolved2025-06-18T17:54:07.665Z

There was a read outage impacting Loki tenants in the prod-us-east-0 cluster. This issue has been resolved.

Report: "Metrics Drilldown Issues"

Last update 2025-06-18T14:16:29.324Z

investigating2025-06-18T14:16:29.321Z

We are currently investigating an issue with Metrics Drilldown. Customers may encounter an "App not found" error during this time.

Report: "Logs query rate metric unavailable"

Last update 2025-06-18T11:45:45.617Z

resolved2025-06-18T08:00:00.000Z

Between 9:40 to 10:55 AM UTC, Cloud Logs service briefly experienced issue with providing data to query rate metrics only. You may experience gaps in the results for query rates panel in the billing dashboards for the given period. The situation is now mitigated, we apologize for the inconvenience.

Jun 17, 2025

Report: "Brief Write Latency in prod-us-west-0 Loki Cell"

Last update 2025-06-17T18:52:12.470Z

resolved2025-06-17T18:00:00.000Z

From 18:15 to 18:25 UTC, our prod-us-west-0 Loki cell experienced a period of degraded write performance. The issue resolved quickly without requiring manual intervention, and the system has remained stable since.

Jun 16, 2025

Report: "Ingestion errors for Traces on cluster AWS Germany ( prod-eu-west-2)"

Last update 2025-06-16T09:37:06.708Z

resolved2025-06-13T13:30:00.000Z

The Tempo service on cluster EU west experienced a traffic increase over the weekend, which caused an elevated error rate in Tempo's write path (ingestion). Our engineering team identified the root cause of the issue, and implemented measurements for palliating and resolving the problem. Traces ingestion problems could have been experienced from 15:30 UTC on 13th until 19:30 UTC on 15th.

Jun 12, 2025

Report: "GCP - Major Incident Affecting Multiple Grafana Cloud Components"

Last update 2025-06-12T18:36:36.026Z

investigating2025-06-12T18:36:36.019Z

As of 17:55 UTC, we were alerted a significant outage involving one of our cloud providers. Users experiencing this issue may encounter inability to query or ingest data for metrics, logs, traces, and profiles. Users experiencing this issue may also not be able to create new Hosted Grafana instances. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Jun 11, 2025

Report: "Loki - Slow Queries"

Last update 2025-06-11T20:32:16.015Z

investigating2025-06-11T20:32:16.011Z

As of 20:23 UTC, we became aware of an issue with Loki query speeds. Users experiencing this issue may encounter slow query speeds for logs. Write performance should not be affected. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Jun 4, 2025

Report: "Instances on the "Slow" Release Channel Receiving Unexpected Errors."

Last update 2025-06-04T22:00:14.986Z

resolved2025-06-04T12:00:00.000Z

At approximately 12:00 UTC a feature toggle was rolled out which negatively impacted instances on the slow release channel. Users on this release channel began to receive an "AlertStatesDataLayer" error. A workaround was quickly identified and applied to reporting users. The feature toggle in question was fully reverted by 18:00 UTC.

Jun 3, 2025

Report: "Private Datasource Connect - New Agents Failing To Get SSH Certificates Signed"

Last update 2025-06-03T16:12:44.157Z

resolved2025-06-03T16:12:44.138Z

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

monitoring2025-06-03T15:23:12.259Z

As of 13:34 UTC, our engineering team became aware of an issue with PDC Agents failing to get new SSH certificates signed. Existing PDC Agents will continue working until the connection is terminated. Engineering has released a fix and as of 14:58, customers should no longer experience issues with new PDC Agent SSH certificates. We will continue to monitor for recurrence and provide updates accordingly.

Report: "Scheduled maintenance caused a temporary issue with logging in into Grafana Cloud stacks."

Last update 2025-06-03T11:49:58.577Z

resolved2025-06-03T08:00:00.000Z

Due to scheduled maintenance (https://status.grafana.com/incidents/rz7nt6cs4prb) we hit an issue with some users being unable to log in into their Grafana Cloud stacks. The issue was affecting only users who: - had no session already opened in the Grafana Cloud stack; - or they were located close to Europe (geographically), but their stack is closer to US (or vice versa). The issue was caused by an incorrect configuration introduced by the maintenance, which was fixed shortly after being discovered. Login is fully operational and stable now.

Report: "Private Datasource Connect - New Agents Failing To Get SSH Certificates Signed"

Last update 2025-06-03T11:12:00.000Z

Resolved2025-06-03T11:12:00.000Z

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

Monitoring2025-06-03T10:23:00.000Z

As of 13:34 UTC, our engineering team became aware of an issue with PDC Agents failing to get new SSH certificates signed. Existing PDC Agents will continue working until the connection is terminated.Engineering has released a fix and as of 14:58, customers should no longer experience issues with new PDC Agent SSH certificates. We will continue to monitor for recurrence and provide updates accordingly.

Report: "grafana.com Portal Scheduled Maintenance."

Last update 2025-06-03T05:00:00.000Z

Completed2025-06-03T05:00:00.000Z

The scheduled maintenance has been completed.

In progress2025-06-03T03:00:00.000Z

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Scheduled2025-05-28T07:26:00.000Z

We will perform an internal scheduled migration of the grafana.com portal which will result in user sessions being expired. The migration will be regional and first, the European region will be migrated and then the US region will follow up. During the maintenance, users will be logged out twice (if actively browsing the portal) and asked to authenticate again.

Jun 2, 2025

Report: "Elevated latency in prod-us-east-0 cluster."

Last update 2025-06-02T17:58:43.767Z

resolved2025-06-02T17:58:43.747Z

This incident has been resolved.

monitoring2025-06-02T15:48:37.576Z

A fix has been implemented, and we are monitoring the results.

investigating2025-06-02T14:56:06.312Z

We're observing an elevated latency in prod-us-east-0 cluster for Hosted Metrics service. We're currently investigating the issue.

Report: "Push errors and elevated latency in prod-us-central-0 cluster."

Last update 2025-06-02T17:58:27.489Z

resolved2025-06-02T17:58:27.469Z

This incident has been resolved.

investigating2025-06-02T15:48:59.168Z

A fix has been implemented, and we are monitoring the results.

investigating2025-06-02T14:20:48.454Z

We're observing push errors and elevated latency in prod-us-central-0 cluster affecting the write path performance. We're investigating the issue currently and will post updates accordingly.

Report: "Push errors and elevated latency in prod-us-central-0 cluster."

Last update 2025-06-02T12:58:00.000Z

Resolved2025-06-02T12:58:00.000Z

This incident has been resolved.

Update2025-06-02T10:48:00.000Z

A fix has been implemented, and we are monitoring the results.

Investigating2025-06-02T09:20:00.000Z

We're observing push errors and elevated latency in prod-us-central-0 cluster affecting the write path performance. We're investigating the issue currently and will post updates accordingly.

Report: "Elevated latency in prod-us-east-0 cluster."

Last update 2025-06-02T12:58:00.000Z

Resolved2025-06-02T12:58:00.000Z

This incident has been resolved.

Monitoring2025-06-02T10:48:00.000Z

A fix has been implemented, and we are monitoring the results.

Investigating2025-06-02T09:56:00.000Z

We're observing an elevated latency in prod-us-east-0 cluster for Hosted Metrics service. We're currently investigating the issue.

May 29, 2025

Report: "Some Pages in the User Portal Showing Errors."

Last update 2025-05-29T17:11:48.379Z

resolved2025-05-29T17:11:48.362Z

This incident has been resolved.

investigating2025-05-29T14:44:41.619Z

Our team is aware of an issue that is causing some pages in the User Portal to error out. We are currently working on deploying a fix.

Report: "Some Pages in the User Portal Showing Errors."

Last update 2025-05-29T12:11:00.000Z

Resolved2025-05-29T12:11:00.000Z

This incident has been resolved.

Investigating2025-05-29T09:44:00.000Z

Our team is aware of an issue that is causing some pages in the User Portal to error out. We are currently working on deploying a fix.

May 27, 2025

Report: "Issues with Hosted Grafana in some Regions."

Last update 2025-05-27T19:47:40.671Z

resolved2025-05-27T19:47:40.655Z

This incident has been resolved.

monitoring2025-05-27T18:46:44.034Z

A fix has been implemented, and we are monitoring the results.

investigating2025-05-27T15:31:55.448Z

Our Engineering Teams are continuing to investigate this and are actively working towards a resolution for this issue.

investigating2025-05-27T13:47:14.836Z

We are currently investigating an issue occurring in the prod-eu-west-2 and prod-us-central-0 regions. Some users may be experiencing availability issues in these regions. Our team is actively working towards a resolution for this issue.

Report: "Issues with Hosted Grafana in some Regions."

Last update 2025-05-27T14:47:00.000Z

Resolved2025-05-27T14:47:00.000Z

This incident has been resolved.

Monitoring2025-05-27T13:46:00.000Z

A fix has been implemented, and we are monitoring the results.

Update2025-05-27T10:31:00.000Z

Our Engineering Teams are continuing to investigate this and are actively working towards a resolution for this issue.

Investigating2025-05-27T08:47:00.000Z

May 26, 2025

Report: "Grafana Cloud: Migration Assistant - degraded performance"

Last update 2025-05-26T02:50:59.864Z

resolved2025-05-26T02:50:59.849Z

We have implemented a fix and verified it with a load test.

monitoring2025-05-20T21:07:21.852Z

A fix is scheduled for release in an upcoming Grafana update, expected in late May or early June.

investigating2025-05-15T13:46:30.601Z

We are continuing to investigate this issue.

investigating2025-05-15T13:46:15.315Z

The team is aware of issues with migrating large numbers of resources and is working on a fix

May 25, 2025

Report: "Grafana Cloud: Migration Assistant - degraded performance"

Last update 2025-05-25T21:50:00.000Z

Resolved2025-05-25T21:50:00.000Z

We have implemented a fix and verified it with a load test.

Monitoring2025-05-20T16:07:00.000Z

A fix is scheduled for release in an upcoming Grafana update, expected in late May or early June.

Update2025-05-15T08:46:00.000Z

We are continuing to investigate this issue.

Investigating2025-05-15T08:46:00.000Z

The team is aware of issues with migrating large numbers of resources and is working on a fix

May 22, 2025

Report: "Grafana Cloud k6: service outage"

Last update 2025-05-22T16:39:37.972Z

resolved2025-05-22T16:39:37.956Z

This incident has been resolved.

investigating2025-05-22T16:38:41.567Z

We are continuing to investigate this issue.

investigating2025-05-22T16:29:05.939Z

We are continuing to investigate this issue.

investigating2025-05-22T16:12:24.780Z

We have a database issue. We are investigating the cause of it.

Report: "Grafana Cloud k6: service outage"

Last update 2025-05-22T11:39:00.000Z

Resolved2025-05-22T11:39:00.000Z

This incident has been resolved.

Update2025-05-22T11:38:00.000Z

We are continuing to investigate this issue.

Update2025-05-22T11:29:00.000Z

We are continuing to investigate this issue.

Investigating2025-05-22T11:12:00.000Z

We have a database issue. We are investigating the cause of it.

May 19, 2025

Report: "SLO Performance Degradation"

Last update 2025-05-19T19:17:04.235Z

resolved2025-05-19T19:17:04.217Z

Engineering has released a fix and as of 18:45 UTC, customers should no longer experience issues with creating, editing, updating, or deleting SLOs. We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

investigating2025-05-19T18:41:20.486Z

As of 17:55 UTC, we became aware of a performance issue to our Cloud App Platform. Users experiencing this issue may be unable to create, edit, update, or delete SLOs. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

May 16, 2025

Report: "Load test metrics not being ingested during execution"

Last update 2025-05-16T18:09:05.250Z

resolved2025-05-16T18:09:05.232Z

This incident has been resolved.

monitoring2025-05-16T18:00:29.091Z

We are continuing to monitor for any further issues.

monitoring2025-05-16T17:59:42.372Z

A fix has been implemented and we are monitoring the results.

investigating2025-05-16T17:14:39.211Z

We are currently investigating this issue.

May 14, 2025

Report: "Traceql metric queries not returning recently ingested data in us-east-2 region."

Last update 2025-05-14T10:31:00.047Z

resolved2025-05-09T12:00:00.000Z

Starting from May 9th 2025 12:00 UTC, we were facing Hosted Traces partial read path outage in us-east-2 region. Traceql metric queries were not returning a recently ingested data. After an investigation, the fix was deployed and the incident was resolved at May 14th at 10:00 UTC. There might be data gaps which will be eventually filled, we confirmed no data loss happened.

Report: "Some Hosted Grafana instances in prod-ap-south-1 experiencing degraded performance"

Last update 2025-05-14T03:57:14.426Z

resolved2025-05-14T03:57:14.408Z

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.

monitoring2025-05-14T01:40:55.804Z

Engineering has released a fix and as of 1:40am UTC, customers should no longer experience degraded performance in prod-ap-south-1. We will continue to monitor for recurrence and provide updates accordingly.

investigating2025-05-14T00:12:56.940Z

Beginning approximately 22:30 UTC, we identified some Hosted Grafana instances in prod-ap-south-1 experiencing degraded performance, presenting as either unavailable or slow. We are currently investigating this issue

May 12, 2025

Report: "Private Datasource Connect (PDC) Agent - Partial Outage"

Last update 2025-05-12T21:35:48.458Z

resolved2025-05-12T21:35:48.436Z

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

monitoring2025-05-12T21:06:58.109Z

As of 20:32 UTC, customers should no longer experience any issues with the Private Datasource Connection (PDC) Agent. We will continue to monitor for recurrence and provide updates accordingly.

identified2025-05-12T21:06:44.839Z

As of 18:52 UTC, we were alerted to an issue with the Private Datasource Connect (PDC) Agent. Users experiencing this issue may have encountered a "Connection Status Not Available" error for the duration for any PDC Agent configured in their environment. Additionally, users may have experienced 504 Gateway Timeout Errors and/or dashboards that query data from PDC agents failing to load data.

May 6, 2025

Report: "Synthetic monitoring: browser synthetics elevated failure rates in Oregon public probe"

Last update 2025-05-06T21:34:14.835Z

resolved2025-05-06T21:34:14.814Z

This incident has been resolved.

monitoring2025-05-06T21:32:43.048Z

A fix has been deployed and browser checks are running normally as of 21:25 UTC, we are continuing to monitor

investigating2025-05-06T20:51:04.732Z

We are investigating an issue with the Oregon probe where browser synthetics have delayed execution and higher failure rates. Other synthetics types (protocol-level and k6 scripted) are currently unaffected.

May 5, 2025

Report: "Instances Inaccessible in prod-us-central-0"

Last update 2025-05-05T05:29:51.274Z

resolved2025-05-05T05:29:51.255Z

Engineering has applied a fix and as of 5:30 AM UTC, customers should no longer experience an issue with accessing their instance. At this time, we are considering this issue resolved.

investigating2025-05-05T05:07:09.003Z

We were alerted to an issue with accessing instances hosted in the prod-us-central-0 cluster. This issue started occurring from ~1 UTC. Users experiencing this issue may encounter a "connection error" page when attempting to login to their instance. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

May 2, 2025

Report: "Slowdown on Loki Write Path in prod-eu-north-0 Region"

Last update 2025-05-02T14:02:13.084Z

resolved2025-05-02T14:02:13.065Z

The incident has been resolved.

identified2025-05-02T13:07:32.502Z

Loki service experienced brief slowing down in the write path. There is no impact to customer data. We are actively working on mitigation.

Report: "Brief Slowdown on Loki Write Path in prod-eu-north-0 Region"

Last update 2025-05-02T09:21:54.400Z

resolved2025-05-02T09:21:54.386Z

The incident has been resolved.

monitoring2025-05-02T08:38:32.164Z

Loki experienced brief slowing down in the write path starting 6:10 UTC. There was no impact to customer data. The situation has been mitigated and we're monitoring the service.

Apr 29, 2025

Report: "Some Instances in prod-us-east-0 Experienced Issues Accessing Their Instance."

Last update 2025-04-29T15:09:23.635Z

resolved2025-04-29T15:09:23.623Z

At approximately 14:20 UTC we observed an issue that caused some deployments in prod-us-east-0 to experience long load times, or issues accessing their instance. The issue was resolved shortly before 15:00 UTC.

Apr 27, 2025

Report: "API offline in GCP Singapore (prod-ap-southeast-0) and GCP Brazil (prod-sa-east-0)"

Last update 2025-04-27T20:28:56.060Z

resolved2025-04-27T20:28:56.043Z

API came online again at 19:14 UTC in GCP Singapore and 19:21 UTC in GCP Brazil. While the API was unavailable, probes were unaffected. Synthetic monitoring checks in all probes continued to run and publish data during this incident. The incident was caused by earlier database maintenance in these regions.

monitoring2025-04-27T19:52:19.954Z

A fix has been implemented and we are monitoring the results.

investigating2025-04-27T18:48:05.949Z

The synthetics API is currently offline in both GCP Singapore (prod-ap-southeast-0) and GCP Brazil (prod-sa-east-0). We are investigating the issue.

Apr 25, 2025

Report: "Cloud Migration Assistance Disabled in Some Regions"

Last update 2025-04-25T17:51:57.406Z

resolved2025-04-25T17:51:57.391Z

This incident has been resolved.

investigating2025-04-22T16:09:35.836Z

The Cloud Migration Assistant feature is currently disabled in the following regions due to an issue discovered by our engineers. AWS UAE AWS Indonesia GCP Saudi Arabia We are working on a fix for this issue in the coming days. Updates will be provided here as they become available to our teams.

Apr 22, 2025

Report: "Logs Drilldown Broken for Some Users"

Last update 2025-04-22T22:36:35.418Z

resolved2025-04-22T22:36:35.404Z

This incident has been resolved.

monitoring2025-04-22T21:35:04.941Z

A fix has been implemented, and we are monitoring the results.

investigating2025-04-22T20:26:26.242Z

We are currently investigating an issue that is causing an "App Not Found" error for some users when attempting to access logs drilldown.

Apr 17, 2025

Report: "Email notifications issue"

Last update 2025-04-17T06:39:13.950Z

resolved2025-04-17T06:39:13.934Z

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.

monitoring2025-04-17T01:00:56.055Z

The issue has been mitigated, and a root cause analysis is currently underway.

identified2025-04-16T15:23:51.928Z

The issue has been mitigated, and a root cause analysis is currently underway.

investigating2025-04-16T13:10:45.997Z

Some users may not receive alert email notifications or other email service messages. We are working to resolve the issue as soon as possible.

Apr 15, 2025

Report: "Graphite proxy ingestion failing - prod-ap-northeast-0"

Last update 2025-04-15T09:27:20.421Z

resolved2025-04-15T09:27:20.405Z

This incident has been resolved

monitoring2025-04-15T09:10:06.096Z

A fix has been implemented and we are monitoring the results.

investigating2025-04-15T08:58:55.264Z

We are currently investigating an issue affecting Graphite proxy ingestion. Our engineering team is actively investigating and will provide updates as more information becomes available.

Apr 11, 2025

Report: "AWS Firehose Logs Down in Prod-US-East-2"

Last update 2025-04-11T19:55:52.417Z

resolved2025-04-11T19:55:52.398Z

This incident has been resolved.

investigating2025-04-11T18:45:17.642Z

Our team is currently investigating an issue that is affecting AWS Logs with Firehouse traffic in the us-east-2 region. Users will be receiving an "Unable to connect to the destination endpoint. Contact the owner of the endpoint to resolve this issue." error message.

Report: "Cloudwatch datasource query issues on cluster GPC US central"

Last update 2025-04-11T08:03:44.454Z

resolved2025-04-11T08:03:44.434Z

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.

monitoring2025-04-09T08:40:12.249Z

Our engineering team is applying the adequate fixes to tackle the issue, and monitoring the behavior.

identified2025-04-08T14:52:26.836Z

Since today at ~9:40 UTC we are observing connectivity errors on CloudWatch datasources located on US central cluster. This impacts the query of data and can affect the Alerts based on CloudWatch datasources.

Apr 10, 2025

Report: "Synthetic Monitoring: connectivity issue between probes and GCP Saudi Arabia (prod-me-central-0)"

Last update 2025-04-10T20:47:58.958Z

resolved2025-04-10T20:47:24.000Z

As of 20:29 UTC the incident is resolved all probes have reconnected to the API in GCP Saudi Arabia(prod-me-central-0).

investigating2025-04-10T19:26:29.221Z

We continue to monitor connectivity issues between probes and GCP Saudi Arabia. As of 19:15 UTC all public probes are unable to connect to GCP Saudi Arabia to get changes to synthetic monitoring. Probe continue to send metrics and logs to GCP Saudi Arabia, with the exception of Paris.

investigating2025-04-10T17:30:46.463Z

Starting at 16:55 UTC, the Paris probe is not receiving updates from or publishing data to the GCP Saudi Arabia grafana cloud region. We are investigating the issue.

Report: "Prod-EU-West-2 Connection Issues"

Last update 2025-04-10T16:44:35.168Z

resolved2025-04-10T16:44:35.158Z

At approximately 16:00 UTC some customers in the prod-eu-west-2 region began experiencing connectivity issues with their environment. The issue was remediated by our engineers ~30 minutes later.

Report: "Error Creating Teams"

Last update 2025-04-10T08:02:29.328Z

resolved2025-04-10T08:02:29.311Z

The fix has been rolled out and at this time, we are considering this resolved.

identified2025-04-09T13:54:07.373Z

The issue has been identified, and we are in the process of rolling out a fix.

identified2025-04-09T12:48:01.783Z

We are currently investigating an issue where some customers are unable to create a new Team in Grafana Cloud.

Report: "Issue with Multiple Components in prod-eu-west-2 and prod-eu-west-4"

Last update 2025-04-10T02:06:43.878Z

resolved2025-04-10T02:06:43.857Z

This incident has been resolved.

monitoring2025-04-10T01:27:41.920Z

As of 1:15am UTC, we were alerted to an issue with the following components in prod-eu-west-2 and prod-eu-west-4 clusters, and what users experiencing these issues may encounter: - Authentication failing - Alerting, partial degradation of rule evaluation as well as configuration syncs - Loki impacted, errors on read and write path - SLO plugin is down, unable to access page Engineering is actively engaged and noted that the issue is stabilizing. We are continuing to monitor this.

Apr 8, 2025

Report: "Degraded Performance of Adaptive Metrics in GCP US Central"

Last update 2025-04-08T17:19:08.463Z

resolved2025-04-08T17:19:08.449Z

This incident has been resolved.

investigating2025-04-08T14:01:19.572Z

Since today at 06:50 UTC, we are experiencing degraded performance on Adaptive Metrics recommendations in cluster GCP US Central. The impact can be reflected in rule recommendations getting are outdated.

Report: "Log ingestion rate limit issue in AWS Germany (logs-prod-012)"

Last update 2025-04-08T13:09:25.842Z

resolved2025-04-07T13:00:00.000Z

Between 2025-04-07:13:00:00UTC and 2025-04-08:14:00:00UTC some customers in AWS Germany (logs-prod-012) have had their ingestion rate limits incorrectly lowered, causing affected customers to hit rate limits writing their logs. The correct rate limits have been reinstated and we can confirm affected customers are no longer being rate limited.

Report: "Multiple components of Grafana down"

Last update 2025-04-08T07:20:47.415Z

resolved2025-04-08T07:20:47.370Z

This incident has been resolved.

investigating2025-04-08T06:06:56.000Z

We were alerted to this incident affecting Mimir in the prod-us-east-0 cluster as well.

investigating2025-04-08T05:46:46.559Z

We are continuing to investigate this issue.

investigating2025-04-08T05:42:01.467Z

As of 5:30am UTC, we were alerted to an issue with the following components and what users experiencing these issues may encounter: - Authentication failing - Alerting impacted, unable to evaluate rules - Loki impacted, errors on read and write path - SLO plugin is down in the prod-us-east-0 cluster, unable to access page Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Apr 7, 2025

Report: "Intermittent 500 Errors for Log Queries"

Last update 2025-04-07T09:15:11.538Z

resolved2025-04-07T06:45:00.000Z

Some Loki Queries responded with a 500 Response Code, between 06:45 - 7:15UTC.

Apr 6, 2025

Report: "Hosted Grafana Instance Creation Failures in ap-southeast-1 (Singapore)"

Last update 2025-04-06T17:45:07.995Z

resolved2025-04-06T17:45:07.974Z

This incident has been resolved.

monitoring2025-04-06T15:15:18.756Z

A fix has been implemented and we are monitoring the results.

identified2025-04-06T15:08:08.819Z

The issue has been identified and engineering is working on a fix.

investigating2025-04-06T14:42:14.000Z

We are currently investigating an issue affecting Hosted Grafana infrastructure in the ap-southeast-1 (Singapore) region. Some users may experience failures when provisioning new resources or instances. At this time, existing instances appear to be unaffected. Our engineering team is actively investigating and will provide updates as more information becomes available.

Apr 4, 2025

Report: "Grafana Cloud: SAML UI Role mapping"

Last update 2025-04-04T01:27:28.009Z

resolved2025-04-04T01:27:27.986Z

Engineering has rolled out a fix for this issue. At this point, we are considering the incident resolved.

identified2025-04-02T17:08:27.879Z

Engineering has identified additional issues that need to be addressed to fully resolve the SAML UI role mapping behavior for Hosted Grafana users. At this time, users may still encounter problems when attempting to apply changes to SAML role mappings. A fix has been identified and is currently being implemented. We’ll share further updates as they become available.

Report: "Instances on "Fast" experiencing issues querying the Tempo datasources"

Last update 2025-04-04T01:25:06.750Z

resolved2025-04-04T01:25:06.726Z

The fix has been rolled out on our "fast" release channel. At this point, we are considering the incident resolved.

identified2025-04-03T19:56:05.043Z

The issue has been identified, and a fix is being rolled out.

investigating2025-04-03T15:18:12.008Z

We are currently investigating an issue which has caused instances on the "fast" release cycle to experience issues when querying the Tempo datasource.

Apr 2, 2025

Report: "Grafana Cloud: SAML UI Role Mapping"

Last update 2025-04-02T06:32:52.065Z

resolved2025-04-02T06:32:52.048Z

Engineering has released a fix and as of 0600 UTC, customers should no longer experience the inability to change SAML role mapping. At this time, we are considering this issue resolved. No further updates.

identified2025-04-01T22:45:11.515Z

Engineering has identified the issue and are currently implementing remediation. We will continue to provide updates as more information is shared.

investigating2025-04-01T18:14:38.000Z

As of 18:00 UTC, we became aware of an issue with the Grafana Cloud SAML UI. Users experiencing this issue may encounter not being able to successfully save any role mapping changes. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Mar 28, 2025

Report: "Hosts unreachable for instances in prod-sa-east-1"

Last update 2025-03-28T07:59:17.322Z

resolved2025-03-28T07:59:17.313Z

From 6:23 UTC to 6:39 UTC, we became aware of an issue with instances in the prod-sa-east-1 cluster. Users experiencing this issue may have encountered with requests failing for instances in this cluster. This has stabilised and customers should no longer experience this issue.

Mar 26, 2025

Report: "Not possible to create stacks on prod-us-west-0"

Last update 2025-03-26T14:03:31.164Z

resolved2025-03-26T14:03:31.150Z

The stack creation process is now functioning properly in the prod-us-west-0 region.

monitoring2025-03-26T13:53:06.431Z

A fix has been implemented and we are monitoring the results.

investigating2025-03-26T13:39:07.774Z

We are experiencing issues with stack creation in the prod-us-west-0 region. Adding and removing plugins are impacted as well

Mar 20, 2025

Report: "Some Grafana Instances Taking Longer to Initialize"

Last update 2025-03-20T13:33:23.427Z

resolved2025-03-20T13:33:23.407Z

This incident has been resolved.

monitoring2025-03-18T19:56:32.990Z

We are continuing to monitor for any further issues.

monitoring2025-03-18T14:27:59.890Z

A fix has been implemented, and we are monitoring the results.

identified2025-03-17T18:49:59.874Z

The issue has been identified, and we are working on a fix.

Mar 18, 2025

Report: "prod-us-central-3 prod-us-central-4 clusters are unavailable"

Last update 2025-03-18T19:55:35.921Z

resolved2025-03-18T19:55:35.906Z

This incident has been resolved.

monitoring2025-03-18T19:07:20.000Z

Starting at 18:15 UTC a recent networking configuration change has caused the prod-us-central-3 and prod-us-central-4 clusters to become unavailable. As of 19:04 UTC a fix is deployed and we are monitoring the results.

Mar 17, 2025

Report: "Tests failures on k6"

Last update 2025-03-17T17:56:32.279Z

resolved2025-03-17T17:56:32.263Z

This incident has been resolved.

investigating2025-03-17T16:12:11.509Z

Some users may see their tests not running and getting aborted by system. We are investigating this issue.

Report: "Some Grafana Instances Taking Longer to Initialize"

Last update 2025-03-17T17:32:50.635Z

resolved2025-03-17T17:32:50.616Z

This incident has been resolved.

identified2025-03-17T16:53:53.187Z

We have identified the issue and are mitigating.

investigating2025-03-17T16:28:09.778Z

We are currently investigating an issue that is causing some customers to experience longer than expected initializing times.

Mar 13, 2025

Report: "Issue with Grafana Access in prod-ap-south-1 and prod-ap-northeast-0"

Last update 2025-03-13T05:59:32.268Z

resolved2025-03-13T05:59:32.250Z

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.

monitoring2025-03-13T04:34:52.530Z

We have observed marked improvement with Grafana access. Customers should no longer experience this issue. We will continue to monitor and provide updates.

investigating2025-03-13T03:58:08.145Z

We have discovered that this is also affecting instances in the prod-us-east-0 cluster. Our Engineering team is still investigating and assessing the issue. We will provide further updates accordingly.

investigating2025-03-13T03:19:18.361Z

As of 3am UTC, we were alerted to an issue with accessing Grafana for instances in the prod-ap-south-1 and prod-ap-northeast-0 clusters. Users experiencing this issue may encounter an error message when attempting to log in to Grafana. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

Mar 12, 2025

Report: "Degraded performance when querying metrics on AWS Germany"

Last update 2025-03-12T15:46:29.213Z

resolved2025-03-12T12:30:00.000Z

On 12th of March, between 13:30 UTC and 13:50 UTC we experienced some degraded performance when querying metrics on the AWS Germany cluster. The effects were reflected in unavailability to see metrics on dashboards and alert queries triggering errors during that period. The issue has been resolved.

Mar 11, 2025

Report: "Delay in aggregated series through adaptive metrics"

Last update 2025-03-11T21:06:21.549Z

resolved2025-03-11T21:00:13.000Z

All series are now caught up and the incident is resolved.

identified2025-03-11T20:52:24.000Z

We are continuing to monitor the fix for this issue.

identified2025-03-11T15:50:57.000Z

Aggregated series through adaptive metrics have been delayed since ~15:50 UTC in prod-us-central-0/cortex-prod-04. No data is lost, it is only being delayed for querying. We’ve put a fix in place and it’s catching up now. We're currently monitoring further.

Mar 7, 2025

Report: "Write path down in prod-us-east-2-prometheus-prod-56"

Last update 2025-03-07T02:15:51.799Z

resolved2025-03-07T02:15:51.782Z

This incident has been resolved.

investigating2025-03-06T22:44:21.701Z

We are continuing to investigate this issue.

investigating2025-03-06T22:41:16.000Z

We are currently investigating an issue with the write path in prod-us-east-2-prometheus-prod-56.

Mar 6, 2025

Report: "Read errors in prod-eu-west-0"

Last update 2025-03-06T16:49:19.669Z

resolved2025-03-06T16:49:19.633Z

Many ingesters were evicted from nodes in cortex-prod-01 at once causing a read path outage. Once the ingesters were rescheduled the read path recovered. The errors lasted about 10 minutes.

monitoring2025-03-06T16:17:45.812Z

A fix has been implemented and we are monitoring the results.

investigating2025-03-06T16:16:14.914Z

As of 3:35pm UTC, we were alerted to an issue with a Mimir read path outage in prod-01-eu-west-0. Users experiencing this issue may have encountered timeouts on Prometheus/metrics queries. Service has recovered but we are monitoring.

Mar 5, 2025

Report: "Longer than expected load times in multiple AWS regions"

Last update 2025-03-05T23:27:37.276Z

resolved2025-03-05T23:27:37.253Z

This incident has been resolved.

monitoring2025-03-05T16:02:52.000Z

Our engineering team has taken temporary measures to alleviate the impact on customers. We will continue to provide updates as more information becomes available.

identified2025-03-05T12:45:28.000Z

Our engineering team is still investigating the root cause of the issue and has taken temporary measures to alleviate the impact on customers. We will continue to provide updates as more information becomes available.

identified2025-02-26T21:35:01.347Z

We are seeing this issue occur in multiple AWS regions. Our engineering team is currently engaged in remediating this issue.

monitoring2025-02-26T15:39:00.000Z

We are continuing to monitor for further issues. The issue is scoped to AWS and is only significantly observed in the prod-us-east-0 and prod-eu-west-2 regions. We have also retroactively changed the title of this status to accurately reflect the scope.

monitoring2025-02-24T22:22:37.627Z

Our Engineering team has identified that https://status.grafana.com/incidents/q6k82s6s7sj0 is related to this issue. At this moment we keep monitoring our services. We'll be providing further updates accordingly.

monitoring2025-02-24T19:59:01.712Z

Engineering has released a fix, and customers should no longer be experiencing this issue. We will continue to monitor for recurrence and provide updates accordingly.

identified2025-02-24T18:26:34.141Z

Engineering has identified the issue and is currently exploring remediation options. At this time, users may experience instances taking 15+ minutes to come off the "Grafana is Loading" page. We will continue to provide updates as more information is shared.

Feb 28, 2025

Report: "Degraded performance due to overloaded internal queues"

Last update 2025-02-28T13:10:19.933Z

resolved2025-02-28T13:09:56.569Z

Our internal queues were affected by a scheduled data migration. For around three hours asynchronous scheduled tasks were affected and their processing delayed, but none cancelled or lost. Test scheduled particularly may have been not run at their intended time during that period.