Historical record of incidents for Datadog Govcloud
Report: "Delayed monitor notifications"
Last updateWe have confirmed recovery of delays in Monitors Notifications involving monitors which rely on distribution metrics as of 17:49 UTC
We have deployed a fix for delays in Monitors Notifications involving monitors which rely on distribution metrics, which began at 15:20 UTC and are monitoring the fix.
We are still investigating for delays in Monitors Notifications involving monitors which rely on distribution metrics, which began at 15:20 UTC.
Report: "Delayed monitor notifications"
Last updateWe are investigating delays in Monitors Notifications involving monitors which rely on distribution metrics, which began at 15:20 UTC.
Report: "Delayed [CLOUD PROVIDER: AWS/GCP/...] Events"
Last updateThis incident has been resolved.
We are recovering from increased latency processing monitors from cloud providers. Some users may receive delayed alerts from cloud integrations monitors that activated after 13:45 UTC
Report: "Delayed [CLOUD PROVIDER: AWS/GCP/...] Events"
Last updateWe are recovering from increased latency processing monitors from cloud providers. Some users may receive delayed alerts from cloud integrations monitors that activated after 13:45 UTC
Report: "APM Traces not linked to RUM Events"
Last updateThis incident has been resolved.
We are investigating an issue preventing RUM events from being linked to APM traces. As a result of this, users will be unable to see APM traces within their RUM events.
Report: "APM Traces not linked to RUM Events"
Last updateWe are investigating an issue preventing RUM events from being linked to APM traces. As a result of this, users will be unable to see APM traces within their RUM events.
Report: "Delayed Logs"
Last updateThis incident has been resolved.
All affected products are now fully recovered; we are monitoring to confirm that they remain healthy.
We are seeing signs of recovery for all customers and products as our mitigation proceeds. Most customers should now have all current data for all products; we are finishing mitigation for a small percentage of data. Next update in 60 minutes at most.
We are continuing to work on rolling out a fix for this issue, and still expect resolution by 10:45 pm Eastern time. We will update again within the next 60 minutes
We are continuing to work on rolling out a fix for this issue, and expect resolution no later than 10:45 pm Eastern time. We will update another within the next 60 minutes
We are continuing to work on rolling out a fix for this issue, but do not yet have an ETA for resolution. We will update again within the next 30 minutes
We have identified the issue impacting these products and are implementing a fix. We will post another update within 30 minutes.
We are investigating increased latency processing some data in Logs, Trace Analytics, Synthetics, and RUM. As a result of this issue, some users may see delays or gaps for data in their queries in these products. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Report: "Delayed Metrics"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are investigating increased latency processing Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Report: "Delayed Monitors Notifications"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating delays in Monitors Notifications, which began at 15:07.
Report: "Degraded Web Application Performance"
Last updateThis incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.
We have identified the underlying issue and are continuing to work on a fix. Degraded web application performance is primarily observed in customers with low network bandwidth.
We have identified the underlying issue and are working on a fix.
We are investigating degraded performance with the web application.
Report: "Elevated Error Rates for Log Management"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are actively investigating elevated error rates for Log Management, Events, and RUM. As a result of this issue, some users may experience errors querying logs, events or RUM sessions, and may see missing context in monitor alerts that should contain logs or events.
Report: "Public Website unavailable"
Last updateThis incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.
We are currently investigating this issue.
Report: "Delayed Monitors Notifications"
Last updateThis incident has been resolved.
The issue causing delays in Monitor Notifications for monitors with metric queries over time windows longer than one hour has been largely resolved. The impact is now limited to a small percentage of monitors affecting a subset of customers. We are closely monitoring the situation and will provide a final update once the issue is fully resolved.
We have deployed a fix and are currently monitoring the results. Further updates will be provided as we continue to make progress toward full recovery.
We are investigating delays in Monitors Notifications for metric queries with time windows longer than 1 hour, which began at 7:35 PM UTC.
Report: "Metric monitor evaluations and Log Archives degraded"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are experiencing issues with processing cloud integrations which is resulting in delayed integration metrics for aws.* metrics and issues with Logs Archives. We have disabled notifications relying on these metrics. We are investigating the issue and will provide additional information as it becomes available.
Report: "Web Application / API Issues"
Last updateThis incident has been resolved.
We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application, or pages will be missing certain elements. Please note that data processing and alerts are not affected by this incident.
Report: "Delayed Metrics"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating increased latency processing Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Report: "Issues with Dashboards"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating an issue with loading dashboards. As a result, dashboards may be unreachable.
Report: "Metrics Issues"
Last updateThe incident has been resolved.
The issue has been identified and a fix is being implemented
We are investigating increased latency processing Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Report: "Delayed Distribution Metrics"
Last updateMonitors and live data are no longer affected and all live data is shown without delays. Unfortunately we are not able to recover data for distribution metrics with percentiles disabled from 19:00-19:40 UTC and those metrics will display gaps. You can see your affected metrics here: https://app.ddog-gov.com/metric/summary?facet.percentiles=-enabled&filter=dist.
Remediation for this issue is still in progress. Monitors are no longer affected, and live data are no longer delayed, but some users might continue to see gaps in historical data on metrics and dashboards until remediation is complete.
The issue has been identified and a fix is being implemented.
We are investigating increased latency processing Distribution Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Report: "Delays in Metrics"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are identifying a delay in metrics processing. This impacts dashboards and metrics-based monitors.
Report: "Web Application Not Loading"
Last updateThis incident has been resolved.
We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application. Please note that data processing and alerts are not affected by this incident.
Report: "Degraded Web Application Performance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating degraded performance with the web application.
Report: "Logs query results returning unexpected results for some queries and monitors"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Fix has been rolled out and we are currently monitoring to confirm full resolution.
We have successfully tested a fix for this issue and are currently deploying it to resolve this incident.
We're still working on a fix for historical data impacted by this incident.
We're still working on a fix for historical data impacted by this incident.
We have identified an issue for querying on logs timestamped between 2023-10-02 16:40 UTC and 2023-10-03 20:40 UTC. A fix for this issue is currently being tested. All queries on logs with timestamps from outside of this timeframe are working as expected.
Report: "Delays in processing log monitors"
Last updateThis incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.
We are investigating delays in Monitors Notifications for log monitors, which began at 16:18 UTC.
Report: "Delayed Metrics"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating increased latency processing Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Report: "Login Errors"
Last updateThis incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.
We are continuing to investigate this issue.
We are investigating user login issues with the web application. Please note that data processing and alerts are not affected by this incident.
Report: "Issues with Loading Dashboards"
Last updateThis issue is now resolved. Our investigation revealed that only a small subset of customers were impacted by this incident.
We've partially mitigated the issue, and Logs pages are now fully available. Errors loading Dashboards persist while we continue to work towards full resolution
We've identified an issue causing failures on a subset of pages on the web application, including Dashboards and Logs Explorer. Data ingestion and alerting are unaffected
We are investigating an issue with loading dashboards. As a result, dashboard pages may return errors.
Report: "Degraded Web Application Performance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating degraded performance with the web application.
Report: "Delayed Monitors Notifications"
Last updateThis incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.
We are continuing to investigate this issue.
We are continuing to investigate the issue.
We are investigating delays in Monitors Notifications, which began at 21:17.
Report: "Elevated Error Rates for Metrics Queries"
Last updateThis incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the service is fully operational.
We are actively investigating elevated error rates for Metrics Queries. As a result of this issue, some users may see errors with metrics graphs on the web application or API.
Report: "[SSO] Login Errors"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating user login issues with the web application [via SSO]. We are investigating an issue causing the "Login with SAML" button to not appear for some users. While we work on a fix, users may contact support@datadoghq.com to get the correct link to log-in with SAML
Report: "Synthetics UI Errors"
Last updateThis incident has been resolved.
We've identified the issue and are working on a fix. Synthetics tests and monitors are continuing to operate without impact.
We are seeing errors when trying to use the synthetics UI that we are investigating. Synthetics tests and monitoring are continuing to run without issue.
Report: "Delayed Monitors Notifications"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating delays in Monitors Notifications, which began at 2240 UTC
Report: "Data processing delays"
Last updateThis incident has been resolved.
We are investigating increased latency processing data. As a result of this issue, some users may see delays or gaps in the impacted products.
Report: "Live Process Monitors"
Last updateThis incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.
We are investigating increased latency processing Processes data. As a result of this issue, some users may see delays or gaps for data based on Process Monitoring.
Report: "GovCloud Crawler Delayed Data"
Last updateWe have observed AWS API issues in us-gov-west-1 for RDS and Elasticache services. As an effect, there was a delay in displaying these metrics. We have confirmed that we backfilled the data and that there is no data loss due to this incident.
Report: "Delayed Cloud Provider Metrics"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
We are investigating increased latency processing all Cloud Provider metrics. We have identified the issue and are working on a fix. As a result of this issue, some users may see delays or gaps in graphs that contain these metrics. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Report: "Delayed Monitors Notifications and cloud provider metrics"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are investigating delays in Monitors Notifications, which began at 12:55PM UTC, as well as increased latency processing Cloud providers Events.
Report: "Issues with loading Infrastructure List and Map"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are investigating issues with loading the infrastructure lists and maps. Please note that data processing and alerts are not affected by this incident.
Report: "Web Application Not Loading"
Last updateThis incident has been resolved.
We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application. Please note that data processing and alerts are not affected by this incident.
Report: "Web Application Not Loading"
Last updateWe are now recovered for live data and monitors. At this point, customers might still be seeing gaps in metrics data between 1:25 and 3:25 EDT, we will be following up with specific affected customers through our usual support channels.
Our provider has resolved the underlying network issue. We are now scaling up our systems to handle the backlog.
Network connectivity issues in the region continuing to cause issues loading the application, delaying data and alerts in the Datadog Govcloud region. Our provider has acknowledged the issue and is working to resolve it. We have also begun our own mitigations.
Network connectivity issues in the region are still causing issues loading the application, delaying data and alerts in the Datadog Govcloud region. Our provider has acknowledged the issue and is working to resolve it.
We are continuing to investigate this issue with our provider. Network connectivity issues in the region are still causing issues loading the application, delaying data and alerts in the Datadog Govcloud region.
We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application.
Report: "Web Application is currently unavailable"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Metrics Processing Delays"
Last updateThis incident has been resolved.
All metrics based monitors are fully restored and we are continuing to restore log based monitors as we work through relevant backlogs.
All crawler based metrics are fully restored and current.
Realtime data should be available for all customers and we are actively working to restore delayed historical data. Most metric and logs based monitors are restored and we are working to restore the remainder shortly.
The web application is responding without errors. We are still working to restore timely processing of tracing, logs, metrics and alerts.
We have identified a performance issue affecting Metrics, Logs, APM and alerting. We are working to restore access.
We’re investigating increased metric latencies and API errors. Graphs may be delayed. To avoid spurious alerts, we’ve temporarily disabled alerts for Metrics Monitors.
Report: "Certain types of monitors are not being evaluated"
Last updateMonitors of type APM, Logs and RUM did not evaluate between 15:30 and 16:49 UTC. We’ve identified the problem and implemented a fix, this incident is now resolved.
Report: "Metrics Processing Delayed"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results, we will update status when we have confirmed that errors rates have returned to normal.
We have identified the underlying issue causing metric delays We are working on rolling out a fix for the underlying issue.
We are investigating increased latency in Metrics processing. As a result of the latency, metrics on graphs may be delayed. To prevent spurious alerts, we have temporarily disabled notifications for Metric monitors. Please note that all data will be backfilled once services are operational.
Report: "Elevated error rates on the web application"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating an issue with an elevated error rate on the web application. It's important to note that monitoring data is properly processed and that no data is lost.
Report: "Elevated Web Application and API Errors"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
We are continuing to work on a fix for this issue.
We're experiencing an elevated level of errors in synthetics intake.