Datadog US3

Is Datadog US3 Down Right Now? Discover if there is an ongoing service outage.

Datadog US3 is currently Operational

Last checked from Datadog US3's official status page

Historical record of incidents for Datadog US3

Report: "Delayed Monitors Notifications"

Last update
identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating delays in Monitors Notifications, which began at 15:36 UTC.

Report: "Delayed Monitors Notifications"

Last update
investigating

We are investigating delays in Monitors Notifications, and submission errors for customers using private link, which began at 0837 UTC.

Report: "Delayed Metrics"

Last update
Resolved

This incident has been resolved.

Monitoring

A fix has been implemented and we are monitoring the results, full recovery for all impacted metrics is estimated to take up to 60 minutes. We will provide an update if recovery happens sooner.

Identified

The issue has been identified and a fix is being implemented.

Investigating

We are investigating increased latency processing Metrics impacting metrics generated from APM (traces), Logs, Synthetics, RUM, Containers, Integrations, DBM, Estimated Usage Metrics, and Distribution Metrics.As a result of this issue, some users may see delays or gaps for metrics on graphs.To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Delayed Metrics"

Last update
investigating

We are investigating increased latency processing Metrics impacting metrics generated from APM (traces), Logs, Synthetics, RUM, Containers, Integrations, DBM, and Distribution Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating delays in Distribution Monitors Evaluation, which began at 5:30pm UTC. Monitors for other types of metrics are evaluating as usual.

Report: "Delayed Monitors Notifications"

Last update
Investigating

We are investigating delays in Monitors Evaluation, which began at 5:30pm UTC.

Report: "Metrics delayed"

Last update
resolved

This incident has been resolved.

monitoring

All metrics data during the impacted window is available. We will being re-enabling monitors with an evaluation window greater than 60 minutes. Monitors with an evaluation window of less than 60 minutes continue to be evaluated.

monitoring

We have identified the issues, and are backfilling data. Monitors with an alert window of one hour or less have been restored, and live metrics data is available

identified

We are continuing to work on a fix for this issue.

identified

For the period May 2, 2025, 11:25 - 13:00 UTC, metrics are delayed. We are backfilling the data for that time period and anticipate no data loss. Metric monitors that include data between 11:25 - 13:00 UTC time range are delayed. Metric queries and metrics monitors evaluating data after 13:00 UTC are correct and working as expected.

investigating

All metric monitor notifications have been delayed starting at 14:57 UTC. We are working on identifying the issue.

identified

We are continuing to work on a fix for this issue.

identified

For the period May 2, 2025, 11:25 - 13:00 UTC, metrics are delayed. We are backfilling the data for that time period and anticipate no data loss. Metric monitors that include data in that time range are delayed. Metrics after 13:00 UTC are correct, and metric monitors that only consider that timeframe are working properly.

monitoring

For the period May 2, 2025, 11:25 - 13:00 UTC, metrics are delayed. We are backfilling the data for that time period and anticipate no data loss. Metric monitors that include data in that time range are delayed. Metrics after 13:00 UTC are correct, and metric monitors that only consider that timeframe are working properly.

investigating

We are continuing to investigate this issue.

investigating

We’re investigating increased metric latencies. Graphs may be delayed. To avoid spurious alerts, we’ve temporarily disabled alerts for Metric Monitors.

Report: "Metrics delayed"

Last update
Investigating

We’re investigating increased metric latencies. Graphs may be delayed. To avoid spurious alerts, we’ve temporarily disabled alerts for Metric Monitors.

Report: "Login Issues"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating user login issues related to reCAPTCHA for customers using password login. If you experience an issue with reCAPTCHA, refreshing the page can often mitigate the issue. Please note that data processing and alerts are not affected by this incident.

Report: "Login Issues"

Last update
Resolved

This incident has been resolved.

Monitoring

A fix has been implemented and we are monitoring the results.

Update

We are continuing to work on a fix for this issue.

Identified

The issue has been identified and a fix is being implemented.

Investigating

We are investigating user login issues related to reCAPTCHA for customers using password login. If you experience an issue with reCAPTCHA, refreshing the page can often mitigate the issue. Please note that data processing and alerts are not affected by this incident.

Report: "Delayed Log Monitors Notifications"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are investigating delays in Log Monitors Notifications, which began at 1:40 PM UTC.

Report: "Delayed Metrics Based Monitor Notifications"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating delays in metrics based Monitor Notifications, which began at 20:20 UTC.

Report: "Degraded Web Application Performance"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and are continuing to work on a fix. Degraded web application performance is primarily observed in customers with low network bandwidth.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are investigating degraded performance with the web application.

Report: "Degraded Web Application Performance"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating degraded performance with the web application.

Report: "Web UI features maybe hidden"

Last update
resolved

This incident has been resolved. Please refresh your Datadog web page to resolve the issue completely.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating an issue, that is causing certain features to be hidden from our UI. There is no data loss or monitoring impact.

Report: "Metric monitor evaluations delayed for aws.* metrics"

Last update
resolved

This incident has been resolved.

monitoring

Monitor evaluations for aws.* metrics are no longer affected, we will continue monitoring the recovery

identified

The issue has been identified and a fix is being implemented.

investigating

We are experiencing issues with processing cloud integrations which is resulting in delayed integration metrics for aws.* metrics. We have disabled notifications relying on these metrics. We are investigating the issue and will provide additional information as it becomes available.

Report: "CI Visibility - Page Load issue"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We have identified an issue that prevents most Software Delivery pages from loading. Also, Intelligent Test Runner, Quality Gates and GitHub PR comments are affected

Report: "APM - degraded performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating an issue in executing trace queries, the team is working on a fix

Report: "Web Application Not Loading"

Last update
resolved

This incident has been resolved

monitoring

A workaround has been implemented and we are monitoring the results.

investigating

We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application. Please note that data processing and alerts are not affected by this incident.

Report: "Issues with data ingesting and alerting"

Last update
resolved

This incident has been resolved.

monitoring

Remediation efforts continue. RUM and Application Vulnerability Management are operational again. Cloud SIEM and Cloud Security Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

monitoring

Remediation efforts continue. Profiling is operational again. RUM, Cloud SIEM, Cloud Security Management, and Application Vulnerability Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

monitoring

We continue to deploy fixes and are monitoring the results. We will provide another update once the issue is fully resolved.

monitoring

We continue to deploy fixes and are monitoring the results. RUM, Profiling, Cloud SIEM, Cloud Security Management, and Application Vulnerability Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

monitoring

We continue to deploy fixes and are monitoring the results. We will provide another update once the issue is fully resolved.

monitoring

APM is operational at this time, and alerting based on APM data has also resumed. RUM, Profiling, Cloud Security Management, and Application Vulnerability Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

monitoring

We have deployed a fix and are monitoring the results. Certain data types (Logs, NPM, and Synthetics) are operational again, and alerting from those types has also resumed. APM, RUM, Profiling, Cloud Security Management, and Application Vulnerability Management, as well as alerting off these data types, continue to be impacted. We will provide another update once the issue is fully resolved.

investigating

We are investigating an issue with ingesting data which began around 20:40 UTC. As a result, data from Log Management, APM, Synthetics, Profiling, RUM, CSM, and NPM is delayed. Additionally, monitors derived from this data are delayed.

Report: "We are investigating user login issues with the web application"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are continuing to investigate this issue.

investigating

We are investigating user login issues with the web application login by email. Please note that data processing and alerts are not affected by this incident.

Report: "Delayed events for Logs, Synthetics and Container"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating increased latency processing Logs, Synthetics Test Results and Container updates. As a result of this issue, some users may see delays or gaps for data in their logs queries and synthetics tests status. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Elevated Error Rates for Error Tracking"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results.

investigating

We are investigating increased errors in Error Tracking processing. As a result of this issue, some users may experience gaps in Error Tracking updates and alerts.

Report: "Delayed Monitors Notifications"

Last update
resolved

All monitor notifications have caught up.

investigating

We are continuing to investigate this issue.

investigating

We are investigating delays in Monitors Notifications, which began at 7:37 UTC.

Report: "Elevated Error Rates for Monitors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented. Monitors have recovered and live data is now available. Backfilling is ongoing for historical data.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are actively investigating elevated error rates for Monitors. As a result of this issue, some users may experience issues in addition with CI Visibility, NDM, NPM, Profiling, RUM and Synthetics

investigating

We are actively investigating elevated error rates for Monitors. Metrics monitors are not affected.

Report: "Delays in Processes Monitors Evaluation (Other Monitor types unaffected)"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating delays in Processes Monitors Evaluation, which began at 15:15 UTC. Note that this only affects monitors based on processes product. All other monitor types are unaffected

Report: "User Login Issues and Delayed Synthetics/APM/Sketch Metrics"

Last update
resolved

This incident has been resolved.

monitoring

The majority of metrics processing delays have been resolved and we are actively monitoring the recovery for the remaining metrics.

identified

We have continuing to work on recovery. We will provide another update at 9:45 pm EST.

identified

We have continuing to work on recovery. We will provide another update at 9:15 pm EST.

identified

We have continuing to work on recovery. We will provide another update at 8:45 pm EST.

identified

We have identified the issue and are working towards recovery. We will provide another update at 8:15 pm EST.

investigating

We are continuing to investigate this issue.

investigating

We are investigating increased latency processing synthetics metrics, APM metrics, and sketch metrics. As a result, notifications for these monitor types may be delayed.

Report: "Elevated Errors for API Key Validation"

Last update
resolved

From 12:45-1:15 PM US EST Datadog’s endpoint to validate Datadog API keys was unavailable. During this window Datadog Agents would be unable to validate their API key. In all cases Agents would continue to send data. Some Agents running in Kubernetes may be marked unhealthy until restarted. Newly started Agents would fail to start. Build jobs using our CI Visibility product would be missing custom tags and measures.

Report: "Elevated Error Rates for Metrics Queries"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are actively investigating elevated error rates for Metrics Queries. As a result of this issue, some users may see errors with metrics graphs on the web application or API. Metrics monitors evaluations are also delayed as a consequence.

Report: "Elevated Error Rates for Metric Monitors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are actively investigating elevated error rates for Metric Monitors. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Delayed monitor evaluation and query failures across multiple products"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

We experienced an increased rate of query failure across multiple products including Logs Management, APM, RUM, Synthetics, CI Visibility, Error Tracking, Audit Logs, Database Monitoring and NPM. This resulted in delayed monitor evaluation and notification for a subset of monitors. We are monitoring recovery. Metric Monitors and queries are unaffected by this incident.

investigating

We are currently investigating this issue.

Report: "[SSO] Login Errors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating user login issues with the web application. Please note that data processing and alerts are not affected by this incident.

Report: "Data delays and web errors"

Last update
resolved

This incident has been resolved.

monitoring

All components have recovered now. We are replaying a few failed notifications.

identified

We are continuing to work on a fix for this issue.

identified

We have identified an issue which caused temporarily elevated error rates on our web application (500 error pages) and increased latency processing monitoring data. As a result of this issue, customers might still some metrics are delayed as well as monitors relying on this data. We are currently working on a fix and the data is being backfilled. We will provide another update once the service is fully operational again.

Report: "Delays in Logs ingestion"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and are working on a fix. It is important to note that no data has been lost, and notifications will be caught up once the service is operational again.

investigating

We are investigating delays in Logs Ingestion which will delay log data and log-based monitors notifications. This began at 21:56 UTC.

Report: "Elevated Access Denied errors"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and are working on a fix. Users should be able to access the Datadog web application at this time, but may still see occasional errors.

investigating

We are investigating elevated error rates on our web application. As a result, some users might be getting Access Denied errors when loading the web application. Please note that data processing and alerts are not affected by this incident.

Report: "Elevated Error Rates for Log Queries and Monitors"

Last update
resolved

This incident has been resolved.

monitoring

Fix has been rolled out and we are currently monitoring to confirm full resolution.

monitoring

We have successfully tested a fix for this issue and are currently deploying it to resolve this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved. At this time, newly ingested data is properly queryable, and monitors targeting Logs sent from 2023-10-03 20:40 UTC onwards are valid. Queries targeting logs between 2023-10-02 11:40 UTC and 2023-10-03 20:40 UTC may return erroneous data. We are evaluating a fix that will restore query correctness for this time-window.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are continuing to investigate these issues, and will provide an update as soon as possible.

investigating

We are actively investigating issues with Log Queries returning unexpected results. As a result of this issue, some users may experience issues querying logs on the web application or API, and with Logs based Monitors and Log-Based Metrics.

Report: "Delayed Metrics"

Last update
resolved

This incident has been resolved.

identified

Metrics are no longer delayed for all customers

identified

We are continuing to work on a fix for this issue.

identified

We are investigating increased latency processing Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Delayed Azure Native Integration Metrics"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We’re actively investigating increased latencies for collecting Azure Native Integration metrics due to third party errors. As an effect, there might be delays in graphs displaying these metrics. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Delayed Synthetic Browser Test Results"

Last update
resolved

We have scaled up the underlying system and we no longer observe latency in synthetic browser test results.

investigating

We have identified an issue that resulted in an increased latency executing Synthetics browser tests. As a result of this issue, some users may experience delays in receiving test results and notifications.

Report: "Degraded Web Application Performance"

Last update
resolved

This incident has been resolved.

monitoring

We have identified the underlying issue, and are recovering. We are monitoring the recovery and will provide another update once the issue is fully resolved.

investigating

We are continuing to investigate this issue.

investigating

We are investigating loading issues on our web application and delays in ingesting metrics data and evaluating monitors on this data, which began at 18:51 UTC.

investigating

We are investigating degraded performance with the web application.

Report: "Some monitor notifications are delayed"

Last update
resolved

This incident has been resolved.

monitoring

We are investigating delays in Monitors Notifications, which began at 3:43 AM UTC. This only impacts monitors which rely on APM trace distribution metrics. We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

Report: "Delayed Azure Native Integration Metrics"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We’re actively investigating increased latencies for collecting Azure Native Integration metrics due to third party errors. As an effect, there might be delays in graphs displaying these metrics. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Delayed Metrics"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have identified an issue causing increased latency processing Metrics and are working on a fix. As a result of this issue, some users may see delays or gaps for metrics on graphs. Graphs may be delayed. To avoid spurious alerts, we’ve temporarily disabled “no data” alerts for Metric Monitors

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved. Delays may have been observed for a subset of Distribution Metrics Monitor notifications between 22:30 and 00:56 UTC.

investigating

We are investigating delays in Monitors Notifications for distribution metrics, which began at 22:30 UTC.

Report: "Delayed Synthetics tests results"

Last update
resolved

Backfill is finished. This incident has been resolved.

monitoring

All services are fully operational and processing live data. We have started to backfill Synthetics tests results and will provide another update once the backfills are finished.

monitoring

We have deployed a fix and we are monitoring the results. It is important to note that no data has been lost, and it will be backfilled and available once the service is operational again. We will provide another update once the issue is fully resolved.

identified

We have identified an issue that resulted in an increased latency processing Synthetics tests results and are working on a fix. As a result of this issue, some users may see delays with test results and in notifications based on this test data.

Report: "Web application performance degraded"

Last update
resolved

This incident has been resolved.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are investigating loading issues on our web application. As a result, some users might be getting errors or degraded performance when loading the web application, specifically on dashboards.

Report: "Web Application Not Loading"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified. Synthetic tests may be temporarily running with an outdated configuration and new Synthetic tests may not start immediately.

investigating

We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application. Please note that data processing and alerts are not affected by this incident.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating delays in Monitors Notifications, which began at 21:24 UTC.

Report: "Backfilling historical data for March 8, 2023 incident"

Last update
resolved

We have finished backfilling data across all products: all data received during the incident that had been successfully buffered but unprocessed, is now fully accessible on the platform. Due to the nature of this outage, you may see some residual gaps in the data we received within the first few hours after the start of the incident. We truly appreciate your patience and understanding during this incident.

monitoring

We have completed backfill of data for the following products * Database Monitoring * Serverless Monitoring We are now in the process of validating and verifying data across all customers in those products. For other products, we are actively working on backfilling data and will provide updates every 2 - 3 hours until the backfill effort is complete and the incident is fully resolved.

monitoring

We have also completed backfilling data for the following products: RUM We are now in the process of validating and verifying data across all customers in those products. For other products, we are actively working on backfilling data and will provide updates every 2 - 3 hours until the backfill effort is complete and the incident is fully resolved.

monitoring

We have completed backfill of data for the following products: * APM traces and services * Logs * Network Performance Monitoring * Network Device Monitoring * Profiling * CI Visibility and are now in the process of validating and verifying data across all customers in those products. For other products, we are actively working on backfilling data and will provide updates every 2 - 3 hours until the backfill effort is complete and the incident is fully resolved.

monitoring

All Datadog services are now available and able to receive, query, and report on live data. Monitors continue to be evaluated correctly since live data has been restored. Some customers may still observe gaps in historical data for parts of the last 24 hours. We are now working on backfilling data and will provide updates every 2 - 3 hours until the backfill effort is complete and the incident is fully resolved.

monitoring

All Datadog services are now available and able to receive, query, and report on live data. Monitors continue to be evaluated correctly since live data has been restored. Some customers may still observe gaps in historical data for parts of the last 24 hours. We are now working on backfilling data and will provide updates every 2 - 3 hours until the backfill effort is complete and the incident is fully resolved.

monitoring

Monitors continue to be evaluated correctly since live data has been restored. Unless noted otherwise, all Datadog services are now available and able to receive and query live data. Some customers may still observe gaps in historical data for certain products for parts of the last 24 hours. We are now working on backfilling data and will provide updates every 2 - 3 hours until the backfill effort is complete and the incident is fully resolved.

monitoring

APM Traces and Error Tracking are operational. We will continue to monitor progress towards recovering the remaining services. Unless noted otherwise, all Datadog services are now available and able to receive, query, and report on live data. Some customers may still observe gaps in historical data for certain products for parts of the last 24 hours. We are now working on backfilling data and will provide updates every 2 - 3 hours until the backfill effort is complete and the incident is fully resolved.

monitoring

Unless noted otherwise, all Datadog services are now available and able to receive, query, and report on live data. Some customers may still observe gaps in historical data for certain products for parts of the last 24 hours. We are now working on backfilling data and will provide updates every 2 - 3 hours until the backfill effort is complete and the incident is fully resolved.

monitoring

Unless noted otherwise, all Datadog services are now available and able to receive, query, and report on live data. Some customers may still observe gaps in historical data for certain products for parts of the last 24 hours. We are now working on backfilling data and will provide updates every 2 - 3 hours until the backfill effort is complete and the incident is fully resolved.

identified

APM Traces and Error Tracking are operational. We will continue to monitor progress towards recovering the remaining services.

identified

Security Monitoring is operational. SLOs are operational. Cloud Integrations are operational. Profiling recent data is available for queries. We will continue to monitor progress towards recovering the remaining services.

identified

RUM is fully operational. We will continue to monitor progress towards recovering the remaining services.

identified

Logs Management is operational, live data and alerting are back to normal. External Archives and Log Forwarding are still delayed. Metrics are fully operational. Serverless monitoring is operational. We will continue to monitor progress towards recovering the remaining services.

identified

Network Device Monitoring is fully operational. Metrics generated from Logs are now available. We will continue to monitor progress towards recovering the remaining services.

identified

We're in the process of enabling metric alerts for some customers for time windows less than 1 hour. Network Performance Monitoring is fully operational. Event Management is fully operational. Error Tracking is partially available. We will continue to monitor progress towards recovering the remaining services.

identified

The Synthetics product is fully operational. We're seeing partial recovery for Serverless Monitoring, as well as metrics from our cloud provider integrations. We will continue to monitor progress towards recovering the remaining services.

identified

Monitors for Logs and Service Checks are operational. Database Monitoring is operational. We will continue to monitor progress towards recovering the remaining services.

identified

Live data is now available for Logs, and CI Visibility is fully operational. We're seeing partial recovery for Watchdog. We will continue to monitor progress towards recovering the remaining services. Data ingestion and monitor notifications remain delayed across non-metric data types.

identified

We are continuing to work on a fix for this issue.

identified

Live Search on last 15 mins for APM Traces is recovered. We will continue to monitor progress towards recovering the remaining services. Data ingestion and monitor notifications remain delayed across non-metric data types.

identified

We're seeing partial recovery across several products including Security Monitoring, CI Visibility and Network Performance Monitoring. These products may have gaps in data and partial limitations based on data available to monitors. We will continue to monitor progress towards recovering the remaining services. Data ingestion and monitor notifications remain delayed across non-metric data types.

identified

We're seeing partial recovery across several products including SLOs and Logs. These products may have gaps in data and partial limitations based on data available to monitors. We will continue to monitor progress towards recovering the remaining services. Data ingestion and monitor notifications remain delayed across non-metric data types.

identified

Processes and their respective monitors, and Metrics are operational in US3. There may be gaps in historical metric data. We continue progress towards recovering the remaining services. Data ingestion and monitor notifications remain delayed across non-metric data types.

identified

We are continuing to make progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

At 06:00 UTC on March 8th, 2023 the Datadog platform started experiencing widespread issues across multiple products and regions . The web application was unavailable or intermittently loading, and data ingestion & monitor evaluation were delayed. We will share a more detailed analysis post-recovery, but at a very high level: A system update on a number of hosts controlling our compute clusters caused a subset of these hosts to lose network connectivity As a result a number of the corresponding clusters entered unhealthy states and caused failures in a number of the internal services, datastores and applications hosted on these clusters. Our current status is: We identified and mitigated the initial issue, and rebuilt our clusters We also have recovered a number of our applications and services, including our web portals We are now working on recovering and catching-up the rest of our data systems for metrics, traces and logs across the regions that are still affected (see region-specific status pages). The recovery work is currently constrained by the number and large scale of the systems involved. What to expect next: We are focusing on bringing back live data for all customers and all products before catching-up on any historical data we may have stored during the outage We expect live data recovery in a matter of hours (not minutes, and not days) We will continue to issue regular updates as the situation unfolds We understand how critical Datadog is to your business, we sincerely apologize for the inconvenience and we are working hard to resolve this issue.

identified

We are continuing to make progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

We are continuing to make progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

We are continuing to make progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

We are continuing to make progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

We continue progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

We continue progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

We continue progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

We are still working on the identified issue and are making continued progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

We are still working on the identified issue and are making continued progress towards recovering all services. Data ingestion and monitor notifications remain delayed across all data types.

identified

We are still working on the identified issue and are making continued progress towards recovery. Data ingestion and monitor notifications remain delayed across all data types.

identified

We are still working on the identified issue and are making continued progress towards recovery. Data ingestion and monitor notifications remain delayed across all data types.

identified

We are still working on the identified issue and are making continued progress towards recovery. Data ingestion and monitor notifications remain delayed across all data types.

identified

We have identified the issue, and are making continued progress towards recovery. Data ingestion and monitor notifications remain delayed across all data types.

identified

We are continuing to work on mitigating and investigating the issue causing delayed data ingestion across all data types. Monitor notifications are delayed, and you may observe delayed data throughout the app. Additionally, the web application continues to have elevated error rates.

investigating

We are continuing to work on mitigating and investigating the issue causing delayed data ingestion across all data types. Monitor notifications are delayed, and you may observe delayed data throughout the app. Additionally, the web application continues to have elevated error rates.

investigating

We are continuing to work on mitigating and investigating the issue causing delayed data ingestion across all data types. Monitor notifications are delayed, and you may observe delayed data throughout the app. Additionally, the web application continues to have elevated error rates.

investigating

We are continuing to investigate this issue.

investigating

We are still investigating issues causing delayed data ingestion across all data types. Monitor notifications may be delayed, and you may observe delayed data throughout the web app.

investigating

We are still investigating issues causing delayed data ingestion across all data types. Monitor notifications may be delayed, and you may observe delayed data throughout the web app.

investigating

We are investigating issues causing delayed data ingestion across all data types. As a result monitor notifications may be delayed, and you may observe delayed data throughout the web app.

investigating

We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application.

Report: "GCP metrics delayed"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an issue with our metrics collection from Google Cloud Platform. Metrics collected from the Google Cloud Platform may be delayed.

Report: "Delayed Events"

Last update
resolved

This incident has been resolved. Remaining data are being processed.

monitoring

We are continuing to monitor for any further issues. Backfilling is still in progress.

monitoring

A fix has been implemented and we are monitoring the results. Recent data are being processed normally, older data impacted by the incident are currently being backfilled.

identified

We have identified the underlying issue and are working on a fix. It is important to note that no data has been lost, and it will be backfilled and available once the service is operational again.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate the issue. Notifications are back to normal for all users, except for the ones sent to Microsoft Teams.

investigating

We are investigating delays in Monitors Notifications which impacts a subset of customers. It began at 07:10am UTC on 25th of Jan 2023.

Report: "[SSO] Login Errors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating user login issues with the web application [via SSO]. We are investigating an issue causing the "Login with SAML" button to not appear for some users. While we work on a fix, users may contact support@datadoghq.com to get the correct link to log-in with SAML

Report: "Issue processing cloud integration data."

Last update
resolved

This incident has been resolved.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are experiencing issues with processing cloud integrations which is resulting in delayed integration metrics and delays processing xray traces. We have disabled notifications relying on these metrics. We are investigating the issue and will provide additional information as it becomes available.

Report: "Delayed Metrics"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating increased latency processing Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Delayed Monitors Notifications and Events"

Last update
resolved

This incident has been resolved.

investigating

Delays in Monitor notifications are now resolved as of 03:15 UTC. We continue to investigate delays with Events and are investigating our cloud service provider.

investigating

We are continuing to investigate this issue with our cloud service provider.

investigating

We continue to investigate delays in Monitors Notifications and Events, and we have raised the issue with our cloud service provider for further investigation.

investigating

We are investigating delays in Monitors Notifications and Events, which began at 00:33 UTC.

Report: "Composite monitors evaluations are failing"

Last update
resolved

This incident has been resolved. Composite monitors are evaluated again since 8:44am GMT.

identified

Between 7:27am GMT and 7:57am GMT half of the composite monitors were not evaluated for all customers. Since 7:57am GMT none of them are evaluated. Other type of monitors are not affected. We have identified the issue and a fix is being implemented.