Datadog US1

Is Datadog US1 Down Right Now? Check if there is a current outage ongoing.

Datadog US1 is currently Operational

Last checked from Datadog US1's official status page

Historical record of incidents for Datadog US1

Report: "Delayed processing of APM Trace Metrics"

Last update
investigating

We are investigating delayed processing of APM Trace metrics starting around 21:40 UTC. Dashboards and monitors relying on these metrics are affected.

Report: "Elevated error rates in queries across multiple products"

Last update
investigating

We are actively investigating issues querying data affecting multiple products. As a result of this issue, there might be errors when trying to load data from queries on different pages of the web application or through the API.

Report: "Monitors - Delayed Evaluation"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating delays in Distribution Monitors Evaluation, which began at 5:30pm UTC. Monitors for other types of metrics are evaluating as usual.

Report: "Delayed Traces and Spans in APM"

Last update
resolved

The incident is now resolved. APM trace ingestion and all downstream systems, including monitors, have fully recovered and are up to date.

monitoring

We are monitoring a fix with to increased latency processing in APM Metrics. APM data in live view is current but distributed tracing metrics are delayed by 20 minutes. Monitors sourced from the data are impacted until the data becomes current.

investigating

As a result of the issue we are monitoring delays in Monitors Evaluation

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating increased latency processing Traces and Spans in APM As a result of this issue, some users may see missing or delayed traces and Spans starting at 18:33 UTC.

Report: "Delayed Traces and Spans in APM"

Last update
Investigating

We are investigating increased latency processing Traces and Spans in APMAs a result of this issue, some users may see missing or delayed traces and Spans starting at 18:45 UTC.

Report: "Delayed AWS Metrics and Events"

Last update
resolved

This incident has been resolved.

identified

A fix has been implemented and recovery is in progress. To prevent spurious alerts, monitors on AWS Metrics and Events remain disabled until recovery is complete.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating increased latency processing AWS metrics and events. As a result of this issue, some users may see delays or gaps in graphs that contain these metrics and events. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Delayed AWS Metrics and Events"

Last update
Investigating

We are investigating increased latency processing AWS metrics and events.As a result of this issue, some users may see delays or gaps in graphs that contain these metrics and events.To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Monitors - Delayed Evaluation"

Last update
resolved

This incident has been resolved.

investigating

The incident has fully recovered. The service is now fully operational.

investigating

We are investigating delays in Monitors Evaluation, which began at 12:45 UTC.

Report: "Monitors - Delayed Evaluation"

Last update
Investigating

We are investigating delays in Monitors Evaluation, which began at 12:45 UTC.

Report: "Delayed processing of APM Trace Metrics"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating delayed processing of APM Trace metrics starting around 07:00 UTC. Dashboards and monitors relying on these metrics are affected.

Report: "Delayed processing of APM Trace Metrics"

Last update
Resolved

This incident has been resolved.

Update

We are continuing to monitor for any further issues.

Monitoring

A fix has been implemented and we are monitoring the results.

Identified

The issue has been identified and a fix is being implemented.

Investigating

We are investigating delayed processing of APM Trace metrics starting around 07:00 UTC. Dashboards and monitors relying on these metrics are affected.

Report: "Login Issues"

Last update
resolved

This incident has been resolved.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating user login issues related to reCAPTCHA for customers using password login. If you experience an issue with reCAPTCHA, refreshing the page can often mitigate the issue. Please note that data processing and alerts are not affected by this incident.

Report: "Login Issues"

Last update
Resolved

This incident has been resolved.

Update

We are continuing to work on a fix for this issue.

Identified

The issue has been identified and a fix is being implemented.

Investigating

We are investigating user login issues related to reCAPTCHA for customers using password login. If you experience an issue with reCAPTCHA, refreshing the page can often mitigate the issue. Please note that data processing and alerts are not affected by this incident.

Report: "Delayed Processing for a Subset of Metrics"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and continue to work on a fix. It is important to note that no data has been lost: data is being backfilled and will be available once the service is operational again.

identified

We have identified the underlying issue and continue to work on a fix. It is important to note that no data has been lost: data is being backfilled and will be available once the service is operational again.

identified

We have identified the underlying issue and continue to work on a fix. It is important to note that no data has been lost, and it will be backfilled and available once the service is operational again.

identified

We have identified the underlying issue and are working on a fix. It is important to note that no data has been lost, and it will be backfilled and available once the service is operational again.

investigating

We are investigating increased latency processing Trace Metrics. As a result of this issue, some users may see delays or gaps for a subset of their metrics on graphs and statistics on Service Catalog.

Report: "Degraded Web Application Performance"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and are continuing to work on a fix. Degraded web application performance is primarily observed in customers with low network bandwidth.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are investigating degraded performance with the web application.

Report: "Increased delay processing events"

Last update
resolved

This incident has been resolved.

monitoring

We are continue to monitor the progress of processing the backlog in Events. The majority of the backlog has been processed. Event Monitor evaluation remains delayed while we finish processing the backlog.

identified

We've implemented a fix, and are currently working through the backlog of delayed Events. Event Monitor evaluation remains delayed while we work through the backlog. All other monitor types have recovered and are currently evaluating.

identified

We have identified the issue causing delayed ingestion of Events. Alerting evaluation continues to be delayed for Event Monitors, Process Monitors, and Cloud Network monitors. All other monitor types have recovered and are currently evaluating.

investigating

We are continuing to investigate this issue.

investigating

We are investigating increased latency processing Events. As a result of this issue, some users may see delays in the event stream or for event queries on dashboards, and event alert evaluation is delayed. This issue also caused a delay in the processing of alerts across other products. We've implemented a fix for this, and are monitoring the recovery of the alert evaluation pipeline. As a result, a subset alerts may be delayed while the system recovers.

Report: "APM connections retrying"

Last update
resolved

This incident has been resolved.

monitoring

We have mitigated the cause of transient agent submission errors for APM and customers should no longer observe these errors. The Datadog Agent automatically retries these errors and succeeded on retry; this incident did not result in any data loss

identified

The issue has been identified and a fix is being implemented.

investigating

Some US1 customers experiencing degraded performance for APM. Customers may see transient errors, but these should resolve with an automatic retry from the Datadog agent.

Report: "Delayed APM Distribution Metrics, Data Streams Monitoring Metrics & Monitor Notifications"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

Data Streams Monitoring metrics and associated monitor notifications based on these metrics have recovered.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating increased latency in processing APM Distribution Metrics and Data Streams Monitoring Metrics as well as monitors notifications based on these metrics, which began at 17h47 UTC. As a result of this issue, some users may see delays or gaps for these metrics on graphs, including APM pages as well as delayed monitor notifications.

Report: "Delayed APM data ingestion"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and systems are recovering.

investigating

We are investigating increased ingestion latency of APM data.

Report: "Monitors - Delayed Evaluation for Distribution Metric Monitors"

Last update
resolved

This incident has been resolved.

monitoring

We have rolled out out a fix and all distribution monitors are up to date. We are continuing to monitor the customer experience and expect to resolve this incident in the next 30 minutes.

identified

We are in the process of rolling out a fix that will bring all distribution monitors up to date. We will update again when the issue is resolved.

identified

The root cause has been identified. We are working on a fix so that distribution metric monitor evaluations are up to date.

investigating

We are investigating delays in monitor evaluations for monitors based on distribution metrics, starting at 15h35UTC. This is causing a delay in notifications.

investigating

We are investigating delays in Distribution Metric Monitors Evaluation, which began at 15h35UTC.

Report: "Monitors - Delayed Evaluation"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating delays in Events-based Monitor Evaluation, which began at 16:00 UTC.

Report: "Delayed Distribution Metrics"

Last update
resolved

This incident has been resolved. All distribution metrics are being processed and monitors are no longer disabled for distribution metrics.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are investigating increased latency processing Distribution Metrics. As a result, some users may see delays or gaps for distribution metrics on graphs, including APM pages. Monitors based on this data may also be delayed. We have identified the problem and are actively working to resolve the issue.

Report: "Delayed distribution metrics & monitor notifications"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are investigating delays in distribution metrics, and on monitors notifications for monitors based on these metrics, which began at 17:40 UTC.

Report: "Delayed Distribution Metrics"

Last update
resolved

This incident has been resolved. All distribution metrics are being processed and monitors are no longer disabled for distribution metrics.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and remediation steps are underway.

investigating

We are investigating increased latency processing Distribution Metrics. As a result of this issue, some users may see delays or gaps for distribution metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on distribution metrics.

Report: "[SSO] Login Errors"

Last update
resolved

This incident has been resolved. If you continue to see issues, please contact Datadog technical support.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

We have identified the issue and are implementing a fix.

identified

We are investigating user login issues with the web application when using Okta SSO.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating delays in Monitors Notifications for distribution metrics, which began at 20:00 UTC.

Report: "Degraded Web Application Performance"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are investigating degraded performance with the web application related to metrics-based widgets.

Report: "Web UI features maybe hidden"

Last update
resolved

This incident has been resolved. Please refresh your Datadog web page to resolve the issue completely.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating an issue, that is causing certain features to be hidden from our UI. There is no data loss or monitoring impact.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating delays in Monitors Notifications, which began at 0605 ET.

Report: "Delayed Metrics Monitor Evaluations"

Last update
resolved

This incident has been resolved.

investigating

Monitors with long intervals may still be delayed but the service is recovered.

investigating

We have identified the issue and deployed a fix, we are monitoring the recovery.

investigating

We are investigating increased metrics based monitor delays for some customers.

Report: "Delayed Monitors Evaluations"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

investigating

We are investigating delayed evaluation of a subset of metric monitors. Customers may experience delayed or missing monitor notifications as a result.

Report: "APM - degraded performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating an issue in executing trace queries, the team is working on a fix

Report: "CI Visibility - Page Load issue"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We have identified an issue that prevents most Software Delivery pages from loading. Also, Intelligent Test Runner, Quality Gates and GitHub PR comments are affected

Report: "Application Security Management - Issue Updating Configurations"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating an issue in updating configurations in the product, the team is working on a fix

Report: "Partial outage on components of RUM product"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We have identified an issue which affects the use of Sankey and Cohorts Analysis in the RUM product, the team is working on a fix.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

identified

We identified a delay in Monitor Notifications from 13:52 UTC and 14:05 UTC. The issue has resolved, but we continue to monitor the situation.

investigating

We identified a delay in Monitor Notifications from 13:52 UTC and 14:05 UTC. The issue has resolved, but we continue to monitor the situation.

Report: "Delayed AWS, GCP, Azure, and SaaS Integration Metrics"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

investigating

We are investigating increased latency processing some AWS, GCP, Azure and SaaS Integration Metrics. As a result of this issue, some users may see delays or gaps in graphs that contain these metrics. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

monitoring

We are finalizing our recovery and at this time expect customers should see no further impact. We will continue to monitor for issues.

identified

We are seeing continuing improvements and recovering as quickly as possible while maintaining system stability. Distribution metrics remain delayed and associated monitors evaluation are currently skipped. Point metrics and associated monitors are fully recovered.

identified

We are seeing continuing improvements. Distribution metrics remain delayed and associated monitors evaluation are currently skipped.

identified

We are seeing continuing improvements. Distribution metrics remain delayed and associated monitors evaluation are currently skipped.

investigating

We are seeing improvements on metrics processing. Distribution metrics remain delayed and associated monitors evaluation are currently skipped.

investigating

We are investigating issues in metrics processing, leading to impact on monitors evaluation, dashboards as well as other products.

investigating

We are continuing to investigate this issue.

investigating

We are investigating delays in Monitors Notifications, which began at 14:40 UTC.

Report: "We are investigating user login issues with the web application"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are investigating user login issues with the web application login by email. Please note that data processing and alerts are not affected by this incident.

Report: "Metrics historical data failed queries"

Last update
resolved

Our metrics system has recovered and all historical metrics are now queryable.

monitoring

The system continues to recover. Data is available, but some results may be slow or incomplete until full recovery. Our teams continue to monitor the incident.

monitoring

A fix has been implemented. While the system continues to recover, data will be available but some results may be slow or incomplete until full recovery is complete.

identified

We are continuing to work on a fix for this issue.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating queries failing for historical data for metrics, impacting timeframes more than one day ago. Queries for recent data are not affected by this incident.

Report: "Partial outage of metrics query"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Elevated Error Rates for Metrics Submission"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating elevated error rates for Metrics Submission APIs. As a result of this issue, submitting new metric data through the API might fail temporarily. Please note that the Datadog Agent and Client Libraries will buffer data or retry to avoid data loss.

Report: "Elevated Errors for API Key Validation"

Last update
resolved

From 12:45-1:15 PM US EST Datadog’s endpoint to validate Datadog API keys was unavailable. During this window Datadog Agents would be unable to validate their API key. In all cases Agents would continue to send data. Some Agents running in Kubernetes may be marked unhealthy until restarted. Newly started Agents would fail to start. Build jobs using our CI Visibility product would be missing custom tags and measures.

Report: "Google SSO login issues for web application"

Last update
resolved

This incident has been resolved.

identified

We are continuing to monitor progress. We will post further updates when we have them.

identified

We are seeing signs of recovery and are continuing to monitor progress. We will post further updates when we have them.

investigating

We are investigating user login issues with the web application via Google SSO. Users switching orgs might also be affected. Please note that data processing and alerts are not affected by this incident.

Report: "Elevated Error Rates for Metrics Submission"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring recovery. Metric monitor evaluations still might be delayed; we will post an update when this recovers.

identified

We are still investigating elevated error rates for Metrics Submission APIs and delays processing metrics monitors.

investigating

We are investigating elevated error rates for Metrics Submission APIs. As a result of this issue, submitting new metric data through the API might fail temporarily. Please note that the Datadog Agent and Client Libraries will buffer data or retry to avoid data loss.

Report: "Logs Status Elevated Error Rates"

Last update
resolved

This incident has been resolved. As a result of this incident, logs from AWS Lambda (specifically, those tagged with source:lambda) were incorrectly categorized as errors from 18:30 UTC to 20:50 UTC on 2024-01-22. All logs after this date are being processed as normal.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have identified the underlying issues for elevated error rates for Log Management As a result of this issue, some users may see incorrect statuses for logs from AWS Lambda

Report: "Delayed Infrastructure Updates"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update in 30 minutes once the service is fully operational.

investigating

We are investigating increased latency processing host updates. As a result of this issue, some users may see delays in host activity status updates on the infrastructure list.

Report: "Elevated error rate for metrics and delayed metric monitors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are actively investigating elevated errors and slow queries for metrics data. As a result of this issue, some users may see errors when trying to load data on dashboards and metrics monitors evaluation may be delayed.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We are continuing to work on a fix for this issue.

identified

We are continuing to work on a fix for this issue.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are investigating delays in the processing of Metrics and corresponding Monitor Notifications, which began at 22:30 UTC.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

investigating

We are continuing to investigate the issue.

investigating

We are investigating delays in Monitors Notifications, which began at 15:16 UTC]

Report: "Degraded Web Application Performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and we are working on a fix. Note that this only affects Monitors, SLOs and Incident Management web apis.

investigating

We are investigating degraded performance with the web application.

Report: "Elevated Error Rates for Log Queries and Monitors"

Last update
resolved

This incident has been resolved.

monitoring

Fix rollout has now been completed.

monitoring

The fix rollout is currently ongoing. Once completed we will confirm resolution.

monitoring

The fix rollout is currently ongoing. Once completed we will confirm resolution.

monitoring

We have successfully tested a fix for this issue and are currently deploying it to resolve this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We're still working on a fix for historical data impacted by this incident.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved. At this time, newly ingested data is properly queryable, and monitors targeting Logs sent from 2023-10-03 20:40 UTC onwards are valid. Queries targeting logs between 2023-10-02 11:40 UTC and 2023-10-03 20:40 UTC may return erroneous data. We are evaluating a fix that will restore query correctness for this time-window.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are continuing to investigate these issues, and will provide an update as soon as possible.

investigating

We are actively investigating issues with Log Queries returning unexpected results. As a result of this issue, some users may experience issues querying logs on the web application or API, and with Logs based Monitors and Log-Based Metrics.

Report: "Delayed Metric Monitor Notifications"

Last update
resolved

This incident has been resolved.

identified

We have identified the underlying issue and are working on a fix. It is important to note that no data has been lost, and notifications will be caught up once the service is operational again.

investigating

We are investigating delays in Metrics Monitors Notifications, which began at 02:35 UTC.

Report: "Monitors Notifications Delayed"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We are aware of delays in Metric Monitors Notifications, which began at 20:55 UTC. We have identified the underlying issue and are working on a fix.

Report: "Delayed Processing for a Subset of Metrics"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate the issue. To prevent spurious alerts, we have temporarily disabled affected monitors based on this data.

investigating

We are investigating increased latency processing Processing for a Subset of Metrics. As a result of this issue, some users may see delays or gaps for a subset of their metrics on graphs.

Report: "Delayed monitor notifications & metrics graphing issues"

Last update
resolved

This incident has been resolved.

monitoring

Users may still be experiencing some issues with graphs not loading in the web application. We will provide another update once the issue is fully resolved.

monitoring

Users may still be experiencing some issues with graphs not loading in the web application. We will provide another update once the issue is fully resolved.

monitoring

Users may still be experiencing some issues with graphs not loading in the web application. We will provide another update once the issue is fully resolved.

monitoring

Issues with monitor notifications have been resolved. Users may still be experiencing some issues with graphs not loading in the web application. We will provide another update once the issue is fully resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

investigating

We are continuing to investigate the issue.

investigating

We are continuing to investigate the issue.

investigating

We are investigating delays in Monitors Notifications for monitors which rely on distribution metrics, which began at 17:58 UTC. Users may also experience some issues with graphs not loading in the web application. Please note that data ingest is not affected by this incident.

Report: "Delayed Monitors Notifications"

Last update
resolved

This incident has been resolved.

monitoring

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identified

We have identified the underlying issue and are working on a fix.

investigating

We are investigating delays in Monitors Notifications affecting, which began at 13:53 UTC.