Historical record of incidents for Librato
Report: "Some alerts are delayed"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Hourly summarizations are delayed."
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues. Next update in 3 hours.
A fix has been implemented and we are monitoring the results. Next update in 6 hours.
Report: "Increased error rate on Web and API"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Major outage from 18:03 to 18:40 UTC"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
We are continuing to monitor for any further issues.
We are currently investigating this issue.
Report: "Increased error rates and delayed alerts for composite metrics"
Last updateThis incident has been resolved.
Starting at 09:00 UTC On Monday Composite Metrics had increased error rates and delays in processing alerts on Composite Metrics.
Report: "Increased error rates from 02:38 - 02:42 UTC"
Last updateIncreased error rates from the API and other services for several minutes.
Report: "SSL error connecting to api.heroku.com"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Alert or Screenshot integrations with slack are not delivered"
Last updateWe are currently investigating this issue.
Report: "Alert processing errors"
Last updateFrom 01:33 to 02:17 UTC Alerts and metric ingestion were disrupted. Metrics sent by some agents may have gaps during that time and Absent Alerts may have fired incorrectly.
We are currently investigating this issue.
Report: "Delayed historical summaries when viewing spans of time 3 days long or longer"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Report: "Delayed metrics"
Last updateThis incident has been resolved.
We continue to work on the metric processing pipeline.
We continue to process the backlog. Most metrics and alerts should be back to normal behavior.
We continue to process the backlog. Most metrics and alerts should be back to normal behavior.
The issue has been identified and a fix is being implemented.
Report: "Increased Error rates"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Delays in processing metrics"
Last updateThis incident has been resolved.
The backlog has been processed and Alerts and Service Side Aggregated Metrics have returned to normal performance.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Metric ingestion is delayed"
Last updateThis incident has been resolved.
We are working on the backlog of ingested metrics.
The issue has been identified and a fix is being implemented.
Report: "Service Side Metric Aggregation is delayed"
Last updateThis incident affected: Service-Side Aggregated Metrics.
Report: "Delayed metrics from AWS Cloudwatch"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Report: "The marketing site is down. Users can log in to Librato using https://metrics.librato.com/sign_in."
Last updateThis incident has been resolved.
The marketing site is down. Users can log in to Librato using https://metrics.librato.com/sign_in.
Report: "Increased Errors on https://my.appoptics.com/"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "The home page is not accessible, use https://metrics.librato.com/users/sign_in"
Last updateThis incident has been resolved.
The provider we use for the home page had an incident, https://status.heroku.com/incidents/2402 that appears to now be resolved.
We are currently investigating this issue.
Report: "Alerts, and metrics were delayed from 12:10PM UTC - 2:20PM UTC"
Last updateThis incident has been resolved.
Traces, alerts, and metrics have returned to normal operation.
Report: "Metrics and Alerts processing is delayed"
Last updateThis incident has been resolved.
Alert processing has returned to normal.
We are working on restoring full functionality to alerts.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Delayed alerts for small number of metrics (23:30 UTC on 22Apr2021 to 21:15 UTC 23Apr2021)"
Last updateThis incident has been resolved.
"A small number of alerts were delayed starting at 23:30 UTC on 22Apr2021. As of 21:15 UTC 23Apr2021 all delayed alerts have been delivered and alert processing has returned to normal."
Report: "Increased Error rates when viewing data in a 3 day window or longer"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "CloudWatch Message Ingestion Issue"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are investigating an issue with delayed metric ingestion for our CloudWatch integration. We will update here when more details are available.
Report: "A Fraction of Alerts Delayed"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate the root cause of the issue. Alerts continue to be delayed.
We are continuing to investigate the root cause of the issue. Alerts continue to be delayed.
Alerts are delayed. We are investigating the root cause of the issue.
Alerts may be delayed. We are investigating the root cause of the issue.
Report: "Heroku outage affecting Librato"
Last updateThis incident has been resolved.
This issue will also cause lower throughput of logs from Heroku
An issue with Heroku is causing the site www.librato.com to be inaccessible. We are investigating.
Report: "Heroku outage affecting Librato"
Last updateThis incident has been resolved.
An issue with Heroku is causing the site www.librato.com to be inaccessible. We are investigating.
Report: "Heroku Log Metrics disruption"
Last updateThis incident has been resolved.
We are now receiving Heroku logs at the expected volume. Heroku customers may see a gap in log coverage. The Heroku incident is still open so we'll continue to monitor.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "database issue"
Last updateThis incident has been resolved.
Normal operations have resumed and we continue to monitor the situation.
We are continuing to investigate this issue.
There is an identified issue with the database that is affecting performance. We are currently investigating. Please check back for updates.
Report: "spurious absent alerts and delayed alerts"
Last updateThis incident has been resolved.
A database performance issue was identified and we have resolved it. We are working on scaling the database to avoid this problem in the future.
We are aware of an issue causing spurious absent alerts and delayed alerts. We are currently investigating this issue.
Report: "false alerts being generated"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "delayed AWS metrics"
Last updateThis incident has been resolved.
We've recovered from the delay in CloudWatch metrics processing. You should see up-to-date metrics in Appoptics/Librato. Please contact support if you need further help.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Delays in processing metrics and metric alerts."
Last updateThis incident has been resolved.
Any spurious absent alerts should now be resolved and we continue to monitor the recovery.
Any spurious absent alerts should now be resolved and we continue to monitor the recovery.
Investigating Alerts service issue that is resulting in spurious absent alerts.
We are currently investigating this issue.
Report: "Issues with metric ingestion from Heroku"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Alerts processing delayed"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Delayed Alerts"
Last updateresolved, absent alerts did not fire from 22:57 to 23:23 UTC
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Site issue"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are aware of an issue and are investigating
Report: "Cloudwatch metrics delayed for us-west-2"
Last updateCloudwatch metrics for us-west-2 are no longer delayed.
Cloudwatch metrics from us-west-2 are delayed since 17:45 UTC due to an AWS API outage. Metrics will be backfilled when the API is accessible again.
Report: "Delayed Alerts"
Last updateBetween 00:30 UTC and 01:45 UTC there were processing issues with the alerting pipeline that may have resulted in some alerts being accidentally triggered and/or delayed.
Report: "Investigating issues with page loading"
Last updateThis incident has been resolved.
We are continuing to investigate the issue. Composite alerts may be delayed.
We are continuing to investigate the issue. Composite alerts may be delayed.
Some customers may be experiencing increased API latency or error rate at this time. We're continuing to work on this issue.
We are currently investigating issues with page loading
Report: "API Latency"
Last updateThis incident has been resolved.
We have noticed a higher than normal latency when using the API. We are currently investigating the root cause.
Report: "API Latency"
Last updateAPI latency has returned to normal.
API Latency appears to be returning to normal.
We have noticed a higher than normal latency when using the API. We are currently investigating the root cause.
Report: "Delayed collectd metrics"
Last updateThis incident has been resolved.
We are currently experiencing some delays in processing collectd metrics. Some users may experience delays in seeing metrics that have been posted via collectd.
Report: "Intermittent Heroku data ingestion"
Last updateHeroku data ingestion impacted by intermittent log parsing failures between 18:25-18:45 UTC; some measurements may be missing or partially-aggregated during this time. Service has been fully restored as of 18:45 UTC.
Report: "API calls failing"
Last updateAt 22:38 UTC we experienced a database failover which briefly caused some API calls to fail. Within a few minutes the writer recovered, and all API resumed functioning as normal. Reading some data could have been delayed as replicas caught up. All effects appear to have resolved as of 22:55.
Report: "Increased error count"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Metrics Processing Delays"
Last updateThis incident has been resolved.
Metrics processing is currently delayed.
Report: "TLS 1.0 Deprecation"
Last updateTLSv1 support has been disabled. All public endpoints will require TLSv1.1 or greater. If you have any questions or need assistance in confirming support for your browsers or clients, contact us at support@librato.com.
The TLS configuration has been reverted. We will be disabling support for TLS 1.0 on June 25th. The vast majority of HTTPS traffic already comes from clients that support TLS 1.2, but a small percentage of users may be affected by this deprecation. On June 25th we will disable TLS 1.0 support for Librato which may result in your browser or client no longer being able to interact with the website or the API. To avoid that, please make sure you support TLS 1.2. TLS 1.1 is supported but not recommended. The following minimum browser versions support TLS 1.2: - Microsoft Internet Explorer 11 - Microsoft Edge (all) - Firefox 27 - Chrome 30 - Safari 7.0 (OS X 10.9) - Mobile Safari 5.1 (iOS 5.1) - Opera 17 Any modern collection agents or libraries that uses SSL are all likely to be supported. You can check the TLS version number on an instance by running the following curl statement from the instance in question. curl --silent "https://www.howsmyssl.com/a/check" | grep -o -e '"tls_version":"[a-zA-Z0-9\ \.]*"' There will be a 'tls_version' field to look for. If you have any questions or need assistance in confirming support for your browsers or clients, contact us at support@librato.com.
The new TLS configuration is now in place.
We are starting a brown out for TLS 1.0 deprecation for approximately 1 hour starting at 12pm Pacific. If you are receiving high rates of HTTP status 500s during this period, the agents forwarding messages may be using TLS 1.0. We will be completing the deprecation process on June 25th. If you have any questions or need assistance in confirming support for your browsers or clients, contact us at support@librato.com.
We will be disabling support for TLS 1.0 on June 25th. The vast majority of HTTPS traffic already comes from clients that support TLS 1.2, but a small percentage of users may be affected by this deprecation. On June 25th we will disable TLS 1.0 support for Librato which may result in your browser or client no longer being able to interact with the website or the API. To avoid that, please make sure you support TLS 1.2. TLS 1.1 is supported but not recommended. There will be a one hour test on June 18th at 12pm PST. The following minimum browser versions support TLS 1.2: Microsoft Internet Explorer 11 Microsoft Edge (all) Firefox 27 Chrome 30 Safari 7.0 (OS X 10.9) Mobile Safari 5.1 (iOS 5.1) Opera 17 Any modern collection agents or libraries that uses SSL are all likely to be supported. You can check the TLS version number on an instance by running the following curl statement from the instance in question. curl --silent "https://www.howsmyssl.com/a/check" | grep -o -e '"tls_version":"[a-zA-Z0-9\ \.]*"' There will be a 'tls_version' field to look for. If you have any questions or need assistance in confirming support for your browsers or clients, contact us at support@librato.com.
Report: "Composite Alerts - False Positives"
Last updateBetween 17.44 and 17:55 UTC there was a brief period where a delay in measurement processing may have caused alerts that used composites to trigger unwarranted. The situation quickly resolved itself as the delayed service continued normal operation.
Report: "Increased API errors"
Last updateThis incident has been resolved.
We are currently investigating this issue.