Is Scout Down Right Now? Discover if there is an ongoing service outage.

Scout is currently Operational

Last checked Jul 30, 2025 6:23 UTC from Scout's official status page

Historical record of incidents for Scout

Jul 1, 2025

Report: "Database performance issues"

Last update 2025-07-01T17:22:50.630Z

identified2025-07-01T17:22:50.626Z

We are having some database issues affecting a portion of accounts. Applying a fix. Metrics are still being ingested.

Jun 26, 2025

Report: "Database maintenance"

Last update 2025-06-26T02:28:40.373Z

investigating2025-06-26T02:28:40.370Z

We are performing emergency maintenance on our database. Some customers are unfortunately being impacted.

Jun 25, 2025

Report: "Ingest delays"

Last update 2025-06-25T21:28:24.139Z

resolved2025-06-25T21:28:24.119Z

This incident has been resolved.

monitoring2025-06-25T20:45:41.000Z

Remaining customers are now recovering.

identified2025-06-25T20:36:06.000Z

Most customers are recovered but a portion are still impacted.

monitoring2025-06-25T20:18:49.825Z

A fix has been implemented and we are monitoring the results.

identified2025-06-25T19:49:24.336Z

The issue has been identified and a fix is being implemented.

investigating2025-06-25T19:29:27.582Z

We are currently investigating this issue.

Jun 17, 2025

Report: "Ingest delays"

Last update 2025-06-17T10:46:18.652Z

investigating2025-06-17T10:46:18.649Z

We are seeing delays in processing metric information. We are investigating.

Jun 4, 2025

Report: "Partial Outage - Delay in Metrics"

Last update 2025-06-04T02:38:04.148Z

resolved2025-06-04T02:30:00.000Z

A subset of customers experienced a delay in metrics. All metrics have caught up at this point.

Jun 3, 2025

Report: "Partial Outage - Delay in Metrics"

Last update 2025-06-03T21:30:00.000Z

Resolved2025-06-03T21:30:00.000Z

A subset of customers experienced a delay in metrics. All metrics have caught up at this point.

May 25, 2025

Report: "Ingest delays"

Last update 2025-05-25T22:13:37.974Z

resolved2025-05-25T22:13:37.958Z

This incident has been resolved.

investigating2025-05-25T20:27:37.908Z

We are seeing a spike in metrics, causing delays

Report: "Ingest delays"

Last update 2025-05-25T17:13:00.000Z

Resolved2025-05-25T17:13:00.000Z

This incident has been resolved.

Investigating2025-05-25T15:27:00.000Z

We are seeing a spike in metrics, causing delays

May 23, 2025

Report: "Ingest delays"

Last update 2025-05-23T02:27:36.385Z

resolved2025-05-23T02:27:36.367Z

This incident has been resolved.

monitoring2025-05-22T21:05:26.100Z

Zookeeper corruption has been rooted out. Things appear healthier and catching up in all cases.

identified2025-05-22T17:43:52.949Z

We are not to full resolution yet.

identified2025-05-22T05:42:17.621Z

Throughput has improved although behavior of individual partitions remains a problem and is still causing delays in some cases.

identified2025-05-22T02:13:53.684Z

It has been a long day with kafka. We continue to experience instability, causing lag and dropped payloads.

monitoring2025-05-21T16:13:05.154Z

An alternate approach has been applied, we are watching.

identified2025-05-21T15:08:17.979Z

The initial fix was unsuccessful, certain accounts are now substantially delayed.

monitoring2025-05-21T14:33:10.526Z

A fix has been implemented and we are monitoring the results.

investigating2025-05-21T13:55:13.205Z

We are currently investigating this issue.

Apr 24, 2025

Report: "Ingest delays"

Last update 2025-04-24T17:15:06.265Z

resolved2025-04-24T17:15:06.250Z

This incident has been resolved.

monitoring2025-04-24T17:01:12.660Z

A fix has been implemented and we are monitoring the results.

identified2025-04-24T15:43:35.534Z

Metric ingestion is being delayed for a set of users. We are working towards resolution.

Apr 16, 2025

Report: "Ingest delays and performance degradation"

Last update 2025-04-16T02:06:41.964Z

resolved2025-04-16T02:06:41.942Z

This incident has been resolved.

identified2025-04-15T14:10:45.752Z

Ingest is recovering. Some accounts will require additional backfill of data, which we are working on.

identified2025-04-15T13:18:43.418Z

We have identified the issue and are working on fixes.

investigating2025-04-15T12:38:14.027Z

We are continuing to investigate this issue.

investigating2025-04-15T12:37:37.180Z

Ingested records are taking longer than usual to process. In some cases, this is affecting alerting.

Apr 1, 2025

Report: "Ingest Delays"

Last update 2025-04-01T17:03:40.777Z

resolved2025-04-01T17:03:40.763Z

This incident has been resolved.

monitoring2025-04-01T16:16:12.929Z

A fix has been implemented and we are monitoring the results.

Mar 13, 2025

Report: "Ingest Delays"

Last update 2025-03-13T17:23:04.007Z

resolved2025-03-13T17:23:03.992Z

This incident has been resolved.

monitoring2025-03-13T16:13:05.842Z

Delays are recovering, we are monitoring.

identified2025-03-13T15:56:09.717Z

The issue has been identified and a fix is being implemented.

investigating2025-03-13T15:47:53.846Z

Ingested data is suffering delays in processing. Our team is working on a fix.

Aug 19, 2024

Report: "Ingest delays and dashboard degredation"

Last update 2024-08-19T18:09:22.129Z

resolved2024-08-19T18:09:22.114Z

This incident has been resolved.

monitoring2024-08-19T18:09:15.026Z

We are continuing to monitor for any further issues.

monitoring2024-08-19T15:31:21.171Z

A fix has been implemented and we are monitoring the results.

investigating2024-08-19T14:50:29.147Z

We are continuing to investigate this issue.

investigating2024-08-19T14:50:19.117Z

We are currently investigating this issue.

Feb 22, 2024

Report: "Data Ingest Issues"

Last update 2024-02-22T22:37:05.793Z

resolved2024-02-22T22:37:05.779Z

This incident has been resolved.

monitoring2024-02-22T19:02:21.140Z

A fix has been implemented and we are monitoring the results.

identified2024-02-22T18:44:13.578Z

The issue has been identified and a fix is being implemented.

Feb 16, 2024

Report: "Ingest delays"

Last update 2024-02-16T01:10:06.481Z

resolved2024-02-16T01:10:06.470Z

This incident has been resolved.

identified2024-02-16T00:54:31.922Z

The issue has been identified and a fix is being implemented.

Aug 9, 2023

Report: "Ingest delays"

Last update 2023-08-09T13:34:30.895Z

resolved2023-08-09T01:34:12.000Z

This incident has been resolved.

monitoring2023-08-09T00:14:55.284Z

A fix has been implemented and we are monitoring the results.

identified2023-08-09T00:08:03.976Z

The issue has been identified and a fix is being implemented.

investigating2023-08-08T22:48:35.155Z

We are currently investigating this issue.

Jun 10, 2023

Report: "Ingest Issues"

Last update 2023-06-10T02:17:49.248Z

resolved2023-06-10T02:17:49.235Z

This incident has been resolved.

monitoring2023-06-09T23:38:58.914Z

We have implemented a fix and are monitoring the issue.

investigating2023-06-09T22:24:09.299Z

We are currently investigating this issue.

Apr 21, 2023

Report: "Ingest Delays and UI Degradation"

Last update 2023-04-21T18:47:28.121Z

resolved2023-04-21T18:47:28.111Z

This incident has been resolved.

monitoring2023-04-21T18:20:20.362Z

Ingested metrics are being processed, lag is recovering.

investigating2023-04-21T18:07:28.899Z

Data ingest is lagging. Some users are experiencing UI errors.

Apr 17, 2023

Report: "Web UI Unavailable"

Last update 2023-04-17T20:56:49.183Z

resolved2023-04-17T20:56:49.165Z

This incident has been resolved.

monitoring2023-04-17T20:35:42.436Z

The UI is operational, metric processing is recovering. Dashboards will be caught up shortly.

investigating2023-04-17T20:24:48.670Z

We are continuing to investigate this issue.

investigating2023-04-17T20:22:43.694Z

We are continuing to investigate this issue.

investigating2023-04-17T20:22:17.268Z

We are currently investigating this issue.

Mar 2, 2023

Report: "Page Load Errors"

Last update 2023-03-02T15:35:06.633Z

resolved2023-03-02T15:35:06.619Z

This incident has been resolved.

monitoring2023-03-02T15:19:46.749Z

A fix has been implemented and we are monitoring results.

investigating2023-03-02T15:01:37.581Z

We are investigating page load errors in the UI for some customers.

Aug 8, 2022

Report: "Delay in metrics ingestion"

Last update 2022-08-08T11:49:20.535Z

resolved2022-08-08T11:49:20.521Z

This incident has been resolved.

monitoring2022-08-08T11:07:49.319Z

A fix has been implemented and we are monitoring the results.

identified2022-08-08T10:46:43.449Z

The issue has been identified and a fix is being implemented.

investigating2022-08-08T10:01:41.494Z

We are currently investigating a delay in metrics ingestion.

Aug 1, 2022

Report: "Delay in Metrics Ingestion, Slow Page Loads"

Last update 2022-08-01T16:58:09.654Z

resolved2022-08-01T16:58:09.638Z

We have resolved the underlying issue. Metric ingestion is caught up for all customers and the UI is operating as normal.

investigating2022-08-01T15:40:32.203Z

We are currently investigating a delay in metrics ingestion and slow UI page loads for some customers.

Mar 4, 2022

Report: "Agent payloads being rejected, missing dashboard data"

Last update 2022-03-04T05:29:25.650Z

resolved2022-03-04T05:29:25.635Z

This incident has been resolved.

monitoring2022-03-03T22:51:38.415Z

We have implemented the fix and the kafka cluster is operating normally. Agent checkin payloads are being ingested and processed again as of 3:41PM MT. Data from 2:40-3:30PM MT will not backfill to charts.

identified2022-03-03T22:15:02.972Z

We have encountered an issue with our Kafka cluster preventing agent payloads from being recorded into kafka for storage and processing. You will not see current data in your dashboards until the issue is resolved. We are working on deploying the fix now.

Feb 16, 2022

Report: "Metrics Ingestion Lag"

Last update 2022-02-16T22:40:01.852Z

resolved2022-02-16T22:40:01.838Z

Metric ingestion has caught up and all charts are now current.

monitoring2022-02-16T22:15:33.791Z

We have identified the issue and the metrics will begin filling in on application charts within a few minutes.

investigating2022-02-16T22:01:00.137Z

We are currently experience metric ingestion lag. You may not see the most recent metrics in your charts. We are investigating the issue.

Jan 24, 2022

Report: "Metrics ingestion lag, some dashboards not loading"

Last update 2022-01-24T00:39:39.656Z

resolved2022-01-24T00:39:39.642Z

All apps are reachable via the UI and the metrics backlog has been processed.

monitoring2022-01-24T00:19:43.584Z

All apps should now be accessible. We are processing the metrics ingestion backlog and you will see your chart metrics fill in soon.

identified2022-01-23T23:59:27.462Z

We are continuing to work on a fix for this issue.

identified2022-01-23T23:47:29.024Z

We have identified an issue with our database which is causing time series metrics ingestion lag for all customers. In addition, some customers may not be able to load their app in the UI. We are working to fix the issue as soon as possible.

Jun 3, 2021

Report: "Load balancer change caused metric gap"

Last update 2021-06-03T21:46:02.125Z

resolved2021-06-03T21:46:02.112Z

A change to our elastic load balancer which accepts the metrics payloads from agents was temporarily in a nonfunctional state from 3:30PM MT to 3:35PM MT on 6-3-2021. The issue has been resolved. You may have missing metrics in charts for this time period.

Sep 29, 2020

Report: "Metric ingestion delay/dashboards unavailable"

Last update 2020-09-29T05:23:24.909Z

resolved2020-09-29T05:23:24.892Z

All services are back to normal.

investigating2020-09-29T05:08:06.072Z

We are continuing to investigate this issue.

investigating2020-09-29T05:02:37.176Z

We are currently experiencing an issue with one of our time series databases. Dashboards may not be available or metrics delayed. Metrics are currently being buffered and will be filled in once the issue is resolved.

Sep 1, 2020

Report: "Delayed Metrics"

Last update 2020-09-01T19:12:03.276Z

resolved2020-09-01T19:12:03.263Z

Ingestion is back to normal.

investigating2020-09-01T18:02:59.892Z

We are currently investigating this issue.

Jul 23, 2020

Report: "Database communication issue"

Last update 2020-07-23T03:17:15.951Z

resolved2020-07-23T03:17:06.568Z

Everything is caught up, and looking good.

monitoring2020-07-23T02:45:10.129Z

The database is back up, and the buffered data is being ingested. You'll see charts catch up over the next few minutes.

investigating2020-07-23T02:23:47.173Z

One of our database backends has disconnected from our frontend. Investigating. All incoming data is buffered and will be replayed once we're back up.

Jul 8, 2020

Report: "Delay in ingestion"

Last update 2020-07-08T16:00:44.275Z

resolved2020-07-08T16:00:44.258Z

All buffered data has been ingested and is available

monitoring2020-07-08T13:20:15.224Z

We have fixed a communication issue between our web servers and one of our database servers, and are catching up data now.

investigating2020-07-08T12:53:49.957Z

Currently investigating a delay in new data being ingested.

Jun 1, 2020

Report: "Ingestion Issue"

Last update 2020-06-01T21:47:28.084Z

postmortem2020-06-01T21:40:30.584Z

On 2020/05/31 we experienced a short network outage that prevented our zookeeper and kafka nodes from reaching each other. When connectivity was restored, there was a problem with stale zookeeper data which prevented the kafka brokers from initiating a proper leader election for topic partitions. This also prevented kafka producers from being able to produce to a majority of partitions. A manual leader election was attempted, but failed to correct the issue. We began a rolling restart of our entire kafka cluster, which ultimately resolved the issue. Later versions of Kafka have better handling around this particular failure, and we anticipate moving to a recent version to prevent entering this failure mode again.

resolved2020-05-31T17:44:40.982Z

Ingestion for all customers has been operating normally since 10:20AM MT. Some customers will have some or no data from 8:50AM to 10:20AM MT. We will follow up with more information about the cause of the outage.

monitoring2020-05-31T16:22:47.931Z

We have restarted several of our Kafka servers in ingestion, and ingestion appears to be recovering. Data should begin appearing on your dashboard again.

investigating2020-05-31T15:46:24.935Z

We are investigating an issue in ingestion of agent data.

Nov 21, 2019

Report: "Delayed Ingestion for Some Customers"

Last update 2019-11-21T16:04:38.548Z

resolved2019-11-21T16:04:38.529Z

This incident has been resolved.

monitoring2019-11-20T15:45:14.318Z

We've isolated the issue to a single account. We're in contact with that customer and have restarted ingestion for all other accounts.

investigating2019-11-20T14:23:36.132Z

We've identified a handful of incoming messages that have slowed our ingestion processing, causing it to fall behind. This has caused tripped circuit breakers in other parts of our app. All data is stored, but ingestion as a whole is paused

investigating2019-11-20T13:42:43.491Z

One of our ingestion servers is falling a little behind, so ingestion for any customers on that server will be delayed. All data is safe and is being processed.

Oct 14, 2019

Report: "Database Connectivity Issues"

Last update 2019-10-14T16:34:41.942Z

resolved2019-10-14T16:34:41.923Z

Replaying data is complete.

monitoring2019-10-14T16:24:07.499Z

The connection issue has been resolved, data will begin backfilling and be fully up to date in a few minutes.

investigating2019-10-14T16:08:43.627Z

We're seeing timeouts for one of our time series databases

Sep 19, 2019

Report: "Ingestion Lag"

Last update 2019-09-19T02:00:21.047Z

resolved2019-09-19T02:00:21.030Z

Metric ingestion was paused at 20:13, restarted at 22:30 UTC and all apps metrics are caught up and stable as of 2019-09-18 0:00UTC. Operations are back to normal.

monitoring2019-09-18T21:25:36.507Z

We are recreating some databases indexes which has forced us to fully pause ingestion. Once the indexes are rebuilt metrics will fill in to current while we continue to fix the root cause of the ingestion lag.

monitoring2019-09-18T11:31:40.141Z

We are experiencing some ingestion lag. We have identified the issue and we are working on processing the backlog. Your charts will continue to catch up as we process the backlog.

Aug 27, 2019

Report: "Time series database requires restart"

Last update 2019-08-27T00:48:28.278Z

resolved2019-08-27T00:48:28.262Z

All buffered checkin data has been ingested, and all components are back online.

monitoring2019-08-26T23:44:42.583Z

The database has been restarted, and checkins are catching up now.

identified2019-08-26T23:33:08.188Z

The server one of our time series databases is located on required a restart. It is currently booting, and will be back in service in a few minutes.

Aug 26, 2019

Report: "Investigating Database Issues"

Last update 2019-08-26T04:57:43.743Z

resolved2019-08-26T04:57:43.726Z

All buffered checkin data has been ingested, and all components are back online.

monitoring2019-08-26T02:06:55.174Z

Everything is back up, and buffered checkins are flowing back into the system. Data should be caught up with current in a few minutes.

identified2019-08-26T01:51:34.362Z

We've identified the cause of the issue, and have fixed the underlying issue. We are bringing the database back online.

investigating2019-08-26T01:36:31.373Z

We appear to have degraded write behavior on our main Postgres database. We are investigating.

Aug 19, 2019

Report: "UI Timeouts"

Last update 2019-08-19T15:59:48.267Z

resolved2019-08-19T15:59:48.250Z

We've killed a rogue process that was tying up our database, and all pages are responding.

investigating2019-08-19T15:59:00.950Z

We are continuing to investigate this issue.

investigating2019-08-19T15:57:17.841Z

Some users are seeing their UI timeout when loaded.

Aug 12, 2019

Report: "Time series database issues"

Last update 2019-08-12T00:39:52.008Z

resolved2019-08-12T00:39:51.994Z

The backlog of ingestion data has cleared, and everything is up and running.

monitoring2019-08-12T00:04:23.559Z

The database server has recovered and is ingesting again. Buffered data is being ingested, and will fill in as it catches up.

investigating2019-08-11T23:38:25.158Z

We are experiencing an issue with one of our time series databases. Some of our customers will experience UI timeouts and ingestion lag.

Jun 3, 2019

Report: "Database Connection Issues"

Last update 2019-06-03T16:36:29.217Z

resolved2019-06-03T16:36:29.202Z

All chart metrics are now completely caught up. The root cause of the incident was due to attempted table partitioning during a database vacuum, which caused a lock on a critical table and cascaded to impact the rest of the application. We'll be adjusting our vacuum and partitioning schedules to avoid this lock again.

investigating2019-06-03T16:03:07.129Z

We've identified and fixed the database connection issue. We are currently loading the backlog of data that was held during the incident. Data will be appearing in the UI shortly.

investigating2019-06-03T15:47:44.052Z

We appear to be using more than the expected number of database connections, causing failures on our Web UI. Ingestion is backed up, but the incoming data is safe and collected.

Apr 2, 2019

Report: "Server Monitoring install packages are temporarily unavailable"

Last update 2019-04-02T16:25:16.564Z

resolved2016-12-03T15:00:56.904Z

The package repos are back and operating normally.

investigating2016-12-03T11:05:09.022Z

Installation of scoutd (yum install scoutd or apt-get install scoutd) will fail. We are working on restoring access. This only affects Server Monitoring, not APM.

Report: "Network connectivity issues"

Last update 2019-04-02T16:25:16.518Z

resolved2016-12-13T16:24:01.610Z

Network connectivity is restored. There will be a 7-minute drop in charts corresponding to the outage.

investigating2016-12-13T16:17:26.057Z

http://status.railsmachine.com/incidents/31cpsbzq5p97

Report: "[Server Monitoring] Incorrect alert routing/Alerts not being sent out"

Last update 2019-04-02T16:25:16.468Z

postmortem2018-08-03T17:30:43.156Z

## Server Monitoring 12/31/2016 Postmortem At 5:35PM MDT, our database table storing alerts hit the auto-increment limit for its primary key datatype. As a result, new alerts were either not created as they were supposed to, or in some cases, created and associated with the wrong account. Since the alerts table is huge, modifying it in-place was not an option. We began a sequence of altering the table on a MySQL read-only instance, switching multi-master to the secondary, and modifying the primary database. Shortly thereafter, we temporarily disabled notifications for all accounts to minimize the impact of the alterations. By 8:37PM MDT, alterations were complete. Unfortunately, a glitch in the multi-master switchover process resulted in 7-minute outage from 8:58PM-09:07PM MTD. The glitch was the result of a duplicate `mmm_mond` process running, which repeatedly killed MySQL's replication thread, which caused database instability. ### What We Have Done to Ensure This Does Not Happen Again 1. We have added monitoring and alerting on MySQL Multi-master's `mmm_mond` process, to ensure that only one process is running at a time. 2. We have audited all tables in our database to ensure that no other tables are close to exceeding their primary key auto-increment limit. While none are currently close, there are two tables at 50% of their limit, so we will be migrating these tables proactively during an upcoming scheduled maintenance window.

resolved2017-01-01T04:44:51.454Z

We have corrected the underlying database issue causing the incorrectly routed alerts. Alerts should be back to normal for all accounts.

identified2017-01-01T02:58:15.013Z

Alerts not being routed correctly. We have identified the problem and while the fix is implemented alerts have disabled for all accounts.

Report: "Brief downtime (database fix)"

Last update 2019-04-02T16:25:16.423Z

resolved2017-01-01T05:05:14.932Z

From 8:58PM-09:07PM MDT 2016-12-31, scoutapp.com was unavailable during a database alteration. Data was not collected during this time.

Report: "Server Monitoring: brief metric ingestion outage while swapping database writer role"

Last update 2019-04-02T16:25:16.377Z

resolved2017-01-01T23:43:32.721Z

Scout Server Monitoring had a brief ingestion outage from 4:27PM to 4:31PM MDT while swapping a database writer role.

Report: "Somewhat degraded performance while datacenter upgrades switches"

Last update 2019-04-02T16:25:16.330Z

resolved2017-01-06T20:11:04.122Z

This incident has been resolved.

monitoring2017-01-06T19:26:05.747Z

You may encounter: data occasionally delayed by ~2 minutes; an occasional error attempting to view a chart or alert. If you experience this, just refresh the page you are looking at.

Report: "AWS Networking Issue"

Last update 2019-04-02T16:25:16.279Z

resolved2017-02-10T03:54:46.379Z

AWS resolved their network issue and we should be back to normal.

investigating2017-02-10T03:23:45.259Z

We are continuing to investigate the issue with our AWS servers, and will update when we have discovered a solution.

investigating2017-02-10T02:38:00.000Z

We are currently experiencing an issue with our us-west-1 AWS servers. This could cause some degradation in performance.

Report: "Network instability - Server Monitoring outage"

Last update 2019-04-02T16:25:16.229Z

resolved2017-02-15T04:46:47.059Z

This incident has been resolved.

monitoring2017-02-15T03:09:52.828Z

Server Monitoring is back online. We apologize for the outage, and will post a post-mortem tomorrow.

identified2017-02-15T00:41:07.379Z

We are re-syncing a database that was corrupted during the power outage. Stay tuned ...

identified2017-02-14T23:48:12.470Z

We've regained accesses to most of our machines via SSH and are working on bringing services back up

identified2017-02-14T23:25:01.066Z

From http://status.railsmachine.com/incidents/blqbh5wmfcrl: "At approximately 5:40 EST, we experienced a temporary utility interruption at the data center. This temporary utility interruption caused an unknown error in our UPS which resulted in a power outage to your environment. zColo Operations are diligently working to restore power to your environment. Additional updates will be provided when available."

investigating2017-02-14T22:58:06.946Z

"Preliminary reports indicate a power outage. We are continuing to investigate and are working to get things back online now." from http://status.railsmachine.com/incidents/blqbh5wmfcrl

investigating2017-02-14T22:56:48.831Z

http://status.railsmachine.com/incidents/blqbh5wmfcrl

Report: "Ingestion lag for metrics"

Last update 2019-04-02T16:25:16.179Z

resolved2017-08-19T03:43:39.907Z

All charts are up to date. We will follow up with a post mortem.

monitoring2017-08-19T01:49:55.802Z

The ingestion pipeline is catching up.

investigating2017-08-18T23:42:17.157Z

Our ingestion pipeline handling metrics from the agent is backed up and we are investigating the cause. Charts for your apps will not have up-to-date metrics until the issue is resolved.

Report: "UI unavailable"

Last update 2019-04-02T16:25:16.136Z

resolved2017-11-30T22:06:21.672Z

Charts should be caught back up - all systems back to normal.

monitoring2017-11-30T21:21:39.569Z

The UI is available again. There is a 20 minute lag in data. We've begun replaying data ingestion to fill in the gap.

identified2017-11-30T21:09:54.005Z

InfluxDB hung while removing a significant amount of data from a timeseries database. We're restarting InfluxDB, which should take around 30 minutes. Data ingestion is continuing - charts will be a bit behind as we replay checkins to Influx after it comes back online.

investigating2017-11-30T21:04:28.456Z

The Scout UI is currently unavailable. We're investing an issue with our backend timeseries storage.

Report: "Metric Ingestion Lag"

Last update 2019-04-02T16:25:16.094Z

resolved2017-12-14T08:38:52.398Z

Metric ingestion for all customers is now caught up and operating normally.

identified2017-12-14T07:13:16.174Z

An RDS instance failover triggered the lag. We're restarted ingestion and charts are filling in with data.

investigating2017-12-14T06:43:29.411Z

We're investigating a delay in the display of fresh data on charts.

Report: "Data Ingestion Delay"

Last update 2019-04-02T16:25:16.045Z

resolved2018-02-28T01:29:03.166Z

We're back to normal. No data was lost during the ingestion delay.

monitoring2018-02-28T00:58:24.504Z

The delay was triggered by a spike in Influx query times. The delay is decreasing rapidly. We're monitoring to ensure things return to normal.

investigating2018-02-28T00:48:58.958Z

We're seeing a delay in metric ingestion and are investigating.

Report: "Time Series Database Issue"

Last update 2019-04-02T16:25:15.992Z

resolved2018-04-17T19:59:30.810Z

Metric ingestion has caught back up.

monitoring2018-04-17T17:59:05.983Z

Our systems are replaying buffered data collected during the outage and ingesting these into our database.

monitoring2018-04-17T17:20:34.130Z

The time-series database is restarting and should be operational in a few minutes. After which buffered data from the downtime will be played into it.

investigating2018-04-17T16:50:40.485Z

The backend time-series database appears to be having issues. All incoming data is being buffered and will be ingested into the system, but the site is currently inaccessible.

Report: "502 errors accessing scoutapp.com"

Last update 2019-04-02T16:25:15.947Z

resolved2018-06-26T02:10:27.386Z

This incident has been resolved.

identified2018-06-25T23:38:15.573Z

Data ingestion has caught back up.

identified2018-06-25T22:59:17.359Z

The site is now available. We're replaying data that wasn't ingested over the downtime.

identified2018-06-25T22:58:15.229Z

We are continuing to work on a fix for this issue.

identified2018-06-25T22:55:07.495Z

We had a bad deploy and are investigating 502 errors. Data ingestion has not been impacted (data has not been lost).

Report: "Metrics ingestion lag"

Last update 2019-04-02T16:25:15.897Z

resolved2018-09-11T15:45:09.867Z

This incident has been resolved.

monitoring2018-09-11T15:10:39.267Z

Our relational database needed tuning. Most customer's charts are current - for those customers who still have some lag, it should resolve within the hour.

investigating2018-09-11T13:36:27.126Z

Some apps are having lag on their metrics charts. We are investigating.

Report: "504 errors accessing scoutapp.com"

Last update 2019-04-02T16:25:15.843Z

resolved2018-09-12T20:49:19.108Z

This incident has been resolved.

monitoring2018-09-12T18:17:32.960Z

The UI should be available again.

identified2018-09-12T18:08:05.372Z

We identified a lock on a table and have cleared the lock. We're continuing to investigate.

investigating2018-09-12T17:40:06.640Z

We're seeing some 504 errors accessing scoutapp.com and investigating.