Is SpeedCurve Down Right Now? Discover if there is an ongoing service outage.

SpeedCurve is currently Operational

Last checked Jul 30, 2025 6:27 UTC from SpeedCurve's official status page

Historical record of incidents for SpeedCurve

Sep 12, 2024

Report: "RUM Data Outage"

Last update 2024-09-12T07:15:10.078Z

resolved2024-09-12T07:15:10.065Z

Fastly have fixed the streaming log service and we're now seeing normal log volumes and RUM page views coming through. We think it's unlikely Fastly will be able to recover the missing logs so there will be 2h:30m of missing RUM page views.

identified2024-09-12T03:10:17.000Z

We use Fastly to collect RUM beacons and write logs for ingestion into SpeedCurve RUM. However, Fastly is currently having issues with its streaming log service, and we've seen a major drop in logs being sent to us for ingestion into SpeedCurve RUM. You will see a large drop in RUM page views in your dashboards from 12th Sept 03:10am UTC. You can follow the Fastly incident here. https://www.fastlystatus.com/incident/376914

Jun 26, 2024

Report: "RUM performance issues"

Last update 2024-06-26T21:27:47.743Z

resolved2024-06-26T21:27:47.730Z

This incident has been resolved.

monitoring2024-06-26T06:58:41.401Z

A fix has been implemented and we are monitoring the results.

identified2024-06-25T21:51:02.304Z

Slow performance of RUM queries and slow RUM data injection past 12 hours. Gaps in RUM data are possible in some charts.

Jun 13, 2024

Report: "RUM performance issues"

Last update 2024-06-13T03:31:40.535Z

resolved2024-06-13T03:31:40.512Z

This incident has been resolved.

monitoring2024-06-12T06:02:04.292Z

A fix has been implemented and we are monitoring the results.

identified2024-06-11T22:13:45.224Z

Still experiencing slow performance of RUM queries.

monitoring2024-06-11T04:19:43.740Z

A fix has been implemented and we are monitoring the results.

investigating2024-06-10T21:57:51.752Z

We are currently investigating this issue.

Dec 6, 2023

Report: "Some notes with specified site_id posted via API were not saved"

Last update 2023-12-06T02:16:06.828Z

resolved2023-12-04T21:00:00.000Z

Some notes posted via 'Add a note' API endpoint (POST https://api.speedcurve.com/v1/notes) with site_id parameter were not saved or were saved without the link to a specified site. The issue was introduced after a release on December 4 (UTC) and now fully resolved.

Oct 18, 2023

Report: "RUM Data Processing Delays - Reporting Tools Affected"

Last update 2023-10-18T00:14:20.599Z

resolved2023-10-18T00:14:20.586Z

The issue has been resolved and all RUM data is now available.

monitoring2023-10-17T19:16:35.922Z

RUM cluster is back to normal and we are monitoring its performance.

identified2023-10-17T16:01:03.351Z

Our data processing infrastructure is running behind which is causing inaccuracies in the reporting tools. No data has been lost and the system should be caught up shortly.

Oct 10, 2023

Report: "RUM Data Processing Delays"

Last update 2023-10-10T18:14:09.018Z

resolved2023-10-10T18:14:09.002Z

The RUM pipeline has returned to its normal performance, all data has been uploaded and available in charts.

investigating2023-10-09T21:33:05.689Z

Processing of RUM page views is currently delayed. Data is being collected, but data ingestion into our data store is currently slow.

Feb 7, 2023

Report: "Chrome Canary tests failing"

Last update 2023-02-07T21:54:32.595Z

resolved2023-01-19T06:00:12.000Z

This incident has been resolved.

identified2023-01-11T12:00:00.000Z

Chrome Canary version 111.0 has caused an issue that is preventing our test agents from running any tests in this browser. We are in the process of reverting this browser to version 110.0 and will not be automatically updating it until the issue is resolved.

Jan 12, 2023

Report: "Synthetic dashboard degraded performance and timeouts"

Last update 2023-01-12T22:16:01.000Z

resolved2023-01-12T22:16:00.984Z

This incident has been resolved.

identified2022-12-02T20:37:03.677Z

Some synthetic dashboards are experiencing slow performance and in some cases are not loading at all due to timeouts. We are working on a fix, but in the mean time we suggest viewing a smaller date range in your dashboard to prevent timeouts.

Oct 24, 2022

Report: "Synthetic Firefox tests failing"

Last update 2022-10-24T20:59:49.777Z

resolved2022-10-13T14:00:00.000Z

An automated software update at approximately 14:00 on 13 October 2022 (UTC) caused Firefox tests to begin failing. This was initially fixed at 08:00 on 19 October 2022 (UTC), but a regression caused the issue to reappear. As of 21:00 on 23 October 2022 (UTC), the issue is considered resolved.

Oct 5, 2022

Report: "RUM dashboards delayed"

Last update 2022-10-05T21:32:53.974Z

resolved2022-10-05T21:32:53.963Z

Rum page view processing is back to normal now. The query cache will be refreshed shortly, and charts will return to normal.

investigating2022-10-05T20:16:01.252Z

We are continuing to investigate this issue.

investigating2022-10-05T19:49:43.123Z

Processing of RUM page views is currently delayed. Data is being collected, but data ingestion into our data store is delayed. The team are investigating.

Sep 15, 2022

Report: "Email sending delayed"

Last update 2022-09-15T22:13:09.832Z

resolved2022-09-15T22:09:05.000Z

SpeedCurve emails have been delayed over the last day. We discovered a backlog of queued emails that have now all been sent.

May 12, 2022

Report: "Issues with synthetic scheduled testing"

Last update 2022-05-12T07:15:50.999Z

resolved2022-05-12T07:15:50.983Z

Turns out we were hitting max network limits in AWS for some of our services. We've transitioned to new instances with higher network limits which should resolve any issues.

investigating2022-05-11T13:24:07.143Z

We're seeing stability issues with our synthetic testing which is causing delays to scheduled tests. The team are investigating.

Sep 8, 2021

Report: "Synthetic testing paused for maintenance"

Last update 2021-09-08T21:10:27.415Z

resolved2021-09-08T19:30:55.000Z

Synthetic testing was paused for just under an hour while we moved servers.

investigating2021-09-08T18:37:14.000Z

Synthetic testing and deploys are paused for maintenance. Any scheduled tests will be added when the service resumes. Estimated time is 1 hour.

Aug 23, 2021

Report: "Synthetic testing paused for disk replacement"

Last update 2021-08-23T05:28:08.896Z

resolved2021-08-23T05:28:08.884Z

The disk has ben replaced and Synthetic scheduled tests are now running normally. Any scheduled tests skipped in the last few hours will now be run.

monitoring2021-08-22T22:43:38.396Z

We've had a SSD disk failure on our WPT server. The server will be down for an hour while it's replaced. You can continue to view completed tests and any scheduled tests will be completed once the server is back.

Jun 8, 2021

Report: "Global CDN Disruption"

Last update 2021-06-08T11:15:57.411Z

resolved2021-06-08T11:15:57.396Z

Fastly has resolved its global CDN issues. We had an approximately 75% drop in LUX page views during this incident.

monitoring2021-06-08T10:45:02.484Z

We are continuing to monitor for any further issues.

monitoring2021-06-08T10:42:22.682Z

Fastly who we use for CDN services is having global issues at the moment and that is having a knock on affect to SpeedCurve. You can follow the Fastly incident here... https://status.fastly.com/incidents/vpk0ssybt3bj

Apr 13, 2021

Report: "LUX Export Endpoint Errors"

Last update 2021-04-13T21:01:21.206Z

resolved2021-04-13T21:01:21.180Z

We are no longer seeing errors on the LUX export endpoint.

monitoring2021-03-25T21:50:06.053Z

We have identified a release that appears to have caused the increased error rates. We have rolled back this release and are continuing to monitor the LUX export endpoint.

investigating2021-03-25T20:48:39.351Z

The LUX export endpoint (/v1/lux/export) is currently experiencing a high error rate. We are looking into the issue now.

Jan 31, 2021

Report: "Degraded dashboard performance"

Last update 2021-01-31T20:32:41.606Z

resolved2021-01-31T20:32:41.591Z

This incident has been resolved.

identified2021-01-28T23:01:05.534Z

We've identified the cause of the issue. We introduced a bit of latency with our last infrastructure change that was not anticipated as we continue to migrate to a new platform. Our solution is to push forward with moving the frontend to our new platform. The team is working aggressively to complete this migration. We are moving with urgency, while continuing to be cautious. We are completing our testing and tentatively plan to migrate the frontend early next week. Getting the production environment stable for our entire customer base is our highest (and only) priority at the moment. We will continue to provide updates to our status page as we have them.

identified2021-01-22T22:23:55.029Z

The SpeedCurve API is fully operational again. We are continuing to investigate degraded performance across Synthetic and LUX dashboards.

identified2021-01-18T21:59:04.000Z

We have identified an issue which may be impacting performance for some users. We are in the process of implementing a solution.

Jan 28, 2021

Report: "Scheduled synthetic tests currently paused"

Last update 2021-01-28T10:08:29.441Z

resolved2021-01-28T10:08:29.408Z

This incident has been resolved.

monitoring2021-01-28T06:05:55.378Z

Our WebPageTest server is back online and normal scheduled tests and deploys are running again. Over the next few hours we will run any missed scheduled tests from earlier today.

identified2021-01-28T02:07:24.635Z

The RAID array on our main WebPageTest server has failed and is currently being replaced by the team at LiquidWeb. We expect the server to be back up in an hour or so. Deploys and scheduled tests will resume once the server is back online.

investigating2021-01-27T21:44:25.000Z

We are investigating issues with our WebPageTest server and have paused scheduled synthetic tests while we determine the cause. Deploys may run slowly but should still continue.

Report: "Unexpected changes in metrics"

Last update 2021-01-28T06:06:56.694Z

resolved2021-01-28T06:06:56.677Z

This incident has been resolved.

monitoring2021-01-15T01:08:21.137Z

After discussing the issue with Amazon EC2 engineers, we have concluded that these changes are permanent.

investigating2020-12-16T02:41:36.125Z

You might have noticed some unexpected changes in your metrics around 9-10 December. We have not made any changes to our service. We believe the issue is due to changes in Amazon EC2. At this stage we believe the changes are permanent, however we're still in the process of investigating. If you have any questions, send them to us at support@speedcurve.com.

Jan 20, 2021

Report: "On demand test failures"

Last update 2021-01-20T17:17:39.630Z

resolved2021-01-20T03:00:47.000Z

This incident has been resolved.

investigating2021-01-19T18:18:46.753Z

We are continuing to investigate this issue.

investigating2021-01-19T15:00:00.000Z

Tests that are run via the SpeedCurve API or 'Test Now' feature are not being triggered. Scheduled tests are working as expected. This issue is currently under investigation.

Jul 15, 2020

Report: "Synthetic tests delayed"

Last update 2020-07-15T08:54:03.115Z

resolved2020-07-15T08:54:03.104Z

Synthetic test are all running on schedule again.

investigating2020-07-15T01:47:41.095Z

Scheduled Synthetic tests are running up to an hour late at the moment. We're investigating and will have them back on time shortly.

Report: "Chrome 80 update affecting CPU metrics"

Last update 2020-07-15T01:45:12.826Z

resolved2020-07-15T01:45:12.808Z

This incident has been resolved.

monitoring2020-02-21T01:05:51.865Z

We have identified a fix for Chrome 80 and will test it over the next few days. In the meantime our test agents will continue to run Chrome 78.

monitoring2020-02-20T00:54:14.503Z

Due to a change in Chrome 80, our test agents have been unable to accurately measure some CPU metrics including First CPU Idle and Time To Interactive. As a result, data collected over the last 48 hours may have lower FCI and TTI values than expected. We have temporarily reverted our test agents to Chrome 78 while we work on a solution.

Sep 17, 2019

Report: "Degraded performance for some pages"

Last update 2019-09-17T21:16:02.697Z

postmortem2019-09-13T01:41:10.311Z

On 5 September 2019, we became aware of increased CPU usage on our test agents across all SpeedCurve regions. Unfortunately, the increased CPU usage affected metrics for almost all of the tests that were run between 3-11 September 2019. CPU-based metrics like TTI & scripting time were the most heavily affected, but in many cases time-based metrics like start render & speedindex were also affected. ![](https://img.speedcurve.com/blog/2019-09-04-postmortem-timeline.png?max-w=1000) We know that dramatic changes in metrics like this can be frustrating, especially when you aren't sure what caused the change. We now know that the root cause of this incident was an update to the Linux kernel on the servers that run our test agents. The unusually long duration of the incident was due to a combination of insufficient monitoring, a complex tech stack, and a slow debugging feedback loop. # What happened Let's dive straight into a timeline of events. All times are in UTC. #### 2 Sep 21:00 An update to the Linux kernel was installed on our test agents. All of our test agents run Ubuntu 18.04 LTS and are configured to run a software update when they first boot, as well as every 24 hours. For this reason, there would have been a mixture of "good" and "bad" test agents for several hours after this point. #### 3 Sep 13:06 Our internal monitoring alerted us to an increase in CPU metrics. At this point there was still a mixture of "good" and "bad" test agents, so the data that triggered the alert appeared to be caused by some anomalies rather than a genuine issue. For this reason, the alert was ignored on the assumption that a genuine issue would issue subsequent alerts. ![](https://img.speedcurve.com/blog/2019-09-04-postmortem-first-alert.png?max-w=1000) #### 4 Sep 02:00 Our internal monitoring alerted us to another increase in CPU metrics, this time for a third party script \(Google Analytics\). This prompted a short investigation, but it was believed that the alert was caused by a change in Google Analytics rather than an issue with the test agents. ![](https://img.speedcurve.com/blog/2019-09-04-postmortem-second-alert.png?max-w=1000) #### 5 Sep 02:28 Our internal monitoring alerted us again to an increase in CPU metrics. This time the alert was seen and taken more seriously, because it appeared to be more widespread than a single third party. Investigation into the issue began in earnest at this point. #### 5 Sep 03:35 We received the first report from a SpeedCurve user about degraded performance. #### 5 Sep 03:38 More members of the SpeedCurve team joined the discussion to speculate about possible causes. The tech stack for our test agents has several layers: * The SpeedCurve application, which orchestrates the testing * WebPageTest, which farms testing jobs to individual test agents * The test agent software, which control the web browsers and extract performance data * The web browsers * Linux * Amazon EC2 Our goal at this point was to rule out as many layers as possible so that we could focus the investigation. #### 5 Sep 07:16 More internal monitoring alerted us to the fact that this issue is much more widespread than we initially thought. We began to speculate that there could be an EC2 issue, but this was ruled out as the issue appeared to be spread across multiple regions. #### 6 Sep 00:56 By this point we had ruled out all layers except for Linux and EC2. We believed the most likely cause was a software package upgrade, and began a binary search to identify the package. #### 6 Sep 05:00 No further investigation was performed over the weekend. #### 8 Sep 20:28 After some false positives identifying the software package, we switched to a much more thorough debugging method. This involved upgrading software packages one-by-one, rebooting the server, and creating AMI snapshots at every step of the way. #### 9 Sep 21:20 After a flood of support tickets from SpeedCurve users, we agreed that this issue was widespread enough to justify creating an incident on our status page. #### 10 Sep 03:06 We identified an update to the Linux kernel as the root cause. This was unexpected, and started some heavy discussions around whether automated software updates were appropriate for our test agents. #### 10 Sep 20:30 The SpeedCurve team agreed to roll back the Linux kernel to a known-good version and disable automatic software updates. #### 11 Sep 05:52 We began preparing patched test agent images for all of our test regions. #### 11 Sep 09:35 All regions except for London had been switched to the patched test agents. The London region seemed to be experiencing issues and we were unable to copy images to it. #### 11 Sep 10:25 The London region was switched to the patched test agents. The incident was marked as resolved on our status page. # What didn't go well This was SpeedCurve's most widespread and longest-running incident. There are many reasons for this, but the biggest reasons are as follows: 1. While we have full control over changes to the SpeedCurve application and WebPageTest, there are several layers of our tech stack that we have less control over. Even though we exclusively use stable and LTS \(long-term support\) software update channels, we are still at the mercy of software vendors to ensure no breaking changes are introduced. Clearly the use of stable and LTS channels is not enough to prevent issues like this from occurring. 2. Our internal monitoring produced some unexpected results, and we ignored the first two alerts. For this reason, it took us around 48 hours to realise the severity of this incident. 3. We are familiar with breaking changes being introduced in web browsers, but this was the first incident where we had to dig all the way down to the operating system level. Our existing debugging processes were not sufficient to deal with this incident, and it took much longer than anticipated to find the root cause. We also had no way to revert to a known-good OS configuration, since our existing rollback scenarios only accounted for issues higher up the stack. # How we intend to prevent this from happening again The major change we're making after this incident is switching from automated software updates to periodic, curated updates. This has a few benefits for us \(and for our users\): 1. We can perform updates on our own test agents before rolling them out to al of our regions. This allows us to check for potential issues in a timely manner, and also gives us the opportunity to report bugs to software vendors before they impact SpeedCurve test results. 2. We can take a snapshot after each update has been approved and rolled out. Since test agents are essentially frozen after each update, we have a reliable history of agent images that we can roll back to in the case of an incident like this. 3. In the case that an update will have a noticeable impact on SpeedCurve test results, we can give our users plenty of notice. On top of this, we will also continue to improve our internal monitoring. # Conclusion This was a frustrating incident for SpeedCurve users and for the SpeedCurve team. We're really sorry for the inconvenience that it caused. On the bright side, we learned a lot and we're looking forward to improving our processes so that incidents like this don't happen again. Thanks so much for helping us to improve SpeedCurve!

resolved2019-09-11T10:26:34.896Z

All test regions have been patched and performance should return to expected levels. We will follow up with a full write-up of this issue soon.

monitoring2019-09-11T09:45:27.737Z

A fix has been rolled out to most testing regions and we are continuing to monitor the situation.

identified2019-09-11T02:13:11.356Z

We have identified that a change in the Linux kernel is responsible for noticeable CPU overhead on our servers. We're working to resolve this as soon as possible.

investigating2019-09-09T21:20:05.643Z

We've noticed that some pages are experiencing degraded performance metrics since 3 September 2019. We are actively investigating the cause of this issue.

Aug 8, 2019

Report: "First Meaningful Paint Issues"

Last update 2019-08-08T22:19:08.239Z

resolved2019-08-08T22:19:08.217Z

Resolving this for now. It's now a known issue that first meaningful paint is not always correct in the Chrome trace files. First meaningful paint is still considered under development with no "standardized definition by the Google and Chrome teams.

monitoring2019-07-03T08:39:38.267Z

We are continuing to monitor for any further issues.

monitoring2019-07-03T07:51:28.184Z

The FMP reported via chromeUserTiming has had some funky changes in Chrome 75 and we've been working with the WebPageTest team to find out what's going on and improve the parsing of Chrome user timing events. It looks like Chrome has started reporting FMP for multiple frames, not just the main frame, and this has thrown the metric out on some pages. We've pushed an update to use the first FMP event we find rather than the last FMP event on the page which should fix the issue with FMP events that appear late in the page load. We're also seeing other tests with no FMP reported or a FMP of 0 which is nonsensical. FMP is still regarded as a work-in-progress by the Chrome team so we're not sure what their appetite is for fixing issues like this in the Chrome trace. We strongly recommend using hero rendering times over FMP as we find them a much better representation of the "meaningful" content a user is actually seeing as the page renders. For example FMP doesn't currently take into account when any images render. Google acknowledges that FMP doesn't have a "standardized definition" yet and recommends using user timing marks on hero elements instead.

identified2019-07-03T00:18:19.363Z

We've identified an issue in Chrome 75 where First Meaningful Paint is not being reported for some URLs.

Report: "Firefox not running tests"

Last update 2019-08-08T22:13:52.971Z

resolved2019-08-08T22:13:13.691Z

The WebPageTest team has now updated the test agent codebase and Firefox is working again.

investigating2019-08-08T11:13:44.436Z

There are currently issues with synthetic testing using the Firefox browser. A change in Firefox is causing WebPageTest to error when trying to start a test. We're working with the WebPageTest team to identify and resolve the issue.

Jul 3, 2019

Report: "This is an example incident"

Last update 2019-07-03T06:48:41.062Z

investigating2019-02-14T21:26:13.775Z

When your product or service isn’t functioning as expected, let your customers know by creating an incident. Communicate early, even if you don’t know exactly what’s going on.

resolved2019-02-14T21:00:13.870Z

Empathize with those affected and let them know everything is operating as normal.

identified2019-02-14T20:56:13.824Z

As you continue to work through the incident, update your customers frequently.

monitoring2019-02-14T20:50:13.846Z

Let your users know once a fix is in place, and keep communication clear and precise.

Report: "API outage"

Last update 2019-07-03T06:48:41.018Z

resolved2019-02-15T23:09:30.435Z

This incident has been resolved.

investigating2019-02-15T22:52:32.570Z

We are currently investigating this issue.