Is Honeybadger.io Down Right Now? Discover if there is an ongoing service outage.

Honeybadger.io is currently Operational

Last checked Jul 29, 2025 14:29 UTC from Honeybadger.io's official status page

Historical record of incidents for Honeybadger.io

Feb 12, 2025

Report: "Delay in stats and check-in processing"

Last update 2025-02-12T13:46:16.830Z

resolved2025-02-12T13:30:33.000Z

We had an interruption in processing of check-in alerts and in-app stats from about 12:30 to about 13:30 UTC. This may have resulted in a failure or a delay in reporting check-ins that went missing during that period.

Oct 22, 2024

Report: "Data Processing Delays"

Last update 2024-10-22T15:30:11.020Z

resolved2024-10-22T15:30:10.992Z

The backlog has been processed and all systems are running normally.

monitoring2024-10-22T13:35:45.545Z

We have identified and corrected an issue in our data processing pipeline that may have caused delays for some customers. No data has been lost and the system should be caught up shortly.

Sep 23, 2024

Report: "App slowness and timeouts"

Last update 2024-09-23T11:57:18.189Z

resolved2024-09-23T11:57:17.845Z

This incident has been resolved.

monitoring2024-09-23T11:48:02.520Z

We have identified the cause of the slowdown and performance is back to normal.

investigating2024-09-23T10:37:53.943Z

We are looking into the cause of slowness and timeouts with the web app.

Jul 25, 2024

Report: "Insights backlog issue"

Last update 2024-07-25T03:06:07.712Z

resolved2024-07-25T03:06:07.387Z

The backlog has been fully processed, so all Insights events should be ingested and available.

monitoring2024-07-25T01:55:08.000Z

We have successfully pushed a fix and are waiting for the backlog to be processed. I also want to give a reminder that this only affects Insights events; error processing was unaffected.

identified2024-07-24T23:42:57.202Z

We have identified an issue with Insights event ingestion backlog. We are working on a fix.

Feb 12, 2024

Report: "Check-in monitoring"

Last update 2024-02-12T20:39:51.488Z

resolved2024-02-12T20:39:48.828Z

We discovered and addressed a configuration issue that caused the delay in processing. Processing has returned to normal.

investigating2024-02-12T20:22:59.690Z

We are seeing an increased backlog in our check-in processing, which is causing false alarms to be reported. We are investigating the cause of the backlog.

Feb 6, 2024

Report: "Increased response times and timeouts"

Last update 2024-02-06T23:19:26.982Z

resolved2024-02-06T23:19:26.969Z

This incident has been resolved.

monitoring2024-02-06T23:03:36.365Z

We've resolved an issue with query contention, and web application service has been restored. Our API pipeline was unaffected by this incident. Some charts on the reports tab may be outdated until we update project counts. We'll continue to monitor the situation. Sorry for the inconvenience!

identified2024-02-06T21:42:09.970Z

We've identified an issue with aggregate database queries and are working on a fix.

investigating2024-02-06T21:02:03.856Z

We are continuing to investigate this issue.

investigating2024-02-06T20:59:38.658Z

We are currently experiencing elevated response times and timeouts from our web application.

Jan 17, 2024

Report: "Web app is unavailable"

Last update 2024-01-17T19:21:12.248Z

resolved2024-01-17T19:21:03.221Z

This incident has been resolved.

identified2024-01-17T19:19:19.361Z

We are working on restoring service

Nov 10, 2023

Report: "Delay in Logplex processing"

Last update 2023-11-10T19:44:19.045Z

resolved2023-11-10T19:44:18.411Z

We discovered this morning that our Logplex pipeline, which handles Heroku platform errors, had an interruption in processing, which caused platform errors not to be recorded. This has been corrected, the backlog is being processed, and additional monitoring has been added to avoid this issue in the future.

Jun 13, 2023

Report: "Uptime check reports are delayed"

Last update 2023-06-13T21:37:05.654Z

resolved2023-06-13T21:37:05.147Z

Lambda is back in business, and the backlog of events triggered during the outage is being processed.

identified2023-06-13T19:34:12.961Z

Search indexing and notice timeline charts are also impacted by the failures with Lamba. We're continuing to queue updates for processing once Lambda resumes normal operations.

identified2023-06-13T18:59:04.814Z

Issues with AWS Lambda are causing our uptime check reports to be delayed. Reports are queued up and will be delivered as AWS services recover.

Mar 25, 2023

Report: "Unhealthy API server"

Last update 2023-03-25T02:26:33.975Z

resolved2023-03-24T22:30:00.000Z

Starting around 3:30pm PST, we had an incident where an unhealthy API server was in rotation in our load balancer. During that time requests routed to that server responded with a 502 which could have resulted in the clients dropping notices. At around 5:30pm PST we removed the unhealthy server from rotation. 7:22pm PST edit: Reworded message to describe that notices are not queued on the client, but dropped when the server responds in an error state

Mar 24, 2023

Report: "Issues processing backlog"

Last update 2023-03-24T09:31:30.171Z

resolved2023-03-24T09:01:42.000Z

We identified that our workers had an issue deploying. We deployed a fix and upped our capacity to deal with the current backlog. Everything is back to normal now. Turns out this issue was related GitHub rotating their public RSA key: https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/

identified2023-03-24T08:40:42.653Z

We have identified the issue and are working towards clearing the backlog.

investigating2023-03-24T08:05:04.778Z

We currently are having issues processing our notice backlog. We are looking into the issue.

Mar 9, 2023

Report: "Web app unavailable"

Last update 2023-03-09T09:46:15.557Z

resolved2023-03-09T09:40:50.000Z

We had an incident (from 1:00 to 1:27 PST) where our apps were unresponsive. They have since recovered.

Feb 14, 2023

Report: "API issues"

Last update 2023-02-14T16:22:25.849Z

resolved2023-02-14T16:22:25.056Z

Our primary redis instance was having a bad time. This has been fixed, and now we're back to normal.

investigating2023-02-14T16:10:45.944Z

We are seeing increased error rates for our API

Dec 16, 2022

Report: "Email delivery is being delayed"

Last update 2022-12-16T23:07:42.464Z

resolved2022-12-16T23:07:42.115Z

This incident has been resolved.

monitoring2022-12-16T22:21:04.857Z

We are continuing to monitor for any further issues.

monitoring2022-12-16T19:19:52.000Z

Our email provider is having problems delivering emails, so our outbound email deliveries are being delayed.

Nov 23, 2022

Report: "Web app unavailable"

Last update 2022-11-23T09:23:29.170Z

resolved2022-11-23T09:23:28.716Z

This incident has been resolved.

investigating2022-11-23T09:17:22.328Z

We are currently investigating this issue.

Aug 3, 2022

Report: "Uptime checks are delayed"

Last update 2022-08-03T01:19:13.765Z

resolved2022-08-03T01:19:13.354Z

This incident has been resolved.

investigating2022-08-03T01:08:47.618Z

We are currently investigating this issue.

Jul 13, 2022

Report: "Network disruption"

Last update 2022-07-13T16:16:54.884Z

resolved2022-07-13T16:15:56.000Z

We had a network disruption that lasted approximately one minute. The cause has been identified and resolved.

Jul 6, 2022

Report: "Brief web app downtime"

Last update 2022-07-06T16:25:51.601Z

resolved2022-07-06T16:00:00.000Z

We had some brief downtime while a deploy was completing—everything is back to normal now. 👍

Apr 27, 2022

Report: "Status pages down"

Last update 2022-04-27T15:04:20.305Z

resolved2022-04-27T15:04:19.722Z

We had a faulty deploy at 11 am yesterday that caused us to stop serving customer status pages. We weren’t immediately aware of the problem due to insufficient monitoring of this recently-launched feature. We have restored the status pages, identified and resolved the problem that caused the faulty deploy, and added more monitoring.

Apr 1, 2022

Report: "Check-in failures"

Last update 2022-04-01T15:50:08.439Z

resolved2022-04-01T11:00:00.000Z

We had a period of about 30 minutes where our check-ins and sourcemaps API endpoints were not accepting requests, which resulted in some check-in missing alerts that should not have been sent and in failures to upload sourcemaps. This issue has been resolved.

Mar 9, 2022

Report: "Delayed search indexing"

Last update 2022-03-09T21:35:21.214Z

resolved2022-03-09T21:35:20.377Z

It looks like we're back to normal. AWS finally updated their status page. ;) https://phd.aws.amazon.com/phd/home?region=us-east-1#/account/dashboard/open-issues?eventID=arn:aws:health:us-east-1::event/MULTIPLE_SERVICES/AWS_MULTIPLE_SERVICES_OPERATIONAL_ISSUE/AWS_MULTIPLE_SERVICES_OPERATIONAL_ISSUE_SVALH_1646859683&eventTab=details

investigating2022-03-09T21:33:30.106Z

We are continuing to investigate this issue.

investigating2022-03-09T21:11:46.354Z

We're investigating an issue related to slow search indexing—search results may not reflect reality until we get this resolved (or AWS does). Will keep y'all updated here. Sorry for the inconvenience!

Jan 12, 2022

Report: "Uptime checks delayed"

Last update 2022-01-12T21:14:23.525Z

resolved2022-01-12T21:14:22.934Z

We're all good now, thanks folks!

monitoring2022-01-12T21:10:35.091Z

We've taken steps to resolve the immediate issues; uptime checks should be back to normal. We'll update this incident as we learn more, but hopefully will have this wrapped up soon. :) As always, please reach out to support@honeybadger.io if you're having trouble.

investigating2022-01-12T20:12:56.595Z

Uptime checks and some other periodic jobs are not running on time. Uptime/outage alerts will be delayed until this issue is resolved, as well as some stats in the UI.

Dec 22, 2021

Report: "Slack delivery errors"

Last update 2021-12-22T14:54:02.670Z

resolved2021-12-22T14:54:01.986Z

This incident has been resolved.

monitoring2021-12-22T13:48:21.605Z

Slack inbound webhooks are failing, so some error notifications are unable to be delivered.

Dec 7, 2021

Report: "Delays in processing"

Last update 2021-12-07T22:19:33.552Z

resolved2021-12-07T22:19:32.959Z

It looks like we're coming out of the woods. The backlogs are gone, EC2 instances are booting once again, and processing volume is returning to normal.

monitoring2021-12-07T17:40:05.421Z

We are continuing to be impacted by problems in us-east-1. We are working our plan to bring up another region.

monitoring2021-12-07T15:56:18.811Z

Various services in AWS us-east-1 are having problems, which is causing delays in error ingestion.

Jul 28, 2021

Report: "Web app errors and check-in alerts"

Last update 2021-07-28T16:12:05.729Z

resolved2021-07-28T16:12:05.335Z

We had a bad code deploy that caused a few instances of errors with the web app and erroneous alerts for check-ins. Sorry for the inconvenience!

May 29, 2021

Report: "Bug in error grouping"

Last update 2021-05-29T01:53:37.065Z

resolved2021-05-29T01:53:37.056Z

We accidentally broke grouping for some errors for a period of about 8 hours today, ending right before this update was published. No errors were lost, but you may have seen what looked like duplicate errors, as we failed to group some errors together that should have been. Sorry for the inconvenience!

May 21, 2021

Report: "Web app unavailable"

Last update 2021-05-21T22:35:46.637Z

resolved2021-05-21T22:35:45.986Z

This incident has been resolved.

identified2021-05-21T22:21:33.877Z

The web app is unavailable. Error collection is not impacted.

Mar 2, 2021

Report: "Current-hour stats are missing"

Last update 2021-03-02T12:10:47.839Z

resolved2021-03-02T12:10:47.188Z

This incident has been resolved.

investigating2021-03-02T11:40:37.622Z

Counts of error notifications received in the past hour are currently showing 0 in the UI. This does not impact error collection and notification. We are investigating the cause of the issue.

Feb 25, 2021

Report: "Problems with search"

Last update 2021-02-25T21:26:00.120Z

resolved2021-02-25T21:25:59.722Z

Everything is back to normal, and the backfill is complete.

monitoring2021-02-25T19:46:31.614Z

Our cluster has resumed normal operations. We are now indexing data as it arrives, and we are starting the backfill process to index the data we didn't index since this incident began.

identified2021-02-25T14:25:48.755Z

The issue has been identified and a fix is being implemented.

investigating2021-02-25T12:39:44.825Z

Our search cluster is having problems — searches and some charts are not currently available.

Dec 15, 2020

Report: "Delays in data appearing in search results"

Last update 2020-12-15T22:35:52.378Z

resolved2020-12-15T22:35:51.804Z

This incident has been resolved.

investigating2020-12-15T22:15:46.477Z

We are currently experiencing a delay in the data that is being returned in search results, which includes the list of notices displayed on the error detail page. Error ingestion and search indexing are not affected.

Nov 26, 2020

Report: "API response times"

Last update 2020-11-26T00:11:22.023Z

resolved2020-11-26T00:11:21.557Z

Processing is back to normal, and we're keeping our fingers crossed that AWS autoscaling is working once again.

identified2020-11-25T19:24:24.355Z

We are continuing to work on dealing with the fallout from the issues in us-east-1. We are processing inbound error notifications, but with some delay.

identified2020-11-25T17:08:32.313Z

We are continuing to work on a fix for this issue.

identified2020-11-25T16:36:18.000Z

AWS is having issues (https://status.aws.amazon.com) which are impacting our API, causing slowness, retries, and failures.

Nov 25, 2020

Report: "Network timeouts"

Last update 2020-11-25T16:26:04.907Z

resolved2020-11-25T16:26:04.220Z

This incident has been resolved.

investigating2020-11-25T15:48:27.124Z

We are seeing long response times due to apparent network issues.

Nov 20, 2020

Report: "Delays in data appearing in search results"

Last update 2020-11-20T06:12:31.943Z

resolved2020-11-20T06:12:31.483Z

This incident has been resolved.

investigating2020-11-20T05:50:07.613Z

Aug 18, 2020

Report: "API downtime"

Last update 2020-08-18T20:11:17.260Z

resolved2020-08-18T20:10:00.000Z

We had an automation failure that caused a few minutes of downtime for our ingestion API. We are reworking our automation to avoid this problem in the future.

Jul 10, 2020

Report: "Delays in data appearing in search results"

Last update 2020-07-10T14:10:36.340Z

resolved2020-07-10T14:10:35.742Z

This incident has been resolved.

investigating2020-07-10T13:39:42.698Z

May 30, 2020

Report: "API and Web App May be Unavailable to Some Users"

Last update 2020-05-30T13:29:22.534Z

resolved2020-05-30T13:29:22.084Z

We have switched all affected systems to new certificates and everything is back to normal.

monitoring2020-05-30T13:19:45.059Z

We are in the process of issuing new certificates from a different CA, to work around Comodo's issue.

monitoring2020-05-30T12:33:12.559Z

The unavailability is being caused by an upstream SSL problem. Comodo (our CA) is being flagged as "untrusted" by clients. See this tweet for an example: https://twitter.com/aitorpazos/status/1266703889786691584 We are currently monitoring the situation, as well as investigating other solutions.

investigating2020-05-30T11:56:01.515Z

We have experienced a drop in external traffic which leads us to believe that some customers may not currently be able to access our service. At this time we suspect that the issue is caused by network problems outside of our own systems, and we are currently investigating to confirm this.

Sep 24, 2019

Report: "Sales and docs sites are down"

Last update 2019-09-24T17:55:07.232Z

resolved2019-09-24T17:55:06.143Z

Netlify fixed it :)

monitoring2019-09-24T17:47:47.463Z

Netlify is having redirect loop problems, causing our static sites to go down. Everything in our app and API is fine.

Aug 27, 2019

Report: "Partial API outage"

Last update 2019-08-27T19:48:19.995Z

postmortem2019-08-27T18:15:26.533Z

This report details the impacts of our outage of August 26th, the cause for the outage, and steps we have and will be taking to prevent a similar kind of outage in the future. Before getting into the details, I want to apologize to everyone who was impacted by this outage – we have worked hard to build a resilient system, and it’s really disappointing when we let you down. On to the deets… ### What happened? A little after 7PM Pacific Time I received an alert from PagerDuty letting me know that one of our ingestion API endpoints was not responding to external monitoring. Reviewing our dashboards revealed that our primary Redis cluster was about to run out of available memory. This cluster stores the Sidekiq queue that we use for processing the payloads of the errors being reported to our API, and it typically rests at 3% memory utilization. Our internal dashboard did show some outliers among our customers for inbound traffic, but a few minutes of research down that path did not lead to a cause for the memory consumption. Running down our list of things to check in case of emergency led me to find that our database server was overloaded with slow-running queries. This caused our Sidekiq jobs to take much longer than usual \(30-50x as long\), which caused a backlog large enough to consume all the memory. Having our main Redis cluster effectively unusable resulted in the following problems: * Our API endpoints were unable to receive error reports, source map uploads, deployment notifications, and check-in reports * Uptime checks were delayed * Some check-ins were switched to the down state because the check-in reports couldn't be received * Some error notifications were lost as we had to juggle Redis instances \(more on that below\) ### When was it fixed? Not long after 9PM our API was responsive again. By 10PM our backlog of error payloads was fully processed, and we were back to normal. We would have been back in business sooner, but a few things tripped us up: * As soon as it was clear we were going to run out of memory on our Redis cluster \(which is hosted by AWS ElastiCache\), and that I wouldn't be able to quickly free up some memory, I started a resize of the existing cluster. When it became clear that would not be quick \(it ended up taking approximately two hours\), I spun up a new, larger ElastiCache cluster. When it become clear _that_ would not be quick, I spun up an EC2 instance in our VPS to host Redis temporarily. * Unfortunately, though we use Ansible for automating all our EC2 provisioning, we did not have a playbook for quickly spinning up a Redis server. When I set up the first server manually, I didn't provision enough disk space on the instance to store the Redis snapshot as the backlog grew \(I didn't have the fix in place for the slow queries yet\). * When I spotted that problem with the Redis instance, I spun up another one with a large-enough disk. * We also didn't have an automated way to update the 4 locations where our app, api, and worker instances were configured with the location of the Redis server, so with each of the two changes to the Redis server location I had to run some Ansible commands to update configurations and bounce services Once that was all settled, though, and traffic was once again flowing in to our new self-hosted Redis instance, I was able to turn my attention to the cause of the problem – the slow queries. It turns out that one query was the cause of the slowness. This query loaded previously-uploaded sourcemaps to be applied to Javascript errors as they were being processed. Since the problem was so localized, I was able to get the database back to a good place by temporarily suspending the sourcemap processing. ### What's the remediation plan? As you can imagine, there are a number of things we can do to help avoid or minimize this kind of situation in the future: 1. Review that database table and/or the query to see how we can get past the tipping-point we encountered that turned a 4-10ms query into a 400ms query for certain customers \(in process\) 2. Persist payloads to S3 sooner in our Sidekiq jobs so we can minimize memory pressure on Redis 3. Increase the size of our Redis cluster \(already done\) 4. Create an Ansible playbook to quickly provision a new Redis instance in case of emergency \(done\) 5. Centralize the 4 app configurations for the URL of the Redis cluster and create an Ansible playbook that can quickly update those configurations \(in process\) We'll continue to improve our systems and processes to deliver the most reliable service we can. We truly appreciate that you've chosen us for your monitoring needs, and we are always eager to show our appreciation by working hard for you. As always, if you have any questions or comments, please reach out to us at [support@honeybadger.io](mailto:support@honeybadger.io).

resolved2019-08-27T05:10:52.216Z

The backlog has been cleared, and our Redis cluster is happy once again. We'll be looking at ways we can better handle this scenario in the future.

identified2019-08-27T04:39:26.965Z

We have a temporary fix in place for the impacted Redis cluster, and now we are working on the backlog.

identified2019-08-27T03:17:24.285Z

Our main Redis cluster is having issues, and we are attempting to work around them

investigating2019-08-27T02:21:45.873Z

We are currently investigating this issue.

Aug 12, 2019

Report: "Pipeline processing delays"

Last update 2019-08-12T17:45:28.194Z

resolved2019-08-12T17:45:27.432Z

This incident has been resolved.

identified2019-08-12T17:34:10.309Z

The issue has been identified and a fix is being implemented.

Apr 16, 2019

Report: "Sourcemap upload issues"

Last update 2019-04-16T12:28:28.875Z

resolved2019-04-16T12:28:28.036Z

We have determined that some of the sourcemap traffic was being incorrectly routed to a staging server, which was encountering errors. This routing problem has been fixed.

investigating2019-04-16T12:02:01.209Z

We have received reports of problems with uploads of sourcemaps to our API, and we are looking into the issue.

Mar 28, 2019

Report: "Bogus check-in alerts"

Last update 2019-03-28T20:48:55.866Z

resolved2019-03-28T20:48:55.845Z

This incident has been resolved.

monitoring2019-03-28T15:53:51.445Z

Starting around 6:30 AM UTC on March 28th, we encountered a burst of Lambda invocation errors, which caused a backlog in one of our queues, which caused some of the uptime checks to not be recorded in time to avoid alert conditions, which caused bogus alerts to be sent out. The backlog was cleared within a couple of hours. We're looking at changes we can make to our job queues to prevent this kind of scenario from occurring again.

Jan 30, 2019

Report: "Intermittent problem displaying errors in web app"

Last update 2019-01-30T23:40:17.239Z

resolved2019-01-30T23:40:16.553Z

We just identified and resolved an issue where displaying some errors in the web UI was failing due to a problem with a 3rd party service. No other systems were impacted.

Jul 16, 2018

Report: "Intermittent sourcemap upload issue"

Last update 2018-07-16T17:29:56.289Z

resolved2018-07-16T17:29:55.799Z

Our API was experiencing intermittent failures with sourcemap uploads from about 7:30 AM to 10:30 AM PDT. This was caused by internal DNS resolution failures. The problem has been resolved.

Jul 2, 2018

Report: "Intermittent sourcemap and deployment reporting problems"

Last update 2018-07-02T12:42:00.215Z

resolved2018-07-02T12:41:59.685Z

This morning we had an issue with one of our web servers not being able to receive sourcemap uploads and deployment notifications, which resulted in intermittent failures from our API. This has been resolved.

May 17, 2018

Report: "Slowdown in sourcemap processing"

Last update 2018-05-17T14:01:56.777Z

resolved2018-05-17T14:01:56.267Z

This incident has been resolved.

investigating2018-05-17T13:24:17.411Z

We are experiencing delays in sourcemap processing. This may result in some error backtraces not being enriched properly, but otherwise error report ingestion is not affected.

May 1, 2018

Report: "Heroku log drains experiencing intermittent connectivity problems"

Last update 2018-05-01T17:19:10.076Z

resolved2018-05-01T17:19:10.037Z

This incident has been resolved.

investigating2018-05-01T16:38:51.185Z

We are currently investigating this issue.

Mar 20, 2018

Report: "Timeouts and Processing Delays"

Last update 2018-03-20T16:13:34.063Z

resolved2018-03-20T16:13:33.380Z

It looks like the S3 problems are resolved, so we are back to normal.

monitoring2018-03-20T15:38:49.784Z

We are experiencing timeouts in the UI and processing delays due to increased errors rates from S3.

Nov 11, 2017

Report: "Error processing delays"

Last update 2017-11-11T09:55:45.660Z

resolved2017-11-11T09:55:45.635Z

We are now back to normal operations.

monitoring2017-11-11T07:15:32.888Z

We've identified and routed around the cause of the slowdown. We're working through our backlog of delayed notices.

investigating2017-11-11T06:12:30.429Z

At around 9:30 pm PST we started experiencing delays processing error notifications. We're investigating the cause of this.

May 24, 2017

Report: "Intermittent search issues"

Last update 2017-05-24T06:18:55.619Z

resolved2017-05-24T06:18:55.574Z

This incident has been resolved.

investigating2017-05-24T04:13:02.083Z

We're experiencing intermittent search problems affecting some of our users.

Mar 19, 2017

Report: "Logplex failure"

Last update 2017-03-19T13:01:18.254Z

resolved2017-03-19T13:01:18.210Z

From 07:23 UTC to 12:28 UTC our Heroku logplex endpoint was down. This resulted in Heroku Platform errors not being collected and alerted during this window. The problem was caused by a misconfigured AMI that, when booted, would not pass the ELB health check. The outage was as long as it was because our monitoring was not configured correctly to wake us up for failures with this endpoint. The AMI problem has been corrected and additional monitoring will be added.