Heron Data

Is Heron Data Down Right Now? Check if there is a current outage ongoing.

Heron Data is currently Operational

Last checked from Heron Data's official status page

Historical record of incidents for Heron Data

Report: "GCP outage"

Last update
investigating

Read requests are succeeding, but post requests are failing

investigating

We are continuing to investigate this issue.

investigating

We are experiencing issues with one of our cloud providers, GCP. Namely, we are seeing degradation in the cloud storage, cloud run, and AI services

Report: "File Parsing Down"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Server slowness"

Last update
resolved

The incident has been resolved

monitoring

Systems are stable

investigating

We are currently investigating an issue where Heron async processing is experiencing slowness

Report: "Bank statement PDF parsing performance degraded"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "API instability"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating this issue.

Report: "Email server outage"

Last update
resolved

This incident has been resolved.

monitoring

Our email server provider, Postmark, experienced an issue with processing queued messages. This meant that all submissions forwarded to Heron Data were not being received by the Heron system. No data was lost in this process, and the provider system has mostly recovered -- all incoming messages are being processed and most historical messages are now processed. We are monitoring to ensure complete recovery https://status.postmarkapp.com/notices/zno1dlxjdjmblc0d-service-issue-outbound-sending-and-inbound-processing-messages-are-being-accepted-and-queued

Report: "High rate of timeouts"

Last update
resolved

The API has been stable and working since the fix. This is now been marked as resolved.

monitoring

A fix was implemented on 30/11/2023 at 19:30 UTC and our monitoring systems have no reports of further timeouts. We will continue monitoring throughout the day.

investigating

We are experiencing an increase in timeouts and are investigating the issue. We have taken intermediate steps to mitigate this and will update you as and when we have more information.

Report: "API instability"

Last update
resolved

This incident has been resolved. We experienced a high number of API requests that were using a high amount of CPU, overwhelming our infrastructure. We shipped a change that reduced the CPU usage, and have seen normal CPU usage for the last 8 hours

monitoring

We have implemented mitigation measures to address the higher than usual error rate with our API. We will continue to closely monitor the situation. Thank you for your understanding.

identified

We have identified the cause of the error rate issue and are actively working on resolving it. Thank you for your patience as we work towards a solution.

investigating

We are currently experiencing a higher than usual error rate with our API. Our team is actively investigating and working on resolving the issue. Updates to follow soon. We apologize for any inconvenience caused. Thank you for your patience.

Report: "Database outage"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have identified that a related database table is impacted, and are implementing a fix

monitoring

A fix has been implemented and we are monitoring the results.

identified

The database migration is complete and async processing is back online

identified

We are running a backfill on the table in question which we believe will take ~6 hours, so we will provide another update around then

identified

We have remediated part of the issue. Now the outage is limited to async processing and any route that involves fetching transaction categories (e.g., delete & get transactions, end user endpoints like /summary, /profit_and_loss, and various reports). We are executing a fix which is underway for the remainder

identified

We are continuing to work on a fix for this issue.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating an issue where we have reached the maximum number of IDs for a table in our database. We have to schedule some emergency downtime to resolve the issue

Report: "Elevated API Errors"

Last update
postmortem

We apologise for the increased 500 errors and timeouts experienced in the last 10 hours. We introduced a new index to our database which was partially built and resulted in an “invalid” state. This resulted in long running queries which blocked subsequent reads. We dropped the index, as well as blocking database processes, and are now recovered.

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We're experiencing an elevated level of API errors and are currently looking into the issue.

Report: "PDF parsing issue"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are experiencing issues with our PDF parsing capabilities. The issue has been identified and the fix is in progress.

Report: "API outage"

Last update
resolved

We experienced API outage (<5mins) from a prolonged database migration. All systems are back to normal

Report: "Increased API latency"

Last update
resolved

This incident has been resolved.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

Report: "Increased API latency"

Last update
resolved

Response times have been restored.

investigating

We are currently investigating this issue.

Report: "Increase API response times"

Last update
resolved

We noticed intermittent increased latency yesterday afternoon due to unprecedented levels of traffic. While debugging the root cause and how to handle this, we increased our compute resources in multiple areas to handle the traffic. We have now identified the root cause and solved the issue. Response times are back to normal but we will continue to monitor to ensure the fix worked.

Report: "Major outage: Load balancer 502s"

Last update
resolved

From 00:10-00:19 UTC (9 minutes), we had a major outage in our API when we experienced a high amount of 502 errors that rendered our backend API unusable. This was caused by issues in our load balancer, which was caused by usually high amounts of traffic and unavailable pods to service the traffic. We are in the process of adding better scalability to our backend to handle such spikes in traffic moving forward, including better horizontal scaling metrics.

Report: "API downtime"

Last update
resolved

From 8:30 - 8:35 UTC we had brief API downtime from a planned database migration. We're fully operational again

Report: "SSL Certificate Renewal Error"

Last update
resolved

Our automated SSL renewal process failed. This meant our API was down between 10:08 and 10:49AM BST. We provisioned a new SSL certificate manually, restored automatic SSL certificate renewals and added monitoring to prevent this from happening again.

Report: "Degraded API Performance"

Last update
resolved

Some users might have experienced intermittent 500 response codes between 7PM and 8PM BST. This was caused by a monitoring tool we added to better understand our usage and latency which unfortunately interfered with external API calls which we rely on for some API endpoints and users. Once we identified the issue we reverted the monitoring changes and all systems returned to normal. Moving forward we will deploy and babysit all monitoring changes (on top of code deploys and core infra changes) in staging before releasing to production.

Report: "Degraded API performance"

Last update
resolved

As part of releasing new ML models to improve our merchant extraction service, we hit scaling issues when exposed to a higher than normal throughput. All the relevant checks were made in testing, local and staging environments but unfortunately nothing could reproduce production-level traffic. We have reverted the release, and all systems are fully functional again.

Report: "Increased API and webhook responses"

Last update
resolved

Our REST API responses experienced increased latency, peaking at 60s. Some customers using our async enrichment flow will have seen webhooks take longer to arrive than usual. All systems are back to normal now.

Report: "Database CPU upgrade"

Last update
resolved

We ran an upgrade of our production database to handle increased traffic which resulted in about 1 minute of intermittent downtime for our API users. The API performance is back to normal.

Report: "Google Cloud Load Balancers"

Last update
resolved

Google Cloud has resolved this issue. Our API is fully functional again.

investigating

Google Cloud Load Balancers are experiencing issues, so Google is unable to port any traffic to our API. Any calls to our API will currently result in a 404 error. This is affecting other known sites such as spotify.com and etsy.com.

Report: "API downtime"

Last update
resolved

In an attempt to optimise our network traffic routing, we provisioned some config changes to our load balancer. The change caused our Kubernetes routing to become unhealthy and thus blocking traffic coming into our app.herondata.io domain. This was tested in staging environments and worked correctly, which means production-level traffic made it impossible to apply this change properly. We quickly reverted the config to stabilise the API which is now operating normally.

Report: "API downtime"

Last update
resolved

During a database migration in production, an overly aggressive lock took over a table we use for authentication, so we could not serve any requests to customers between 16:20-16:50 UTC. We've now resolved the issue and are fully operational.

Report: "API downtime"

Last update
resolved

From 9:11 to 9:20 UTC we experienced timeouts and failed requests on our `POST` and `GET` transactions endpoints. This was due to shipping product improvements which included a database upgrade. We're fully operational again.