Flatfile

Is Flatfile Down Right Now? Check if there is a current outage ongoing.

Flatfile is currently Operational

Last checked from Flatfile's official status page

Historical record of incidents for Flatfile

Report: "Excel Files Not Extracting"

Last update
identified

We are seeing an issue where excel files are not being extracted. We have identified a cause and are working on a fix. CSV files are still working.

Report: "Uploads Are Unresponsive"

Last update
investigating

We are currently investigating this issue.

Report: "Intermittent Bad Gateway Errors"

Last update
monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating an issue where some users are seeing bad gateway errors.

Report: "Files failing to upload"

Last update
monitoring

The fix has been fully rolled out, we are currently monitoring the platform.

identified

The issue has been identified and a fix has been rolled out

investigating

We are currently investigating reports of failures of file uploads in our UK platform region

Report: "Files failing to upload"

Last update
Monitoring

The fix has been fully rolled out, we are currently monitoring the platform.

Identified

The issue has been identified and a fix has been rolled out

Investigating

We are currently investigating reports of failures of file uploads in our UK platform region

Report: "Unable to access platform.flatfile.com"

Last update
postmortem

# Incident Overview **Nature of Incident:** Incorrect Application Frontend Container Deployed to Platform Frontend **Services Affected:** Flatfile Platform Dashboard and Spaces ## Details of the Incident At approximately 4:26pm MDT on June 3, a tag collision caused an image for an unreleased product to be deployed in place of the Platform frontend container. This led to frontend assets for an incorrect frontend application to be served in place of the Flatfile dashboard. 8 minutes later the engineering team was alerted to the incident and manually deployed the correct image to resolve the issue. ## Impact Assessment The incident affected all users of the Platform frontend applications for approximately 20 minutes. The API was unaffected. ## Root Cause The root cause was a collision in tagging on images within the Flatfile image registry. ## Resolution Flatfile engineering manually deployed a known good image to the frontend service to immediately restore service. Following this, the image registry was patched to prevent future incidents. ## Security and Data Integrity Please be assured that this incident did not compromise the security or integrity of your data. Our commitment to data protection remains a top priority.

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an issue causing users to not be able to access platform.flatfile.com.

Report: "Unable to access platform.flatfile.com"

Last update
Postmortem
Resolved

This incident has been resolved.

Monitoring

A fix has been implemented and we are monitoring the results.

Investigating

We are currently investigating an issue causing users to not be able to access platform.flatfile.com.

Report: "Platform Outage"

Last update
resolved

This incident has been resolved.

identified

We are continuing to work on a fix for this issue.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Platform Outage"

Last update
Resolved

This incident has been resolved.

Identified

We are continuing to work on a fix for this issue.

Update

We are continuing to monitor for any further issues.

Monitoring

A fix has been implemented and we are monitoring the results.

Identified

The issue has been identified and a fix is being implemented.

Investigating

We are currently investigating this issue.

Report: "Spaces Failing to Load"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Spaces Failing to Load"

Last update
Resolved

This incident has been resolved.

Monitoring

A fix has been implemented and we are monitoring the results.

Identified

The issue has been identified and a fix is being implemented.

Investigating

We are currently investigating this issue.

Report: "Slowness in Loading Spaces"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "File Uploads Failing"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating an issue that is causing some file uploads to fail.

Report: "Slowness in Loading Spaces"

Last update
Resolved

This incident has been resolved.

Monitoring

A fix has been implemented and we are monitoring the results.

Investigating

We are currently investigating this issue.

Report: "Identity Service Degredation"

Last update
resolved

Logins to the US, EU, and UK environments are operational again! All dashboards should now be accessible.

identified

We're watching our service provider for updates.

identified

We are continuing to work on a fix for this issue.

identified

We're watching our service provider for updates.

investigating

Our identify provider is experiencing some instability, causing the login to the dashboard to be inaccessible at the moment. We are monitoring and will update once this is back up again.

Report: "File Uploads Failing"

Last update
Resolved

This incident has been resolved.

Identified

The issue has been identified and a fix is being implemented.

Investigating

We are currently investigating an issue that is causing some file uploads to fail.

Report: "Identity Service Degredation"

Last update
Resolved

Logins to the US, EU, and UK environments are operational again! All dashboards should now be accessible.

Update

We're watching our service provider for updates.

Update

We are continuing to work on a fix for this issue.

Identified

We're watching our service provider for updates.

Investigating

Our identify provider is experiencing some instability, causing the login to the dashboard to be inaccessible at the moment. We are monitoring and will update once this is back up again.

Report: "AU Region Frontend Application Outage"

Last update
resolved

A frontend application configuration variable change was deployed prior to deploying the code change. The configuration has been patched and the code change deployed.

Report: "Intermittent 504 Errors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have identified the issue and are rolling out a fix

investigating

We are currently investigating customer reports of 504 errors on the platform

Report: "Spaces failing to load"

Last update
resolved

This incident has been resolved. The root cause of the incident was due to a misconfiguration when releasing an update to our spaces configuration. This resulted in spaces failing to load properly. The spaces configuration has now been fixed and rolled out.

monitoring

A fix has been implemented and we are seeing reports of access to spaces being restored.

identified

The issue has been identified and a fix is being rolled out

investigating

We are currently investigating reports where customers are facing issues with embedded spaces loading

Report: "Spaces Failing to Load"

Last update
resolved

This incident has been resolved. Root Cause is still being determined.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

Spaces are not opening for some users.

Report: "Spaces failing to load"

Last update
resolved

Cache invalidation conflicts for our configuration service led to a degradation of the Spaces application in some edge locations.

Report: "Error Creating and Loading Spaces"

Last update
resolved

Earlier today we deployed a feature that relied on a new environment variable. Despite updating the relevant files with the correct variable, one failed to deploy properly resulting in the variable not being found after deployment, and causing the errors that led to spaces not loading. This has been rolled back.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are seeing errors creating spaces and are investigating.

Report: "EU Application Routing Incident"

Last update
resolved

This incident has been resolved.

monitoring

An update to our networking has been deployed and we are monitoring the solution

identified

The issue has been identified and a fix is being implemented.

Report: "Intermittent 503 Errors"

Last update
postmortem

# **Introduction** On Apr 2, 2025 a service degradation caused intermittent requests for static assets to fail; these included requests for HTML, JS, CSS and other assets resulting in failed delivery of frontend applications for several short bursts of time.  # **Incident Details** * **Date Reported**: April 2, 2025 * **Issue Summary**: Delivery of frontend application assets degraded # **Impact Assessment** The incident resulted in degraded delivery of static assets used in the frontend applications, manifesting in the following: 1. Intermittent errors loading spaces 2. Missing assets in applications 3. NGINX error pages being viewed instead of Spaces The incident did not affect usage of the API and browser clients which had cached the static asset files.  # **Root Cause** Our cloud hosting provider terminated several EC2 instances in our Kubernetes fleet over several hours the morning of April 2. The NGINX proxy that delivers static assets was forced to recreate on another node, resulting in several seconds of failed requests for assets. This occurred several times in succession. # **Resolution & Fix** 1. **Immediate Remediation** * Flatifle infrastructure engineers scaled NGINX resources across the fleet to avoid downtime during disruptions  2. **Recovery Strategy** * We implemented new routing and retry strategy combined with affinity rules to prevent scheduling on ephemeral resources # **Follow-Up Actions** * **Monitoring Enhancement**: While monitoring for this type of issue exists and alerts triggered correctly, enhancements could be made to escalate alerts and prompt faster response times.

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating an issue where some users are seeing intermittent 503 errors

Report: "Caching Round-trip Discrepancy"

Last update
postmortem

# **Introduction** A recent update introduced a change to how sheet counts are processed. As part of this update, we began including a timestamp \(countsComputedAt\) to indicate when the counts were calculated. Shortly after being deployed to production, several endpoints \(such as get workbook, update records, patch job\) started returning 400 errors with the message recordCount.updated\_at.getTime is not a function. # **Incident Details** **Date Reported**: April 24, 2025 **What Went Wrong:** Data from workbooks is cached for performance. When this cached data is retrieved, it goes through a process that converts date objects into plain text \(specifically, ISO8601-formatted strings\). This conversion removes the special capabilities of a date object—such as calculating the time—causing our system to break when it tried to use those capabilities. This error did not appear in our testing or development environments because caching is disabled there, so the date information remained intact and behaved as expected. **Why It Wasn’t Caught Earlier:** **Development and test environments differ from production**: Since caching is turned off during development and testing, we didn’t see the issue where the date value was converted into a plain string. **Incomplete type checking**: The part of our system responsible for managing cached data didn’t reflect that the cached date values had changed from their original form, so the issue wasn’t flagged by our automated checks. **Next Steps:** We’re updating our development environment to more closely mirror production so that we can catch this type of issue earlier in the future. Additionally, we’re improving our type handling and tests around cached data to ensure these transformations are properly accounted for going forward. **Resolution & Fix** 1. Production was rolled back while the issue was investigated. 2. The workbook presenter logic was updated to handle the different types resulting from caching behavior. 3. The local and test environments were updated to perform type conversion consistently with production, even when caching is disabled.

resolved

A recent update introduced a change to how sheet counts are processed. As part of this update, we began including a timestamp (countsComputedAt) to indicate when the counts were calculated. Shortly after being deployed to production, several endpoints (such as get workbook, update records, patch job) started returning 400 errors with the message recordCount.updated_at.getTime is not a function.

Report: "Spaces Not Loading"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating an issue where some users are unable to load spaces from the dashboard.

Report: "Dashboard Inaccessible"

Last update
resolved

While making an enhancement update on the way we do releases, we encountered a drop in network traffic to our services due to a misconfiguration. We identified and reverted back the change to resolve the issue.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are seeing a 503 Service Temporarily Unavailable error when trying to reach our dashboard. We are currently investigating this.

Report: "Intermittent 502s"

Last update
resolved

Flatfile experienced an issue maintaining database connections via a proxy which led to service degradations whereby some connections from the app to the database unexpectedly hung up, returning a 502 error to the client. The connection issue with the proxy was resolved at 11:47am MDT.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing some intermittent 502 errors when loading Portals and performing some actions inside spaces. Our team is investigating this currently.

Report: "Login and space load errors on UK region"

Last update
resolved

An update was deployed to the UK regional server that ended up breaking some internal routing. Once the routing issue was identified, our team deployed an update to correct this behavior.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: ""Something went wrong" errors"

Last update
postmortem

# **Introduction** On March 14, 2025, our team identified an issue where certain workbooks were failing to open and/or update. These failures were caused by a database incident involving one of our ephemeral database servers. This document outlines the incident details, the identified root cause, the steps taken to resolve the issue, and the long-term remediation plan. # **Incident Details** * **Date Reported**: March 14, 2025 * **Issue Summary**: One of Flatfile’s ephemeral database instances entered an abnormal state. Workbooks mounted to this database instance failed to open and/or be updated.  # **Impact Assessment** The incident resulted in degraded service performance for users with workbooks on the Quickstore 3 database. Specifically, users experienced: 1. Intermittent unavailability of existing workbooks stored on the affected database 2. Issues loading sheets in newly created spaces that attempted to access data from the affected database The incident did not affect the creation of new workbooks, as these would be directed to functioning database instances. Only workbooks that were already stored on the Quickstore 3 instance were impacted, leading to a compromised user experience for a subset of users. # **Root Cause** Initial investigations determined that the Quickstore 3 database had entered an abnormal state. The database writer node became unresponsive, preventing both read and write operations from completing successfully. While the exact trigger for this state is still under investigation, monitoring data suggests that the database instance may have experienced resource exhaustion or an internal failure that was not automatically resolved by the database management system. # **Resolution & Fix** 1. **Immediate Remediation** * A backup of the affected database instance was completed to secure all data. * A new database instance was brought online to attempt to maintain service availability. * A new reader node was spun up while planning to remove the problematic node from service. 2. **Recovery Strategy** * After evaluating options, Flatfile launched a new database cluster using the backup at the same time that the reader node was coming online in case the additional reader node was unable to make the database healthy again. # **Follow-Up Actions** * **Monitoring Enhancement**: While monitoring for this type of issue exists and alerts triggered correctly, enhancements could be made to escalate alerts and prompt faster response times. * **Root Cause Investigation**: Continue the investigation into database monitoring data to determine what initially caused the Quickstore 3 database to enter the problematic state.

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are seeing some intermittent "something went wrong" errors when trying to load sheets for some users. We are currently investigating this.

Report: "Intermittent Jobs Failures"

Last update
resolved

As a result of a traffic spike the database connection pool was overwhelmed between 10:30-11:30am Eastern. We will be updating our rate limiting algorithm to address this type of traffic.

Report: "Intermittent job failures"

Last update
resolved

As a result of a traffic spike the database connection pool was overwhelmed. We will be updating our rate limiting algorithm to account for this particular type of traffic.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing these errors start to recur, investigating now!

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing some jobs intermittently fail, like file extractions. We are investigating now.

Report: "Portal Instability"

Last update
resolved

We rolled back a change affecting jobs; this incident is resolved.

investigating

We're seeing errors related to jobs, we are currently investigating this issue.

Report: "Portal Service Degredation"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating this issue.

Report: "Intermittent 400 errors on some operations"

Last update
resolved

This incident has been resolved. We were seeing intermittent connectivity issues to blob storage, and deployed a hotfix to restore a stable connection.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Intermittent Login Errors"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

Some users are experiencing trouble logging into the Flatfile Dashboard. We are currently investigating this issue.

Report: "Database Connection Errors on EU Region"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

Report: "Users are unable to Import Files"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are seeing an issue where users are unable to upload files to a space.

Report: "Login Issues"

Last update
resolved

We have rolled out a fix and systems are operational.

identified

We have identified the issue and are rolling out a fix.

investigating

We are continuing to investigate this issue.

investigating

We are investigating an issue where some users are having issues logging into the platform and launching spaces.

Report: "Intermittent Errors on EU Regional Server"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating this issue.

Report: "Errors on Platform API"

Last update
resolved

This incident has been resolved. We tuned our database for this region to prevent long running queries.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing errors when hitting the Platform API for our EU region. We are currently investigating.

Report: "Errors on Platform API"

Last update
resolved

Earlier today, we experienced issues with our app due to hitting memory constraints. We have addressed the issue by scaling our app up which has resolved the memory constraints.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing errors when hitting the Platform API for our EU region. We are currently investigating.

Report: "Slowness loading parts of the dashboard and spaces"

Last update
resolved

We recently observed increased network traffic that caused some areas of slowness. Our team identified the issue and implemented measures to optimize performance. We’re actively monitoring the system to ensure everything continues running smoothly.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Authentication Services Outage"

Last update
resolved

This incident has been resolved.

investigating

We are investigating an issue with our authentication services in the AUS region

Report: "Intermittent Failure to load v2 Portal"

Last update
resolved

Earlier today, we experienced issues with our cache database due to hitting a memory limit. We have addressed the issue by freeing up additional memory on our database cluster.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing some intermittent failures to load the v2 Portal, and are investigating this issue now.

Report: "Slowness, errors creating workbooks"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Seeing some /counts requests and other API calls take a long time to resolve, and we are currently investigating this issue.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing some slowness and/or errors when creating workbooks as part of new space creation. We are currently investigating this issue.

Report: "Slowness, errors creating workbooks"

Last update
resolved

Earlier today, we saw failed indexing on new workbooks and found a recent change that initiated this behavior. We rolled back the change that caused these failures and will ensure it is addressed.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We're seeing slowness when creating workbooks begin to return and are investigating why this is happening.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are seeing some slowness and/or errors when creating workbooks as part of new space creation. We are currently investigating this issue.

Report: "Flatfile Not Accessible"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring for any additional issues

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating an issue where some users may not be able to access the Flatfile app.

Report: "Slowness loading sheets, extracting files"

Last update
resolved

A testing scenario that resulted in a dramatic traffic spike resulted in temporary decreased performance on API endpoints. Spaces that were created within this window may need to be recreated.

investigating

We are continuing to investigate this issue.

investigating

We are seeing slowness when loading and working in spaces. We're currently investigating the issue and will provide a status update once we have more information.

Report: "CSV Extraction Failure/Slow"

Last update
resolved

Queue backpressure caused by high network traffic caused degraded performance on some asynchronous processes such as file extraction.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Spaces not loading"

Last update
resolved

This incident has been resolved.

identified

We are currently investigating an issue preventing spaces from loading. We have identified the issue and are working on a fix.

Report: "events.flatfile.com Errors"

Last update
resolved

We've restored the internal service, so you should no longer see these errors in your developer console.

investigating

You may see some errors in your network console indicating that the events.flatfile.com domain cannot be reached or is returning a bad gateway error. This is an internal service that we are working to restore, and should not affect your ability to use your Spaces or the Dashboard in the meantime.

Report: "Slow Performance in Spaces"

Last update
resolved

This is resolved now; we are seeing files extract and spaces create in a timely manner.

investigating

We are currently seeing an issue with some behavior in spaces taking a while to run, like file extraction and some actions. We are currently investigating this.

Report: "Some spaces taking a while to be created"

Last update
resolved

We saw a queuing issue with our events that impacted event-related behavior like space creation and file extraction. We've cleared out the affected queue and are seeing performance back to normal

investigating

We are seeing some space:configure events taking a while to be read by server-side listeners, causing slowness when creating a space that looks like the space is not initially created. We are currently investigating this issue.

Report: "Authentication Errors on Legacy Platform"

Last update
resolved

We've identified the source of the errors, and have restored service to the dashboard and Portal services.

investigating

We are seeing errors when accessing the Legacy Dashboard, and some Portals. We are currently investigating this issue.

Report: "Degraded Performance"

Last update
resolved

This incident has been resolved.

monitoring

The issue has been resolved and we are monitoring for any additional issues.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We're currently investigating an issue where some users are unable to access platform.flatfile.com. We're working to get this up and running as quickly as possible.

Report: "Degraded performance on custom jobs"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating an issue in Spaces where some types of custom jobs are delayed

Report: "Spaces Loading Slower Than Expected"

Last update
resolved

This issue has now been resolved and spaces are loading as expected.

identified

The issue has been identified and a patch is currently deploying

investigating

We are currently investigating an issue that is causing customers using the spaces UI or latest Portal SDK to see spaces loading slower than they should be.

Report: "Partial Degradation of Mapping Page"

Last update
resolved

For approximately 2 hours this afternoon, starting around 1:30 EDT, we saw a partial degradation of our mapping feature that caused the page to occasionally not load. We've rolled back the code that caused this degradation and are ensuring it's been permanently corrected.