Historical record of incidents for Nanonets
Report: "GCP Global Outage – No Impact to Our Systems"
Last updateWe are continuing to monitor for any further issues.
There is a major GCP global outage ongoing at the moment. We have ensured that all our critical dependencies have been redirected to fallbacks, and we are currently not impacted. GCP incident: https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW We are closely monitoring the situation and will take any necessary actions if the situation evolves. Thanks for your continued support and vigilance. We'll keep you posted with any updates.
Report: "Degraded async file processing for few users on https://eu-open.nanonets.com"
Last updateThis incident has been resolved.
Between 1 PM and 7 PM IST on June 2nd, very few users in our EU-Open region (https://eu-open.nanonets.com) experienced delayed file processing via the Async Prediction API due to a critical infrastructure issue that caused a backlog. This issue was limited to the EU-Open region, with all other regions operating normally. We have identified and resolved the root cause, scaled up the system, and are working to clear the backlog as quickly as possible.
Report: "Degraded async file processing for few users on https://eu-open.nanonets.com"
Last updateThis incident has been resolved.
Between 1 PM and 7 PM IST on June 2nd, very few users in our EU-Open region (https://eu-open.nanonets.com) experienced delayed file processing via the Async Prediction API due to a critical infrastructure issue that caused a backlog. This issue was limited to the EU-Open region, with all other regions operating normally. We have identified and resolved the root cause, scaled up the system, and are working to clear the backlog as quickly as possible.
Report: "Temporary disruption in file operations on app.nanonets.com"
Last updateOn May 23rd, 2025, between 11:45 AM UTC and 12:10 PM UTC, we experienced a brief disruption affecting file information retrieval and related actions such as verification. The issue was promptly identified and resolved. All services are now fully operational, and we have implemented measures to prevent a recurrence. We apologize for the inconvenience caused and appreciate your patience.
Report: "Temporary disruption in file operations on app.nanonets.com"
Last updateOn May 23rd, 2025, between 11:45 AM UTC and 12:10 PM UTC, we experienced a brief disruption affecting file information retrieval and related actions such as verification. The issue was promptly identified and resolved. All services are now fully operational, and we have implemented measures to prevent a recurrence. We apologize for the inconvenience caused and appreciate your patience.
Report: "File processing degradation for instant learning models on US region"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Post-Processing Service Degradation and Export Failures on app.nanonets.com"
Last updateOn May 7th 2025, between 13:05 PM UTC and 14:15 PM UTC, we experienced temporary failures in our post-processing (PP) service, leading to some export failures. The root cause was a sudden spike in traffic that overwhelmed the PP service, exposing a bottleneck that choked request handling. This has since been identified and addressed with a patch to improve traffic resilience. We regret the disruption caused and apologize for the inconvenience.
Report: "File processing degradation for instant learning models"
Last updateFrom 5:30 AM - 5:45 AM PDT, instant learning models might've experienced degraded performance - higher latency coupled with intermittent failures. It was caused by high load on one of our secondary databases. This issue has since been resolved.
Report: "Delayed file processing for all models using async api in app.nanonets.com"
Last updateBetween 02:05 PM UTC and 02:42 PM UTC, some files experienced delayed processing on app.nanonets.com due to a sudden spike in load, which triggered system upscaling. This temporarily caused a backlog. We've identified the bottleneck, cleared the queue. All systems are operational now.
Report: "Delayed file processing for all models using async api"
Last updateOn April 11th, 2025, between 11:35 AM UTC and 12:45 PM UTC, users experienced delays in file processing across all models on the async prediction API of [app.nanonets.com](http://app.nanonets.com). There is no impact on EU and IN region. The issue was caused by an internal service dependency that became bottlenecked, leading to a backlog in the processing queue. The root cause was promptly identified, and a fix was deployed to restore normal processing. We sincerely apologize for the inconvenience caused and we are taking steps to prevent similar occurrences in the future.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Intermittent prediction failures for instant learning models on app.nanonets.com"
Last updateWe observed intermittent prediction failures for instant learning models on app.nanonets.com between 11:10 AM and 11:45 AM IST on 2nd April 2025 due to a sudden spike in system load. Our team quickly identified the issue and scaled the system beyond the threshold to stabilize performance. We've also implemented additional checks to proactively handle such scenarios in the future. We sincerely apologize for the inconvenience caused.
Report: "Instant learning model prediction failures in app.nanonets.com"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "app.nanonets.com is slow"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "High Response Time on Instant Learning Models in EU Region"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Intermittent 5xx errors on app.nanonets.com"
Last updateThis incident has been resolved.
We observed intermittent 5xx errors on app.nanonets.com between 8:26 PM IST and 8:46 PM IST. The issue has been identified and fixed, and we are actively monitoring the system to ensure stability.
Report: "Platform downtime issue for app.nanonets.com"
Last updateApp is down in US region Investigating - We are currently investigating this issue. 10:38 AM UTC Monitoring - A fix has been implemented and we are monitoring the results. 10:43 UTC
Report: "High Response Times for Non Instant learning models in US region"
Last updateOn 21st Feb 2025, from 6:00 PM IST to 08:30 PM IST, non-instant learning models on [app.nanonets.com](http://app.nanonets.com) experienced high response times and timeouts due to extreme load on our system. Although auto-scaling was triggered on time, one of our services choked under the increased throughput. Our engineers quickly identified the bottleneck and increased the service's throughput, normalizing the response times. We apologize for the inconvenience and we are implementing measures to enhance system resilience and better handle future traffic spikes.
This incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Intermitent API Errors"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Elevated response times for instant learning models"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Delayed File Processing For async Models and Data actions/approval run issues for sync models"
Last updateOn 6th-Feb-2025, multi-page files uploaded through the Async API experienced delayed processing due to an issue at our end. This occurred between 5:07 PM IST and 6:20 PM IST, affecting app.nanonets.com , eu.nanonets.com, eu-open.nanonets.com During this period, file processing delays may have led to increased response times. Data actions and approval flow didn’t run for file uploaded on sync mode Resolution: The issue was identified and resolved, ensuring that all new file uploads are now processing correctly. Additionally, all impacted files uploaded during the affected window have been successfully processed. To prevent such occurrences in the future, we are implementing stronger regression checks before releases and reinforcing our monitoring systems. We sincerely apologize for the inconvenience caused and appreciate your patience and understanding.
Report: "Failures in Instant learning model on app.nanonets.com"
Last update### **Incident Summary:** Between 16:30 UTC and 22:30 UTC 4th Feb, users experienced high response times and some failures when using Instant Learning models on [app.nanonets.com](https://app.nanonets.com). This was caused by unusually high latency in our secondary database when processing certain requests. Our other models on [app.nanonets.com](http://app.nanonets.com) are working fine and our other regions \(EU, EU-Open, IN\) are operational without any issues. ### **Resolution:** Our engineering team identified the issue and implemented a fix to restore normal response times. We are planning to schedule a maintenance upgrade for our secondary database to prevent similar incidents in the future. We apologize for the inconvenience and appreciate your patience.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are currently investigating the issue
Report: "App is down in US region"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "High Load Causing 500 Errors & Latency on app.nanonets.com"
Last updateBetween 3:47 PM and 04:02 PM IST, intermittent 500 errors and high response times were observed for models on app.nanonets.com due to high system load. We have scaled our infrastructure to handle the increased demand and we are implementing proactive measures to manage such loads more effectively in the future.
Report: "Table extraction errors for instant learning models"
Last update**Incident Summary:** On January 22, 2025, between 10:30 AM IST and 12:30 PM IST, users encountered failures in extracting table data from files uploaded to the Instant Learning Models. All other APIs and services remained fully operational during this period. **Resolution:** This issue specifically affected documents uploaded to the Instant Learning Models within the stated timeframe. Our engineering team promptly identified the root cause and implemented a resolution, fully restoring the service's functionality. The platform is now operating normally. We sincerely apologize for the inconvenience caused by this incident and regret the disruption it may have caused to our users. Ensuring the high availability, reliability and performance of our services remains our highest priority.
We're experiencing errors with table-data extraction one instant learning models
Report: "File prediction errors for instant learning models in US region"
Last update**Incident Summary:** Between **03:50 PM IST and 04:08 PM IST** on January 21, 2025, users of `app.nanonets.com` experienced failures in processing requests related to Instant Learning Models. This issue was isolated to our secondary db, which encountered an operational issue, causing request failures. All other APIs and services remained unaffected. **Resolution:** The above issue impacted reads for Instant Learning Models. Our engineering team promptly resolved the read replica issue, restoring full service functionality. The platform is now operating as expected. We sincerely apologize for the inconvenience caused and deeply regret the disruption to our users. Ensuring high availability and reliability of our services is our top priority.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "App is down in EU region."
Last update**Incident Summary:** On 13-Jan-2025 07:42 PM IST an incident occurred where our application became temporarily unavailable. This affected the following services: * [https://eu.nanonets.com](https://eu.nanonets.com) * [https://eu-open.nanonets.com](https://eu-open.nanonets.com) The outage was caused by an accidental deletion of an authorization token required for database authentication. This token is essential for ensuring secure and seamless communication between the application and the database. **Resolution:** A new authorization token was generated and securely updated in the system. Services were restarted, and normal functionality was verified. We will introduce stricter permissions to ensure only authorized personnel can manage critical tokens. We sincerely apologize for the inconvenience caused by this incident. We acknowledge the importance of our services to your operations and deeply regret this unintentional disruption. Please be assured that we are taking all necessary steps to prevent such incidents in the future. **Timeline:** 13-Jan-2025 07:42 PM IST to 13-Jan-2025 07:54 PM IST
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Only https://eu.nanonets.com and https://eu-open.nanonets.com are affected. https://app.nanonets.com is working as expected
Report: "app.nanonets not working"
Last updateThis incident has been resolved.
Root Cause: Our primary databases experienced an unexpected outage during a planned update aimed at improving service performance. Resolution: The databases have been fully restored and are functioning normally. To ensure a smoother process in the future, we will schedule similar updates with advance notice and enhanced safeguards, including comprehensive backups.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Instant learning models in US region experienced delays in file processing."
Last updateRoot Cause: One of our internal systems, responsible for handling a critical component of our processing pipeline experienced an unexpected failure. This created back-pressure on our upstream systems, resulting in delays and partial failures in file processing. Resolution: The issue was addressed by restoring the affected instance. Moving forward, we will have strong fallback mechanisms to address this. Timeline - Jan 3rd 05:30 AM IST to 6:25 AM IST
Report: "IN region Outage"
Last updateOur application experienced downtime in the IN region (https://in.nanonets.com) due to an issue with redis caused due to a lot of connections from the API server being mistaken with a security issue by redis. We have restarted the redis and the API servers to restore the connections. Timeline: 12:18PM IST - 12:27PM IST
Report: "Issue with Instant Learning Models"
Last updateWe use multiple services to power our instant learning models. One of them is Open AI and on 11th Dec 15:17 PST\(PST\) Open AI went down\([https://status.openai.com/](https://status.openai.com/)\) . This caused upstream pressure on our inferences leading to long wait times and failures.
This incident has been resolved.
There is downtime in a downstream service (Open AI, https://status.openai.com/) leading to degraded performance on our extraction results
There is downtime in a downstream service (Open AI, https://status.openai.com/) leading to degraded performance on our extraction results
Report: "Instant learning model prediction issues in EU and EU-Open Regions."
Last update**Incident Summary**: Instant learning models in EU and EU-Open regions experienced failures, resulting in request timeouts and service disruptions during the incident. **Root Cause**: A dependency conflict occurred when an external library update introduced incompatibility with a critical system component, leading to failure in processing requests. **Resolution:** The issue was addressed by pinning the affected library versions to known stable releases, ensuring compatibility across components. Moving forward, we have strengthened our dependency management processes to include proactive monitoring and testing of third-party updates to prevent similar issues
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Service Degradation on app.nanonets.com"
Last updateWe sincerely apologize for the inconvenience caused by today's service disruption between **14:20 IST and 17:00 IST**. A feature deployment caused unexpected pressure on our database, leading to request timeouts and elevated 5xx errors. We have since rolled back the feature and restored normal operations. To prevent such issues in the future, we are strengthening our testing procedures and monitoring systems. Thank you for your understanding and continued support. If you have any further questions or concerns, please contact our support team.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have identified the issue and working on a fix
We are currently investigating this issue.
Report: "EU server downtime"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Downtime for EU instance"
Last update**Incident Summary:** One of our database nodes went down, resulting in our API’s to [https://eu.nanonets.com](https://eu.nanonets.com) returning 5xx responses intermittently. This issue was quickly identified, and we implemented a rapid fix by restarting the affected node. Only EU region got impacted, rest all regions are working as expected. **Root Cause:** The database node experienced a high load, reaching its capacity limits, which led to the node becoming unavailable and causing disruptions in API service. **Resolution:** The node was immediately restarted to restore service, and we have scheduled a capacity upgrade on **November 17th** during a planned maintenance window to prevent future occurrences. We apologize for any inconvenience caused and appreciate your understanding. **Timeline:** 15:35 IST - 15:50 IST
This incident has been resolved.
We are currently investigating this issue.
Report: "File processing issue with Instant Learning and Zero Training models"
Last updateAt around 10:46 UTC on 7th Nov, one of our queueing systems experienced heavy load which led to requests getting queued and frequently timing out for Instant Learning and Zero Training models. We got alerted to it and quickly scaled it up, and by 11:15 UTC, the backlog got cleared and the incident was resolved. We are adding additional alerting to this queueing system to make sure that we can catch these type of issues well before the queue backs up.
This incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Intermittent Image Rendering Issue"
Last updateOur application is currently experiencing intermittent image rendering issues due to disruptions at our third-party image provider, ImageKit. While the API and other functionalities remain unaffected, images may occasionally fail to load for end-users and you may experience app slowness. We are closely monitoring the situation and are in contact with ImageKit to ensure a swift resolution. You can view their real-time status updates here: https://imagekitio.statuspage.io/
Report: "IN region Outage"
Last updateOur application experienced downtime in the IN region (https://in.nanonets.com) due to a database outage caused by an unexpected spike in load. The increased load resulted in the database going down, affecting application availability. We have increased the capacity and restarted the affected node to restore service. Timeline: 06:20PM IST - 06:28PM IST
Report: "User facing auto logged out issues"
Last update**Incident Summary**: On 22nd Oct 2024 around 3:43pm \(IST\) some users started facing random log outs events **Root Cause**: There was a critical downstream service that failed to scale during the high load phase. This resulted in some 5xx errors causing the user to be logged out. **Resolution**: We have provisioned enough resources to handle the load now and are planning set up auto scaling for it. **Timeline:** OCT 22nd 3:43PM IST - OCT 22nd 5:15 PM IST
This incident has been resolved.
We are currently investigating this issue.
Report: "Intermittent file upload failures"
Last update**Incident Summary**: On 07-OCT-2024 11:30PM IST, a new feature was deployed, which inadvertently caused a spike in our RDS usage. This led to some APIs timing out, causing 5xx errors, and resulting in random file upload failures for affected users. **Root Cause**: The root cause of the issue was a feature deployed by one of our developers that had an unintended effect on the database. The feature introduced inefficiencies in database queries, leading to a significant increase in RDS usage. The increased load caused API requests to exceed their time limits, resulting in timeouts and random file upload failures. **Resolution**: As soon as the issue was identified, we reverted the change and restored normal operations. The system returned to a stable state shortly after the rollback. We are now implementing additional safeguards and testing procedures to prevent similar incidents in the future, ensuring we do not experience such unanticipated impacts again. **Timeline:** OCT 7th 11:50PM IST - OCT 8th 12:30 AM IST
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Instant learning models upload was failing"
Last updateIncident Summary: An enhancement to our instant learning model inadvertently caused issues, leading to upload failures or files being stuck in a queued state. This affected files uploaded between 03:55 PM IST - 04:15 PM IST. Root Cause: The issue was traced back to a recent enhancement for document level processing in our instant learning model that caused unexpected behaviour. Resolution: We have identified and reverted the problematic enhancement. Additionally, all affected files from the specified timeframe shall be retried. Moving forward, we will implement stricter controls in our deployment process and enhance our instrumentation to detect similar issues earlier.
Report: "Blank predictions on Eu- open"
Last updateThis incident has been resolved. Interval : 12.06-2.45pm IST
The issue has been identified and a fix is being implemented.
Report: "Partial Downtime"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Instant Learning Model Enhancement Impact on File Processing"
Last updateIncident Summary: An enhancement to our instant learning model inadvertently caused issues in some customer models, leading to file processing failures or files being stuck in a queued state. This affected files uploaded between 06:00 PM IST - 09:15 PM IST. Root Cause: The issue was traced back to a recent enhancement in our instant learning model that caused unexpected behaviour in a subset of customer models. This impact was not detected early due to the random nature of the issue affecting only a small number of users. Resolution: We have identified and reverted the problematic enhancement. Additionally, all affected files from the specified timeframe have been retried. Moving forward, we will implement stricter controls in our deployment process and enhance our instrumentation to detect similar issues earlier. We apologize for the inconvenience caused and are taking steps to prevent such incidents in the future.
Report: "Downtime for EU instance"
Last updateIssue:Intermittent failures when updating fields, exporting files and error while accessing models Interval: 12.22-1.48pm IST RCA:We have seen partial failures in the DB calls originating from one specific node. Our team terminated this node as an instant fix. We are adding instrumentations to avoid this in long term process.
Report: "IN region nginx server down"
Last updateWe experienced intermittent downtime on our Nginx server in the IN region. EU and US regions are not affected. The affected Nginx instance has been restarted and is now fully operational and serving requests. Our team is actively investigating the root cause of this issue so that we can make a long term fix for this. We apologize for any inconvenience this may have caused. Duration: 11:34 AM IST - 11:41 AM IST
Report: "Downtime for EU-open instance"
Last updateIssue:Instant learning models was down on eu-open instance Time interval:4.34pm-4.45pm IST RCA:nodes were getting scheduling and terminated which resulted in EU open downtime
Report: "IN region slow response times"
Last updateHigh load on load balancers lead to slow response times for 15 mins between 7:45 AM and 8:20 AM
Report: "Failures in email imports for some customers"
Last updateDue to a recent config change, we have seen failures in processing for some of the customers using email import feature. The issue has been identified and the fix is deployed.
Report: "Platform downtime issue"
Last update**Timeline** On 15th July 07:08 - 07:18 \(UTC\) ### Incident Summary New users experienced login issues File uploads were failing ### Root Cause The incident was caused by the failure of one of the database nodes. The system's architecture requires both database nodes to be functional to handle new user logins and file uploads. With one node down, the system could not accept more connections, leading to login failures for new users and failed file uploads. ### Incident Resolution The issue was resolved by restarting the downed database instance. Once the database node was back online, the system resumed normal operations, allowing new users to log in and file uploads to be processed successfully.
This incident has been resolved.
The issue has been identified and a fix is being implemented.
Report: "Files upload affected"
Last update**Timeline**: Jul 9th 06:52 AM UTC - 09:50 AM UTC **Incident Summary:** Users experienced issues with rate limits, causing most users to be blocked from uploading further files. **Root Cause:** This issue was caused by a faulty deployment, which inadvertently affected the rate limits of different users. **Resolution:** We have identified and rolled back the faulty deployment to restore normal functionality. Additionally, we are implementing stricter measures and more thorough testing protocols to ensure that such mistakes do not occur in the future. We sincerely apologize for the inconvenience caused by this incident. Please be assured that we are taking all necessary steps to prevent such issues from happening again. We appreciate your patience and understanding as we work to improve our processes and deliver the best possible experience to our users.
Files upload affected for multiple users
Report: "Elevated response times for instant learning models"
Last update**Incident Summary:** Users experienced elevated response times for instant learning models due to a disruption in our processing system. **Root Cause:** One of our GPU nodes was down, which significantly affected file processing times and led to slower response times for our users using instant learning models. **Resolution:** We promptly identified the machine and removed it from our pool, which restored normal processing times. As a long-term fix, we are implementing a robust mechanism to ensure that any node or machine going down will not impact file processing times. This will include automatic detection and removal of faulty nodes from our pool and redistribution of the workload to healthy nodes. We sincerely apologize for the inconvenience this incident may have caused. We understand the importance of reliable and fast service, and we are taking the necessary steps to prevent such issues from occurring in the future. We appreciate your patience and understanding.
This incident has been resolved.
We are currently investigating this issue.
Report: "Platform downtime issue for IN instance"
Last updateIssue:Platform downtime issue for IN instance Interval:3.13-3.18pm IST
Report: "Downtime for EU-open instance"
Last updateIssue:Downtime for EU-open instance Interval:4.05-4.10pm IST RCA:nodes were getting scheduling and terminated which resulted in EU open downtime