Napta

Is Napta Down Right Now? Check if there is a current outage ongoing.

Napta is currently Operational

Last checked from Napta's official status page

Historical record of incidents for Napta

Report: "Service outage"

Last update
postmortem

We would like to share an overview of a service interruption that occurred on May 6th, 2025, affecting access to the Napta application. Below is a timeline of events, a simplified explanation of the cause related to a database update, and the actions we are taking. ### **Timeline of Events** * **1:46 PM CET:** Users began experiencing issues accessing the Napta application. * **1:54 PM CET:** Our monitoring systems triggered "too many 5xx" alarms, indicating a high volume of errors. * **2:00 PM CET:** Issue was identified and a fix was launched. * **2:06 PM CET:** Access to the application was fully restored. ### **Root Cause** The interruption occurred during a planned update to our platform that included changes to the structure of our application's database \(a database schema migration\). An automated process that applies these database changes ran in an unexpected order relative to the application's code update. This temporary mismatch meant the application's software was looking for data in a database structure that wasn't yet fully in place, preventing it from operating correctly. ### **Action Plan** To prevent similar incidents related to database updates, we are reinforcing the synchronization between database updates and application code updates in our deployment processes to ensure they occur in the correct order, and we are reviewing the thresholds and sensitivity of our alarms to provide earlier warnings of service disruptions. ### **Closing Remarks** We sincerely apologize for the disruption this service interruption caused. We understand the importance of a reliable platform and are committed to continuously improving our systems and processes to minimize the risk of future incidents. If you have any further questions, please don’t hesitate to contact our support team.

resolved

This incident has been resolved.

monitoring

The issue is mitigated and we're now monitoring it.

identified

We've identified the cause of the problem and are working on it

investigating

We are currently experiencing service outage on app.napta.io, our team is looking into this.

Report: "Degraded performance"

Last update
postmortem

On March 14th, 2025, Napta experienced an incident affecting platform performance. We would like to share a detailed timeline of events, the root cause, and the actions taken to prevent similar issues in the future. ### **Timeline of Events** **2:40 PM CET** Our monitoring systems detected a sudden increase in response times across the platform. **3:10 PM CET** The root cause was identified and mitigated. Response times returned to normal and the platform became fully responsive again. ### **Root Cause** The issue was caused by a process that unexpectedly acquired a global lock on a specific client database for an undetermined amount of time. This lock prevented other transactions from executing, triggering timeouts and creating bottlenecks across the system for all our clients. ### **Action Plan** To prevent similar incidents in the future, we introduced stricter timeout configurations to automatically terminate blocking database operations after a short period of time, preventing prolonged lock situations. ### **Closing Remarks** We sincerely apologize for the inconvenience this incident may have caused. We remain committed to maintaining a reliable and high-performing platform, and continue to improve our monitoring and safeguards accordingly. ‌ If you have any further questions, please don’t hesitate to contact our support team.

resolved

We have identified the root cause of the issue and successfully resolved it. A postmortem report will be published soon. We sincerely apologize for any inconvenience this may have caused.

investigating

We are currently experiencing degraded performances on app.napta.io, our team is looking into this.

Report: "Degraded performance"

Last update
postmortem

On November 29th, 2024, Napta experienced an incident affecting its users. We would like to share a detailed timeline of events, the root cause, and the actions taken to prevent similar issues in the future. ### **Timeline of Events** **9:00AM - 9:20 AM CET** Application slowdown due to server allocation failures from the cloud provider \(likely due to Black Friday demand\). A client reported being unable to connect to the platform but after successfully acquiring servers, the issue seemed resolved. It was later determined that the client’s connection issue was not directly linked to the server allocation failure \(see below\). **1:26 PM CET** Our monitoring dashboards detected sporadic 504 Gateway Timeout errors affecting a small percentage of requests. Initial investigations were launched. **1:38 PM CET** To address the issue, an instance refresh was performed and completed by 1:51 PM CET. However, the errors persisted, and further investigations were conducted. **2:43 PM CET** All backend tasks began failing health checks and were restarted by our container orchestration system. This resulted in a brief period of downtime, during which the application was unavailable. **2:53 PM CET** New backend tasks were successfully deployed, and the application was restored. No additional 504 errors were observed. Monitoring and investigations continued. **3:55 PM CET** A few clients reported that Napta was stuck on an infinite loading screen. Upon investigation, we discovered that backend-to-database connections for these specific clients were stuck. Simultaneously, another asynchronous service handling a high number of tasks was creating excessive database connections, which were also stuck. Terminating the database connections of the asynchronous service resolved the backend issue temporarily. To mitigate the situation, we halted the asynchronous service. **4:36 PM CET** The root cause was identified and fixed. The issue stemmed from a regression in the asynchronous service, which caused database connections not to close properly and being randomly stuck. As the database connections were shared between the backend and asynchronous services via PgBouncer, backend was impacted by the issue on asynchronous service. The fix was deployed successfully, restoring full functionality. ### **Action Plan** To prevent similar incidents in the future, we have implemented the following improvements: **Enhanced Monitoring and Alerts:** Additional monitoring and alerting are being implemented for the asynchronous service to detect anomalies earlier. **PgBouncer Configuration Review:** We reviewed PgBouncer’s configuration to ensure that stuck connections can be cleared after a timeout. ### ‌Closing Remarks We sincerely apologize for the inconvenience caused by this incident and thank our clients for their patience and understanding. Ensuring the stability and reliability of Napta is our top priority, and we are committed to learning from this incident to provide an even better experience moving forward. If you have any further questions, please don’t hesitate to contact our support team. The Napta Team

resolved

This incident has been resolved. Sorry for the inconvenience, we will communicate soon on the root cause.

monitoring

We are still monitoring.

monitoring

The issue is mitigated and we're now monitoring it.

investigating

We are currently experiencing degraded performances on app.napta.io, our team is looking into this.

Report: "Degraded performance"

Last update
postmortem

On December 14th, 2023, Napta experienced degraded performance on its platform for approximately 27 minutes, impacting customers between 7:59 AM and 8:26 AM UTC. Below is a detailed summary of the incident, its root cause, and the actions we have taken to address it. ### **Incident Summary** Between **7:59 AM** and **8:26 AM UTC**, Napta users experienced degraded performance, primarily due to slowdowns caused by high demand during the morning peak. The root cause was traced to a failure in our automated cluster configuration change jobs, which were blocked due to an issue on GitLab’s infrastructure. This issue prevented critical adjustments to our production cluster, which are normally executed automatically. The incident was detected at **8:07 AM UTC** by our monitoring system, which raised an alarm for exceeding critical response time thresholds. Manual intervention was required to resolve the issue, and service was fully restored by **8:26 AM UTC**. ### ‌**Timeline** **December 13th, 2023** * **9:52 PM UTC**: GitLab identified an issue on their platform that prevented GitLab jobs from running correctly. **December 14th, 2023** * **6:02 AM UTC**: Our automated cluster configuration change job failed due to the ongoing GitLab issue. * **6:15 AM UTC**: A secondary backup job, designed to execute if the first job fails, also encountered the same issue due to GitLab’s infrastructure problem. * **7:59 AM UTC**: Platform slowdowns began to occur as a result of increased traffic during the morning peak, combined with the absence of the necessary cluster configuration updates. * **8:07 AM UTC**: Our monitoring system raised an alert for exceeded critical response times. Investigations commenced immediately. * **8:24 AM UTC**: Manual intervention restored the correct cluster configuration. * **8:26 AM UTC**: All platform performance issues were resolved. ‌ ### **Actions to Prevent Future Incidents** **Add Job Redundancy** We have implemented additional redundancy in our GitLab jobs to ensure that critical cluster configuration updates can be executed through alternative paths if primary jobs fail. **Enhance Alerting** New alarms were introduced to detect and notify the team earlier about failed configuration jobs, enabling faster response times and minimizing potential impact. **Implement Autoscaling Improvements** We will optimize our autoscaling configurations to better handle traffic surges during high-demand periods, reducing the risk of degraded performance even in cases of delayed configuration updates. ‌ ### **Closing Remarks** We sincerely apologize for the inconvenience this incident caused to our customers. At Napta, ensuring a seamless experience for our users is our top priority, and we are committed to learning from this incident to improve the reliability of our platform. If you have any further questions or concerns, please do not hesitate to contact our support team. ‌ **The Napta Team**

resolved

Our monitoring shows that users are no longer experiencing issues. We will mark this incident as resolved, thank you for your patience.

monitoring

The issue is mitigated and we're now monitoring it.

identified

We've identified the cause of the problem and are working on it

investigating

We are currently experiencing degraded performances on app.napta.io, our team is looking into this.

Report: "Degraded performance"

Last update
postmortem

* at **9:04 AM CET**, an alert is received concerning the response time for our backend tasks. After analysis, we discovered that the daily scaling of our infrastructure did not run. There were then not enough backend tasks to support the load. * at **9:34 AM CET**, the number of backend tasks is back to its nominal value * at **9:41 AM CET**, an alert is received concerning a high CPU level on one of our database server. After analysis, we discovered at **9:50 AM CET** that it is due to the unwanted activation of a feature sent in production on Friday June 9th at 1:00 PM CET. A fix is developed and, in parallel of its deployment, a corrective action is performed on the concerned client database in order to lower the impact of the issue. * at **10:15 AM CET**, the concerned database server CPU is back to its nominal value, both incidents are resolved ‌ Following these incidents, the following actions were implemented: * Setup of an alert monitoring in real tome the number of active backend tasks * Integration of alerts to our day to day tools in order to gain reactivity

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are still seeing degraded performances for some of our clients.

monitoring

The issue is mitigated and we're now monitoring it.

identified

We've identified the cause of the problem and are working on it.

investigating

We are currently experiencing degraded performance on app.napta.io, our team is currently looking into this.

Report: "Degraded performance"

Last update
resolved

This incident has been resolved.

monitoring

The issue is mitigated and we're now monitoring it.

investigating

We are currently experiencing degraded performances on app.napta.io, our team is looking into this.

Report: "Some requests are failing"

Last update
postmortem

* at 17:47 CET time, an alert is received concerning a timeout in the execution of some of our database requests. * at 17h54 CET time, the incident is resolved. The root cause came from the deployment of our software version and the related modification of database schema. During this modification process, a query was stuck during 30 seconds \(statement\_timeout setting of our databases\), and the action was retried 4 times. We were not able to find the reason why the query was stuck and the deployment on our staging environment didn't encounter this issue. ‌ Following these incidents, the following actions are decided: * we will not use anymore the helper function that sends this specific database query. We have other options to achieve the same goal.

resolved

The update of the production database caused some requests to fail from 3:46 PM UTC to 3:54 PM UTC