Sumsub

Is Sumsub Down Right Now? Check if there is a current outage ongoing.

Sumsub is currently Operational

Last checked from Sumsub's official status page

Historical record of incidents for Sumsub

Report: "Dashboard visual issues"

Last update
resolved

We are happy to confirm the Dashboard appearance is back to normal. Should you still experience any issues with the Dashboard visual representation or have any additional questions, do not hesitate to contact Sumsub Support!

investigating

Some of our clients have reported visual issues with our Dashboard (cockpit.sumsub.com). Dashboard page might look incompletely loaded, containing artifacts and glitches. Issue affects only visual dashboard appearance. There's no impact on verification process or service otherwise. Our technical teams are investigating the problem and will provide an update as soon as possible. We apologize for the inconvenience!

Report: "Temporary SMS delivery issues"

Last update
postmortem

## **Incident Timings** **Start time:** 23 Feb 07:00 UTC **End time:** 24 Feb 14:00 UTC ## **Incident Summary** Some of our clients reported a decline in SMS delivery rates. After switching to the fallback SMS sending method, delivery remained stable for some time but began declining again after several hours. As a result, a small portion of applicants were unable to complete verification because they did not receive SMS notifications. Once the underlying issue was identified and resolved, our team ensured that no further action was required from clients. Affected applicants can complete the verification process by simply retrying the SMS step. ## **Root Cause** A significant volume of untracked SMS messages caused by a technical issue. These messages were sent in unusual way, allowing to bypass both our system’s and the SMS provider’s rate-limiting mechanisms. This resulted in an unusually high spike in message traffic, leading to message queues, with some messages timing out and not being delivered. ## **Action Plan** As a result of this issue, our Team has identified several key areas for improvement: 1. **Enhance Monitoring & Visibility:** Improve alerting for SMS delivery 2. **Prevent Recurrence:** Enhance rate limiting and throttling 3. **Collaborate with SMS providers:** Coordinate with our SMS service providers to improve anomaly detection mechanisms from their side and introduce additional fallback options ## **Conclusion** If you are still experiencing any unusual SMS delivery rate fluctuations or if you have any questions or concerns please contact our Support Team.

resolved

After several hours of extensive monitoring and testing, issue is confirmed to be resolved. Our Team has conducted additional series of checks and tests to ensure the delivery remains stable. If you have any further issues or questions, please contact Sumsub Support!

monitoring

We are happy to report that SMS delivery issues have been resolved and we are now actively monitoring the system to ensure stability. Thank you for your patience. Please reach out to Sumsub Support if you experience any further issues.

investigating

We are currently experiencing temporary SMS delivery issues affecting some users. Our team is actively investigating the issue and working with SMS service providers to resolve the issue as quickly as possible. We apologize for any inconvenience this may cause and appreciate your patience. Updates will be provided as soon as more information is available. Thank you for your understanding.

Report: "Temporary verification disruptions"

Last update
postmortem

## **Incident Timings** **Start time:** 19 Jan 16:30 UTC **End time:** 19 Jan 17:44 UTC ## **Incident Summary** As a result of unprecedented surge in traffic, our team identified an issue affecting one of the services in our application infrastructure. This incident caused verifications to hold for approximately 50 minutes. All pending verifications during this time were automatically queued to be processed later. After identifying and fixing the underlying issue, our team ensured all queued operations were successfully completed without any impact on the users or any need for additional actions from our clients. ## **Root Cause** The incident was primarily caused by temporary downtime in an isolated service cluster, compounded by failover complications. Under peak load, excessive resource usage led to node instability. The failover mechanism did not operate as intended, delaying the restoration of normal functionality. ## **Action Plan** Our team has planned several improvements in the above-mentioned components of our infrastructure: 1. **Service Optimization:** this will make the service less resource-demanding and therefore reducing probability of any outages altogether 2. **Failover Enhancement:** Reconfiguring failover mechanisms to increase redundancy 3. **Alerting Optimization:** Improve alerting to ensure timely detection of potential issues ## **Conclusion** We sincerely apologize for any inconvenience this incident may have caused. Ensuring the reliability and stability of our systems is our highest priority, and we are committed to learning from this event. The changes and improvements outlined above will strengthen our infrastructure and reduce the likelihood of similar incidents in the future. Thank you for your understanding and continued trust in our team and product. If you have any questions or concerns, please don’t hesitate to contact us our Support team.

resolved

We would like to inform that the issue is now fully resolved, verification time is back to normal, all delayed users are completely processed. Our Team has conducted additional series of checks and tests to ensure the services remain stable. We confirm the problem has been fully mitigated as of now. An unprecedented burst of traffic hit our systems in a very short timeframe, causing an extreme spike in load and leading to the issue. We have implemented measures to handle increased demand more effectively, and our team will continue monitoring to ensure stable service going forward.

monitoring

We are pleased to report that the internal service responsible for verification processing has been successfully restored. Our team is now focused on addressing any remaining consequences from the incident—such as clearing backlogs or reconciling delayed verifications and monitoring the system stability.

investigating

One of our internal services, responsible for verifications, has experienced a failure. As a result, some users may encounter delayed verifications. Verification processes amay experience significant delays, which may affect user onboarding, transaction approvals, or other verification-dependent workflows. The service downtime may lead to extended waiting periods for end-users or automated processes requiring verification. The engineering team is actively investigating the root cause of the service failure. Temporary workarounds are being explored to reduce user impact while the service is being restored.

Report: "Incident Report: Temporary Elevation of API Errors During Planned Infrastructure Change (09:05–09:40 UTC)"

Last update
resolved

09:05 to 09:40 UTC (35 minutes) During this period, a small number of 5XX errors may have been observed by some of our clients accessing our API endpoints. These errors were caused by an issue encountered during a planned infrastructure change. We are pleased to inform you that the service is operating normally, and no further action is required. If you experience any lingering issues or have concerns following this incident, please don't hesitate to contact Sumsub Support for assistance.

Report: "Temporary API Disruption on July 11th, 2024"

Last update
resolved

We experienced a brief disruption in our API services on July 11th, 2024, between 10:34 and 10:40 (UTC), during which some requests to api.sumsub.com returned 5XX errors. Our team swiftly addressed the issue, and the platform was fully operational and stable after six minutes. We are conducting a thorough investigation to understand the root cause and implement measures to prevent future occurrences. If you continue to experience any issues, please contact Sumsub Support for assistance. We apologize for any inconvenience this may have caused and appreciate your understanding and cooperation.

Report: "Possible Request Timeouts Notice"

Last update
postmortem

Post an extensive investigation conducted by the Sumsub Team, the following conclusions have been reached: * The incident has only impacted the performance of requests to [api.sumsub.com](http://api.sumsub.com/) and caused no downtime. * The issue was caused by a network stack failure in one of the database cluster nodes. * The affected node has been phased out from production for further diagnostics. As of now, the performance of [api.sumsub.com](http://api.sumsub.com/) remains optimal. The problematic node has been replaced. Our team will incorporate lessons learned from this incident into our next review of redundancy protocols.

resolved

We are happy to announce that the issue is considered as fully resolved. Our Team has conducted additional series of checks and tests to ensure the issue is fixed. We do confirm that service performance remains perfect and there is no impact on service operation. Thank you for your patience!

monitoring

We are pleased to inform you that the problem was succesfully mitigated. You can expect no further impact on our services' performance. Our team is still continuing to monitor the situation and does additional investigation of the issue. We will continue to monitor the system closely for the next 30 minutes to ensure stability and optimal performance. Should there be any further issues, we will take immediate action and provide you with an update.

investigating

We're currently investigating possible timeouts of requests to our services. Our team is actively working to identify the root cause and resolve the issue with the highest priority. We apologize for any inconvenience this may cause and are currently doing our best to resolve this as quickly and safely as possible. We will continue to monitor the situation and provide another update in 30 minutes, or sooner if there are any significant changes.

Report: "Potential Performance Impact Notice Due to Data Center Technical Difficulties"

Last update
resolved

We are happy to announce that all issues related to the recent technical difficulties at our Data Center are now fully resolved. We have conducted thorough checks and confirm that service performance remains perfect and all affected cases have been cleared. Thank you for your patience!

monitoring

We are pleased to inform you that the technical issues affecting our Data Center have been successfully resolved. Our services are now fully operational without any performance impacts. Our team is actively working on clearing any cases that were affected during the disruption. We will continue to monitor the system closely to ensure stability and optimal performance. Should there be any further issues, we will take immediate action and provide you with an update.

investigating

We are still actively investigating the issue and will provide another update in 30 minutes, or sooner if there are any significant changes.

investigating

One of our Data Centers is currently experiencing technical difficulties. The services remain operational with possible impact on performance, which potentially might result in longer verification process. We apologize for any inconvenience this may cause and are currently doing our best to resolve this as quickly and safely as possible. We will continue to monitor the situation and provide another update in 30 minutes, or sooner if there are any significant changes.

Report: "API Access Issue for a part of clients on growing plans (self-service)"

Last update
resolved

Timeframe: 2:00 PM - 5:00 PM UTC Issue: Some self-service clients may have encountered a 402 HTTP error due to a billing service update. Current Status: Resolved. All services are functioning normally. Client Action: None required.

Report: "Elevated API requests errors"

Last update
resolved

This incident has been resolved.

investigating

The Issue has partial affect on Sumsub clients. We are continuing to investigate this issue. Cloudflare has set the critical priority to this issue. Related Cloudflare status can be tracked here https://www.cloudflarestatus.com/

investigating

We are continuing to investigate this issue. Related Cloudflare status can be tracked here https://www.cloudflarestatus.com/

investigating

Starting from 5:15 UTC, some clients, primarily from Southeast Asia, may experience issues with their requests to our API: Connection timeout. The issue is on our infrastructure partner Cloudflare side. Together with Cloudflare we are investigating the problem and will provide details soon.

Report: "Technical Issue with document uploading availability"

Last update
resolved

Date: 2023-05-12 Duration: 11:00 UTC - 11:09 UTC (9 minutes) Impact: Document uploading and fetching from the dashboard were not working. Incident Summary On May 12, 2023, our system experienced a disruption in document uploading and downloading functionality on the dashboard. The issue persisted for 9 minutes, during which time requests failed with HTTP 500 errors. Users were unable to upload documents or access them from the dashboard. Our team promptly investigated the issue and resolved it. We apologize for any inconvenience this incident may have caused, and we remain committed to providing reliable and high-quality service to our users. We will continue to monitor the situation and implement necessary measures to prevent similar issues in the future. Timeline 11:00 UTC - Received alerts from the monitoring system and initiated an investigation. 11:02 UTC - Root cause identified. 11:07 UTC - Root cause addressed and fixed. 11:09 UTC - Service restored and returned to normal.

Report: "MSDK crashing while opening ID photo screen"

Last update
postmortem

Minor bug caused by back-end update. Issue was detected only on the old versions of MSDK \(1.18.\*\) Impact: Several dozens of users who were using an outdated version of the mobile SDK.

resolved

This incident has been resolved.

investigating

We are currently investigating the issue.

Report: "Technical Issue with dashboard availability"

Last update
postmortem

Around 15:50 UTC we received the first alerts on increased system load in our platform. Our automatic scaling system attempted to mitigate the problem by increasing the number of backend instances. During this time a significant amount of requests were still going through, but our system was showing an extreme delay performing any of these actions.This prompted our Engineering team to open an incident report and dive into a full scale investigation. As result we found out that IO was the root cause. It is important to clarify that our backend relies heavily on a distributed file system provided by AWS. We opened a case with AWS, as we worked around the clock on a plan to make our system responsive again, without knowing that the root cause had started on Amazon side. Here are some of the actions : 1\. We replaced the file system for another one with more aggressive settings. That action showed improvement, but unfortunately did not gave us expected results. This forced us to make some changes on the backend to prevent any performance degradation while working without distributed filesystem at all. 2\. Around 21:25 UTC - A fix was rolled out. And we confirmed, the changes made in the system were working with the expected performance. 3\. Around 00:15 UTC - AWS acknowledged, there were elevated latencies for the file system and and started investigation on their side 4\. Around 02:00 UTC - AWS identified the issue’s root cause and confirmed they are working on a fix. There were no further updates from AWS on the case yet. Although, this incident was not yet reflected on the global AWS status page.

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

Service has encountered a technical problem with access for majority of clients. No images can be uploaded for applicants. Our team is investigating the incident.

investigating

We are currently investigating this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

Service has encountered a technical problem with access for majority of clients. No images can be uploaded for applicants. Our team is investigating the incident.

Report: "Technical issue with AML service"

Last update
resolved

This incident has been resolved.

identified

Our service has encountered a technical problem preventing us from getting a response from the external source responsible for providing AML matches. In some cases, there may be an increase in verification time. Our team together with AML provider are actively investigating the incident and we will update you with more details as we have them.

Report: "S3 incident"

Last update
resolved

We're experiencing an elevated level of API errors and are currently looking into the issue.

Report: "S3 incident"

Last update
resolved

We're experiencing an elevated level of API errors and are currently looking into the issue.

Report: "Data Processing Delays - Reporting Tools Affected"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

Issue identified on our side. KYC checks take significantly longer time. LiveChat widgets are also affected. Customer support available only via e-mail or TG chat.

investigating

We are continuing to investigate this issue.

investigating

Currently applicant checks take significant longer time, than usually. No data has been lost and the system should be caught up shortly.

Report: "Cloudfare incident"

Last update
resolved

This incident has been resolved

investigating

Cloudfare incident

Report: "Access issue to WebSDK styles and texts services."

Last update
resolved

Lasted for 15 minutes, resolved

Report: "Increased latency"

Last update
resolved

This incident has been resolved.

investigating

We investigate increased latency and a small percentage of the traffic being dropped.

Report: "Latency Issues"

Last update
resolved

This incident has been resolved.

identified

Cloudflare has implemented a fix for this issue and is currently monitoring the results. https://www.cloudflarestatus.com