CyberSmart

Is CyberSmart Down Right Now? Check if there is a current outage ongoing.

CyberSmart is currently Operational

Last checked from CyberSmart's official status page

Historical record of incidents for CyberSmart

Report: "We are experiencing API stability issue - Ongoing Improvements"

Last update
resolved

The performance has stabilised across the Cybersmart platforms.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are currently experiencing API stability issues in CyberSmart Active Protect due to the recent rollout of our enhanced vulnerability detection system. The new system requires more infrastructure resources, which has lead to performance degradation. Current Status API Stability: Increased load on our cloud infrastructure and database is causing slowdowns and occasional instability. Performance Monitoring: We’ve identified that complex queries on large data sets are contributing to the issue. Impact: You may experience slow loading or errors in our web product, particularly in the Vulnerable Software module. Actions Taken - Scaled up infrastructure to handle increased demand. - Ongoing monitoring of performance and stability. - API optimisations underway to improve efficiency and throughput. We appreciate your patience as we work to stabilise the API this week. Our goal remains to provide high availability, performance, and reliability.

Report: "AWS infrastructure issue"

Last update
resolved

Our infrastructure provider has restored all services.

monitoring

AWS have reporting systems being restored, our application is back online. We are now monitoring to ensure systems remain operational - you may experience intermittent outages.

investigating

We are currently investigating an issue with our infrastructure provider. We have identified the issue and are working with AWS to resolve

Report: "Outage Incident - 20/11/2020 - Web App"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

Our main database is having issues relating to known bugs from our cloud provider. User's will not be able to use our web platform whilst this defect is de-bugged and a solution is in place. Together with our cloud providers are working to resolve the problem.

Report: "Outage Incident - 24/08/2020 - Web App"

Last update
resolved

Issue Summary - Total Outage time: 28 mins - Dashboard users were not able to log in and use the platform, they faced an error message on any request. Timeline (GMT) - 09:25: Issue Reported - 09:53 Issue Resolved Root Cause Spike in traffic which caused a set amount of connections to be refused from our web servers. Resolution and recovery Our Devops & Infrastructure team are working on implementing scaling services to allow for spikes in traffic.

Report: "Outage Incident - 10/06/2020 - Application (Mobile/ Desktop Agents)"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

The issue causes all Applications to fail to report to our Web Platform. The issue has been identified and a fix is underway. Our Platform admin access is unaffected.

Report: "Outage Incident - 18/05/2019 - Web App"

Last update
resolved

This incident has been resolved.

investigating

This issue is now resolved.

investigating

Issue Summary - Total Outage time: ~1 hours - Dashboard users were not able to log in and use the platform, they faced an error message on any request. Timeline (GMT) - 09:10: Issue Reported - 10:00 Issue found - 10:15 Issue resolved Root Cause Caused by service upgrades that had taken place on Sunday evening. Resolution and recovery Our Devops & Infrastructure team are working on building more in-depth processes for system changes.

Report: "Outage Incident - 14/11/2019 - Web App"

Last update
resolved

Issue Summary - Total Outage time: ~3 hours - All users and applications where not able to access our Web Platform. Timeline (GMT) - 18:16: Issue Reported - 10:57 CyberSmart platform back online Root Cause New version release caused major errors due to a Database migration error. Resolution and recovery Using cloud services and migration fixes we were able to run the migration on our larger database tables. Corrective and Preventative Measures We are addressing current designs and plans to scale currently and will be assessing our use and future changes to our Database architecture and services.

Report: "Outage Incident - 15/11/2019 - Web App"

Last update
resolved

Issue Summary - Total Outage time: ~17 hours - Admin users where not able to view their organisation's application results from within our portal. Timeline (GMT) - 10:13: Issue Reported - 10:30 Issue found - 10:52 patch release launched and issue confirmed fixed Root Cause Caused by a code level error which was affected by a change to our database schema (migrations). Resolution and recovery Our Quality Assurance team is working on improving our regression testing for releases. We aim to release fast and incrementally so we can get the latest features in use by our customers. Corrective and Preventative Measures Quality Assurance review on Regression testing

Report: "Outage Incident - 11/10/2019 - Web App"

Last update
resolved

Issue Summary - Total Outage time: 2m - All users were unable to access the CyberSmart Web platform due to a 3rd party component failure. - Platform was down for all users from 20:49 to 20:51. Timeline (GMT) - 20:49: Issue Began - 20:50: Issue resolved by automatic restart - 20:51: CyberSmart platform back online Root Cause We are currently upgrading our platform to cater for user scaling and are experiencing issues memory issues with the legacy components of the platform. Resolution and recovery We understand the problems and are currently underway in designing and launching a new backend architecture for scale and failover. This is currently in the testing phase and will live before 01/12/19. Corrective and Preventative Measures N/A

Report: "Outage Incident - 15/10/2019 - Web App"

Last update
resolved

Issue Summary - Total Outage time: 1m - All users were unable to access the CyberSmart Web platform due to a 3rd party component failure. - Platform was down for all users from 4:37 to 4:38. Timeline (GMT) - 4:37: Issue Began - 4:37: Issue resolved by automatic restart - 4:38: CyberSmart platform back online Root Cause We are currently upgrading our platform to cater for user scaling and are experiencing issues memory issues with the legacy components of the platform. Resolution and recovery We understand the problems and are currently underway in designing and launching a new backend architecture for scale and failover. This is currently in the testing phase and will live before 01/12/19. Corrective and Preventative Measures N/A

Report: "Outage Incident - 19/10/2019 - Web App"

Last update
resolved

Issue Summary - Total Outage time: 2m - All users were unable to access the CyberSmart Web platform due to a 3rd party component failure. - Platform was down for all users from 4:01 to 4:03. Timeline (GMT) - 4:01: Issue Began - 4:02: Issue resolved by automatic restart - 4:03: CyberSmart platform back online Root Cause We are currently upgrading our platform to cater for user scaling and are experiencing issues memory issues with the legacy components of the platform. Resolution and recovery We understand the problems and are currently underway in designing and launching a new backend architecture for scale and failover. This is currently in the testing phase and will live before 01/12/19. Corrective and Preventative Measures N/A

Report: "Outage Incident - 21/10/2019 - Web App"

Last update
resolved

Issue Summary - Total Outage time: 1m - All users were unable to access the CyberSmart Web platform due to a 3rd party component failure. - Platform was down for all users from 1:19 to 1:20. Timeline (GMT) - 1:19: Issue Began - 1:19: Issue resolved by automatic restart - 1:20: CyberSmart platform back online Root Cause We are currently upgrading our platform to cater for user scaling and are experiencing issues memory issues with the legacy components of the platform. Resolution and recovery We understand the problems and are currently underway in designing and launching a new backend architecture for scale and failover. This is currently in the testing phase and will live before 01/12/19. Corrective and Preventative Measures N/A

Report: "Outage Incident - 21/10/2019 - Web App"

Last update
resolved

Issue Summary - Total Outage time: 2m - All users were unable to access the CyberSmart Web platform due to a 3rd party component failure. - Platform was down for all users from 1:21 to 1:23. Timeline (GMT) - 1:21: Issue Began - 1:22: Issue resolved by automatic restart - 1:23: CyberSmart platform back online Root Cause We are currently upgrading our platform to cater for user scaling and are experiencing issues memory issues with the legacy components of the platform. Resolution and recovery We understand the problems and are currently underway in designing and launching a new backend architecture for scale and failover. This is currently in the testing phase and will live before 01/12/19. Corrective and Preventative Measures N/A

Report: "Outage Incident - 23/07/19 - Web App Impact"

Last update
resolved

Issue Summary - Total Outage time: ~2.5m hours - All users were unable to access the CyberSmart Web platform due to a 3rd party component failure. - All customers and application HTTP requests to the platform resulted in 502 errors - A third party hosting/services company (Amazon Web Services) experienced an outage in which we have a number of key infrastructure components hosted with. Timeline (GMT) - 16:33 Issue Began - 16:50 Staff were notified of the issue - 19:00: Issue resolved (by external service provider) - 19:03: CyberSmart platform back online Root Cause Amazon AWS had issues with a few of there platform infrastructure services including degraded performance for EBS volumes within the “EU-WEST-2”Region, which is a key part of the RDS component CyberSmart uses for data storage. Resolution and recovery N/A Corrective and Preventative Measures We have planned a work-stream for improved failover within CyberSmart, including using PaaS services distributed over different geographical regions. This will allow automatic corrective measures to keep our services online when a given region has issues.

Report: "Outage Incident - 02/07/2019 - Web App"

Last update
resolved

Issue Summary - Total Outage time: 24m - All users were unable to access the CyberSmart Web platform due to a 3rd party component failure. - Platform was down for all users from 14:45 GMT until 15:09 GMT - All customers and application HTTP requests to the platform resulted in 502 errors - A third party hosting/services company (CloudFlare) experienced an outage in which we have a number of key infrastructure components with. Timeline (GMT) - 14:45 Issue Began - 14:48: Staff were notified of the issue - 15:09: Issue resolved (by external service provider) - 15:09: CyberSmart platform back online Root Cause A number of services used to host our product provided by an external service provider had technical issues and had full outage over their platform. Resolution and recovery N/A Corrective and Preventative Measures We will be exploring contingency plans for future outages with these third party services.

Report: "Outage Incident - 03/05/2019 - Web App"

Last update
resolved

Issue Summary - Total outage time: 4h 38m - All users were unable to access the CyberSmart web dashboard, the CyberSmart applications continued to function but the requests were not handled - The web dashboard was down for all users from 13:48 until 18:26 - All customers requests to the platform resulted in 502 errors - A new feature required an unforeseen time to complete, causing our REST API’s CPU usage to run at critically high levels Timeline (GMT) - 13:48 Issue began - 13:49: Staff were notified of the issue - 16:44: Problem found - 18:23: Fix pushed to Production - 18:26: Service Restored Root Cause - Our API server handles requests from deployed applications on Desktop and Mobile (in testing) devices. These send an update HTTP POST request containing changes of configuration every 15 minutes. Our new feature which checks application vulnerabilities on a user’s local computer does a ‘lookup’ against our database to check for any new vulnerabilities. This query was taking up to 30 seconds to return (for over 100k requests), which caused huge load on the CPU resource of the REST Server as it continued to try and processes new requests, and evidently caused the server to return 502 errors. Resolution and recovery - We have disabled the new feature from the web dashboard which was causing the downtime, and the dashboard is back online and working normally. The feature was not visible to users, so no loss of service will occur. Corrective and Preventative Measures - We have planned for these REST API endpoints to be evaluated and refactored for increased efficiency, specifically for database lookup. We aim to have the feature back online and visible in our next public release. - We are dedicating time in the coming weeks to upgrading our monitoring systems to alert us to the location of future problems, and aide us in debugging and testing, in turn allowing us to achieve a quicker resolution time when required.