Historical record of incidents for Northpass
Report: "Application Outage"
Last updateBetween 14:22 and 14:46 EDT today, our entire application was unavailable due to an unexpected memory exhaustion in our backend layer. Root Cause: A high-volume enrolment process caused backend cache errors leading to an "Out of Memory" condition. Preventative Measures Script Throttling: We'll introduce rate-limiting for large-scale data loads. Capacity Expansion: Backend cache memory limits were increased.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Login Issues for users Using SSO auth"
Last updateA recent deployment of new functionality introduced a bug in learner enrollments that affected some of our customers. As for future avoidance we enhanced tests covering sign-in and enrollments that will prevent from introducing similar errors moving forward.
This issue has been resolved. Thank you for you patience and understanding
A fix has been implemented and we are monitoring the results, thank you for your patience.
We are continuing to work on the fix for this issue, thank you for your patience and understanding.
The issue has been identified by the Engineering team and fix is being implemented...
We are continuing to investigate this issue.
We are currently investigating this issue
Report: "Performance degradation"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating issue with performance
Report: "Course Viewer issue"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating issue with Course Viewer that is preventing learners from using parts of the course viewer
Report: "Login Issues for Customers Using SSO"
Last updateTime Detected: 09:30 Status: Resolved Time Resolved: 10:45 Summary: The login issues affecting customers using Single Sign-On (SSO) have been resolved. The issue was traced to an abnormal behaviour in a firewall rule following a recent component update. Resolution Steps: The engineering team identified the cause and reverted the changes to the component update, we will continue monitoring to ensure stable access for all customers. Thank you for your patience and understanding.
Report: "Northpass App - Azure: Application outage due to database maintenance"
Last updateIncident Duration: 04:30 - 04:36 (6 minutes) Incident Description: On 05.10.2024, from 04:30 to 04:36, our application experienced an outage due to a scheduled database maintenance operation by our cloud provider. During this maintenance window, database connections were temporarily unavailable, which caused the application to be inaccessible for a brief period. Impact: - All users were unable to access the application during the downtime. - No data loss occurred, and full service was restored once the maintenance was completed. We apologize for any inconvenience caused and appreciate your understanding. If you have any further questions, please don't hesitate to reach out to our support team.
Report: "Performance Degradation"
Last updateIncident Start: 14:33 EDT Incident End: 16:08 EDT Impact: During the incident period, customers experienced increased latency. This impacted the UI responsiveness, causing inconvenience to our users. Cause: The root cause of the incident was identified as abnormal behavior with rate limiting. This anomaly disrupted the normal operation of our network infrastructure, resulting in the issues. Resolution: Our engineering team quickly identified the problem and applied a fix. Once the fix was completed problem was resolved. Next Steps: We are conducting a thorough review of our update procedures and tests to prevent similar incidents in the future.
Report: "Increase error rate and timeouts"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating issue with increased error rate and timeouts.
Report: "Performance degradation due to Azure outage"
Last updateThis incident has been resolved.
Azure team status: We have implemented networking configuration changes and have performed failovers to alternate networking paths to provide relief. Monitoring telemetry shows improvement in service availability, and we are continuing to monitor to ensure full recovery.
We are currently facing increase latency due to Azure global outage.
Report: "Admin navigation panel outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating issue with failures of loading the navigation only in admin panel. Learners are not affected
Report: "Application Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Application Outage"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are experience a very high error rate in our AWS environment, and the application is returning errors. We are investigating.
Report: "Application Outage"
Last updateThis incident has been resolved. Please reach out to support@northpass.com with any questions.
We've opened up every aspect of the application now. We are monitoring to see if anything pops up again.
The issue has been identified and isolated. The application should be online for almost all use cases except for one very specific item. We are working to get that last part resolved. We expect all typical usage to work properly now.
We are experience a very high error rate in our Azure environment, and the application is returning errors. We are investigating.
Report: "Performance degradation due to traffic spike"
Last updateStatus: Resolved Incident Start: 11:13 AM EDT Incident End: 11:33 AM EDT Duration: 20 minutes Impact: During this incident, customers experienced increased latency, timeouts, and errors while accessing our services. This issue significantly affected user experience, hindering the ability to perform operations reliably and efficiently within our platform. Cause: The incident was caused by an unexpected surge in the flow of requests, which exceeded the anticipated thresholds. This sudden increase in demand throttled our application's performance and scaling capabilities. As a result, our services struggled to process incoming requests effectively, leading to the observed issues. Resolution: Our engineering team promptly responded by increasing number of application replicas. Next Steps: To prevent a recurrence of this incident, we are reviewing and adjusting our scaling strategies to ensure they can handle sudden spikes in demand more gracefully. Additionally, we are improving our monitoring systems to detect and respond to unusual patterns of activity more quickly. Our aim is to enhance the resilience and reliability of our services, minimising any impact on your experience.
Report: "Performance degradation due to network problems"
Last updateIncident Start: 05:01 EDT Incident End: 05:19 EDT Duration: 18 minutes Impact: During the incident period, customers experienced increased network latency, leading to timeouts and failures for requests. This impacted the responsiveness and availability of our services, causing inconvenience to our users. Cause: The root cause of the incident was identified as abnormal behavior of a system network component during a routine update procedure. This anomaly disrupted the normal operation of our network infrastructure, resulting in the issues. Resolution: Our engineering team quickly identified the problem. Once the updated was completed and components restarted problem was resolved. Next Steps: We are conducting a thorough review of our update procedures and tests to prevent similar incidents in the future.
Report: "Media Library and Quiz Management Pages Aren't Loading"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
We are aware of an issue with the media library and quiz management pages not loading. Learners can still access all media, and quizzes are working as intended. However, you will not be able to access these pages in the meantime. The issue has been identified, and a fix is being rolled out shortly.
Report: "Issues Processing Events"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently experiencing an issue that is causing events to be delayed. Learners are still able to access the application and take courses. Subsequent actions such as webhooks and communications will be impacted. We know what the problem is and are working on a solution. All events will be processed once this has been resolved.
Report: "Application Outage"
Last updateThis incident has been resolved.
We are aware and have identified an issue that is causing the application to sometimes fail. We are actively working to resolve the issue and expect to have this resolved shortly.
Report: "CDN Issues"
Last updateThis incident has been resolved.
Our sub-processor has implemented a fix and is now monitoring the results. We can see that content is loading expectedly again. We will also be monitoring this situation further.
Our sub-processor has identified the issue and is now working on a resolution.
We are still waiting on our sub-processor to identify and fix the issue. Courses still work, but course images, academy logos, and some other content will not load.
One of our sub-processors is experiencing issues, and it is impacting our ability to serve content. We have contacted them and are awaiting a resolution.
Report: "Application Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently experiencing an application outage. Our team is currently investigating the issue.
Report: "Application Performance and Timeouts"
Last updateThis incident has been resolved.
The issue has been identified, and a change has been implemented. We are monitoring the situation.
We are investigating issues with sluggish performance and timeouts that may cause pages to fail.
Report: "Application Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently experiencing an application outage. Our team is currently fixing the issue.
Report: "Learner Login Issues"
Last updateThis incident has been resolved.
We have cleared the backlog of events and see improvements across all areas of the application. We will be monitoring to ensure everything is stable.
The team has made progress and sees improvements, but there are still some impacted areas. We appreciate your patience and apologize for this issue.
We have identified the issue. The team is working on a resolution. More updates to follow.
We are investigating why some learners are having issues logging into their academies.
Report: "Event Processing Delays"
Last updateThis incident has been resolved.
We have identified the issue. This issue is highly localized and most customers will not notice degraded performance.
We are currently investigating event processing delays. The application is operational, but you might notice increased lag times for webhooks and other background processes.
Report: "Image Loading Issue"
Last updateThis incident has been resolved.
Our provider has identified the root cause of the issue and is testing solutions.
We have observed an issue with our CDN provider that's causing images to fail to load. This appears to be random and sometimes the images load properly. We will continue to monitor the situation and work with our provider.
Report: "Application Performance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified, and a resolution is being put in place.
We are currently experiencing an issue that is impacting background processes. You may see a delay in background administrative tasks such as bulk actions, group enrollments, and other areas.
Report: "Application Outage"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
The Northpass application experienced an outage. The issue was identified and has been fixed. We are closely monitoring the situation.
Report: "Analytics Dashboards failing to update"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue. We will update you on our findings shortly.
We have identified that only Looker-based dashboards are impacted. We are still looking into this issue, and will update on our status shortly.
We have received reports that some analytics dashboards are failing to update. We are currently investigating this issue, and will update you on our findings shortly.
Report: "High Error rates"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Hello, We are current investigating the high error rates occurring on our application. We will update you shortly.
Report: "Webhook Delays"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are investigating increased lag times in our eventing system. This is impacting webhooks and other services that rely on these events. Learners are not impacted.
Report: "Failed learner_completed_events webhooks"
Last updateAll affected learner_created webhooks have been re-transmitted successfully.
We've discovered that the learner_created_events webhooks were silently failing. The underlying cause has been addressed, and all webhooks are operating as intended. For all missed webhooks, the affected data is being re-transmitted to customer systems. We are continuing to monitor the situation.
Report: "Webhook Delays"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating increased lag times in our eventing system. This is impacting webhooks and other services that rely on these events. Learners are not impacted.
Report: "Webhook Delays"
Last updateThis incident has been resolved.
Our operations team has identified the issue and has made adjustments that appear to be improving the situation. The events are flowing through, but we expect it will be some time before the delays are fully cleared.
We are investigating increased lag times in our eventing system. This is impacting webhooks and other services that rely on these events. Learners are not impacted.
Report: "Webhooks are not being sent to certain customers"
Last updateThe issue with webhooks has been resolved.
A fix has been implemented at this time. We are currently monitoring the situation at this time.
The issue has been identified at this time. We are implementing a fix to resolve this issue.
We are currently investigating an issue that is causing some customers not to receive webhooks.
Report: "Application Performance Issues"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating application performance issues.
Report: "Webhooks not sending"
Last updateWebhooks are fully operational.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently investigating an issue with course completed webhooks sending.
Report: "Application Outage"
Last updateThis incident has been resolved.
Our infrastructure was able to auto-heal through the issue, but we are still monitoring the situation. We are operational at the moment.
The Northpass application is currently offline. Our engineers are actively investigating the issue.
Report: "Application Performance Issues"
Last updateThe incident has been resolved.
We addressed the problem and are monitoring the situation.
The issue has been identified and a fix is being implemented.
Northpass is currently experiencing performance issues. You may notice the application working slower than usual.
Report: "Publishing Issues"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating an issue with Course Publishing that incorrectly shows the status of a published course. This only affects the administrator's experience. Learners will see the most recent published content.
Report: "Issues with Document Activities and Preview"
Last updateThis incident has been resolved.
A fix has been implemented and we are now monitoring the results.
We are aware of an issue that is impacting document-based activities and preview functionality. We know what the issue is and will have a solution in place shortly.
Report: "Increased Error Rate"
Last updateWe have not seen errors for quite some time. This is resolved.
We observed an increased error rate for about 20 minutes. The errors occurred due to failures connecting to our CDN, Amazon CloudFront. You may have noticed issues with SCORM based activities. The error rate is now within an acceptable threshold and we are actively monitoring the situation.
Report: "Academy Errors"
Last updateAcademy sites experienced errors that resulted in a "Whoops, sorry about that." error message. The issue was due to a feature flag being prematurely turned on for all academies. The issue is resolved.
Report: "Application Outage"
Last updateAt 1:37 PM EDT, there was an outage of the Northpass application. Our Operations team was notified and restored services by 1:57 PM EDT. We are investigating the root cause of the issue and will remediate accordingly.
Report: "Degraded App Performance"
Last updateThe application's performance is stable.
The health of our cluster is improving and performance is improving as well. We will continue to monitor the situation.
Our application is experiencing degraded performance and we are actively looking into the issue.
Report: "Application Outage"
Last updateThis incident has been resolved.
We've identified the issue and rectified it. We are now monitoring the application.
We are currently investigating this issue.
Report: "Unexpected Application Slowdown"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We're currently experiencing an unexpected application slowdown. We're looking into possible solutions and will update our systems as soon as possible.
Report: "Northpass platform not available for some users"
Last updateThis incident is now resolved
The issue has been identified and we are implementing the fix now. The issue is only affecting a subset of users.
We are aware that the Northpass platform is not reachable. We are investigating the issue.
Report: "Learning content inaccessible"
Last update# Current Status Operational # Summary Yesterday morning \(EST time\), we deployed a platform enhancement meant to improve security and the process of customizing a school's look and feel. Specifically, we wanted to ensure customers have easy access to the latest jQuery libraries without having to directly reference the library in their templates. However, this caused issues for customers using our Learning Experience v2 \(LXv2\) who referenced jQuery 2.x in their school’s customization code. # Impact Any customer utilizing our LXv2 who referenced jQuery 2.x in their school’s customization code experienced an issue where learners were unable to access course content. # Root Cause Prior to version 3.x, the jQuery library did not include one of the methods that our LXv2 relies on. This resulted in our LXv2 throwing a JS error, which in turn blocked any learners from accessing course content. # Trigger We regularly and frequently deploy to our production instance, so this error was introduced in our deployment that occurred at 12:20 UTC or 8:20am EST. # Resolution We reverted our latest deployment to return the application to stability at 14:50UTC or 10:50am EST. We also plan to put additional logging and monitoring in place to detect these types of issues earlier. # Corrective Actions We’re putting additional logging and monitoring solutions in place to automatically detect javascript issues like this ahead of time. As we’ve reverted our deployment to resolve this issue, we will refrain from deploying these jQuery-specific changes for a time being \(until this is closely monitored\).
Course content was inaccessible to all learners for certain Northpass customers. The issue lasted for about 2 hours starting at 8:30AM EST.
Report: "Major outage: Northpass application not reachable"
Last updateThis incident has been resolved.
We are aware that the Northpass application is not available. We are currently investigating this issue.
Report: "Intermittent issues with acessing the Northpass Application"
Last updateThis incident has been resolved
Northpass users could have been affected by an issue with Northpass application being unavailable. The issue has been resolved and we are now working on identifying the cause of it.