Northpass

Is Northpass Down Right Now? Check if there is a current outage ongoing.

Northpass is currently Operational

Last checked from Northpass's official status page

Historical record of incidents for Northpass

Report: "Application Outage"

Last update
resolved

Between 14:22 and 14:46 EDT today, our entire application was unavailable due to an unexpected memory exhaustion in our backend layer. Root Cause: A high-volume enrolment process caused backend cache errors leading to an "Out of Memory" condition. Preventative Measures Script Throttling: We'll introduce rate-limiting for large-scale data loads. Capacity Expansion: Backend cache memory limits were increased.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Login Issues for users Using SSO auth"

Last update
postmortem

A recent deployment of new functionality introduced a bug in learner enrollments that affected some of our customers. As for future avoidance we enhanced tests covering sign-in and enrollments that will prevent from introducing similar errors moving forward.

resolved

This issue has been resolved. Thank you for you patience and understanding

monitoring

A fix has been implemented and we are monitoring the results, thank you for your patience.

identified

We are continuing to work on the fix for this issue, thank you for your patience and understanding.

identified

The issue has been identified by the Engineering team and fix is being implemented...

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue

Report: "Performance degradation"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating issue with performance

Report: "Course Viewer issue"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating issue with Course Viewer that is preventing learners from using parts of the course viewer

Report: "Login Issues for Customers Using SSO"

Last update
resolved

Time Detected: 09:30 Status: Resolved Time Resolved: 10:45 Summary: The login issues affecting customers using Single Sign-On (SSO) have been resolved. The issue was traced to an abnormal behaviour in a firewall rule following a recent component update. Resolution Steps: The engineering team identified the cause and reverted the changes to the component update, we will continue monitoring to ensure stable access for all customers. Thank you for your patience and understanding.

Report: "Northpass App - Azure: Application outage due to database maintenance"

Last update
resolved

Incident Duration: 04:30 - 04:36 (6 minutes) Incident Description: On 05.10.2024, from 04:30 to 04:36, our application experienced an outage due to a scheduled database maintenance operation by our cloud provider. During this maintenance window, database connections were temporarily unavailable, which caused the application to be inaccessible for a brief period. Impact: - All users were unable to access the application during the downtime. - No data loss occurred, and full service was restored once the maintenance was completed. We apologize for any inconvenience caused and appreciate your understanding. If you have any further questions, please don't hesitate to reach out to our support team.

Report: "Performance Degradation"

Last update
resolved

Incident Start: 14:33 EDT Incident End: 16:08 EDT Impact: During the incident period, customers experienced increased latency. This impacted the UI responsiveness, causing inconvenience to our users. Cause: The root cause of the incident was identified as abnormal behavior with rate limiting. This anomaly disrupted the normal operation of our network infrastructure, resulting in the issues. Resolution: Our engineering team quickly identified the problem and applied a fix. Once the fix was completed problem was resolved. Next Steps: We are conducting a thorough review of our update procedures and tests to prevent similar incidents in the future.

Report: "Increase error rate and timeouts"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating issue with increased error rate and timeouts.

Report: "Performance degradation due to Azure outage"

Last update
resolved

This incident has been resolved.

monitoring

Azure team status: We have implemented networking configuration changes and have performed failovers to alternate networking paths to provide relief. Monitoring telemetry shows improvement in service availability, and we are continuing to monitor to ensure full recovery.

identified

We are currently facing increase latency due to Azure global outage.

Report: "Admin navigation panel outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating issue with failures of loading the navigation only in admin panel. Learners are not affected

Report: "Application Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Application Outage"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are experience a very high error rate in our AWS environment, and the application is returning errors. We are investigating.

Report: "Application Outage"

Last update
resolved

This incident has been resolved. Please reach out to support@northpass.com with any questions.

monitoring

We've opened up every aspect of the application now. We are monitoring to see if anything pops up again.

identified

The issue has been identified and isolated. The application should be online for almost all use cases except for one very specific item. We are working to get that last part resolved. We expect all typical usage to work properly now.

investigating

We are experience a very high error rate in our Azure environment, and the application is returning errors. We are investigating.

Report: "Performance degradation due to traffic spike"

Last update
resolved

Status: Resolved Incident Start: 11:13 AM EDT Incident End: 11:33 AM EDT Duration: 20 minutes Impact: During this incident, customers experienced increased latency, timeouts, and errors while accessing our services. This issue significantly affected user experience, hindering the ability to perform operations reliably and efficiently within our platform. Cause: The incident was caused by an unexpected surge in the flow of requests, which exceeded the anticipated thresholds. This sudden increase in demand throttled our application's performance and scaling capabilities. As a result, our services struggled to process incoming requests effectively, leading to the observed issues. Resolution: Our engineering team promptly responded by increasing number of application replicas. Next Steps: To prevent a recurrence of this incident, we are reviewing and adjusting our scaling strategies to ensure they can handle sudden spikes in demand more gracefully. Additionally, we are improving our monitoring systems to detect and respond to unusual patterns of activity more quickly. Our aim is to enhance the resilience and reliability of our services, minimising any impact on your experience.

Report: "Performance degradation due to network problems"

Last update
resolved

Incident Start: 05:01 EDT Incident End: 05:19 EDT Duration: 18 minutes Impact: During the incident period, customers experienced increased network latency, leading to timeouts and failures for requests. This impacted the responsiveness and availability of our services, causing inconvenience to our users. Cause: The root cause of the incident was identified as abnormal behavior of a system network component during a routine update procedure. This anomaly disrupted the normal operation of our network infrastructure, resulting in the issues. Resolution: Our engineering team quickly identified the problem. Once the updated was completed and components restarted problem was resolved. Next Steps: We are conducting a thorough review of our update procedures and tests to prevent similar incidents in the future.

Report: "Media Library and Quiz Management Pages Aren't Loading"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

We are aware of an issue with the media library and quiz management pages not loading. Learners can still access all media, and quizzes are working as intended. However, you will not be able to access these pages in the meantime. The issue has been identified, and a fix is being rolled out shortly.

Report: "Issues Processing Events"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are currently experiencing an issue that is causing events to be delayed. Learners are still able to access the application and take courses. Subsequent actions such as webhooks and communications will be impacted. We know what the problem is and are working on a solution. All events will be processed once this has been resolved.

Report: "Application Outage"

Last update
resolved

This incident has been resolved.

identified

We are aware and have identified an issue that is causing the application to sometimes fail. We are actively working to resolve the issue and expect to have this resolved shortly.

Report: "CDN Issues"

Last update
resolved

This incident has been resolved.

monitoring

Our sub-processor has implemented a fix and is now monitoring the results. We can see that content is loading expectedly again. We will also be monitoring this situation further.

identified

Our sub-processor has identified the issue and is now working on a resolution.

identified

We are still waiting on our sub-processor to identify and fix the issue. Courses still work, but course images, academy logos, and some other content will not load.

identified

One of our sub-processors is experiencing issues, and it is impacting our ability to serve content. We have contacted them and are awaiting a resolution.

Report: "Application Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently experiencing an application outage. Our team is currently investigating the issue.

Report: "Application Performance and Timeouts"

Last update
resolved

This incident has been resolved.

monitoring

The issue has been identified, and a change has been implemented. We are monitoring the situation.

investigating

We are investigating issues with sluggish performance and timeouts that may cause pages to fail.

Report: "Application Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are currently experiencing an application outage. Our team is currently fixing the issue.

Report: "Learner Login Issues"

Last update
resolved

This incident has been resolved.

monitoring

We have cleared the backlog of events and see improvements across all areas of the application. We will be monitoring to ensure everything is stable.

identified

The team has made progress and sees improvements, but there are still some impacted areas. We appreciate your patience and apologize for this issue.

identified

We have identified the issue. The team is working on a resolution. More updates to follow.

investigating

We are investigating why some learners are having issues logging into their academies.

Report: "Event Processing Delays"

Last update
resolved

This incident has been resolved.

identified

We have identified the issue. This issue is highly localized and most customers will not notice degraded performance.

investigating

We are currently investigating event processing delays. The application is operational, but you might notice increased lag times for webhooks and other background processes.

Report: "Image Loading Issue"

Last update
resolved

This incident has been resolved.

monitoring

Our provider has identified the root cause of the issue and is testing solutions.

monitoring

We have observed an issue with our CDN provider that's causing images to fail to load. This appears to be random and sometimes the images load properly. We will continue to monitor the situation and work with our provider.

Report: "Application Performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified, and a resolution is being put in place.

investigating

We are currently experiencing an issue that is impacting background processes. You may see a delay in background administrative tasks such as bulk actions, group enrollments, and other areas.

Report: "Application Outage"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

The Northpass application experienced an outage. The issue was identified and has been fixed. We are closely monitoring the situation.

Report: "Analytics Dashboards failing to update"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue. We will update you on our findings shortly.

investigating

We have identified that only Looker-based dashboards are impacted. We are still looking into this issue, and will update on our status shortly.

investigating

We have received reports that some analytics dashboards are failing to update. We are currently investigating this issue, and will update you on our findings shortly.

Report: "High Error rates"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

Hello, We are current investigating the high error rates occurring on our application. We will update you shortly.

Report: "Webhook Delays"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are investigating increased lag times in our eventing system. This is impacting webhooks and other services that rely on these events. Learners are not impacted.

Report: "Failed learner_completed_events webhooks"

Last update
resolved

All affected learner_created webhooks have been re-transmitted successfully.

monitoring

We've discovered that the learner_created_events webhooks were silently failing. The underlying cause has been addressed, and all webhooks are operating as intended. For all missed webhooks, the affected data is being re-transmitted to customer systems. We are continuing to monitor the situation.

Report: "Webhook Delays"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating increased lag times in our eventing system. This is impacting webhooks and other services that rely on these events. Learners are not impacted.

Report: "Webhook Delays"

Last update
resolved

This incident has been resolved.

monitoring

Our operations team has identified the issue and has made adjustments that appear to be improving the situation. The events are flowing through, but we expect it will be some time before the delays are fully cleared.

investigating

We are investigating increased lag times in our eventing system. This is impacting webhooks and other services that rely on these events. Learners are not impacted.

Report: "Webhooks are not being sent to certain customers"

Last update
resolved

The issue with webhooks has been resolved.

monitoring

A fix has been implemented at this time. We are currently monitoring the situation at this time.

identified

The issue has been identified at this time. We are implementing a fix to resolve this issue.

investigating

We are currently investigating an issue that is causing some customers not to receive webhooks.

Report: "Application Performance Issues"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating application performance issues.

Report: "Webhooks not sending"

Last update
resolved

Webhooks are fully operational.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating an issue with course completed webhooks sending.

Report: "Application Outage"

Last update
resolved

This incident has been resolved.

monitoring

Our infrastructure was able to auto-heal through the issue, but we are still monitoring the situation. We are operational at the moment.

investigating

The Northpass application is currently offline. Our engineers are actively investigating the issue.

Report: "Application Performance Issues"

Last update
resolved

The incident has been resolved.

monitoring

We addressed the problem and are monitoring the situation.

identified

The issue has been identified and a fix is being implemented.

investigating

Northpass is currently experiencing performance issues. You may notice the application working slower than usual.

Report: "Publishing Issues"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating an issue with Course Publishing that incorrectly shows the status of a published course. This only affects the administrator's experience. Learners will see the most recent published content.

Report: "Issues with Document Activities and Preview"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are now monitoring the results.

identified

We are aware of an issue that is impacting document-based activities and preview functionality. We know what the issue is and will have a solution in place shortly.

Report: "Increased Error Rate"

Last update
resolved

We have not seen errors for quite some time. This is resolved.

monitoring

We observed an increased error rate for about 20 minutes. The errors occurred due to failures connecting to our CDN, Amazon CloudFront. You may have noticed issues with SCORM based activities. The error rate is now within an acceptable threshold and we are actively monitoring the situation.

Report: "Academy Errors"

Last update
resolved

Academy sites experienced errors that resulted in a "Whoops, sorry about that." error message. The issue was due to a feature flag being prematurely turned on for all academies. The issue is resolved.

Report: "Application Outage"

Last update
resolved

At 1:37 PM EDT, there was an outage of the Northpass application. Our Operations team was notified and restored services by 1:57 PM EDT. We are investigating the root cause of the issue and will remediate accordingly.

Report: "Degraded App Performance"

Last update
resolved

The application's performance is stable.

monitoring

The health of our cluster is improving and performance is improving as well. We will continue to monitor the situation.

investigating

Our application is experiencing degraded performance and we are actively looking into the issue.

Report: "Application Outage"

Last update
resolved

This incident has been resolved.

monitoring

We've identified the issue and rectified it. We are now monitoring the application.

investigating

We are currently investigating this issue.

Report: "Unexpected Application Slowdown"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We're currently experiencing an unexpected application slowdown. We're looking into possible solutions and will update our systems as soon as possible.

Report: "Northpass platform not available for some users"

Last update
resolved

This incident is now resolved

identified

The issue has been identified and we are implementing the fix now. The issue is only affecting a subset of users.

investigating

We are aware that the Northpass platform is not reachable. We are investigating the issue.

Report: "Learning content inaccessible"

Last update
postmortem

# Current Status Operational # Summary Yesterday morning \(EST time\), we deployed a platform enhancement meant to improve security and the process of customizing a school's look and feel. Specifically, we wanted to ensure customers have easy access to the latest jQuery libraries without having to directly reference the library in their templates. However, this caused issues for customers using our Learning Experience v2 \(LXv2\) who referenced jQuery 2.x in their school’s customization code. # Impact Any customer utilizing our LXv2 who referenced jQuery 2.x in their school’s customization code experienced an issue where learners were unable to access course content. # Root Cause Prior to version 3.x, the jQuery library did not include one of the methods that our LXv2 relies on. This resulted in our LXv2 throwing a JS error, which in turn blocked any learners from accessing course content. # Trigger We regularly and frequently deploy to our production instance, so this error was introduced in our deployment that occurred at 12:20 UTC or 8:20am EST. # Resolution We reverted our latest deployment to return the application to stability at 14:50UTC or 10:50am EST. We also plan to put additional logging and monitoring in place to detect these types of issues earlier. # Corrective Actions We’re putting additional logging and monitoring solutions in place to automatically detect javascript issues like this ahead of time. As we’ve reverted our deployment to resolve this issue, we will refrain from deploying these jQuery-specific changes for a time being \(until this is closely monitored\).

resolved

Course content was inaccessible to all learners for certain Northpass customers. The issue lasted for about 2 hours starting at 8:30AM EST.

Report: "Major outage: Northpass application not reachable"

Last update
resolved

This incident has been resolved.

investigating

We are aware that the Northpass application is not available. We are currently investigating this issue.

Report: "Intermittent issues with acessing the Northpass Application"

Last update
resolved

This incident has been resolved

investigating

Northpass users could have been affected by an issue with Northpass application being unavailable. The issue has been resolved and we are now working on identifying the cause of it.