Historical record of incidents for Civitas Learning
Report: "Delayed Data Freshness"
Last updateOn Tuesday April 22nd we began to notice our monitors indicating a number of workflow failures. Initially our logs pointed to DNS failures with our cloud provider. Upon troubleshooting the root cause, we identified an unhealthy node in one of our clusters that was still accepting traffic but unable to properly process data since it was in a failure state. Remediation was to manually remove the unhealthy node from the cluster at which point the issue was resolved.
A fix has been implemented and we are monitoring the results. Data freshness is starting to come back within the 24-48 hour threshold.
We are continuing to work on a fix for this issue.
Some additional issues have been identified and we are working to resolve as quickly as possible
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being developed.
Report: "Degraded Performance in Degree Map"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented
Report: ""Send to Shopping Cart Unsuccessful" Error message"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
Students are receiving an error when attempting to send more than two courses to the cart when registering. We're currently investigating the issue.
Report: "Degraded Admin Login Performance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Workflows timing out"
Last updateWorkflows are performing as expected. The problem was identified as fault with a load balancer. The issue has been resolved and we've added additional safety measures to capture issues of this type in the future.
A fix has been implemented and we are monitoring the results. We will provide another update tomorrow after workflows complete their normal processing.
Fixes put in place have resulted in successful workflows for the majority of the extracts and builds. There are a few builds that are still in process or failed, and we are continuing our troubleshooting efforts to ensure all builds are successful. We will provide an update this afternoon on the status of the few builds that have not succeeded.
We are continuing to work on a fix for this issue.
We have partially resolved the issue and are continuing to work towards full resolution. We will provide another update once we have more details.
Our team continues to work on a fix. We will provide more updates when they become available.
We have identified an issue and currently working on a fix. We will provide another update as soon as we have more details.
We are continuing to investigate this issue and will provide another update soon.
We are currently investigating an issue with workflows timing out. We will provide an update as soon as we have more details.
Report: "Calendar Appointments Not Synching in Advising"
Last updateOur team found a bug in a module that prevented files from creating events to an external calendar. The bug has been permanently fixed and this incident resolved.
We are currently investigating an issue with calendar appointments not synching in Advising.
Report: "Advising Degraded Performance"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "College Scheduler experiencing intermittent slowness"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring results.
Report: "Workflow Updates"
Last updateThis incident is resolved and we expect workflows to run at the next scheduled time.
We are continuing to investigate this issue.
We are aware of and currently investigating an issue affecting workflows that updates Civitas applications.
Report: "Login Issues for Academic Planning"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Site Access Issues"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Workflow extracts with Canvas are experiencing issues"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented. The issue should resolve overnight and we expect normal operation tomorrow.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are currently investigating an issue with Canvas extracts not updating. We will provide another update as soon as more details are available.
Report: "Intermittent Site Access Issues"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results. We are seeing successful logins at this point.
We are currently investigating an error with College Scheduler site access. Some users are receiving an intermittent 504 error: "The request could not be satisfied" and some users are experiencing overall site slowness.
Report: "Intermittent issue receiving Inspire emails"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix was identified and and applied to resolve the issue. We're currently waiting for confirmation from those affected the issue is resolved. We will provide another update once we receive confirmation.
We are currently investigating an issue with Inspire Emails. Some institutions are receiving the following error: Delivery has failed to these recipients or groups: noreply-inspire@civitaslearning.com (noreply-inspire@civitaslearning.com) There's a problem with the recipient's mailbox. Please try resending your message. If the problem continues, please contact your email admin. Notifications are still available within the application.
Report: "Workflow failures"
Last updateWe monitored workflow activity for several days and all reported issues in this status page update are back to normal.
All workflow issues have been resolved. There are a couple of workflows that did not complete this morning and should complete with the normal run tomorrow morning. We will continue to monitor for any additional issues. We will send another update tomorrow.
We are continuing to monitor for any further issues.
A cursory check of the workflows this morning shows the majority successfully completed. We are still monitoring a few that are making progress and should complete later this morning. We will send another update as soon as we know the status of the running workflows.
We've identified a configuration change that should allow the workflows to successfully complete. Those changes are being implemented now and we will monitor the workflows closely over the next few days. We will provide another update tomorrow.
We are currently investigating an issue with AWS that is causing workflows to timeout. Due to this issue, you may not see updated data in your Civitas applications. We are working diligently with AWS to resolve this quickly, and will provide you with regular updates here on the Status Page. At this time, you do not need to open a case or do anything on your end.
Report: "Unable to update Log Outreach"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are currently investigating an issue where institutions are unable to fill out the log outreach form. We have received confirmation that outreaches are being sent, however, the form to log details are unable to be updated.
Report: "Intermittent Authentication Issues"
Last updateThe third party Auth0 has resolved their issues and all systems are showing as operational again.
Auth0 is currently experiencing intermittent issues which prevent login to applications. Please standby as we get more details from Auth0.
Report: "Authentication Issues"
Last updateThis incident is resolved.
The third-party Auth0 reported all issues are resolved and systems are back to normal. We will monitor the Auth0 site for the remainder of the day before closing this out.
Auth0 is currently experiencing issues and preventing login to applications. Please standby as we get more details from Auth0.
Report: "College Scheduler is Down - 504 Gateway Timeout"
Last updateThis incident is resolved.
A fix has been implemented and we are monitoring the results.
We are currently experiencing issues with OAuth logging in to College Scheduler. We are working to quickly resolve the issue and will send another update as soon as more details are available.
Report: "College Scheduler is Down"
Last updateWe are no longer seeing any issues and marking this incident as resolved.
We've identified the issue and resolved it. We are currently monitoring the servers and will notify you if any additional issues arise.
We are currently investigating an issue with College Scheduler. We are aware that customers are getting a "503 - Service Unavailable..." error when logging in or navigating. We will provide another update on the status page as soon as we have more details.
Report: "Civitas Workflow Issues"
Last updateA solution has been identified and is being applied now. All workflows will process tonight and data will be refreshed. Have a great weekend!
A solution has been identified and is being applied now. All workflows will process tonight and data will be refreshed. Have a great weekend!
We are aware of an issue with AWS that is causing workflows to fail; resulting in stale data. We have identified the issue and are currently working with them to resolve. There is no need to open a ticket at this time. We will provide daily updates here on the status page. We apologize for the inconvenience and hope to have this resolved soon.
Report: "Authentication Issues"
Last updateAuth0 is reporting all systems are fully functional. We apologize for any inconvenience this may have caused.
Auth0 has implemented a fix and will be monitoring their systems. We will monitor Auth0 and report any issues if they arise.
Our vendor Auth0 is having issues which are affecting our ability to connect. Once they resolve the issue on their end, we will be back up and running. We will provide you with updates as more information is available.
We are currently investigating an infrastructure issue that may affect all applications. We will provide you with an update as soon as more details are available.
Report: "Sporadic slowness with Inspire Advising Notes"
Last updateAll systems are operating at normal capacity now. We will continue monitoring the systems and do not expect any additional concerns related to this incident. We apologize for any inconvenience this may have caused.
We identified a resource issue and added a new node. All systems are back up and we will monitor the system for slowness. We will provide another update tomorrow.
We are currently investigating an issue with sporadic slowness when accessing and downloading Advising Notes. We will provide updates as more information is available.
Report: "Inspire Intermittent Outage"
Last updateWe identified a database query that was slowing down the system on Wednesday. That query was tuned and performance turned back to normal that day. We monitored system performance on Thursday and did not encounter any issues. We apologize for any inconvenience you experienced on Wednesday and do not expect any additional performance issues.
All systems appear to be running at 100%. We will continue monitoring overnight and provide an update tomorrow morning.
The issue has been identified and we are up to 80% service restored. We will continue to work on this until we are back to full 100% service. We will provide you with another update as more information is available.
We are investigating an intermittent slowness issue with Inspire when logging in and accessing certain pages. We will provide a status update as soon as possible.
Report: "Inspire Outage"
Last updateThis incident has been resolved. We experienced a database spike in usage and upgraded our database instances to handle the increased load. We apologize for any problems this may have caused.
The issue has been identified and we are working on a fix.
We are currently investigating an issue with Inspire where users are unable to login. We will provide a status update as soon as possible.
Report: "Increase 500 Errors on CollegeScheduler.com Preventing Login"
Last updateThis issue has been fully resolved and the HTTP 500 Timeouts experienced previously (between approximately 8:00 am and 12:19 pm Pacific time) on login should no longer be occurring. We are continuing to closely monitor application health and to do a full post-mortem of the incident internally.
We have restored service to College Scheduler after a major outage on the AWS infrastructure took down a key system. Still, we are continuing to monitor the infrastructure as there remain reports of issues on the AWS Cloud. Systems are currently operating normally and we will be working to further analyze the impact of this outage to determine if any part of it could have been avoided.
We are continuing to investigate the issue and making efforts to recover.
We are currently investigating the impacts of a broader AWS outage on College Scheduler.
Report: "Increase Load Times/Errors When Logging In"
Last update**What Happened** At approximately 7:25am Pacific time on Nov 2, 2020, the College Scheduler application experienced a spike in traffic correlated to the opening of Spring 2021 Registration. The application handled the traffic well until a sharp increase in volume within a 20-30 second span resulted in timeouts and subsequently HTTP errors \(specifically 502 and 504 errors\) to be displayed to student and staff users. This made College Scheduler slow to respond and soon after completely unavailable. Unfortunately this lasted for a period of approximately 6.5 hours as we worked to bring the system back online. **Why Did it Fail and Why Did Last So Long** The service responsible for authenticating users into the College Scheduler application became unresponsive during this event. Newly added service instances \(identical virtual machines\) were immediately deemed unhealthy by automated health checks, blocking our efforts to remedy the problem. Meanwhile, whereas a typical spike in login activity typically subsides as valid login “tokens” are issued and then reused in subsequent requests, this event resulted in a sustained level of invalid \(expired tokens\) login activity that compounded the number of requests our authentication servers were handling. No amount of effort to add resources could cope with the volume of queued requests that needed to be processed. **What Existing Process or Safeguards Failed** College Scheduler is heavily load-tested using real world data and an exact replica of production. This system is designed for burst loads of traffic, which we encounter every day during registration periods and throughout the year \(e.g. during freshman orientation events\). An example of such a burst is shown below, where between 6:58am and 7:00am the application experienced an almost immediate 400% increase in traffic.  With nearly 300 customers, it would be very challenging to track anticipated registration / enrollment dates for each institution and schedule around these bursts. As such, our approach is to replicate these burst registration conditions using our loadtest process in a controlled environment and scale for these events automatically. These tests have led to dozens of learnings and resolved many issues previously encountered during normal use of College Scheduler. Still, our load tests are historically modeled on the assumption that login requests succeed and that subsequent activity is properly authenticated. The Nov 2 event, however, resulted in a large number of _failed_ login events, in very quick succession. This illustrated that the application, in this instance, was susceptible to prolonged wait times which resulted rapid thread exhaustion. This was compounded throughout the day as inbound traffic to College Scheduler remained heavy. **Changes We Made to Fix** Once it was clear that the scaling procedures in place were not effective, we put the application into maintenance mode to try and stem the tide of incoming requests to the service. The maintenance page was in place for 14 minutes, which allowed enough time for previously allocated service instances to become healthy. However, after the maintenance page was lifted, the application became unstable again within 18 minutes. At this time, a change was made to expand resources approximately 7 times levels our highest anticipated levels of capacity. After this service allocation, the system stabilized and remained healthy while traffic levels subsided. **What We Are Doing to Correct the Root Cause** The College Scheduler system is being enhanced to more appropriately scale to the extreme levels of student traffic we see in registration periods. Specifically, our system is being tuned to ensure a more rapid scaling speed combined with more proactive pre-allocation of resources leading up to registration season. We have already replicated the Nov 2 issue in a controlled environment and are working through our autoscaling improvements now. In the interim, our infrastructure has remained purposefully over-allocated to greatly exceed the capacity needed to respond to these spikes as registration season continues. We deeply regret the downtime that occurred on Nov 2 and have been working diligently to understand the issue deeply and to further understand why our load testing and production systems were not tuned for this event. Rest assured we are striving to be better and to make sure that you can trust Civitas Learning with your next registration period.
Systems are fully operational. We continue to analyze the event as well as monitor for any further degradation of performance or increased error rates.
We continue to see degraded system response times and intermittent 502 and 504 responses. The system is scaling to handle the additional traffic in an attempt to reduce response times. Latency and error rates remain above acceptable thresholds and we will continue to modify the infrastructure until these numbers improve.
We are continuing to monitor for any further issues.
We are continuing to monitor for any further issues.
The application remains unstable and we are continuing to monitor for any further issues.
The application is back up and all sites are loading. We are continuing to monitor for issues as more users come back into the application. We have implemented more detailed logging to further enable our diagnosis of the root cause for the earlier outage. A full post mortem of the incident will come later today or tomorrow after full analysis.
A fix is in place for the current issue and we are actively bringing the application out of maintenance.
We are continuing to investigate this issue.
We continue to diagnose and make changes to resolve this issue. System is recovering but we are still seeing high volume of timeouts. We will update with more information as soon as we have it.
We are experiencing an increase in load times coinciding with gateway time-out errors on the College Scheduler system. The team is aware of and actively working to resolve the issue.
Report: "Degraded performance for some applications"
Last updateEarlier today we notified you that we were experiencing a delay with outbound emails sent from IFA, IFF, and Illume based on an issue related to our third-party email vendor. This issue is now resolved. All outbound emails were queued and are automatically being sent. We apologize for any inconvenience you experienced during this time.
We are continuing to investigate this issue.
We are experiencing a delay in outbound emails sent this morning from IFA, IFF and Illume based on an issue related to our third-party email vendor. We contacted their support team and they are working on the issue. While this is not impacting all emails, some emails sent between 10:45 pm CT last night and now have been queued and will be sent automatically once this issue is resolved. We apologize for any inconvenience and will reach back out today with an update.
Report: "Degree Map: Slow Loading"
Last updateWe have confirmed that the issue is resolved and normal performance has been restored
At 7:00 PM on August 26th the fix was in place. The capacity was increased on that service. We began monitoring.
At 06:15 PM Central on August 26th an issue was discovered.
Our engineering team is still investigating. At current, the issue seems to be related to a large volume of inteactions with our database. The team is dedicated to resolving the issue and we are trying a number of different techniques to work around the issue. We Sincerely apologize for the delays.
We are currently experiencing slow loading times for Degree Map. Our engineers are investigating the problem.
Report: "Increase Wait Times and Errors on Login"
Last updateThis incident has been resolved.
Between 12:23pm and 12:41pm Pacific Daylight Time (PDT), College Scheduler experienced an increased error rate on a key piece of infrastructure responsible for logging users into the application. A critical service quickly became low on memory and caused increased wait times for the login process, ultimately resulting in timeouts and HTTP 500 errors. This is issue was caught as it happened and system capacities were quickly increased to support the increase load, after which errors began to clear and the health of the system was restored within 18 minutes. We continue to monitor this and all systems during this period of increased registration activity for all of our customers.
Report: "Authentication System Errors Preventing Login to College Scheduler"
Last updateAt approximately 9:29am Pacific time, a change to the College Scheduler infrastructure resulted in system errors during the student login process. This was immediately identified by system alarms and a rollback of the change was initiated. This rollback was successful and the system health was restored by 9:37am Pacific time. All systems are performing normally and our incident response process will be initiating a post-mortem later today.
Report: "Partial Disruption of Service"
Last updateThis has now been resolved.
Auth0 has reported that they are monitoring the status. It looks like the problem has been resolved.
We are currently experiencing a partial disruption of service because of an outage with one of our authentication vendors. You may be experiencing this issue if you are using Auth0 for authentication.
Report: "Degree Map: Degree Audit Refresh intermittent errors."
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
Currently we are experiencing intermittent issues with refreshing the degree audit information within Degree Map. We are investigating the issue.
Report: "Progress Calculations in Explore Degrees"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We're investigating reports of issues with missing progress calculations in Explore Degrees section of Degree Map.
Report: "College Scheduler - Partial Outage"
Last updateThis incident has been resolved.
Issues with customers unable to access College Scheduler have been resolved but we are continuing to monitor the situation and make necessary changes as applicable as registration activity ramps up.
Issues with customers unable to access College Scheduler have been resolved but we are continuing to monitor the situation as registration activity ramps up.
We are experiencing issues with customers being unable to access College Scheduler or receiving timeouts. We are aware of the issue and are actively working to address the issue.
Report: "College Scheduler Login Service Degradation"
Last updateAt approximately 8:30am Pacific time and ending at approximately 8:40am Pacific, a key login service began to experience latency and eventually began rejecting requests, causing problems logging into the College Scheduler application. This was caused by an under-resourced authentication instance, which is used during the login process. The under-resourcing issue has been corrected and College Scheduler is now fully operational.
Report: "Degree Map student profiles failing to load"
Last updateDegree Map is undergoing technical improvements to provide a better planning experience. Alerts for prerequisites will temporarily be removed while we complete this work. Please reach out to your advisor if you have questions or concerns.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
We've identified the root cause of loading errors in Degree Map. We're preparing a fix now.
We are experiencing sporadic errors in loading profile pages for students in Degree Map.
Report: "Degree Map student profile page load errors."
Last updateThe load is working for most sites but users may experience some slowness.
We have identified the problem and are monitoring the issue. The system should be in recovery.
We are experiencing errors in loading profile pages for students in Degree Map.
Report: "Certificate Update"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
We've put a fix in place and we are pushing it out to all the sites.
We are currently experience an issue with an expired certificate. Our operations team is investigating the issue and working on a fix.
Report: "Intermittent authentication errors"
Last updateThis incident has been resolved. We are continuing to monitor systems.
We are continuing to monitor for any further issues.
We are continuing to monitor for any further issues.
We are continuing to monitor for any further issues.
Some institutions who use SSO to access our products will experience difficulties logging in. We will continue to monitor and report back once the problem has been solved.
Report: "College Scheduler is experiencing intermittent performance degradation"
Last updateThe issue has been resolved.
We have verified that the problem has been corrected and we are going to continue to monitor.
We have identified the issue and are rolling out the fix. It may take a short time for the fix to be alleviate the issues in the application.
Currently College Scheduler is experiencing intermittent performance degradation. We are investigating the issue and will let you know when the status changes.
Report: "Partial outage for Degree Map"
Last updateThe issue has been resolved.
We are continuing to monitor for any further issues.
We have tested the sites and the issue seems to be resolved. Please contact support if you continue to experience the issue.
We are continuing to work on a fix for this issue.
We've identified the root cause and are working to push out the fix. The issue should be resolved momentarily.
There is an issue with our infrastructure that is preventing us from finding students in the Degree Map application. Our engineering team is looking into the issue.
Report: "AdviseStream is considering a Production outage"
Last updateWe have identified and resolved the cause of the production outage and have resolved the issue, we will continue to monitor the situation.
We are currently investigating the nature of the production outage affecting AdviseStream partners. We do not yet know the root cause of the outage.
Report: "Single Sign On issues."
Last updateWe have confirmed that the issue has been resolved and that LTI SSO is now working.
We are continuing to monitor for any further issues.
We have just pushed out a change that seems to be having positive results. We will continue monitoring.
An update to the code has been developed and we will be pushing the change into the system shortly.
We've identified the root cause of the issue and are working on developing the solution.
We are currently experiencing issues with our Single Sign On for Inspire for Faculty users. We are in the process of investigating and as of yet no root cause has been identified.
Report: "Internal System is down"
Last updateOne of our internal servers was down this morning for 38 min between 10:15 AM Central and 10:48 AM Central. This may have affected your ability to log in to the affected applications.
Report: "Degraded perfomance"
Last updateAn internal DNS service failed causing clients to fall back on secondary DNS servers. This prevented clients from caching DNS requests and thus added additional overhead to service calls. The primary DNS service has been resolved. We're continuing to monitor health and status of products and services.
We are currently experiencing degraded performance across our product suite. We are currently investigating the problems. Please continue to check here for updates.
Report: "Intermittent product authentication failures"
Last updateThe underlying issues have been resolved and service has been restored. Our third party authentication provider experienced issues in their service that took longer to resolve than expected.
We are currently experiencing intermittent failures for product authentication. You may experience delays or restrictions on logging in. We are working to resolve the issue and will update this page as progress is made.
Report: "Temporary Unstable Performance"
Last updateThe AWS services are back to normal so the intermittent failures in the Civitas Apps should now have recovered.
We've taken steps to re-route services to alleviate the problems on all but the Degree Map products.
The underlying amazon web service used to serve Civitas application static assets (css, js, html) is currently experiencing latency. Because of this, the apps (Illume, Inspire, Degree Map, etc) may be exhibit intermittent degraded performance. The team is aware and we're currently working on a fix.