Thought Industries

Is Thought Industries Down Right Now? Check if there is a current outage ongoing.

Thought Industries is currently Operational

Last checked from Thought Industries's official status page

Historical record of incidents for Thought Industries

Report: "Content Issues"

Last update
identified

We are continuing to work on a fix for this issue.

identified

We have detected an issue in course content and test behavior - a fix is being worked on and will be deployed shortly.

Report: "Intermittent DNS Issue"

Last update
resolved

Between 10:00 AM and 4:30 PM PST an intermittent, internal DNS issue resulted in elevated error rates and increased load times across the platform - this issue has been resolved. We will be closely monitoring the platform to ensure no further functionality is affected.

Report: "Service Disruption (US)"

Last update
resolved

Between 11:19 and 11:31 AM EDT we experienced intermittent service disruption. We are investigating the cause of the incident.

Report: "Service Disruption / DDoS (US)"

Last update
resolved

Between 6:02 and 6:12 AM EST an on-going external vulnerability scan / DDoS attack impacted the US platform and resulted in intermittent availability issues during the stated 10-minute window. The internal team and our automated infrastructure provided a quick resolution to this issue and infra team will continue to review and improve our systems to ensure service disruptions are minimized.

Report: "Possible Login Issues with SSO"

Last update
resolved

Due to an issue identified with a recent release, we had to revert a fix for metadata not reflecting in custom domain that caused issues with cross domain in SSO. This release caused login issues with a small subset of clients using Single Sign-On (SSO). Our team actively reverted and the issue is now resolved.

Report: "Elevated load times"

Last update
postmortem

Between 8:15 AM EDT and 11:00 AM EDT the platform experienced significantly elevated response time in both the EU and US. The root cause of this outage was determined to be a routine security upgrade of an external dependency, leading to high CPU on our application servers. Despite auto-scaling due to increased load, we did not see a satisfactory reduction in response time as expected. Reverting the dependency upgrade led to an immediate return to expected response times.

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

We are continuing to work on a fix for this issue.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating elevated load times in the US.

Report: "Rustici SCORM Outage (US)"

Last update
postmortem

Between 5:45 AM PDT and 7:45 AM PDT the SCORM Rustici service experienced a minor increase in error rates, followed by a more severe outage between 7:45 AM PDT and 8:45 AM PDT, after which point service was restored. The root cause of this outage was determined to be a misconfiguration in the internal load balancer, which resulted in general Rustici traffic routing to a single node and degraded performance when traffic exceeded a critical threshold. The infrastructure team has applied a fix as of the resolution of this outage and confirmed that traffic is correctly routing to all available nodes.

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are investigating high loading times on Rustici SCORM launches in the US.

Report: "Looker Instability (US)"

Last update
resolved

This issue has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Looker functionality has been intermittently unavailable for short periods of time and we are investigating the cause.

Report: "Reporting Outage (US)"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Reporting Delay (EU)"

Last update
resolved

The EU region experienced reporting delays due to a faulty pipeline worker. The infrastructure team has resolved the issue and is re-triggering reporting table builds to ensure all reporting is up-to-date.

Report: "Helium Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

A vendor has identified an internal bug with their workers and we are working with them to resolve the issue.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Reporting Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating increased error rates in looker-based reporting.

Report: "Rustici SCORM Outage"

Last update
resolved

Between 4:25 PM EDT and 5:15 PM EDT the Rustici SCORM service experienced increased error rates, disrupting service for certain customers. As of the resolution of this service interruption the underlaying issue has been diagnosed and a fix has been applied. While no further issues are expected, the platform team is closely monitoring the Rustici application to ensure continued service availability.

Report: "AWS Firehose Outage"

Last update
postmortem

Between 2024-07-30 21:45 UTC and 2024-07-31 4:10 UTC AWS experienced an internal outage which impacted several delivery streams for activity-related reporting on the Thought Industries platform. Despite the failure, posted data was successfully stored in S3 for future ingestion and was successfully re-processed at the conclusion of the outage - there should be no lasting inaccuracies in reporting as a result of this incident.

resolved

AWS has resolved the outage. Accuracy of some activity-based reporting will be impacted for the active period of the outage.

monitoring

AWS is experiencing increased error rates on their Firehose service which impacts some specific activity tracking & reporting on the platform. We will be carefully monitoring the situation until resolution. See AWS's status page for details: https://health.aws.amazon.com/health/status

Report: "US Rustici Outage"

Last update
resolved

Between 6:30 AM PST and 8:00 AM PST the US Rustici cluster experienced increased error rates, impacting the launch and progress of Rustici-hosted SCORM courses for certain users on the learner platform. Due to the intermittent nature of the issue the initial alarms were unvalidated and the issue persisted until confirmation was achieved and service was restored shortly thereafter.

Report: "EU Redshift"

Last update
resolved

On June 15, 2024, 4:00 AM UTC, following an automated AWS update on the production EU reporting database cluster, the ETL experienced intermittent errors resulting in delayed and/or inaccurate reporting results until the issue was fully resolved on June 17, 2024, at 9:00 AM UTC. No further impact should be present as of the resolution of the issue, and the infrastructure team is currently focused on implementing and deploying fixes to ensure this issue does not reoccur.

Report: "EU Reporting"

Last update
resolved

On June 11, 2024, 11:35 AM UTC the EU reporting database experienced intermittent issues in data ingress, resulting in varied delays in reporting accuracy for EU customers until June 11, 2024, 3:25 PM UTC, after which data flow was confirmed as restored. The root cause of this incident was determined to be a faulty state in the internal jobs system which caused ETL jobs to hang without completing or exiting properly, blocking future job runs. No data inconsistencies should be present as of the resolution of this issue, and our infrastructure team is implementing several solutions to improve data availability and more accurately detect and escalate any future issues.

Report: "We are investigating platform instability in the US region."

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "US Instability"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor as availability is restored.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are investigating platform instability in the US region.

Report: "Rustici Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have identified an outage on the Rustici platform and are working on restoring service.

Report: "Rustici Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We have identified an outage on the Rustici platform and are working on restoring service.

Report: "Reporting Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Search Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We have identified an issue in our search functionality and are investigating the cause.

Report: "EU Region Read-Only Mode"

Last update
resolved

This incident has been resolved.

monitoring

The EU-region platform entered read-only mode during a scheduled zero-downtime maintenance - we have detected and resolved the issue and are now monitoring.

Report: "Looker Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We've identified an outage on looker-based reporting and are working to resolve the issue.

Report: "Rustici Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

We have identified an outage on the Rustici platform and are working on restoring service.

Report: "US Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "AWS Lambda Outage"

Last update
resolved

This incident has been resolved.

monitoring

Amazon Web Services is experiencing a region-wide outage which is impacting new certificate generation. AWS has already identified the issue and we are awaiting resolution. Note: Certificates may still be granted, but viewing certificate PDFs is impacted for newly granted certificates or sites with 'Always Regenerate Certificates' enabled.

Report: "Increased Latency"

Last update
resolved

This incident has been resolved.

investigating

We have detected increased latency for a subset of customers and are investigating the cause.

Report: "Rustici Cloudfront Intermittent Outage"

Last update
resolved

Between May 30, 2023 and May 31, 2023, we experienced some intermittent errors due to the application not able to connect to 3rd party services, specifically Rustici. We were able to bypass the limitations and restore access. The services will continue to be monitored for any additional edge cases.

Report: "Identified issue with Assessment Engine [US]"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have identified an issue with our assessment engine which may cause learners to experience issues completing assessments in a single cluster within the US region. We are urgently working to resolve the issue.

Report: "System Responsiveness Being Investigated"

Last update
resolved

The site slowness and responsiveness of the application has been resolved. An RCA will be made available upon request within 7 business days. Please submit a support ticket to request this RCA if necessary.

investigating

We have identified an issue which causing a slowness to the responsiveness to the application. We are urgently working to resolve the issue.

Report: "504 Errors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "503 Server Errors"

Last update
resolved

This incident has been resolved. A full RCA will be available in the next 7 days and will be available upon request. Thank you all for your patience and sorry for any inconvenience caused.

investigating

We're investigating reports around issues with 503 Server Errors across all sites in US. We are currently investigating and will provide updates as soon as we have them.

Report: "[US] Issue with All Sites Being Investigated"

Last update
resolved

We have resolved issues with all 503 Server Errors and can confirm full functionality has now been restored. A full RCA will be available in the next 7 days and will be available upon request. Thank you all for your patience and sorry for any inconvenience caused.

monitoring

Another fix has been implemented around 503 Server Errors and we're monitoring the results.

investigating

We are still seeing issues with with US Sites and 503 Errors. We are continuing to work on a fix for this issue.

monitoring

A fix has been implemented for 503 Errors and we're monitoring the results. Full Resolution Expected Shortly.

identified

We have identified the issue and are working to resolve this A.S.A.P.

investigating

We are continuing to investigate this issue.

investigating

We're investigating reports around issues with 503 Server Errors across all sites in US. We are currently investigating and will provide updates as soon as we have them.

Report: "SCORM Rustici Down for US Instances"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Our engineering team has identified an issue with SCORM Rustici as it is not currently working in all US instances. They are actively investigating and seeking to get this resolved as soon as possible

Report: "Salesforce & BI Connector Sync Issues & Ecommerce Purchase Reporting Issue"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Our engineering and development team is actively investigating an issue where they have identified and confirmed that all syncs (Salesforce & BI) have stopped running. Additionally, eCommerce reporting is not currently populating. This is only for US based instances. We want to apologize for the inconvenience to you and will keep you informed as our team is looking to resolve this as soon as possible!

Report: "Intermittent 500 / Not Found Errors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

Our engineering and development team are seeing intermittent 500/not found errors within the Thought Industries Platform. They are investigating this and looking to resolve this as soon as possible.

Report: "Issues with Logging In to Sites on US Platform"

Last update
resolved

The infrastructure team identified an issue with logging into sites for US clients on Monday, February 27th which has been resolved. This was caused by a maintenance release which resulted in the release being rolled back. A timeline is provided below: US Platform Issues identified: 12:47 pm EST US Platform Issues resolved: 2:07 pm EST

Report: "US Platform Outage Identified & Resolved"

Last update
resolved

The infrastructure team identified a platform wide outage for US clients on Tues 12/27 that was quickly resolved. Timeline provided below: US Platform Outage identified: 7:35 pm EST US Platform Outage resolved: 7:52 pm EST

Report: "Issue with Manual Login"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We're investigating some reports around issues with manual logins for all US instances. We are currently investigating and will provide updates as soon as we have them.

investigating

We're investigating some reports around issues with manual logins for all US instances. We are currently investigating and will provide updates as soon as we have them.

Report: "SCORM Tenant Error"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we're currently monitoring the results.

identified

We have identified the issue with errors related to SCORM files and are working to resolve this A.S.A.P. We apologize for any inconvenience this may cause.

Report: "Isolated Database Cluster Issue / System Outage"

Last update
resolved

Summary Starting on February 5th and ending on February 14th, an isolated database cluster experienced periodically increased query times resulting in intermittent outages for relevant instances. A separate, half-hour platform-wide outage occurred during instance migration procedures on the 14th. Service was ultimately fully restored on February 14th where the affected instance was isolated from other customers. Timeline Original Cluster 2022-02-05 - 99.97% availability, first related outage event. 2022-02-06 - 99.99% availability. 2022-02-07 - 99.98% availability, further outage events begin. 2022-02-08 - 99.86% availability. 2022-02-09 - 99.78% availability. 2022-02-10 - 99.94% availability. 2022-02-11 - 99.86% availability. 2022-02-12 - 99.92% availability. 2022-02-13 - 99.71% availability, instances are migrated to a new cluster. 2022-02-14 - 99.99% availability, brief outage, resolution achieved for this cluster. New Cluster (created 2022-02-13) 2022-02-13 - 99.99% availability, cluster is created and populated. 2022-02-14 - 99.72% availability, outage follows a migrated instance & said instance is isolated. 2022-02-15 - 99.99% availability, resolution achieved for this cluster. Root Cause The root cause of the primary outage was determined to be the unique usage of the platform by an individual instance which resulted in slow and blocking queries against the database cluster. These problematic queries triggered cascading slowdowns and resulted in outages for instances hosted on the isolated database. The secondary outage was caused due to an incorrect migration configuration. Action Items / Response Thought Industries is dedicated to providing a reliable and available platform, and we are deeply aware how an outage can affect our customers and their clients. We are determined to prevent the recurrence of this outage and have implemented the following action items: - Affected database tables have been further optimized. - The impacted instances have been distributed across several isolated clusters. The primarily impacted instance is isolated on its own isolated cluster. Future distributions are planned to further improve the distribution of instances. Migration procedures have been adjusted to ensure misconfiguration does not occur. - Several projects are being actively worked on to improve platform performance and stability. We apologize for any inconvenience caused by this outage. We will continue to closely monitor affected schools to ensure that resolution has been achieved and will continue to work around the clock to ensure that the Thought Industries platform is reliable and available for everyone.

Report: "503 Errors"

Last update
resolved

Thought Industries experienced a brief period of 503 errors occurring which impacted our customers ability to access their sites. A fix for this issue was released by our Infrastructure team and is now considered resolved. More information will be made available over the coming days.

Report: "Intermittent Site Slowness"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating slowness across one of our servers which may be affecting a small subset of customers. We apologize for any inconvenience this may cause, and will continue to provide updates as soon as we have them.

Report: "Rustici (SCORM)"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We're investigating an issue around Rustici (SCORM) that has impacted some of our customers ability to view SCORM content. We are currently investigating and will provide updates as soon as we have them.

Report: "Wistia outage impacting video uploads"

Last update
resolved

We have resolved the issue that was impacting video uploads and can confirm full functionality has now been restored.

monitoring

We've identified the source of this issue as an error with our video hosting provider Wistia. We are monitoring their status page (https://status.wistia.com/) and we hope to have restored functionality soon.

Report: "Reporting: Issues with loading Reports"

Last update
resolved

This incident has been resolved.

identified

Here is an update from Amazon Web Services (AWS) related to our previously reported issue with reporting data not loading. AWS Update: We continue to work towards full recovery of Redshift clusters in the USE1-AZ4 Availability Zone. Complete recovery will likely be reliant on the full recovery for the EC2 / EBS issue being tracked on the Service Health Dashboard located here: https://status.aws.amazon.com/ We will continue to provide updates as soon as we have them.

monitoring

We have received reports of an issue with reporting data not loading. This is the result a current issue w/ our upstream provider Amazon Web Services that we are monitoring for an update. We will provide updates as soon as we have them.

Report: "Intermittent Site Slowness"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented for issue regarding site slowness and we're monitoring the results.

investigating

We're investigating some reports around issues with site slowness. We are currently investigating and will provide updates as soon as we have them.

Report: "Intermittent Site Slowness"

Last update
resolved

We have resolved the issue regarding site slowness and can confirm full functionality has now been restored.

investigating

We're investigating some reports around issues with site slowness for a small number of our customers. We are currently investigating and will provide updates as soon as we have them.

Report: "Intermittent Site Slowness"

Last update
resolved

We have resolved the issue regarding site slowness and can confirm full functionality has now been restored.

monitoring

A fix has been implemented and we are now monitoring the results.

investigating

We're investigating some reports around issues with site slowness for a small number of our customers. We are currently investigating and will provide updates as soon as we have them.