Limble

Is Limble Down Right Now? Check if there is a current outage ongoing.

Limble is currently Operational

Last checked from Limble's official status page

Historical record of incidents for Limble

Report: "Some customers are reporting issues re-assigning tasks and other functionality"

Last update
investigating

We are currently investigating this issue.

Report: "Customers are reporting issues with load times"

Last update
resolved

This incident has been resolved.

monitoring

All functionality has been restored. We are continuing to monitor.

monitoring

We are continuing to investigate and monitor issues with the following: - Creating and managing assets - Managing Vendors - Purchase order management - Some graph widgets are failing to load

monitoring

A partial fix has been deployed. We are continuing to investigate issues with creating assets, reassigning tasks, and managing vendors.

monitoring

A partial fix has been deployed. We are continuing to investigate issues with creating assets and managing vendors.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Some customers reporting intermittent issues using the Limble application"

Last update
resolved

This incident has been resolved.

investigating

Some customers are intermittently experiencing slowness, errors, or pages failing to load on the Limble application

Report: "Loading Issues in Limble"

Last update
resolved

This incident has been resolved

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being applied

investigating

We are continuing to investigate this issue.

investigating

We are investigating reports of loading issues across the app, slow connections and white screens.

Report: "Intermittent issues with new PMs and Cycle Counts launching at scheduled times"

Last update
postmortem

**Date:** March 12, 2025 **Status:** Resolved **Impacted Region\(s\) or Services:** Limble CMMS Web Application **Note:** All times are in Mountain Daylight Time \(MDT\) ## Summary On March 12 at approximately 5 AM we identified that our backend service responsible for generating scheduled tasks had failed to initialize. Further investigation found that the scheduler had been offline since the previous evening at 5:30 PM due to a container image unexpectedly expiring. After initiating our incident response plan, engineers discovered the cause to be an expired container image. Manual steps were taken to update the container image and restart the scheduled task. Further steps were taken to implement detailed monitoring to alert if another imagine expires in the future. ## Impact Scheduled PMs and Cycle Counts experienced a temporary delay of approximately 14 hours, from 5:30 PM on March 11th to 9:00 AM on March 12th, the service was successfully restored, and all scheduled tasks resumed normal operation. ## Root Cause The scheduler container image failed to propagate during a deployment. Consequently, the scheduler attempted to run with an expired image, which resulted in a service startup failure. ## Resolution and Improvements Engineers were quickly able to identify the root cause and its relation to the missing container image. A rollback of the deployment was initiated, which resolved the issue. To prevent future occurrences, the following improvements have been implemented: * **Enhanced Monitoring:** We have deployed detailed monitoring systems specifically designed to detect and alert us to any future container image expirations, ensuring proactive intervention. ## Timeline of Events * 3/12/2025 at 4:59 AM: Customer reports that scheduled items did not send as expected * 7:30 AM: Incident is declared and engineering team is alerted * 8:15 AM: Container image was updated * 8:45 AM: Process was manually re-run * 9:15 AM: All processes re-ran successfully ## **Key Points** * No loss of customer data

resolved

The incident has been resolved. A post-mortem will follow.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

Some customers are experiencing issues with new PMs and Cycle Counts launching at their scheduled times.

Report: "Some users are reporting issues with Logging on"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Service Outage"

Last update
resolved

This incident has been resolved.

identified

We have resolved the issue, and will continue monitoring.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Issues with creating Tasks, Parts, Assets"

Last update
postmortem

**Date:** November 12, 2024 **Status:** Resolved ## Summary On November 12, 2024, the application experienced a rolling service disruption that intermittently impacted our customers when attempting to create new items in our application, such as tasks, assets, and parts. The incident was caused by a database migration deployed into production, which overloaded our databases and caused delays in propagating requests. Immediate action was taken to terminate the migration and restore service to customers. to prevent a recurrence, new deploy procedures and monitors are being implemented. ## Impact For a period of approximately 3 hours, customers intermittently encountered delays or failures when attempting to create items, such as tasks, assets, and parts in our application. In some cases, the action appeared to fail, but the items were successfully created, appearing in the application after some delay. ## Root Cause The incident was caused by a database migration which overloaded our production database. Overloading of the database directly led to a increase in ‘replication lag’. When this metric exceeded 1 second, our applications’ workflows began failing or timing out. ## Resolution and Improvements Once discovered the offending database migration was immediately terminated, restoring service to all customers. Next, that migration was corrected by our Engineers, thoroughly tested using improved protocols, and re-executed without a recurrence of service disruption. Additionally, the following improvements will be implemented: * Monitoring and Alerting Improvements * Stricter requirements in testing of all migrations using a near production sandbox database * High-risk migrations will be executed during planned maintenance windows ###  Timeline of Events * 11:29 AM MST: Database migration initiated. * 12:03 PM MST: Customers begin reporting disruptions. * 12:30 PM MST: Investigation and communication initiated. * 12:57 PM MST: Incident is declared. * 2:15 PM MST: Root cause identified and solution identified. * 2:22 PM MST: Solution implemented and verified in production. * 2:37 PM MST: Incident resolved following further monitoring. ###  Key Points * No loss of our customers' historical data. * Not all customers were impacted at the same time. This was a rolling disruption.

resolved

This incident is now resolved.

monitoring

A fix has been implemented and we are monitoring results

investigating

We have identified the issue and have taken steps toward a fix.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Some customers having issues logging in"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Some Customers unable to access the application"

Last update
resolved

Customers are reporting a blank screen when logging in.

Report: "Some customers not able to access the application"

Last update
resolved

Issue Resolved

Report: "We are aware of an issue that's causing some customers to not be able to access the application"

Last update
postmortem

**Date:** May 14, 2024 **Status:** Resolved **Summary** On May 14, 2024, our service experienced a downtime event due to multiple frontend versions being served concurrently. This incident was caused by a discrepancy in our build process that led to different JavaScript chunks being deployed despite using the same Git commit. Immediate actions were taken to redeploy the code and resolve the issue, and measures have been implemented to prevent recurrence. **Impact** The downtime affected all users attempting to access our site, resulting in an inability to load the app. This incident primarily impacted customers on the main version of the app, while those on Canary were unaffected. **Root Causes** The incident was triggered by a container restart that resulted in different JavaScript chunks being served. Our build process, though using the same Git commit, produced non-idempotent results, causing one of our webApp containers to serve incorrect chunks. **Resolution and Improvements** Immediate actions included redeploying the rollback branch, ensuring all containers served the correct chunks, and implementing a fix to skip build/push for the container image if the SHA tag already exists. Additionally, the following improvements are planned: * **Logging Enhancements** * **Container Health Checks** * **Immutable Tags** * **Build Process Update** * **Nginx Configuration** **Description of Events** * **6:04 AM MST:** Incident detected * **6:15 AM MST:** Investigation and communication initiated. * **6:46 AM MST:** Redeployment initiated using rollback branch. * **6:51 AM MST:** Successful redeployment and app loading restored. * **7:50 AM MST:** Incident resolved

resolved

Services are recovering and we have identified the cause of the outage. We are working to implement a fix to prevent the same thing from happening in the future.

investigating

We are currently investigating this issue.

Report: "Users unable to access the limble applications"

Last update
postmortem

**Date**: April 26, 2024 **Status**: Resolved # Summary On April 26th, 2024, our service experienced a downtime event due to an SSL certificate failure, affecting most of our customers from approximately 2:20 AM to 3:00 AM MST. The issue stemmed from recent changes in the rules around SSL certificate auto-renewal by our SSL provider, which were not fully accounted for in our system. As a result, we have enhanced our alerting systems and updated our infrastructure configurations to prevent similar issues in the future. # Impact The downtime affected all users attempting to access our site without a cached version of the SSL certificate, resulting in about 40 minutes of service disruption. # Root Causes The primary cause of the downtime was a change in the auto-renewal rules for SSL certificates by our provider, which led to an unexpected, silent expiration. Although a fix was previously developed, it was not yet fully propagated across the entire Limble infrastructure. # Resolution and Improvements Immediate actions were taken to renew the SSL certificate and mitigate the situation. Following the incident, we fully integrated the auto-renewal fix and enhanced our monitoring and alerting capabilities to detect similar issues ahead of potential disruptions. # Description of Events: * **2:18 AM MST:** SSL certificate for [limblecmms.com](http://forlimblecmms.com/) and some of its subdomains. * **2:20 AM MST:** Service disruption detected. * **2:25 AM MST:** The on-call team was alerted to the issue. * **2:33 AM MST:** Critical alert alarm raised. * **2:44 AM MST:** Official incident declaration posted on our status page. * **3:03 AM MST:** New SSL certificate procured and implemented; service restored.

resolved

This incident has been resolved.

monitoring

We have implemented a fix to address this issue and are monitoring the results.

investigating

Currently we are aware an an issue where limble applications appear to be down. We are investigating.

Report: "We are aware of an issue that's causing some customers to not have elements / data in the app load"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Some customers may be experiencing issues accessing the CMMS application"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "System Unstable"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

Report: "System Unstable"

Last update
resolved

We are currently investigating this issue.

Report: "Localized service degradation in Denver (Western US)"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Gateway timeout errors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing issues with pages not loading, and general app instability.

Report: "Scheduled tasks are not generating"

Last update
resolved

This incident has been resolved.

monitoring

We have applied a fix for this issue and are proceeding with a post-mortem analysis.

Report: "Partial Outage of 21cfr servers"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

Report: "Partial Outage"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "We are investigating reports of a small percentage of requests to the Limble application failing causing data to not load"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Outbound Emails Not Sending"

Last update
resolved

Incident has been resolved.

monitoring

We have implemented a fix and are monitoring for errors.

identified

We are continuing to work with our service provider to fix this issue.

identified

We are continuing to work on a fix for this issue.

identified

We have identified an issue with one of our service providers that is making it so emails going out of Limble are not sending successfully.

Report: "We are experiencing an issue that is causing some people to not be able to login to the Limble Application"

Last update
resolved

This incident has been resolved.

monitoring

Issue seems to be resolved, continuing to monitor.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating the issue.

Report: "We are experiencing issues with our service provider (AWS). This is causing users to not be able to login, the app not load properly and other issues."

Last update
resolved

This incident has been resolved.