Historical record of incidents for Losant
Report: "Blob Storage Incident"
Last updateWe are currently investigating an issue with Blob Storage that began at approximately 1:58pm EDT. Will update with additional information as we determine the cause.
Report: "Request Failures and Workflow Delays"
Last updateBetween 5:41 and 5:46am EDT there were delays in processing workflows as well as failures for API requests, Webhook requests, and Experience Endpoint requests. This was due to a critical machine failure that then took longer than expected to automatically fail over. By 5:47am Losant was fully operational, but we are investigating why the automatic failover took longer than expected to kick in.
Report: "VM failure leading to temporary errors and repeated closed broker connections"
Last updateAt 10:17am EST a critical VM failed and automated failover to a secondary took longer than expected. This lead to errors and broker connection issues between 10:17am and 10:21am. Errors subsided at 10:21am as a secondary took over the duties of the failed machine. The platform is now operating normally, and the failed machine has been replaced.
Report: "Experience Workflow Delays And Timeouts"
Last updateFrom about 3:25 until 3:40, experience workflows were delayed and some experience endpoint requests timed out. We have mitigated the issue with additional processing capability and are looking into the root cause.
Report: "Experience Workflow Backlog"
Last updateThis incident has been resolved.
The issue has been mitigated, requests should no longer be timing out, but we are continuing to investigate and working on a longer term fix.
We are currently investigating an issue causing delayed or timed out responses for experience workflows.
Report: "Experience Workflow Latency"
Last updateThis incident has been resolved.
A configuration change has been applied that has mitigated the issue and we are monitoring current behavior.
We are currently investigating an issue with experience workflow latency that is causing experience page timeouts.
Report: "We are currently experiencing an outage. We are investigating it now."
Last updateThis incident has been resolved.
We are working to identify the issue and implement a fix.
Report: "We are currently experiencing an outage. We are investigating it now."
Last updateWe have resolved this issue.
A fix has been implemented and we are monitoring the results.
We are working to investigate and resolve this issue.
Report: "Partial outage of API / databases"
Last updateA longer term mitigation is now in place, and the system has been operating normally.
A fix has been implemented for the outage and the issue has been mitigated. We will continue to monitor the situation as we put a longer-term resolution in place.
We are seeing a recurrence of issues, where some users are experience slow read/write performance of application resources.
We have implemented a fix for the platform outage and the issue has been mitigated. We are continuing to monitor the situation as we put a longer-term resolution in place.
The issue has been identified and we are currently working on a fix. We expect to have the issue resolved shortly. We will continue to update this page with more information.
We are experiencing issues with our primary configuration storage database. Reading and writing device and workflow configuration is affected. We will provide an update shortly.
Report: "Broker Connections Unstable"
Last updateThis incident has been resolved.
Connections are now stable and we are investigating the source of the issue.
We are currently investigating an issue where connections to the Losant MQTT broker are unstable.
Report: "Webhooks and Experience Endpoints are experiencing elevated 502 errors"
Last updateThis incident has been resolved.
We have implemented a fix and are currently monitoring the situation to see if any further modifications are needed.
Report: "Cloud Provider Networking Issues"
Last updateOur cloud infrastructure provider has resolved their networking issues.
Our cloud infrastructure provider is currently experiencing intermittent networking issues with internet connectivity. We are currently monitoring the situation.
Report: "We are currently experiencing an outage. We are investigating it now."
Last updateThe incident has been resolved.
The platform is now stable, but we are continuing to monitor the situation to insure stability.
We are investigating this issue.
Report: "Higher than normal latency in the workflow queues"
Last updateThings are resolved. We experienced disk latency issues at our infrastructure provider so we moved processing to compensate.
We are currently investigating higher than normal latency in the workflow queue.
Report: "API Outage and Queue Backups"
Last updateThis incident has been resolved.
We have identified the issue is the result of an ongoing networking problem with our cloud provider's infrastructure and we are continuing to monitor.
We are currently investigating an issue with our queueing system
Report: "Workflow and Data Ingestion Backups"
Last updateFrom 10:25 until 10:34 there were issues with our queueing system, and workflows and data recording were delayed. We are currently investigating to find the root cause, but Losant performance is now back to normal.
Report: "Database Master Failover Incident"
Last updateFrom 9:29pm EDT until 9:50pm EDT due to an issue with our planned scheduled maintenance procedure, one of our database clusters was without a promotable master. This cause delays in workflows, data ingestion, and API response time. We will be conducting a post-mortem to identify the underlying cause(s) and possible mitigations/fixes for future such events.
Report: "Device state processing delayed"
Last updateThis incident has been resolved.
From 2:58pm EST until 3:22pm EST processing of device state messages was delayed. A mitigation has been rolled out and we are currently processing in real time again. Will continue to monitor for any issues.
Report: "Losant Infrastructure Issues"
Last updateThis incident has been resolved.
Losant is now operational again at normal performance levels, but we are continuing to monitor closely, since our infrastructure provider still has underlying service degradation issues.
We have identified the issues and are rolling out new/additional infrastructure to workaround our provider's issues. Losant is operational, but may be slower than normal.
We are currently investigating issues with our underlying infrastructure provider
Report: "Partial system outage"
Last updateThis incident has been resolved.
The platform is now operating normally again, we are monitoring closely for any further issues.
The issue has been identified, and our team is working on a fix
We are experiencing some issues with our REST API and Webhook connectivity. Our team is investigating the issue.
Report: "Particle Integration Issues"
Last updateThis incident has been resolved.
We have implemented temporary mitigations, all Particle integrations are currently properly functioning and we are monitoring. We are currently reaching out to Particle for potential longer term solutions.
Over the past 24 hours we have started consistently hitting rate limits on Particle's API, preventing the particle integration from functioning properly. We are currently working to mitigate this issue, and reaching out to Particle to see if they can change our limits.
Report: "Workflow Timers Delayed"
Last updateAs of 7:49am EDT, all workflow timers are now fully working again, and we are working on a fix to prevent timers from locking up in the future.
We are currently investigating an issue where some workflow timers are delayed or not firing.
Report: "Platform Downtime"
Last updateThe issue has been resolved, and API uptime and workflow queues gave returned to normal.
We are experiencing API downtime and workflow message queue backups. The team is investigating and should have the issue resolved shortly.
Report: "Losant API Access"
Last updateService has been restored. Our team is investigating the root cause to put measures in place to prevent a similar outage from occurring in the future.
Access to the Losant API is currently down. The issue has been identified an service is being restored now.
Report: "Elevated Errors and Processing Delays"
Last updateFrom 1:45 EST until 2:15 EST, Losant saw elevated error rates and higher than normal data processing latency due to network partitions. At this time, Losant is now operating normally again.
Report: "Dashboard and Data API Errors"
Last updateRoot cause of the elevated error rate was identified and a permanent fix has been rolled out.
We have mitigated the issue (error rate has returned to normal) and are investigating the root cause.
We are investigating an elevated error rate on our Dashboard and Data Query API endpoints.
Report: "MQTT Connection Cycling"
Last updateLosant experienced a network partition from 10:08pm EDT until 10:25pm EDT that caused excessive MQTT Broker connection cycling. There was also a small increase in workflow execution latency. The network partition has now been resolved, and all MQTT connections should now again be stable.
Report: "Intermittent DNS Issues"
Last updateFrom 11:30pm EDT on 5/6 until 10:30am on EDT 5/7, DNS resolution for Losant had intermittent failures. We have bypassed the offending DNS provider, and all DNS resolution is now working again.
Report: "Workflow Execution Delays"
Last updateWorkflow delay has been identified and mitigated, workflows are now executing with normal latency.
We are seeing delays in executing workflows, currently investigating
Report: "Increased disconnect rates for MQTT clients"
Last updateUnderlying flapping instance has been removed from the broker pool, and broker connections now appear stable.
We've received reports of increased disconnect rates for MQTT clients. We have verified the issue and have identified the underlying infrastructure components causing it. We are redeploying workers now to resolve the performance issues.
Report: "Workflow Run Latency"
Last updateWe experienced workflow run latency between 18:22 EST and 18:55 EST due to extremely degraded performance of the AWS Lambda workflow node. Workflows are now fully recovered.
Report: "API and Broker service interruption"
Last updateAll services are back to normal. Underlying cause was a broad loss of connection between availability zones at our underlying infrastructure provider. We will continue to investigate further mitigation strategies for events like this in the future.
A fix has been implemented and we are monitoring the results.
We have identified and are resolving an intermittent issue with our underlying queuing service due to network connectivity issues at our underlying infrastructure provider.
Report: "Workflow Timer Delays"
Last updateThe issue with workflow timers not firing has now been resolved.
We have identified an issue preventing the triggering of workflow timers, and are currently working to roll out a fix.
Report: "Data Ingestion Interruption"
Last updateData ingestion and processing was interrupted between 2:49am UTC and 3:01am UTC due to queue network partition. Broker connections and workflow executions were adversely affected, api access remained operational.