Historical record of incidents for Onfleet
Report: "iOS Provisioning errors"
Last updateThere is currently an issue blocking initial logins with new devices on the iOS app. Most devices that have previously logged in will not be affected. The team is investigating this now.
Report: "ETA calculation errors"
Last updateThe ETA calculation issue has been tracked to a configuration issue, and has now been identified and resolved. We will be improving our monitoring and response process to handle similar issues more expediently in the future
We are experiencing errors related to ETA calculations
Report: "ETA calculation errors"
Last updateThe ETA calculation issue has been tracked to a configuration issue, and has now been identified and resolved. We will be improving our monitoring and response process to handle similar issues more expediently in the future
We are experiencing errors related to ETA calculations
Report: "Webhooks not firing for image upload"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently experiencing an issue where webhooks are not firing when an image is uploaded. A fix has been found and we are in the process of testing and deploying
Report: "Webhooks not firing for image upload"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently experiencing an issue where webhooks are not firing when an image is uploaded. A fix has been found and we are in the process of testing and deploying
Report: "Elevated Response Times"
Last updateSystem performance has improved significantly and most if not all functions should be operating without excessive delay. Further adjustments to database queries are in progress to avoid this issue in the future.
The team has identified some changes to database queries to alleviate this issue, and will be publishing these changes as soon as possible.
Monitoring indicates increased response times overall. Our infrastructure teams are investigating the issue.
Report: "Increased API and Dashboard response times"
Last updateThe increase in system latency was caused by a query which was designed to depend on a new database index, but the index had not been created in production at the time of the deployment. We are reviewing our deployment practices to add safeguards for this type of issue in the future.
We briefly experienced partial service outage and increased response times following an application update. All systems are now operating normally. We apologize for any inconvenience caused during this incident.
Report: "Chat service unavailable"
Last updateThis incident has been resolved.
We are continuing to work on a fix for this issue.
We are continuing to work on a fix for this issue.
The provider has confirmed the issue and identified a root cause. A fix should be available shortly.
Chat is currently unavailable due to an issue with an upstream provider.
Report: "Batch Task Creation Jobs Failing"
Last updateBetween 11:45 a.m. and 12:20 p.m. PST, a batch request began to loop unexpectedly, interfering with each other and causing container contention. Because of how batch creations are processed, this prevented other jobs from being processed until they were cleared manually. Once the job was manually stopped, the system returned to normal batch processing. The team will implement enhanced monitoring to avoid this scenario in the future.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
At this time the internal process for managing task creation queues for the asynchronous batch task creation endpoint is experiencing issues. This is being investigated with high urgency.
Report: "Route Optimization"
Last updateThis incident has been resolved.
We have detected a slight decrease in errors for Route Optimization.
The upstream provider has confirmed the issue and is currently investigating the problem.
Route Optimizations are currently running very slowly or failing. Initial signs point to an issue with an upstream provider.
Report: "Pages for courier clients unavailable due to deployment issue"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Report: "Onfleet integrations subsystem experiencing issues"
Last updateOur DevOps team was able to find a glitch within our AWS infrastructure that required a manual override. All systems are operationsal.
“The Onfleet integrations subsystem is currently experiencing issues, blocking creation of manifests as well as some other 3rd party integrations.”
Report: "Route Optimization issues."
Last updateRoute optimization was briefly unavailable due to an outage at a partner provider. Their issues have now been resolved, and monitoring indicates that all features are once again available.
Report: "Brief Outage Due to System Restart"
Last updateWe experienced a brief 1-minute outage due to a system restart around 2:40 Pacific. We apologize for any inconvenience caused. The services have been fully restored shortly after the incident.
Report: "Login failures."
Last updateThe incident has been resolved.
A misconfigured firewall rule briefly caused login failures. The rules have been adjusted, and this should now be resolved.
Report: "Increased API and Dashboard response times"
Last updateWe briefly experienced partial service outage and increased response times following a planned infrastructure update. All systems are now operating normally. We apologize for any inconvenience caused during this incident.
Report: "Service Unavailability in Australia and Southeast Asia."
Last updateImpacted APAC customers have confirmed that they can now access the Onfleet application without any issues.
While monitoring indicates that Onfleet systems are running normally, we are receiving reports of service unavailability affecting users in Australia and Southeast Asia. Initial assessments suggest this may be due to a network issue in ISPs from this region. We apologize for the inconvenience and will provide updates once we have more information. Thank you for your patience.
Report: "Dashboard search is not working on tasks created after Aug 18"
Last updateThis incident has been resolved. We apologize for the disruption and appreciate your understanding.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Elevated Error Rates in Image Uploads"
Last updateWe experienced an issue with elevated error rates in image uploads following our recent release and deployment. This occurred on August 15th between 5:39 PM and 6:43 PM Pacific. Our team identified the root cause and rolled back the release. The image uploads are now functioning normally. We apologize for the disruption and appreciate your understanding.
Report: "Service instability and outage."
Last updateA package upgrade was necessary due to a vulnerability, requiring additional updates to other packages. During deployment, some services did not restart appropriately due to a usage change in a newer container version, which requires a restart after upgrades. This was the root cause: the container service was not restarted, and all containers were stopped. Our operating procedures have been updated to reflect the proper usage pattern for newer versions.
Report: "Dispatchers aren’t able to view driver details"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Report: "Drivers locations not updating"
Last updateThis incident has been resolved.
Our team has successfully identified and reverted the changes that caused the disruption. Services are being restored to full functionality.
We have gotten reports with Driver's Location not updating on the Map view. We are currently investigating our location service for the Driver locations and will keep updating as we find out more.
Report: "Service unavailable"
Last updateThis incident has been resolved.
The dashboard rollback has completed and we are verifying the fix.
The team has confirmed an issue with inconsistent sidebar loading in the the recent release, and is initiating a rollback
Report: "Issues with SMS deliverability for some customers"
Last updateSome customers experienced issues with SMS task notifications wherein those SMS messages were not delivered to end recipients. Upon investigation, we observed that the issues were caused by a recent change to our production telephony service. Upon discovery of the issue, we immediately rolled back our changes and verified that normal service and SMS deliverability had resumed.
Report: "Network Connectivity in California"
Last updatePer AWS, between 11:43 AM and 1:30 PM PST some customers in California may have experienced connectivity issues between their ISP and AWS destinations. Onfleet was able to reproduce the issue from San Francisco with Comcast connections but not ATT connections.
AWS reports connectivity issues for some customers in California. Onfleet has received customer reports of problems connecting to the service.
Report: "Issues with Task Exports"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently seeing issues related to task exports email being sent out.
Report: "Service Degraded"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
An error in code led to an unintended functionality change. The problem release is being rolled back.
Report: "Service Degraded"
Last updateBetween 7:25 PM January 4th and 1:43 AM January 5th tasks appeared out of order on the dashboard side panel. The issue has been resolved.
Report: "System Unavailable"
Last updateAll changes were reverted, and the service is now fully operational.
Our team has successfully identified and reverted the changes that caused the disruption. Services are being restored to full functionality.
We are currently addressing an unexpected disruption that occurred during the deployment/upgrade process. Our team is actively working to resolve the issue.
Report: "Telephony - US shared toll-free number degradation"
Last updateThis incident has been resolved.
We have worked with our telephony provider to resolve the issue and are now monitoring traffic from the shared high throughput toll-free phone number.
We are investigating issues related to the shared high throughput US toll-free phone number used for notifications and call anonymization that is affecting some customers.
Report: "System unavailable"
Last updateOur API and Dashboard experienced intermittent errors and elevated response times between 1:01AM to 1:08AM PDT. We restarted all affected servers and API requests resumed processing normally.
Report: "Service degraded"
Last updateOur dashboard experienced intermittent errors between 4:32PM PDT and 4:48PM PDT. Some users may have experienced issues loading the dashboard and tracking pages.
Report: "System unavailable"
Last updateOur API and Dashboard experienced intermittent errors and elevated response times between 1:33PM and 1:46PM PDT. We restarted all affected servers and API requests resumed processing normally.
Report: "Service degraded"
Last updateOne of the web proxy instances failed causing transient errors for API and dashboard services from 1:57PM PDT to 2:04PM PDT.
Report: "System unavailable"
Last updateOur API experienced intermittent errors and elevated response times between 03:14 and 03:30 PDT. We restarted all affected servers and API requests resumed processing normally.
Report: "Issues performing imports"
Last updateThe incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating issues experienced by some users in performing imports in the dashboard.
Report: "System Unavailable"
Last updateWe identified elevated response times on our API between 10:00PM - 10:12PM PDT. We have restarted all affected servers and API requests started being processed normally.
Report: "System Unavailable"
Last updateThe service was unavailable for approximately 7 minutes due to our database provider losing an instance which hosted multiple databases.
The service is unavailable / inaccessible.
Report: "Email Delivery Issues Affecting Exports and Other Transactions"
Last updateThis incident has been resolved.
Exports from the dashboard are starting to deliver normally. Continuing to monitor until the transactional email provider has declared all clear.
The transactional email provider has confirmed the issue and is investigating it on their end.
We are investigating issues with email delivery which appear to stem from our transactional email provider
Report: "Geo-coding errors preventing task creation in certain regions"
Last updateAround 2023-03-03 06:00 UTC a strict address validation check was introduced to task creation which disrupted API workflows among a range of customers. After noticing a spike in user errors we rolled back this change.
Errors in geo-coding temporarily prevented tasks from being created, especially in regions of the world with inaccurate or incomplete mapping data.
Report: "Unavailable Driver Location Data"
Last updateAt 2023-01-27 9:45 UTC, a failover event occurred due to a transient networking issue for a database cluster related to location storage. This event lasted for around a minute. As a result, several processes related to persisting location data entered a state which prevented them from persisting locations. Due to a lack of effective monitoring, we were not immediately aware of this issue. The root issue was resolved at 2023-01-30 22:11 UTC, soon after we became aware of it. However, as a result of this issue, complete location information for about 20% of tasks during this time period was not collected. We subsequently backfilled derived location information when appropriate, providing distance information for about 75% of these tasks by 2023-01-31 19:07 UTC. We understand how crucial location information is and apologize for any impact this incident may have caused. We have since improved our monitoring so a similar incident would have immediately paged an on-call engineer.
We are currently investigating this issue.
Report: "Route Optimization / Auto-dispatch experiencing increased failure rates."
Last updateOptimization services are fully operational.
We no longer see elevated failure rates from the route optimization provider. We will continue to monitor the situation.
We have detected an elevated failure rate from our route optimization provider and are in contact with their support team. They are working on the issue but do not have an ETA.
Report: "Errors sending email"
Last updateThis incident has been resolved.
Mailchamp Transactional has reported that the issue is resolved. Onfleet is monitoring the recovery.
Mailchimp Transactional error rates are decreasing.
Mailchimp Transactional is continuing their investigation to resolve connection issues.
Onfleet has identified an upstream issue with our Mailchimp email provider. Mailchimp is working to resolve the issue.
We are currently investigating the issue.
Report: "Increased API error rate"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Export processing delayed"
Last updateAt 2022-11-01 6:50 UTC, an export error occurred that caused the corresponding queue worker to stop processing other exports. The same error had been missed previously which had caused other export queue workers to fail in the same way. As such, no further exports could be processed until the error was resolved. Once we became aware of the issue, we resolved the issue by 2022-11-01 10:52 UTC. All of the previously deferred exports were processed by 2022-11-01 11:07 UTC. We apologize for any interruption this deferred processing may have caused. We are validating a fix for the root cause of this issue which will be released when ready. We will improve our monitoring procedures to ensure that we respond more quickly in any similar future situations if they arise.
This incident has been resolved.
An issue with export caused processing to be delayed. Previously delayed exports have now been processed. We are currently monitoring.
Report: "Increased error rates for routing functions"
Last updateRouting has been stable since 8:42 AM PDT.
The routing functions are working correctly. We're going to continue monitoring and stay in touch with the upstream provider.
We are working with an upstream provider to determine to cause of this issue.
Report: "Increased error rates for routing functions"
Last updateRouting has been stable since 19:20 PDT.
We are working with an upstream provider to determine to cause of this issue.
Report: "API & analytics unavailable"
Last updateOn 2022-09-01 03:06 UTC, we were notified that some users were experiencing issues accessing certain features through the dashboard and API. A database update earlier in the day resulted in some documents being modified incorrectly, which impacted feature access for the associated users. We performed a fix at 2022-09-01 04:43 UTC that resolved the issue for most users. We were notified again at 2022-09-01 14:08 UTC that some users continued to experience issues with the API. We determined that this was due to unexpected behavior with our caching system and performed a full cache refresh at 2022-09-01 15:50 UTC. This fully resolved the issue for all affected users. We are enacting procedural changes to prevent similar database errors in the future and to resolve related incidents quicker. We are also implementing changes to our caching system to fix the issue experienced and detect such issues quicker.
This incident has been resolved.
We have applied a fix and are watching to make sure access to the API and analytics has been restored.
We discovered an issue after a recent database update. This issue affects some customers and prevents their use of our API and analytics, as well as other functionality. We are working on a fix currently.
Report: "Increased error rates for routing functions"
Last updateThe routing services are fully operational.
Routing services are functioning normally. We are monitoring the service status.
Our VRP provider has identified the problem and is working on resolving the issue.
We have escalated this issue to the affected provider and are working with them to identify & correct it.
Report: "Dashboard error when editing drivers"
Last updateOn Aug 29 at 17:18 PDT \(Aug 30 00:18 UTC\) we deployed a change to support upcoming improvements in our backend systems. This change introduced a bug in the way data for driver addresses was being sent to the dashboard. We investigated the issue and determined that rolling back this change was necessary. The rollback was completed at 21:29 PDT \(04:29 UTC\), and we monitored for some time to make sure all dashboard functionality had been restored.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are looking into reports of an error when editing drivers in the dashboard.
Report: "API Instability"
Last updateBeginning at 11:15 PDT on September 2, 2022, Onfleet services were unstable for approximately 3 minutes. This outage was due to a server under maintenance that stopped responding unexpectedly. The maintenance was rolled back, and service was restored to normal.
Report: "System unavailable"
Last updateOur API was responding slowly between 02:08 - 02:37 PDT. We have restarted all affected servers and API requests are currently being processed normally.
Report: "Issues with US SMS deliverability for some numbers"
Last updateThis incident has been resolved.
We are continuing to work with our provider to restore SMS traffic on this number. If these efforts do not yield any results, we will start working on provisioning a new number.
We're working with our communications provider to restore message deliverability.