Pronto

Is Pronto Down Right Now? Check if there is a current outage ongoing.

Pronto is currently Operational

Last checked from Pronto's official status page

Historical record of incidents for Pronto

Report: "Delayed notifications"

Last update
resolved

We have resolved the issue and notification times are back to normal.

investigating

We are currently experiencing longer than normal delays delivering new message notifications. We are actively investigating the source of the issue and will update as we learn more.

Report: "Delayed notifications"

Last update
Investigating

We are currently experiencing longer than normal delays delivering new message notifications. We are actively investigating the source of the issue and will update as we learn more.

Report: "Intermittent issues with web app"

Last update
resolved

One of our hosting providers had intermittent problems resulting in errors in the Pronto web app. They have since identified and fixed the problem.

Report: "Database Maintenance"

Last update
resolved

We've wrapped up the loose ends on this database maintenance. Everything looks good and the Pronto platform is fully functional. Thanks for your patience.

identified

The database maintenance is taking longer than expected. Service may unreliable until it is complete. We apologize for the disruption, our team is working as quickly as possible to restore full service.

Report: "Database issues"

Last update
resolved

The issue has been fully resolved. Thanks for your patience.

monitoring

A fix has been implemented and we are monitoring for stability.

investigating

There is a connectivity issue with our primary database. We are working urgently with our database vendor to understand the issue and get it fixed as soon as possible. We apologize for the issues and will send out another update as soon as we know more.

Report: "Pronto System Outage"

Last update
resolved

This incident has been resolved.

monitoring

The system has recovered and is currently operational. We will continue to monitor the database and work to identify the cause.

monitoring

We have cleared out the hung queries and performance seems to be back to normal. We will continue monitoring for the next little while to ensure that the problem is resolved.

investigating

Database queries have begun to hang for an unknown reason and have resulted in system downtime. We are investigating this right now with our database vendor and will update here as soon as we know more. We apologize for the downtime.

Report: "Real-time web sockets issue"

Last update
resolved

This incident has been resolved.

monitoring

Our provider has implemented a fix and we are monitoring the results.

investigating

We are again seeing increased error rates and problems connecting to our real-time messaging service provider. They are investigating the issue.

Report: "Real-time web sockets issue"

Last update
resolved

A fix has been implemented and we are now seeing normal connection rates. Thanks for your patience. As we gather more details about what happened we will share them in a post-mortem.

investigating

Our provider has made strides in fixing the issue but it is not completely resolved yet. We continue to see sporadic connection issues. These issues may be resolved temporarily by refreshing your web browser or by restarting your app (on mobile). We will continue to post updates as we know more.

investigating

Overall things are working better, but we are still seeing sporadic connection issues. We continue to work with our provider to identify root cause. We apologize for the disruption today. We will continue to post updates as we get them.

investigating

We are again seeing increased error rates and problems connecting to our real-time messaging service provider. This issue seems to be identical to the one we experienced yesterday. Our service provider has acknowledged the issue and is working on a resolution. Thank you for your patience.

Report: "Pronto system outage"

Last update
resolved

Our service provider has marked this issue as resolved. Our systems look good and everything is functioning as normal. Thank you for your patience.

monitoring

Our service provider seems to have resolved the issue and Pronto is currently functional. We will continue to monitor the situation and await further updates from them until they have confirmed the solution.

identified

We are again experiencing the same issue as before, causing a service disruption to Pronto. We will continue to monitor the situation and update when we have further information.

monitoring

Our service provider seems to have resolved the issue and Pronto is currently functional. We will continue to monitor the situation and await further updates from them until they have confirmed the solution.

identified

Our real time messaging service provider is currently experiencing a major system outage. This event is also affecting Pronto. They are aware of the issue are are currently working to resolve the issue as quickly as possible.

Report: "Pronto system outage"

Last update
resolved

We are confident that the system is now fully back to normal. We will be working with our database vendor further to understand how to avoid this situation during future migrations. Thank you for you patience.

monitoring

The issue appears to be that when the migration was performed some unexpected database locks caused certain queries to hang, blocking other queries from happening. We have cleared out the hung queries and performance seems to be back to normal. We will continue monitoring for the next little while to ensure that the problem is resolved.

investigating

After the team performed a routine database migration, queries began to hang for an unknown reason and have resulted in system downtime. We are investigating this right now with our database vendor and will update here as soon as we know more. We apologize for the downtime.

Report: "Slow performance and request failures"

Last update
resolved

We identified a rarely occurring slow database query that caused a cascading effect on the database. We have restored the production database performance and are in the process of testing a permanent fix.

investigating

We are currently investigating an issue where requests are failing. We will update this incident as we learn more.

Report: "Real-time web sockets issue"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

Performance seems to be improving. We are continuing to monitor and will post any further updates until the issue is resolved.

identified

The vendor has acknowledged there's an error on their side and they're working to resolve it.

investigating

Our websocket provider is currently experiencing an outage that is affecting sending, receiving, and loading messages. Customers may see prolonged progress spinners and message send errors. We are in contact with our vendor and will provide updates here as we get them.

Report: "Message send errors"

Last update
resolved

The vendor has updated their SSL cert and all is back to normal.

identified

The SSL certificate of one our vendors expired. We are contacting them so they can update it.

investigating

Attempting to send a message is resulting in a server error. Meetings may also be affected. The team is investigating.

Report: "Failed web deployment"

Last update
resolved

The deployment has been successfully rolled back. We will be taking a look at what happened before re-deploying.

identified

We are currently performing a rollback to the previous version which will take a few minutes.

investigating

During an attempt to deploy new changes to the Pronto web app, there was a failure resulting in blank screens and loading errors. We are currently investigating. The mobile app and APIs are not affected.

Report: "Real-time web sockets issue"

Last update
resolved

This incident has been resolved.

monitoring

We are seeing some elevated error rates once again and are working with our vendor.

monitoring

Our websockets vendor has implemented a fix and we are monitoring the results. Right now the system appears to be back to normal.

investigating

We are continuing to investigate this issue.

investigating

Our websocket provider is currently experiencing an outage that is affecting sending, receiving, and loading messages. Customers may see prolonged progress spinners and message send errors. We are in contact with our vendor and will provide updates here as we get them.

Report: "Notification delay"

Last update
resolved

Since our last update we have been carefully monitoring notification delays and the numbers look great and have stayed that way for about 4 hours now. We believe that this issue is now fully resolved and will close this incident. The root cause was that a database upgrade appears to have reset some of the optimizations that were used to ensure fast database access. After reconfiguring the database with the proper optimizations, performance returned to normal levels.

monitoring

After making some more database optimizations all our numbers are looking good again. We will continue monitoring for the next few hours, but for now notifications are back to normal delivery times.

investigating

We are once again experiencing delayed notifications, due to some slow database queries. We are working again with our vendor to investigate and will update here as we learn more. We sincerely apologize for the ongoing issue. The team is working diligently to solve this issue for good.

Report: "Delayed notifications"

Last update
resolved

Notifications have returned to normal after the fixes made by our database vendor. We will continue to monitor notification delay over the next 24 hours to ensure no further issues. Thanks again for your patience.

monitoring

Notification delivery times have returned to normal after our database vendor applied some optimizations. Initial findings indicate a positive impact on all affected database operations. We will continue to monitor the results. Thank you for your patience.

investigating

Overall notification delays have improved, but we are still seeing some periodic spikes. We continue to investigate and will update here again once we know more. Thank you for your patience.

investigating

We are currently experiencing issues with notifications being delayed. We believe this is due to a database being slower than normal after an upgrade. We are working with our database vendor to identify and solve the issue. The rest of the Pronto platform is working normally.

Report: "Platform issues"

Last update
resolved

After reviewing logs and metrics, all evidence points to a temporary, transient network issue within AWS that lasted from 16:53 - 17:00 UTC. We have opened a case with AWS support and will update with a post-mortem if there are any changes to our assessment.

investigating

We are continuing to investigate this issue.

investigating

The Pronto service is now back up. We are still investigating the cause.

investigating

Most requests are currently failing to the backend API servers. We are actively investigating and will update here once we know more.

Report: "Database issue"

Last update
resolved

The Pronto platform is back to normal. We will be analyzing data from this incident over the coming days to understand what went wrong and how to better prevent a similar event in the future.

monitoring

Some of the affected components have been restored and we are starting to see some improvement. The Pronto service is back up, but may have degraded performance for a little while as the other components are restored.

identified

As mentioned in the previous update, the issue is due to some underlying hardware failures. Our database vendor is working to move the affected components to new hardware.

investigating

Our database vendor has identified a hardware issue in the database cluster and is working to mediate.

investigating

Our backend database provider has acknowledged an issue on their platform and are working to diagnose. We will continue to update here as soon as we know more.

investigating

We have been alerted to an issue with our backend database that is causing a system outage. We are currently investigating.

Report: "Degraded performance on user queries"

Last update
resolved

Our workarounds have been deployed successfully and performance has returned to normal. Both User Search and User Count endpoints have been re-enabled and all Pronto functionality is restored. At this point we believe the root cause was a database-level bug. We are working with our database vendor to confirm the bug and get a permanent fix. In the meantime, now that we know how to workaround it, we expect no further problems from this.

identified

We are continuing to work on a fix for this issue. We have deployed some workarounds and seen some promising results. We are working to identify and address the remaining slow areas using a similar approach. We will provide another update after testing and deploying those additional workarounds.

identified

We've identified the very slow queries and are attempting some workarounds to speed them up. It is still unclear what the root cause of the slow queries is, but we are hopeful that this workaround will resolve the immediate issues to get performance back to normal and allow us to reenable all endpoints.

investigating

We have temporarily disabled two endpoints that are causing the issue in order to protect the rest of the application. These two endpoints are: 1. User search - Attempting to search for a user in any context will not work. This includes starting a new DM (existing DMs are unaffected), adding new users to a group, or searching for users in org management. 2. User counts - In org management the overall user count will not be displayed We apologize for this loss of functionality, but deemed it necessary in order to prevent further issues across the Pronto app. We are working with our database vendor directly to diagnose and solve this issue as quickly as possible. Thanks for your patience.

investigating

We are currently seeing degraded performance on user-related queries in the main Pronto database. User searches in org management and also in the client apps are currently taking a long time or timing out. User online status is also affected and may not be reflecting the correct state right now. We are actively investigating and will update as soon as we know more.

Report: "AWS outage"

Last update
resolved

All Pronto services are now back to normal. Push notifications are now being delivered in real-time and other async jobs such as URL previews are speedy once again. Canvas integration has also been re-enabled. Canvas course syncing will need some time to catch up, but should be up to date for all customers within the next 6 hours. Thank you for your patience today. We will spend some time analyzing this event to see what changes we can make to be more resilient to a similar failure in the future.

monitoring

AWS has implemented their root cause mitigation plan and core Pronto services are once again working well. We are still experiencing some minor latency with push notifications as scaling on that service has not yet been restored by AWS engineers. Canvas integration is also still disabled for the same reason. We are hopeful that these issues will both be resolved quickly.

monitoring

As AWS starts to see significant recovery, we also are seeing some Pronto services scaling up again. Push notifications are still delayed, but response times are improving on the core Pronto services. Canvas integration is still disabled. We will continue to provide updates as services recover.

identified

We just saw a major increase in traffic from an integration platform, perhaps as it itself was recovering. This caused our small cluster to get overloaded. To mitigate we have temporarily disabled the Canvas integration platform until we are once again able to scale the Pronto services. This mitigation appears to have worked and Pronto core services are back up, albeit with slower response times than normal.

identified

We are continuing to work on a fix for this issue.

identified

As expected, traffic increases finally pushed Pronto over the edge and we are now experiencing a system wide outage due to our inability to scale because of the AWS outage. We will continue to do whatever we can within our power to bring Pronto back up. We sincerely apologize for the disruption we know this is causing you.

identified

AWS says they are starting to see some signs of recovery, but do not have an ETA for full recovery at this time. We have tried various ways to scale Pronto servers, but because AWS internal APIs are failing this has not been successful. Thus, Pronto is currently running on less than half the capacity we normally would at this time of day. Push notifications continue to be delayed, and general response times are increasing. We expect that if AWS has not recovered their services in the next hour we will start to see much higher latency and an increase in error rates on Pronto core services. We will continue to explore alternatives in the meantime and will keep you up to date. Thanks for your patience.

identified

AWS has identified the root cause and are working towards recovery. Pronto core services are still running smoothly for now (except for delays in push notifications and other async jobs as noted in the last update), but because of the outage we are unable to automatically or manually scale up our servers as we normally would. As traffic increases in the next couple of hours this could result in slower response times across Pronto services. We are investigating alternative ways to scale up our servers in the meantime and will continue to keep you updated.

investigating

There seems to be problems in the us-east-1 AWS region resulting in some services being slow or having increased error rates. Core Pronto services are not currently impacted, but push notifications and other async jobs such as URL previews may be delayed. We are monitoring the situation and will post updates as we learn more. AWS status is available here: https://status.aws.amazon.com/

Report: "Database node failures"

Last update
postmortem

A postmortem

resolved

Just after 8:00pm MDT on Sep 26th the Pronto database cluster had a simultaneous failure on multiple nodes. The Pronto database cluster is designed to automatically withstand the loss of individual nodes that happen in succession, but not when multiple happen simultaneously as they did in this case. The Operations team at Pronto immediately engaged multiple avenues of support at both our hosting provider and our database vendor. In the meantime they also prepared to perform a full database restore (something that is tested regularly, including one last week). After some time, our hosting provider alerted us to an underlying service failure on their part that resulted in the node failures. They worked to restore services, but this took several hours. After our hosting provider’s fix, Pronto services began to come back online at about 1:15am MDT on Sep 27th and were working normally by 1:30am. We are extremely sorry for the disruption to Pronto services. We will learn from this incident and work towards improving our services to be more resilient to underlying failures like this in the future.