Historical record of incidents for Retreaver
Report: "Purchasing new local (non toll-free) numbers is temporary disabled for certain area codes."
Last updateThis incident has been resolved.
Vendors providing the numbers are experiencing difficulties and we've temporary disabled the ability to purchase certain local area codes.
Report: "Identified an issue impacting a subset of our users when visiting the reporting page"
Last updateThis incident has been resolved.
This issue will be restored shortly as we are currently rolling out a fix
Report: "Clients using Convoso are experiencing issues"
Last update#### Summary At the start of the business day, several clients reported issues calls. Our team quickly determined that the issue was isolated to clients using Convoso. Other call traffic was unaffected. Over the next hour, additional clients—all using Convoso—experienced the same symptoms. At 9:30 AM EST, Convoso updated their [status page](https://statusgator.com/services/convoso/dialer) indicating a “potential issue.” At 9:40 AM EST, Convoso updated their [status page](https://statusgator.com/services/convoso/dialer) indicating a “likely issue.” We continued monitoring independently. Convoso have acknowledged an incident stating: #### Resolution Convoso resolved the issue on their end. No action was required from our systems.
Convoso's status health is marked as back up. https://statusgator.com/services/convoso/dialer
Convoso have updated their status page with 'likely issue'. Waiting for updates from them
https://statusgator.com/services/convoso
Report: "RTB performance degraded"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
Customers are complaining that calls are ending on transfer for no apparent reason
We are currently investigating this issue.
Report: "Investigating an increase in errors from our phone provider (Twilio)"
Last updateThis incident has been resolved.
Twilio has updated their status to "operating normally". We will continue monitoring over the next hour.
Twilio provided information about issues with their service: 'Programmable Voice - 11200 HTTP Retrieval Failures' https://status.twilio.com/incidents/djx8tktw4z0p
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Some resources are not loading"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We have received a complaint that in some cases resources are not loading and some pages cannot be displayed
Report: "Reported issue affecting components loading on our website"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
UI components are loading.
We're experiencing issues with a small number of components not loading in our web app, such as the datepicker in Call Log. This has only been reported by a subset of users and may not impact everyone. Our development team is investigating.
Report: "Some reports are taking longer than usual and we are looking into it"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Investigating issue with our background jobs system"
Last updateThis incident has been resolved
This may impact routing of calls and outbound API requests .
Report: "Investigating Carrier Issue"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We are seeing carrier issues where some calls are not correctly marked as completed.
Report: "Investigating issue with a call provider (Twilio)"
Last updateTwilio has marked the incident as resolved. Everything is now fully operational. https://status.twilio.com/incidents/5b8wmb5djj4g
Twilio has updated their status page: > Currently, our baseline call functionality has been restored and is operating normally. [..] Our team is working diligently to resolve any residual issues to minimize any disruption to our users. https://status.twilio.com/incidents/5b8wmb5djj4g We are continuing to monitor and will post an update to sign off when everything is fully operational.
We are waiting on the all-clear from Twilio, but we believe the service is back online. Please stay tuned for further updates. You can find additional information on https://status.twilio.com/incidents/5b8wmb5djj4g
Twilio incident impacting our service: https://status.twilio.com/incidents/5b8wmb5djj4g Our team is in communication with Twilio and closely monitoring our system. We will update as soon as we get more information.
Twilio is reporting issues: https://status.twilio.com/
We are currently investigating this issue.
Report: "Partial outage resulting in delays accessing the web site have been resolved, we are continuing to monitor"
Last updateIssue is resolved, we are continuing to monitor live call numbers.
Status changed from investigating -> monitoring
We are continuing to monitor.
Some users may have experienced delays when loading the user interface. This is no longer occurring and we will continue to closely monitor it.
Calls are operational, we will continue to monitor for issues
We are currently investigating this issue.
Report: "Carrier outage"
Last updateThis issue has been resolved. At approximately 1:09 pm EST our alerting systems informed us of an issue with an inbound carrier. At 1:12 pm we determined the cause and implemented the change away from the Telnyx Chicago site, where they were experiencing an audio degradation issue causing calls to drop. Please see the Telnyx issue for more information.
We are monitoring a situation at Telnyx, where there is a partial outage affecting some numbers on Retreaver. We switched our numbers to move away from the affected site and are seeing some recovery. We will provide more details shortly.
Report: "Twilio is experiencing an outage"
Last updateTwilio has acknowledged system recovery, and calls are processing normally. We are continuing to monitor and will post a new incident if one occurs.
We are seeing signs of recovery at Twilio and are receiving call progress events again. We are continuing to monitor and will update this incident shortly.
We are currently not receiving call progress events from Twilio. Some call are currently not closing correctly, you may see calls in progress that are not actually active. Calls may not route correctly past initial routing stages. We will update this incident as soon as we have more information, as we are waiting on Twilio.
Twilio appears to be having a serious problem. We are actively looking into the incident and will post more information shortly.
Report: "Twilio is down, calls are not going through"
Last updateTwilio has given the all clear and calls are going through. We are continuing to monitor and will post a new incident if anything changes.
Twilio appears to be back up as calls are going through again. We're waiting on them for confirmation that their issue has been resolved.
We're currently investigating this issue and are waiting for Twilio to come back up.
Report: "Inbound carrier outage"
Last updateWe are continuing to monitor. We will post a new incident if there are further disruptions.
The issue at the carrier has been resolved and call traffic is flowing normally again. We continue to monitor for further issues.
One of our inbound carriers is currently experiencing an intermittent outage. Calls may not complete or may have audio issues. We will update this issue once we have more information.
Report: "Inbound carrier outage"
Last updateWe are continuing to monitor for further issues.
One of our inbound carriers experienced a partial outage from 10:33 am EST until 10:59 EST. Calls are currently being processed normally. We are monitoring for further developments.
Report: "Carrier outage"
Last updateWe are continuing to monitor the situation at our carrier. We will post a new incident if there are further developments.
We are continuing to monitor the situation at our inbound carrier, which appears to have stabilized. Calls are currently completing normally, we will post an update as soon as we receive new information.
We are continuing to monitor a situation at one of our inbound carriers. While calls are completing, we are unsure about the stability of the situation at the carrier. We are monitoring for updates and will provide more information as it becomes available.
We are experiencing ongoing intermittent call completion issues due to a carrier outage. We are monitoring for updates and will provide more information as it becomes available.
Report: "Carrier outage"
Last updateThis incident has been resolved.
Calls are processing normally. We are continuing to follow updates from the carrier throughout the night and will post a new incident if needed.
We are experiencing ongoing intermittent call completion issues due to a carrier outage. We are monitoring for updates and will provide more information as it becomes available.
Report: "Intermittent carrier issues"
Last updateWe are observing stability at the carrier and have received the all clear. We continue to monitor and will post a new incident if necessary.
We are continuing to monitor as we observe call traffic flowing at normal rates again. We are still waiting on the carrier to give the all clear.
We are experiencing ongoing call completion issues due to a carrier issue and are continuing to monitor for updates. We will provide more information as soon as possible.
We apologize for the interruption. Calls are coming through again. We are still waiting on the carrier to give the all clear.
We are currently experiencing intermittent carrier issues that are impacting phone calls. We are communicating with the carrier and will update as soon as possible.
Report: "Historical reporting"
Last updateThis incident has been resolved.
We are currently recovering from an issue with reporting historical data from July 2021 and earlier. Calls from this time period are currently being reindexed on our reporting system. Call traffic and underlying data storage are unaffected. Thank you for your patience.
We are currently experiencing an issue with historical reporting. All calls are continuing to be processed and no data has been lost. Today's data is currently fully live, data for this week is fully reindexed, we are working on reindexing data from earlier as quickly as possible. Thank you for your patience.
Report: "US phone call connection issues experienced with Twilio"
Last updateTwilio's API was down between 2:37 and 2:52pm ET which may have caused some calls to not connect, and also caused errors with outbound dialing. We are no longer seeing any errors and marking this as resolved.
Report: "Carrier Issue"
Last updateTwilio has made a change to mitigate the issue and our charts have returned to normal. This issue is resolved. We continue to monitor.
Twilio is continuing to have intermittent issues, but they are reporting that they have located the issue. We are noticing intermittent timeouts when attempting to dial outbound calls in less than 4% of calls. On affected calls, you will see a call log entry stating that the number couldn't be dialed and asking you to email support. Additionally, some calls in Retreaver may take longer to close out than normal as Twilio is also having issues with their outgoing webhooks, which inform us of call progress. We will provide another update as soon as Twilio resolves the issue.
We are continuing to investigate this issue.
We are receiving reports from one of our carriers that there may be issues with some calls timing out. We will keep you informed as the situation progresses.
Report: "Routing Settings not loading"
Last updateRouting Settings were not loading on campaign and number editing pages due to a misconfiguration on one of our app servers. This would only occur if our load balancer happened to send your initial request to the errant server. The issue has been fully resolved.
Report: "Carrier Outage"
Last updateWe have received the all clear from our carrier. We apologize for the incident.
At 3:17 pm EST we experienced a call drop event due to a carrier outage. Call traffic is currently being rerouted. We are currently monitoring the changes and are following the carrier for updates. We will updates this incident again in an hour.
Report: "Carrier Outage"
Last updateWe haven't observed any additional incidents, and have received an all-clear from the carrier. Switching to the carrier's alternative datacenter has resolved this issue. As always, we are continuing to monitor.
We haven't observed any additional incidents, and have received an all-clear from the carrier. Switching to the carrier's alternative datacenter has resolved this issue. As always, we are continuing to monitor.
Our carrier experienced a brief outage at 14:20:40 EDT and fully recovered 7 minutes later. This resulted in dropped calls and failure to create new calls. SIP calls were not affected. We experienced another call drop event at 15:22:00 EDT. We are currently routing calls through another data centre to avoid further issues. The change to the carrier's alternative site is working. We are continuing to monitor this issue.
Our carrier experienced a brief outage at 14:20:40 EDT and fully recovered 7 minutes later. This resulted in dropped calls and failure to create new calls. SIP calls were not affected. We experienced another call drop event at 15:22:00 EDT. We are currently routing calls through another data centre to avoid further issues. We are continuing to monitor this issue.
Our carrier experienced a brief outage at 14:20:40 EDT and fully recovered 7 minutes later. This resulted in dropped calls and failure to create new calls. SIP calls were not affected. We experienced another call drop event at 15:22:00 EDT. We are currently routing calls through another data centre to avoid further issues. We are continuing to monitor this issue.
Report: "Issue with some incoming voice calls"
Last updateTwilio's REST API appears to be responding again and operations have returned to normal. Some calls may appear to be hanging in Retreaver when they have actually finished, these calls will close out automatically and be finalized shortly. We're sorry for the issue, and we're continuing to monitor the situation as the night progresses.
Twilio's REST API is down. We first noticed elevated errors from the Twilio API at 8:46 pm EST. We're currently monitoring the issue. Not all calls are affected. We sincerely apologize to our customers for the impact.
Report: "Call log and dashboards were down"
Last updateAll calls have been reindexed and the service has returned to normal.
Historical call data continues to be backfilled (calls prior to April 10 2018). We expect the process to complete around noon EST. All other functionality continues to operate as normal, and call flow was not affected during this incident. We apologize for the inconvenience this morning, this issue will be updated again once calls have been fully synced to our search servers.
We have restored our search servers. Phone numbers have been fully synced. Live call data is up to date, but the call log is currently being backfilled. Note that historical call data will remain out of sync for approximately 2 hours as we reindex the calls in reverse chronological order. We apologize for the inconvenience this morning, this issue will be updated again when calls have been fully reindexed.
We have restored our search servers. Phone numbers have been fully synced. Live call data is up to date, but the call log is currently being backfilled. Note that historical call data will remain out of sync for approximately 2 hours as we reindex the calls in reverse chronological order. We apologize for the inconvenience this morning, this issue will be updated again when calls have been fully reindexed.
We're working on a fix. Calls continue to be processed as normal. Services will resume as normal shortly.
Report: "There's an issue with Call Handling at Twilio"
Last updateTwilio has issued the all clear, and systems seem to be operating normally. We haven't seen any additional errors in the past hour. We sincerely regret the impact of this issue on our customers.
We're continuing to see small, intermittent bursts of errors from our telephony provider, Twilio (from 1:51 to 1:54 pm EST). This outage is affecting their entire platform. We're continuing to monitor the situation and sincerely apologize to our customers for the issue.
This morning and afternoon, our telephony provider, Twilio, experienced degraded service in 3 separate periods, from 10:58 AM until 11:54 AM (EST), 12:15 PM - 12:40 PM, and 1:00 PM - 1:04 PM. During this time, some calls were negatively impacted, resulting in error messages being played to the caller. We're continuing to monitor the situation and will provide further updates soon.
We're seeing errors again. Some calls may intermittently fail as we wait for Twilio to get their services under control. Our apologies to our users for this.
Twilio is reporting that the issue has been identified and their services have recovered. We're continuing to monitor for issues.
We haven't seen any errors in the past 5 minutes. We're continuing to monitor the situation.
Twilio is back down. We’re continuing to monitor.
The last error was seen by our engineers around 11:52 AM EDT, so we're assuming that Twilio is now fully operational. We continue to monitor this situation.
Some calls may fail to go through. We're waiting on Twilio to fix their issue, historically they have had an incredibly reliable service. Updates will follow shortly.
Report: "Some freshly provisioned numbers not routing correctly"
Last updateOur carrier partner has informed us that the fix was a success and that the incident has been resolved. We're continuing to work with them to determine the root cause and will be providing a full report to affected customers.
Our carrier has applied a fix for the issue and is currently testing numbers for routability. We will update this incident again shortly.
Our carrier is currently all-hands-on-deck fixing the issue with some recently acquired toll-free numbers not routing correctly. They are working toward a full resolution. We will provide another update shortly.
Our carrier is currently investigating an issue where some toll-free numbers acquired in the past 24 hours are not routing correctly. We are working closely with them to fix this issue and will provide updates shortly.
Report: "Some inbound calls to local numbers are currently failing"
Last updateCalls are coming through again on affected numbers. We regret the occurrence and will be moving our remaining local numbers off Twilio very soon.
Our carrier, Twilio, is currently investigating an issue with some North American phone numbers. We are monitoring the situation and will provide updates soon. Toll-free numbers, and recently provisioned local numbers, are working.
Report: "Twilio is down"
Last updateTwilio is back up. Calls are completing successfully. Some calls in Retreaver may appear to be hanging but will resolve within 4 hours and will not be billed.
Our phone service provider, Twilio, is currently down. New calls are unable to complete. We have been monitoring the situation closely and will provide updates soon.
Report: "Reporting is down"
Last updateThis incident has been resolved.
We stopped our previous search server instance and started it again in order to force the server to boot on new hardware at AWS. All calls have been reindexed and reporting is back up. We believe the issue has been resolved, but we're continuing to monitor.
We're spinning up a replacement instance, we expect the situation to return to normal in about 10 minutes. Calls continue to be handled as normal. We apologize for the inconvenience.
Report: "Reporting is down"
Last updateThis incident has been resolved.
Reporting is back up. At approximately 6:45 am HST our Solr server experienced an error writing to AWS Elastic Block Storage. This issue unfortunately failed to trigger our monitoring systems. Retreaver is architected so that a failure in our search servers will not affect call processing or any other functionality. Once aware of the failure, the server was rebooted to restore access to EBS, and the search index was updated. We regret that our monitoring did not flag the error and we apologize for the interruption to our customers. We're working to improve our monitoring so that this type of incident can be handled more expediently in the future.
Reporting is back up and we're fixing the issue of missing calls in the search index - zero data has been lost. We'll be fully reindexed shortly. Our apologies for the inconvenience.
Calls continue to be processed as normal. We are working on a fix.
Report: "Incorrect Conversion Charts"
Last updateCharts have returned to normal.
Currently some visual charts on our dashboards and the call log are showing a 100% conversion rate. This is a visualization error which will be fully fixed by approximately 5 PM ET. The underlying data is correct, and webhooks and other reporting are unaffected.
We're working on a fix for the charts on our dashboards, currently calls are being shown as converting at 100%. The underlying data is correct. This is a visualization error and will be fixed shortly - callbacks and other reporting are unaffected.
Report: "Asynchronous webhook/tracking URL fires are delayed"
Last updateAsynchronous Tracking URL and Webhook fires have been caught up.
We've brought additional bot capacity online to clear out the queue of unfired webhooks and tracking URLs. Will update this incident as soon as we're caught up. Test firing tracking URLs may take longer than expected.
Report: "Amazon S3 is down"
Last updateWe're back and fully recovered. All asynchronous webhooks have been fired.
We have resumed firing pixels/webhooks. We'll update this issue as soon as we're caught up.
We're still unable to write to S3, so asynchronous webhooks continue to be delayed. Users will be unable to upload audio prompts, download reports, or listen to call recordings until this is resolved. All other systems are go. We'll update this issue shortly.
We're still waiting for Amazon to fully recover. Prompt recordings which are not cached by our telephony provider are unable to be played and may lead to call handling failures. We suggest temporarily switching your campaigns to use text-to-speech. Pixel fires and call recordings are currently delayed. Once S3 comes back online we'll get caught up. You may also experience slowness in our UI since we rely on a couple third-party libraries such as New Relic and Hubspot which we now know are hosted on S3.
S3 is back up and we're working on recovery. Webhooks will start firing momentarily.
Prompt recordings which are not cached by our telephony provider are unable to be played and may lead to call handling failures. We suggest temporarily switching your campaigns to use text-to-speech. Pixel fires and call recordings are currently delayed. Once S3 comes back online we'll get caught up. You may also experience slowness in our UI since we rely on a couple third-party libraries such as New Relic and Hubspot which we now know are hosted on S3.
Webhooks/pixel fires are currently delayed due to an outage at Amazon S3. Our HAR files, which log web traffic between our server and the webhook servers, are stored on S3. We're actively monitoring the situation and will update shortly.
We've brought additional capacity online to deal with the high volume of traffic, asynchronous webhooks/pixel fires will be caught up shortly. Synchronous fires are not experiencing an issue and all other systems continue to operate normally.
Report: "Outage at Twilio involving inbound Toll-free"
Last updateToll-free traffic has returned to normal levels and calls are being processed correctly. We've been informed that the outage in US Inbound Toll-free was being caused by a major nation-wide outage at Level 3. Retreaver is currently fast-tracking a solution to provide carrier-redundancy on inbound US Toll-free lines, which we will detail in a future blog post. We continue to monitor the situation at Level 3 and Twilio and will provide more updates as necessary.
Twilio has reported that inbound Toll-free is restored: "Inbound traffic to US numbers has been restored. We will keep monitoring until confirmed full resolution with our carrier partner." Retreaver continues to monitor the situation.
Twilio has provided an update regarding the outage in US Toll-Free inbound calls: "At this point of time, a large part of inbound Toll Free calls and approximately 50% of inbound calls to numbers in United States are not able to reach Twilio. We are working towards identifying ETA of service recovery." Retreaver continues to follow the situation closely and will provide updates shortly.
Twilio's upstream provider is reporting an outage at a major carrier involving some inbound US Toll-Free numbers. This is currently affecting some Retreaver customers. We are following the situation closely and will provide an update as soon as possible.
We've investigating an outage at our phone service provider, Twilio, which is affecting some customers on inbound US Toll-Free numbers. We will provide updates as soon as they are available.
Report: "Account Portal is having intermittent issues"
Last updateThis incident has been resolved.
We initiated a manual failover to our standby database server at 20:49 UTC. The account portal is back online. We're monitoring the stack and working on a root cause analysis. We regret the interruption.
We're looking into the problem and will update this issue shortly.
Report: "Webhooks/Pixel Fires are behind"
Last updateThe backlog has been cleared out and webhooks/pixels are now firing synchronously again.
We identified and fixed an incident with a group of pixel firing bots causing webhooks/pixels to fire more than once, creating duplicate leads and conversions in some external systems. After fixing the issue, we have brought more bots online and the backlog of pixel fires is now being cleared out.
Report: "Reporting is down"
Last updateAll systems are now back online and up to date.
We've replaced the reporting server and are currently reindexing it. Reporting on calls and phone numbers will be out of date or incomplete. We expect the new server to be fully online in a couple hours.
We're working on bringing our reporting server back online. All calls are being handled as expected, but dashboards and call reporting are currently being affected. We will update shortly.
Report: "Pixel fires delayed"
Last updateWe brought some additional capacity online, and we've cleared the backlog.
We're currently working through a backlog in pixel fires/web hooks that was caused by the previous database issue. We expect this to be resolved in the next 30 minutes.
Report: "Database Outage"
Last updateWe have identified the root cause and deployed a fix. A sudden surge of thousands of calls at 14:10 EST revealed an engineering deficiency in call handling which caused our database to lock. This deficiency has been patched, and the fix has been deployed. We are currently conducting a thorough code review to ensure that this deficiency does not exist in other parts of the codebase.
At approximately 14:10 EST we were alerted to a surge in CPU usage on our primary database server. Unable to locate the cause, we manually failed over to our backup server at 14:16 EST. This action succeeded and operations returned to normal 4 minutes later. We're currently working to identify the root cause and will update this incident in the next 2 hours.
We're investigating an outage in our primary database server. We will provide updates shortly.
Report: "Reporting was down"
Last updateAll calls and phone numbers have been indexed.
Everything is back to normal. We're currently experiencing higher than average traffic, resulting in a surge of traffic to our search servers. This prevents things like the dashboards, reports, and calls and phone numbers from being indexed and searchable. As such, some users weren't able to get to our homepage and this gave the impression that the site was down. We have upgraded the search servers by changing their instance type and are currently working to index all calls and numbers that were not indexed in the past 15 minutes. We'll update this issue again once the situation is resolved. This is simply a reporting issue.
We're working on a solution. All calls are currently being handled as normal.
Report: "Payment problems."
Last updateHooray! Stripe has resolved their issue and everyone is caught up.
Apparently Stripe, our credit card processor, is down. We've decided that since they're down everyone (existing accounts that aren't banned, TYVM) can have free reign of the system and we'll try charging all your cards again once they're back up. Sorry for the inconvenience.
Report: "Collaborators can't see phone numbers"
Last updateWe've solved the issue.
We're aware of an issue preventing collaborators from seeing phone numbers in their account. We're preparing a fix for the issue that will be live shortly.
Report: "Pixel firing is delayed"
Last updateThis incident has been resolved.
Twilio is actively working toward a fix on call pricing. Due to this problem calls are not currently being charged to your Retreaver account. We will be catching up on all charges later tonight, and will be automatically placing clients who become overdrawn due to this on a post-payment plan. Once the pricing issue is fixed, pixels will begin firing live again, as opposed to the current 30 minute delay. Thank you for your patience.
Twilio is currently taking their sweet time providing us with the final price on calls. Unfortunately we need this information before we can fire pixels due to potential unforeseen long distance charges. We've opened a support ticket with them and will update when we find out more. In the meantime, we're developing a workaround. Thanks for your patience.
Report: "DB Issue"
Last updateAs of 12:17 PM EDT we made a change to our RDS master that resolved this issue. We're down to one engineer here due to the summer and obviously fixing the issue took priority over our initial status update. Total downtime was less than 3 minutes. Postmortem to follow.
We've resolved an issue with our master database server and are monitoring the situation.
We're investigating an issue with out master database server and will provide updates shortly.
Report: "Database Instability"
Last updateWe've reverted a change in our architecture and operations have returned to normal.
We're monitoring a recent change in our architecture that caused some instability in the site. We're working to stabilize the system.
Report: "Pixel Fires Delayed"
Last updatePixel firing is back online and caught up.
Scheduled maintenance is now complete and we're working on restoring pixel firing.
Due to our scheduled maintenance window, pixel fires are currently delayed. We'll be taking the site offline from 4:00 am EST until 4:30 am. We'll get everything caught up once we're finished with our upgrades.