Historical record of incidents for 2600hz
Report: "Update Kazoo-Callflow to 5.4-0.6 on Zswitch US."
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We will be updating Kazoo-Callflow to 5.4-0.6 on Zswitch US.
Report: "Possible issues"
Last updateThis incident has been resolved.
Please be advised that we are currently hearing of widespread ISP issues across the country. Major media outlets are reporting Verizon-related outages, but we suspect the impact is at least somewhat wider. No 2600Hz services are directly impacted at this time. However, resellers can expect to see increased reports from clients, especially Verizon clients
Please be advised that we are currently hearing of widespread ISP issues across the country. Major media outlets are reporting Verizon-related outages, but we suspect the impact is at least somewhat wider. No 2600Hz services are directly impacted at this time. However, resellers can expect to see increased reports from clients, especially Verizon clients
Report: "Investigating call center issues"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
We have identified that we were receiving alerts from two of our apps servers having issues connecting to each other. We've restarted Kazoo Apps on both servers. We've received confirmation from some customers that this is resolved, we've received no new reports, and we've been unable to replicate since the restarts. We're monitoring to ensure this is resolved.
We are currently investigating reports of issues with loading Call Center in our UI and comm.land.
We are currently investigating reports of issues with loading Call Center in our UI and comm.land.
Report: "Investigating issues on EWR"
Last updateThis incident has been resolved.
We have mitigated the issue and our engineers are working to identify the root cause
We are investigating reports of degraded call services in our EWR zone
Report: "Reports Of Call Completion Issues in EWR"
Last updateThis incident has been resolved.
We have mitigated the issue and are monitoring the system for any further issues.
We are receiving reports of call completion issues on EWR. We are investigating.
Report: "Unanswered semi-attended transfers not hitting voice mail"
Last updateThis incident has been resolved.
Testing has been successful so far and we've deployed the patch to all remaining Hosted Platform servers. We believe this to be fixed now, but will continue monitoring for any further issues.
We believe we have identified the issue. We have deployed a patch to the SJC zone and are currently testing before we deploy to ORD and EWR.
We are seeing increasing reports of semi-attended transfers not reaching the recipient's voice mail if the recipient does not answer. This is impacting our Hosted Platform only and we are currently investigating the issue.
Report: "Increased Reports Of Call Completion Issues using WSS"
Last updateWe've had multiple customer reports that this issue is now resolved
At this time we believe this issue affected clients that were using comm.land AND immediately updated their comm.land software this morning. This issue should now be resolved, we are monitoring the situation to validate.
We are receiving reports of call completion issues over web sockets registered devices including comm.land phones
Report: "Single Kamailio Server - No TCP"
Last updateThis incident has been resolved.
We would like to inform you that several clients have reported successful phone registrations after we implemented a TCP traffic block to and from this specific SBC. This leads us to believe that the issue is currently fully mitigated. In light of this, we are intentionally leaving this SBC in a degraded state to facilitate thorough investigation by our engineering teams as required. Our intention is to conduct a server reboot tomorrow evening. We will keep this incident active until that time to facilitate full transparency into the issue.
Presently, a concern has arisen wherein one out of the three Kamailio servers located in EWR is encountering difficulty in processing SIP registrations. It's important to note that this issue is not expected to affect our services, as phones that are appropriately configured will automatically attempt registration through alternate SBCs and/or zones. If you're currently facing challenges with phone registrations, we encourage you to reach out to our support team. This way, we can assist you in enhancing your phone's configuration to proactively avoid similar issues moving forward.
Report: "p3.zswitch Down (Provisioner)"
Last updateResolved - The restoration has been completed and we are moved back to our main Provisioning server. Prevention Plan: 1. Near Term: Infrastructure will establish a sync between zones in case this happens again so that we have a failover. 1. Mid Term: Infrastructure will set up NFS to share the files across the main and backup servers.
The restoration has been completed and we are moved back to our main Provisioning server. Prevention Plan: 1. Near Term: Infrastructure will establish a sync between zones in case this happens again so that we have a failover. 2. Mid Term: Infrastructure will set up NFS to share the files across the main and backup servers.
Provisioner switched to the backup server while main server undergoes maintenance due to a hardware fault, no files have been compromised. Backup server to take an estimate time of 10 hours while the latest information is restored and synchronized Existing devices will return 404 "not found" from Provisioner until this process is completed. p3.zswitch.net IP has been switched to 161.38.209.149 (backup server) This has the implication that firewalls needs to accept this additional IP for HTTP and HTTPS outgoing traffic (Ports 80 and 443). Clients can add, update and remove devices, however they may need to take the additional step of unlocking them to get them to work, as we have locked everything as a preventive measure. This should not disrupt calls or other services in Kazoo.
We have encountered issues with our provisioner server and are currently working to migrate over to a new host. We will keep this page updated with any further updates. You may have issues provisioning new phones for the next few hours.
Report: "Increased reports of call failures in EWR"
Last updateAfter investigation Kamailio was restarted which resolved the issue - Please let us know if you encounter any further disruption. We are working on investigating the root cause.
We are currently investigating this issue.
Report: "Increased Reports Of One Way Audio Issues in EWR datacenter"
Last updateWe have multiple confirmations that our workaround has resolved the issue
We do believe this issue is with a particular peer. We are advertising new BGP routes to attempt to work around this issue
At this time we do believe this is isolated to clients on a single ISP. We are continuing to investigate
We are receiving increased Reports Of One Way Audio Issues in EWR datacenter
Report: "Increased Reports Of Call Completion Issues in EWR"
Last updateThis incident has been resolved.
We have paused the EWR zone and are monitoring for any further issues
We are receiving reports of call completion issues in EWR
Report: "US-West and US-East connectivity"
Last updateAt 13:13:56 PT we received alerts that multiple BGP sessions had dropped in our US-West datacenter. This included our backhaul between US-East and US-West, as well as one of our outbound ISPs. The backhaul traffic failed over to the secondary by 13:14:03 PT, and the session with the outbound ISP recovered by 13:25:21 PT. We believe there was no impact on inbound, or outbound traffic to/from the 2600Hz network during this incident. We have engaged our providers to get more information on what occurred, but all seems to have recovered by 13:25:21 PT. Thank you, 2600Hz Infrastructure
Report: "Increased Reports Of Call Completion Issues in EWR"
Last updateOn Thursday the 16th of February we experienced a Zswitch outage between 19:25-19:45 GMT. As soon as we noticed alerts appearing for disconnects a 911 bridge was started with Support/Operations/Engineering/ the CTO. This appeared to be localised to EWR; we therefore paused all EWR servers as a precaution which remediated the issues with calls. From here we troubleshooted to determine what could have caused the outage; it was realised compaction was started on the bc003.ewr server only a few minutes beforehand; we could also see that the load was unreasonably high on this server. The compaction was stopped which brought the load down to a stable level. We also noticed that this BigCouch server was delivering DB responses in the 5-6 second range, rather than the below 50ms we would expect. This was due to the server not behaving in the expected manner when compaction was running. Further tests were completed which helped us to determine this was an issue with the server itself and put a plan in to migrate it over to new hardware. After the migration was complete, we further stress tested the new server to confirm compaction did not cause any issues. It’s clear that this server was major component in the 4 most recent outages. We have added new alerting which will help us to have higher visibility of response times from all servers to Bigcouch as this will allow us to notice any similar issues before they start to impact. We are also further improving this alerting to exclude crossbar \(API\) calls which will help keep out any false positives, as well as adding similar active testing across all HA Layers in Kazoo. During the holiday weekend we staged a “All hands on deck” meeting to go over the outages we’ve seen and the steps we can put in place to improve. We want to convey that the seriousness of the downtime we have seen is understood and we’re continuing to work to prevent any recurrences. Following this call we have come out with a list of action items that we’ll be undertaking with the highest priority; including the aforementioned monitoring changes, and reprioritising engineering load to speed up the migration over to CouchDB 3 from BigCouch. If anyone has any further questions, or you’d like deeper information on what steps we’re taking to prevent further outages please don’t hesitate to get in touch.
Multiple clients are reporting no further issues
We have had no further reports of issues and all alerts cleared. Please write into support if you have any example calls after 13:35 PT
We have paused the EWR zone in full and are redirecting traffic arriving in EWR to ORD in Kamillio. Call completion rates seem to be returning to normal levels
We are receiving reports of call completion issues in EWR.
Report: "Cluster Wide Outage"
Last updateOn Thursday 2nd Feb we encountered an issue on Zswitch causing call failures and issues accessing the UI. We were first alerted to issues by our monitoring system; telling us there were errors on FS within EWR. As a precautionary measure we decided to pause FS on the alerting nodes. The issue instantly spread to the rest of the FS servers; at this point we involved the engineering team and jumped on an all hands "911" call. Our engineering, operations and support teams further investigated the issues which uncovered that BigCouch seemed to be in a bad state. \(We could see when trying to pull information from the DB, the apps were occasionally getting timeouts or at least seeing very long response times.\) This again pointed to the databases being overloaded. All of the DB nodes were checked, all compactions stopped, and a slow restart of all the DB nodes was carried out. After the DB's were back up and working we found the following culmination of tasks caused the issues. \(We would like to stress that any individual, or even two of these tasks wouldn't usually cause issues; this is a very edge case scenario.\) - 1. The "DB of DB's" \(A file called dbs.couch which contains all the locations of all shards within the database\) was much larger than it should be. This would cause DB tasks not to be completed as quickly and left BigCouch in a more fragile state.\) To shrink this a separate compaction task needs to be carried out. Separate to the usual compaction carried out on a regular basis. 2. A new feature was added into the latest version to be able to bill for ephemeral tokens executed an inefficient implementation of a monthly roll up which put heavy load on the already struggling DB. 3. Two compaction tasks were running on separate nodes throughout the cluster to address disk space alerts. \(This is a normal and routine task, although the culmination of the two other issues caused heavier load on the DB\). To fully resolve the issue the DB of DB's was subsequently compacted on all of the databases followed by a restart of BigCouch. To prevent this issue from reoccurring we are taking the following steps in the short term - 1. A full audit of the DB compaction procedure to make sure we are compacting everything which needs to be and there's no way BigCouch can reach the same fragile state it did again. 2. Our engineering team are fixing the ephemeral report generator so it's not as aggressive with the DB. In the long term, we have a plan to upgrade all servers onto CouchDB 3, which will compact automatically in a much more proficient manner. We again apologise for the interruption to service, please be assured we working hard to improve our service. We strive to provide the best telecommunications platform available. This was as mentioned previously a very edge case scenario, a perfect storm of DB tasks which resulted in instability; any two of the tasks outlined we are confident wouldn't have caused an issue. If anyone has any further questions, please do send a ticket into support. We'll be more than happy to address any concerns and go into further detail with the steps we're taking to improve.
We believe the issue is now resolved. We are continuing to monitor services
We are having success improving the DB response rate. Though we are still seeing a small percentage of intermittent failures
We have identified this as a database issue. We are still investigating the root cause
We are continuing to investigate this issue.
We are receiving and have verified reports that there are call failures happening in all zones on ZSwitch
Report: "Increased Reports Of Call Completion Issues"
Last updateAll of our clients that reported this issue have gotten back to us and told us its clear now. Full RCA to follow
Call reports are continuing. We have paused the EWR zone in full and are redirecting traffic arriving in EWR to ORD in Kamillio
Two of our freeswitch servers appear to be experiencing unusually high CPU which may have caused a short period of call disruption. We have paused the servers in question, please let us know if you experience further issues. We are continuing to monitor the situation and dig into the root
Increased Reports Of Call Completion Issues in the EWR zone
Report: "Ecallmgr Issues on EWR"
Last updateThis incident has been resolved.
Our Operations Team found that ecallmgr on one of our EWR nodes kept disconnecting from other nodes intermittently. Ecallmgr and kazoo-applications were restarted and platform is stable. We are monitoring.
Report: "Increased Reports Of Call Completion Issues"
Last updateWe have stabilized EWR and resuming service in this zone
We have removed the EWR servers from Kamillio rotation. This should force traffic to unaffected zones until we can resolve the root cause of this issue. Phones that have proper SRV/NAPTR records should resume service
We've seen a spike of reports from clients about call completion issues. At this time most reports seem to be coming from the EWR zone. We are investigating the issue now
Report: "We are investigating reports of inbound and outbound calls failing on ZSwitch"
Last updateFollowing the Zswitch upgrade to 5.1 you likely experienced downtime between 9:20am – 10:40am PT. This was due to a bug in the new version of Freeswitch; if it encountered any presence ID's with spaces Freeswitch would experience a crash. There was at least one account with every device configured this way which explains the almost total FS downtime. Only one call would need to be placed on the Freeswitch server to cause a crash. Our monitoring picked up these crashes and our operations team began to investigate with the help of engineering. After around 15 minutes we had identified the issue. Operations began to disable any accounts with a space in the presence ID as a short term workaround while the engineering team worked on a hot patch to ecallmanager to strip the spaces out of presence ID strings prior to passing to Freeswitch. The hot patch was installed onto the Freeswitch servers at around 10:35am PT where we confirmed the issue was resolved. To prevent a re-occurrence of this issue on future releases we have added an additional step into our QA testing; to confirm all special characters/spaces in presence ID's do not interact in an unexpected way with Freeswitch. If you have any further questions on the downtime please don't hesitate to contact us. We will be more than happy to provide further information if required.
A hot-patch has been deployed to ZSwitch to prevent re-occourance
We had to again restart freeswitch, but believe we have verified the root cause and taken steps to mitigate it. We are now monitoring the situation and engineering is working on a proper fix.
We have restarted the freeswitch servers and believe the platform is functional again. We have identified what we believe is the root cause and are working with engineering to validate this and get an emergency build out to ZSwitch now
We are investigating reports of inbound and outbound calls failing on ZSwitch
Report: "UI issues this morning on ZSwitch & How to get the new comm.land"
Last updateThis incident has been resolved.
We are getting a number of tickets this morning from clients on ZSwitch about various UI issues. If you are having any UI issues please press Control-Shift-R in your web browser. This will force your browser to clear the cache for your UI, reload and should resolve your issue. Additionally we have a number of clients asking how they can get the new comm.land app. To do this you will need to: 1) Uninstall comm.io app first 2) Re-download the new comm.land installer from user portal-> Comm.land tab. Then install comm.land. Upon initial launch, you’ll be prompted to upgrade. Click “Restart” to re-launch the app and, you’ll be on the latest version with the newest features.
Report: "We are investigating call completion issues in our EWR zone"
Last updateThis incident has been resolved.
The cause appears to be extremely high load on one of the apps servers in EWR. We are mitigating this now. This will have only affected ZSwitch clients. PCloud clients should not be impacted.
We are investigating call completion issues in our EWR zone
Report: "Qubicle/Call Center"
Last updateWe haven't seen any further issues after restarting qubicle
We have restarted qubicle/call center in all zones and this seems to have resolved the issue. We are seeing normal call progression again. Please call into support if you have new call examples
We have received multiple reports of call failures for qubicle/call center calls. We are investigating the issue now.
Report: "Hosted zswitch API/UI issues"
Last updateThis incident has been resolved.
We received reports of users unable to access the hosted zswitch UI, and API earlier this morning. We identified that this was part of a low risk maintenance that we performed on the firewall of one of our 3rd party hosting providers. We believe that this has been resolved as of 15:33 UTC. Please notify support if you experience any further issues. Thank you, 2600Hz Operations
Report: "We are investigating a network routing issue in our EWR datacenter"
Last updateThe issue in EWR should be resolved.
Services are operating normally again while we route around the issue. We will post an update here once the issue is believed fully resolved.
We have identified the issue and are routing around it temporarily while we work on a resolution.
We are currently investigating this issue.
Report: "BGP routing issues in ORD"
Last updateRouting issues are resolved. We are seeing normal traffic load in ORD again.
We appear to have BGP routing issues in our ORD data center affecting our Private Cloud and Zswitch customers there. We are routing around the issue now. However, there should be minimal impact for those with proper SRV/NAPTR records and carrier configs.
Report: "Call competion issues in ORD"
Last updateWe have routed around this issue
It would seem that the issue is with large UDP packets from Rack Space to our data centers. We are asking at this time if you are having trouble to use the proxy central2.p.zswitch.net instead of central.p.zswitch.net. This will send you directly to our datacenter and bypass rackspace. We are currently unsure why this is specifically affecting traffic from Rack Space to our data centers and are investigating this.
We are investigating some call completion issues in the ORD data center. At this time it appears to have to do with our routing via CloudFlare routes. The NetOps team is evaluating the best course of action at this time
Report: "Degradation of service in US-Central"
Last updateOur provider for the subset of servers in the US-Central zone has acknowledged that there was an issue on their network in their ORD datacenter where our servers live. They have advised us that their issues have been resolved, and our monitoring and manual tests agree. We have rolled back all changes made to alleviate any issues in the US-Central zone, and have resumed normal operations. This should be all clear. Please reach out to our support team if you are still experiencing any issues after this incident has been marked as resolved. Thank you, 2600Hz
The 2600Hz Operations team received alerts that some of our servers in the US-Central were unreachable by ping at 8:05 AM PT. We began investigating at that time, and the 2600Hz support team reported escalations in the number of reports of UI, and call issues. Operations found that the servers experiencing issues were the subset hosted with a 3rd party provider. No status messages have been posted for that provider, and our Operations team has opened a ticket with them to identify the cause of the issue.The issue does appear to be intermittent, depending on which path is being taken to, or from the servers hosted with this provider. To attempt to relieve the impact of this issue, we have move the preferred API server which looks to have cleared up the UI issues based on our testing. To attempt to improve call completion in this zone, we have pushed call handling to media servers in our US-East, and US-West datacenters. If you have devices that are having trouble registering to US-central, based on the behavior we have seen this should be due to getting no response from the server. If you are using proper SRV records, then this should cause those devices to attempt to register to the next group of servers in that SRV record. If you are directly registered to US-Central or are not experiencing the failover for SRV, then please update your configs/DNS to prefer a proxy in the US-East, or US-West datacenters. Thank you, 2600Hz
Report: "Call Center (qubicle) degradation"
Last updateWe have been watching for the log line that presented along side the qubicle issue this morning and have see no further reoccourance. We believe this is resolved
We have fully stopped qubicle in all zones and restarted them. All reported queues seem to be back up and functioning at this time. We are monitoring their state now.
We are investigating a call center degradation on Zswitch. This is preventing some, but not all queues from starting in the UI and processing calls. Please be aware it may be necessary to log off/of upon completion of this issue
Report: "Private Cloud / Zswitch outage"
Last updateThis incident has been resolved.
We believe we have mitigated this issue. Please file a support ticket for more information. We are continuing to monitor the situation
We are investigating an issue where traffic seems unable to reach out networks. Initial indications are that our network itself is fine. But our DDoS service (CloudFlare) is not routing data to us. More info to follow
Report: "Issues with upstream carrier Bandwidth.com / DTMF issues"
Last updateIt appears Bandwidth has resolved all items at this time
Bandwidth.com is reporting all services are restored. We are continuing to monitor the situation and will update here if a significant change occurs.
We are seeing some improvements in DTMF issues and calls connecting properly. We are still waiting for an official update from Bandwidth
2600Hz clients who are using their own hardware and data centers have also confirmed to us the Bandwidth numbers are experiencing DTMF problems. This would seem to further confirm our suspicions that this mornings issues are issues with Bandwidth service. Please monitor at https://status.bandwidth.com/
Hi all, It seems Bandwidth is having more troubles today. Details available at https://status.bandwidth.com/. We are also recieveing numerious reports of DTMF issues on Bandwidth numbers. At this time all examples are on Bandwith numbers. If you have any non-bandwith.com examples, please call in with them. 2600Hz does not control any of this, we pick providers for things we can't control. We pick providers that we believe are good quality. Bandwidth.com always has been good quality. They're having a bad day today, though, and it's impacting us, and you, too. We trust their time is all over this and we'll be waiting it out, along with you, for resolution. Please note that we can NOT move your numbers to another provider for inbound - that's not a thing that can be done in VoIP or telecom in the US (phone numbers belong to one carrier only and the only way to change that is a port out, which takes days).
Report: "Issues with upstream carrier Bandwidth.com"
Last updateWe've confirmed reports that all services are operational. We'll keep an eye out in case but we're going to resolve this issue for now.
We've heard that bandwidth.com has resolved this issue. We are monitoring to verify but wanted to pass that along.
Hi folks, Since I know it has been a rough week, I wanted to provide some details on this issue. ALL providers who do not own physical wires in the ground have to interconnect with the old telecom network providers in order to provide services, and then transit those old phone lines and PSTN networks via their newer datacenters that are IP-based. There are only a few companies left who really do that - Bandwidth, Level3, Inteliquent, Peerless to name a few. The list is probably less than 10. Most companies big or small (ours, presumably Twilio, RingCentral, etc.) don't publicize this but they outsource this work to the companies listed above. With that in mind, this outage is out of our hands but it will have a big big impact, probably hundreds of thousands of lines if I had to guess (we haven't verified that), impacted. Outages at bandwidth.com are rare because they have become so large and trusted and their network is pretty good. We have already shifted outbound traffic away from bandwidth.com but your customers may be calling other people who use bandwidth.com . Those calls will fail until this is resolved. Additionally, inbound numbers, industry-wide, are single-homed to one provider - so if an inbound number is currently being managed by bandwidth.com, it will be impacted. We can not failover the inbound number routing (as far as I know, nobody can), so we have to wait this out. 2600Hz does not control any of this, we pick providers for things we can't control. We pick providers that we believe are good quality. Bandwidth.com always has been good quality. They're having a bad day today, though, and it's impacting us, and you, too. We trust their time is all over this and we'll be waiting it out, along with you, for resolution. Please note that we can NOT move your numbers to another provider for inbound - that's not a thing that can be done in VoIP or telecom in the US (phone numbers belong to one carrier only and the only way to change that is a port out, which takes days). Hope that helps understand the current situation.
Please note that 2600Hz systems are fully functional and operating normally at this time. That said, Bandwidth.com has reported an outage. Bandwidth.com is used by a LOT of customers and powers a fraction of numbers belonging to 2600Hz DID services as well. They also power a lot of customers OFF our system, so while we may be up, calls TO those users who use Bandwidth.com may fail also. Bandwidth.com has reported that they are aware of the outage and working on it. From their status site: "As of 3:31pm EDT Bandwidth has acknowledged the issue: Investigating - Bandwidth is investigating an incident impacting Voice and Messaging Services. Calls and Messages may experience unexpected failures. All teams are actively engaged. Sep 25, 15:31 EDT" We're posting this as mostly informational. We don't control the bandwidth.com network. Outages with Bandwidth.com are rare. We suspect they have their best people on it and will resolve it asap. But since they are one of the largest wholesale term/orig providers in the US, this will impact a lot of people, so you may see call failures related to their network issues. As we learn more we'll post here.
Report: "Portal/Phone outage"
Last updateThis incident has been resolved.
This is resolved. Please file a support ticket for access to specifics if needed. Thank you!
We are investigating reports of outbound calls and portal API calls failing
Report: "We are investigating slow response times and page not loading issues on the Monster portal on ZSwitch"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "We are investigating alerts in our SJC datacenter related to BGP routing."
Last updateThis incident has been resolved.
This issue appears resolved, but we will leave this open and continue to monitor just in case.
Circuits appear to have auto-failed over with a brief degradation. We will continue investigating/monitoring but customer impact appears to be minimal.
We are currently investigating this issue.
Report: "Advanced Provisioner is having issues loading"
Last updateThis issue should be resolved.
The provisioner server has suffered a hardware failure. We are resolving the issue now and expect resolution momentarily.
We are currently investigating this issue.
Report: "French prompts playing instead of English"
Last updateThis is resolved/
We are investigating an issue where French prompts are incorrectly playing instead of English.
Report: "We are investigating issues with the call center application on ZSwitch hosted"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have identified the issue with call center on ZSwitch and are working to resolve it
We are investigating issues with the call center application on ZSwitch hosted
Report: "Issues Provisioning Phones"
Last updateThis issue is resolved. A follow-up email to some clients more directly impacted has been sent - if you received an email from our support system, please reply to it with any questions.
We believe this issue is now resolved and are monitoring.
The issue has been identified and a fix is being implemented. Unfortunately we expect some delay on this repair. We are hopeful to restore services by 6pm PT.
We are investigating issues provisioning phones using our provisioner.
Report: "Network issues OUTSIDE of 2600Hz"
Last updateThe alerts we were receiving regarding external providers connectivity issues seem to have subsided. We will consider this resolved for now.
Sorry, that last update was erroneous! There is no fix implemented as this issue is outside our control. Here is a recap of the issue we are seeing from earlier: "We are receiving customer reports of bad call quality in some areas. Please note that 2600Hz systems are running properly at this time - our new network setup is working well and all servers are functioning normally. We have double-checked and are unable to find any errors or degraded performance at present. That said, phone calls traverse more than just 2600Hz links and carriers when traversing the country from one point to another. While we have redundant uplinks for our own connections we are unable to control links that are beyond our direct path. We are seeing degradation of some upstream carriers whom we do not directly connect to which we suspect is resulting in poor call quality in some areas. We suspect the reports we are receiving related to call quality / call completion are related to other carriers having issues outside of our own (for example, https://downdetector.com/status/8x8/ reports a complete outage for some customers at 8x8, and we're seeing similar complaints for other providers who utilize Lumen/Level3). We are not able to control or repair other providers circuit issues, these are outside of our control. At this time we will need to wait for those upstream providers to resolve their own issues to resolve what is being reported. To be clear, these issues are not related to 2600Hz, our servers, or our direct carriers. We will update this notice when we see the reports of call quality issues decrease so that you are at least aware of the situation upstream."
We are monitoring the situation.
We are receiving customer reports of bad call quality in some areas. Please note that 2600Hz systems are running properly at this time - our new network setup is working well and all servers are functioning normally. We have double-checked and are unable to find any errors or degraded performance at present. That said, phone calls traverse more than just 2600Hz links and carriers when traversing the country from one point to another. While we have redundant uplinks for our own connections we are unable to control links that are beyond our direct path. We are seeing degradation of some upstream carriers whom we do not directly connect to which we suspect is resulting in poor call quality in some areas. We suspect the reports we are receiving related to call quality / call completion are related to other carriers having issues outside of our own (for example, https://downdetector.com/status/8x8/ reports a complete outage for some customers at 8x8, and we're seeing similar complaints for other providers who utilize Lumen/Level3). We are not able to control or repair other providers circuit issues, these are outside of our control. At this time we will need to wait for those upstream providers to resolve their own issues to resolve what is being reported. To be clear, these issues are not related to 2600Hz, our servers, or our direct carriers. We will update this notice when we see the reports of call quality issues decrease so that you are at least aware of the situation upstream.
Report: "Failed outbound calls when registered to certain SBCs"
Last updateWe have confirmed reports that these outbound dialing issues are resolved. We will be closing this incident. Thank you, 2600Hz Operations
We believe this issue has been solved. We will monitor for future reports.
We have identified the problem, and are working on restoring services now. Thank you, 2600Hz Operations
We received reports after our firewall maintenance where some devices were unable to make outbound calls. We are investigating. Thank you, 2600Hz Operations
Report: "We are investigating a network routing issue in our EWR datacenter"
Last updateThis issue appears resolved.
We believe a routing issue with Level3 is impacting our services, we are working to route around the problem.
We are currently investigating this issue.
Report: "Reports of issues with Secure WebSockets after power maintenance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have identified an issue with WebSockets not functioning after the power maintenance last night. This prevents the call center UI from updating realtime. We believe this issue is already resolved, but we are posting this status update so that customers are aware of the issue. If you are still experiencing this issue please refresh your browser. If the issue persists, please contact support.
Report: "US West Network Degredation"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
Routing should be back to normal, we will continue to monitor for any further degradation.
We are currently experiencing intermitent routing on US West. Currently investigating.
Report: "We are receiving reports of intermittent delays in the web UI and are investigating."
Last updateWe believe this incident is now resolved. We will continue to monitor accordingly.
We believe we have (now correctly) identified the source of the problem, which appears to trace back to a particular customer who is impacting the cluster. We apologize for the trouble this is causing. We are working to resolve the issue.
While the GUI is working, it is still a bit sluggish, we are still mitigating the issue.
We are still investigating some slowness on the web portal.
We believe this issue is resolved and are monitoring.
We are receiving reports of intermittent delays in the web UI and are investigating.
Report: "US-West power maintenance extended"
Last updateAt this time, we believe we have completed the recovery of all private cloud resource after the power maintenance. If you are experiencing residual issues, please contact the support team. Thank you, 2600Hz Operations
The work required to recover from our scheduled power maintenance in our US-West datacenter has exceeded the amount of time in our scheduled window. At this time not all private cloud resources have recovered. We are working on this as our highest priority. We will update this incident when all resources have been restored. Thank you, 2600Hz Operations
Report: "Investigating call routing issues in EWR zone"
Last updateThis issue is resolved.
We are continuing to monitor for any further issues.
This issue should be resolved, we are monitoring to be sure.
We have identified the issue in EWR and are working on a resolution.
We are investigating multiple reports of call routing issues in our EWR zone.
Report: "Multiple reports of calls going not ringing devices in ORD"
Last updateThis issue is believed resolved.
This issue is believed resolved. We are monitoring the issue to ensure it does not recur. Please note that unfortunately call center agents may have been logged out of their queues in the process of resolving this. Please ask call center agents to log back in.
We have paused the ORD zone, calls should be processing normally again while we sort out the issue.
We are investigating reports that calls are completing in ORD but some phones are not ringing devices when they should be.
Report: "ZSwitch: Some Inbound Caller ID showing "UNKNOWN""
Last updateThis issue has been identified and is now resolved.
We are investigating reports that some inbound Caller ID names are showing "Unknown"
Report: "We are investigating reports of outbound calls failing"
Last updateThis issue was resolved at 10:35am ET this morning from a customer perspective. A root cause will be posted once the team has had time to meet about the findings on the incident, but no additional issues have been identified since then.
We have identified the source of this issue. Normal call processing resumed for outbound calls as of 10:35am ET. We are still mitigating the problem but all services should be restored for clients who were experience the issue. We will update here once full redundancy is restored.
At this point, we have been unable to locate any faults in our system. We are speculating a third-party provider issue, but do not have enough data to prove this yet. Due to the volume of tickets we received, however, it is clear that some clients experienced issues dialing outbound, however many clients did not. If you are experiencing this issue, please contact support so we can look at your issue case by case - please make sure to include a phone number dialed, and the account which made the call. We can then attempt to trace the issue further. Thank you.
We are investigating reports of outbound calls failing
Report: "Issues Provisioning Phones"
Last updateThis issue should be resolved.
This issue has been identified and should be resolved. We are monitoring to ensure it is fully resolved.
We are investigating multiple reports of the provisioning service responding with 500 errors.
Report: "Latency issues being investigated"
Last updateAll services appear restored and operational.
We believe a brief network issue occurred at one of our hosting providers (we have a few servers we host external to our datacenters for DoS protection purposes). The issue resolved on it's own by the time we began our investigation. This may have caused some busy signals or failed call setups. This should not have impacted calls which were already active (since media is handled elsewhere). We are monitoring the situation but believe all systems to be operating normally at this time.
We are receiving alerts from our monitoring systems that there is latency in one of our datacenters. We are investigating.