Is Simwood Down Right Now? Discover if there is an ongoing service outage.

Simwood is currently Operational

Last checked Jul 29, 2025 14:52 UTC from Simwood's official status page

Historical record of incidents for Simwood

Jul 24, 2025

Report: "Reported issues calling some mobile destinations"

Last update 2025-07-24T15:16:13.107Z

investigating2025-07-24T15:16:13.103Z

We’ve received several reports of calls failing to certain mobile numbers. Our systems are operating normally, and we’ve found no issues within the Simwood network. There are indications the issue may lie with another network, and we’re actively monitoring the situation while we investigate further.

Jun 20, 2025

Report: "Routing instability via Cogent in Slough"

Last update 2025-06-20T12:41:31.552Z

investigating2025-06-20T12:41:31.548Z

Customers connecting to our services in Slough over Cogent transit may have seen instability over the last 15 minutes or so. We have temporarily shutdown sessions with Cogent to alleviate any issues. Those on-net or reaching us over peering sessions (which is the majority) and other ultimate transit providers, or reaching other availability zones over Cogent have been unaffected. We strongly encourage all customers to connect into us directly wherever possible.

May 29, 2025

Report: "London: Reduced fibre network redundancy"

Last update 2025-05-29T12:15:00.810Z

resolved2025-05-29T12:15:00.790Z

The span has been restored and the links have been brought back into service, restoring full redundancy around our London ring.

investigating2025-05-29T10:10:24.294Z

We are seeing one of our physical fibre spans between Telehouse North and IXN hard down currently, suggesting physical disturbance. We have preemptively shut down links using this whilst it is investigated and repaired. This reduces redundancy around our London ring but as we have multiple other paths, no service impact is expected.

Report: "London: Reduced fibre network redundancy"

Last update 2025-05-29T07:15:00.000Z

Resolved2025-05-29T07:15:00.000Z

The span has been restored and the links have been brought back into service, restoring full redundancy around our London ring.

Investigating2025-05-29T05:10:00.000Z

May 11, 2025

Report: "Slough AZ – At-Risk Advisory (Power Maintenance)"

Last update 2025-05-11T06:44:12.495Z

resolved2025-05-11T06:44:12.471Z

This incident has been resolved.

identified2025-05-10T08:00:41.077Z

The Slough Availability Zone is currently operating on a single power feed due to planned power infrastructure maintenance by our datacenter provider. While all services remain fully operational, the zone should be considered at risk until full power redundancy is restored. The maintenance window is scheduled to conclude by 22:59 UTC on Saturday, 10 May. We are closely monitoring the situation and will provide updates as needed.

May 10, 2025

Report: "Reduced network redundancy in Manchester availability zone"

Last update 2025-05-10T08:02:25.389Z

resolved2024-09-25T22:42:23.700Z

All alarms have been cleared. Resolving this incident.

monitoring2024-09-25T22:17:11.154Z

The optical card has been replaced and the link has been restored. Status will remain in “monitoring” until we have confirmation from the engineers, vendor, and when we have satisfied our internal checks.

investigating2024-09-25T16:56:22.969Z

There is currently reduced network redundancy in our Manchester availability zone due to an identified issue with an optical transport link. Vendor has identified an issue with a line card in Leeds and an engineer is being sent to perform a replacement, expected on-site at 22:15. No service or traffic impacted, but site is considered at-risk.

Apr 30, 2024

Report: "At risk - London Availability Zone"

Last update 2024-04-30T16:24:28.985Z

resolved2024-04-30T16:24:28.967Z

Volta has been returned to N+3 connectivity and Telehouse East reconnected at N+1. Voice service was unaffected by this incident but full redundancy in the affected of our 3 UK Availability Zones has been restored.

identified2024-04-30T15:55:03.247Z

The datacentre have confirmed that these two fibre pairs were indeed seperately cut within the building and are reviewing repair options.

identified2024-04-30T14:53:16.634Z

We noticed some instability on the network around London this afternoon which resulted from two of our fibre pairs out of Volta (both East and West loops) having been disconnected/cut within the building within 5 seconds of each other. This is being investigated but the result is that Volta is reduced to N+1 from N+3 redundancy, and Telehouse East (which has no voice services whatsoever) has been temporarily isolated. The network continues to operate at 100% otherwise with no interruption to voice or related services but further events are always possible. We do not expect a rapid resolution to the cut fibre but will update when there is one.

Nov 16, 2023

Report: "Inbound calls to some hosted ranges failing"

Last update 2023-11-16T12:58:40.819Z

postmortem2023-11-16T12:58:13.538Z

Recently we discovered that calls being sent to us by BT, particularly to ported numbers, were increasingly including the destination number in a non-standard \(i.e. invalid\) format. These calls were as a result matching unexpected routing and causing our customers issues. However upon reporting this non-conformity to BT they confirmed they were unable/unwilling to fix since the format matched their own routing plan and they did not have the flexibility in their routing engine to accommodate a fix. We therefore subsequently prepared a config change in order to accommodate the invalid numbers. Automated testing by replaying historical live call scenarios, and continuous deployment are standard practice for us and, following completion of that, we initiated the rollout to production during the afternoon of 14th November. The rollout was to each call routing instance, of which there are many in each of our 5 availability zones, watching channel/call levels closely for any signs of issues. Part-way through the rollout, at 15.45, we were alerted to a number of inbound calls being rejected by some customers due to the RURI and To header in the outgoing INVITEs for those calls being truncated. We halted the rollout and rolled everything back to the previous state which remedied the situation, with normal state confirmed by 15.55. Following further investigation it was discovered that in the config change and suite of tests we had failed to consider a particular routing scenario involving hosted ranges, applying to a very small number of customers. In all, around 2% of _inbound_ calls across the entire network were affected, which made it extremely difficult to identify from the channel metrics, particularly as we were now intentionally rejecting the improper calls BT couldn’t/wouldn’t. Whilst the overall impact was very small and our Community Slack was uncharacteristically silent on the issue, the 15 customers affected by this issue on their hosted ranges, in some cases saw a much higher percentage of calls affected, depending on their individual traffic and configuration mix. In terms of lessons learned by this incident, we do not believe that not making changes at all, as some would advocate, is a competent approach. Equally, we do not believe that deploying out of hours when some scenarios are absent, only to see issues the following day when they return, the deployment is complete, and attention has turned elsewhere is an acceptable approach - that assures bigger impact, later, and a slower response. Further, with a large distributed network our approach of progressive automated roll-out is one we defend over manual updates to monolithic instances. Thus, as is our standard practice when our test suite fails to accommodate a scenario, it is updated to do so, and this has been done. This enables us to continue to rapidly iterate with automated testing providing the assurance it has so far through thousands of deployments and absolute consistency around the network. We’re sorry to those customers affected who, for the avoidance of doubt, had nothing “wrong” in their configuration at all. It was simply an edge case we’d missed, but which will now be tested automatically with every committed change in future.

resolved2023-11-14T16:00:00.000Z

At around 15.45 this afternoon we were made aware of inbound calls to some number ranges hosted on our network, in some circumstances, being delivered to customers with the destination number in the RURI and To header truncated, resulting in those calls failing to connect due to the target number not being recognised. Within 5 minutes we had identified the issue was related to an update we were in the process of rolling out, which was intended to work around an increasing number of calls being sent to us by BT with invalid destination numbers. This was necessary because they were unwilling/incapable of fixing in their own call routing, with the numbers matching their routing plan despite being invalid. By 15.55 we had rolled back the update across all availability zones, and affected customers had confirmed calls were once again being delivered as expected. We are currently working through examples provided to understand why each specific routing case was impacted and will update here with our findings in due course.

Feb 22, 2023

Report: "Elevated PDD"

Last update 2023-02-22T17:12:27.574Z

resolved2023-02-22T15:00:00.000Z

At 14h44 we were notified of the failure of a primary database node in London (Volta). This is a planned failure scenario and as designed service failed over cleanly to a candidate replacement in Slough (LD4). At 14h50 our call monitoring reported increased PDD (Post Dial Delay) from some parts of the network. This was owing to several call-routing nodes which were previously slaves to the failed master resyncing, and thus being unavailable for service. In this scenario, call-routing fails over to other back-up instances, which it did. Depending on the precise local state at the time of the call this can increase PDD. The first node had resynced by 14h58 and by 15h03 the last node had fully resynced and PDD had returned to normal levels everywhere. Our monitoring shows that less than 15% of calls network wide were impacted by increased PDD but customer experiences may vary according to their own timers and failover protocols. We are however investigating utilisation of the backup routing instances which, whilst not experience affecting, was not as evenly distributed as designed.

Dec 28, 2022

Report: "Delayed inbound SMS"

Last update 2022-12-28T12:48:06.619Z

resolved2022-12-12T09:00:00.000Z

We have identified delays in received SMS delivery.

Jun 7, 2022

Report: "London SIP edge"

Last update 2022-06-07T17:38:43.069Z

resolved2022-06-07T17:38:43.055Z

This incident has been resolved.

monitoring2022-06-07T16:50:18.792Z

This has been stable since the incident was opened but we continue to investigate the root cause and will likely need to continue to do so once the incident is closed. DNS has been returned but we again urge customers to respect SRV or at least DNS for seamless failover in cases like this.

investigating2022-06-07T15:48:21.250Z

We are seeing elevated errors on our London SIP edge. Whilst this will not affect customers properly configured to use FQDN and SRV, we have manually failed DNS over to another Availability Zone for those who are not. We are investigating the underlying issue.

May 24, 2022

Report: "Manchester power issues"

Last update 2022-05-24T10:28:39.881Z

resolved2022-05-24T09:00:00.000Z

At 10.01 we lost reachability to equipment in Manchester over certain routes and a loss of power in Equinix Kilburn became apparent. Whilst power and service were restored a few minutes later we understand the site is running on generators and a UPS fault has been identified. We have very little active equipment in Kilburn but it is a major hub for networks and further power issues will affect reachability; it should be considered at risk.

Apr 29, 2022

Report: "Brief failures on inbound calling"

Last update 2022-04-29T09:03:48.763Z

resolved2022-04-28T20:57:34.000Z

It appears that a routine upgrade to our call routing engine disrupted some incoming calls this evening from 20:25 to 20:57. This type of update is quite normal for us and happens very often as part of our continuous development and deployment. It was being progressively rolled around the network such that calls could failover to the previous version in the event of any failure. However it did not fail and whilst passing all unit tests appears to have caused unexpected call rejections for calls hitting it. The deployment was paused as soon as we were aware of issues and has been rolled back. We will investigate the underlying issue and resolve before resuming continuous deployments. Apologies for anyone affected.

Feb 26, 2022

Report: "Power loss in London Volta"

Last update 2022-02-26T14:44:43.533Z

resolved2022-02-26T14:44:43.518Z

All services have been fully restored in Volta.

monitoring2022-02-26T11:53:44.769Z

Portal access has been fixed for all customers and we continue to monitor all services

identified2022-02-26T11:51:27.547Z

Whilst continuing to monitor the earlier problem we are aware that the portal is unavailable for some customers and are addressing that issue

monitoring2022-02-26T10:43:55.163Z

We have been checking services and are confident all is working normally. We will be monitoring this until power is fully restored and services migrated back.

identified2022-02-26T09:29:48.858Z

We have lost a power feed in Volta which has reduced redundancy across the whole site and taken some servers off-line. Affected services have already automatically migrated to other sites for those following our standard configurations and DNS has been modified for those forcing traffic to this particular site. Customers who haven't followed the interop at all (and are forcing traffic by IP address) will need to manually update their config. We are monitoring the situation and will advise on recovery in due course.

Feb 14, 2022

Report: "Intermittent media delays in calls"

Last update 2022-02-14T17:19:14.939Z

resolved2022-02-14T17:19:14.924Z

This incident is now closed. Thanks to our customers for their assistance and patience and apologies once more for the service issue.

monitoring2022-02-14T14:53:39.790Z

We are continuing to monitor for any further issues.

monitoring2022-02-14T14:53:25.318Z

No new valid reports have been submitted since the above change. We remain alert to any fresh evidence but for now remain in close monitoring mode. Thank you for you patience and assistance with this issue and our apologies for any degraded service.

investigating2022-02-14T13:04:39.115Z

As part of continuing to investigate this issue, a change was made around midday and so we continue to welcome fresh examples of any issues since that time to aid the investigation of this issue. Thank you.

investigating2022-02-14T12:04:54.921Z

We have had sporadic reports of delays in audio commencing on calls and have been investigating these since late last week. We would welcome fresh examples to assist us. Please submit these through team@simwood.com. Thank you

Jan 13, 2022

Report: "Intermittent call issues"

Last update 2022-01-13T09:53:49.511Z

resolved2022-01-13T09:00:00.000Z

We received reports of intermittent call issues from around 9:10. Engineers were working on reports from internal monitoring systems at this time and no further reports or alarms have been seen since 9:20. The service is continuing to be monitored

Dec 31, 2021

Report: "Portal issues"

Last update 2021-12-31T12:25:07.969Z

resolved2021-12-31T12:25:07.956Z

The Simwood Portal has been stable for some time and the issue is now resolved.

monitoring2021-12-31T11:00:18.491Z

We have implemented a fix and the Simwood Portal is now accessible again. We will monitor the situation for the next hour or so.

investigating2021-12-31T10:41:48.241Z

We are investigating an issue with the Simwood Portal.

Dec 14, 2021

Report: "Intermittent issues with portal / API - degraded performance"

Last update 2021-12-14T21:08:40.674Z

resolved2021-12-14T21:08:40.658Z

This incident has been resolved.

monitoring2021-12-14T16:25:22.582Z

We are continuing to investigate the cause of the issue and identifying a robust resolution - the portal is back to being accessible.

identified2021-12-14T15:54:03.036Z

The issue has been identified and a fix is being implemented.

investigating2021-12-14T15:28:14.076Z

We have received reports of intermittent issues that are impacting the API / Portal. This is currently being investigated and we'll update accordingly.

Nov 29, 2021

Report: "Intermittent API/portal timeout"

Last update 2021-11-29T16:39:14.210Z

resolved2021-11-29T16:39:14.195Z

Reports are processed sequentially and a single customer was effectively causing a Denial of Service with thousands of requests for a heavy report, which was in turn delaying others beyond the timeout the API has. The API, portal and the platform in general were otherwise fine and this functioned as expected.

monitoring2021-11-29T13:23:45.188Z

A fix has been implemented and we are monitoring the results.

identified2021-11-29T13:04:55.414Z

The issue has been identified and a fix is being implemented.

investigating2021-11-29T12:54:43.003Z

We are currently investigating this issue.

Nov 12, 2021

Report: "Reports of Network issues"

Last update 2021-11-12T11:47:02.585Z

resolved2021-11-12T11:47:02.567Z

We saw instability in a single router within our Telehouse East site. This was due to a hardware limit being reached which we believe was the indirect result of overnight maintenance at a peer network. Addressing this caused widespread reconvergence across the network. We are monitoring the status of this router in case of recurrence but it has been stable since 9.34am.

monitoring2021-11-12T09:49:24.676Z

A fix has been implemented and we are monitoring the results.

investigating2021-11-12T09:35:18.493Z

We are still investigating, and traffic appears to have improved.

investigating2021-11-12T09:10:00.710Z

We are continuing to investigate this issue.

investigating2021-11-12T09:01:55.132Z

We are currently investigating this issue.

Nov 2, 2021

Report: "Reports of audio issues"

Last update 2021-11-02T13:29:24.673Z

resolved2021-11-02T13:29:24.659Z

We saw network instability in our Telehouse North location which we believe to be due to malformed routes propagated externally from peers, triggering a reload of the BGP process. This in turn caused instability and packet loss for any calls traversing that equipment. Affected external BGP sessions were disabled restoring stability. The underlying issue will be investigated out of hours, and mitigated over the medium term.

monitoring2021-11-02T11:29:04.514Z

We received 7 examples of affected calls which helped track down a network disturbance leading to audio loss on a very small number of calls. Once isolated, we have been able to resolve the issue and have received no further reports. If you do notice any further issues, then please get in touch.

identified2021-11-02T10:56:40.000Z

We believe we have identified the cause of these audio issues and from the examples provided we can see that the issues aren't widespread - it has only affected a very minor volume of calls (less than 10). The audio issues have been highly intermittent and we are currently working on implementing a fix to ensure these issues no longer persist.

investigating2021-11-02T10:24:45.751Z

We are continuing to investigate the cause of the audio issues and performing the necessary tests in order to identify where the cause lies. We have received a few examples which is helping in the diagnosis, and when we're able to identify and/or confirm anything further, we'll provide the relevant updates.

investigating2021-11-02T09:53:45.133Z

We have received reports of audio issues and are currently obtaining and investigating examples.

Sep 13, 2021

Report: "VoIP DDoS Preparations – IMPORTANT customer update"

Last update 2021-09-13T14:04:55.005Z

resolved2021-09-13T14:04:54.989Z

We have completed work in connection with this potential threat and interacted and tested with customers and so this notice is marked as resolved. The potential threats for DDoS attacks remain.

monitoring2021-09-04T20:56:54.547Z

We have been monitoring the current DDoS situation and working with industry colleagues. Our adapted plans for handling such an attack against Simwood properties has been published on our blog at https://blog.simwood.com/2021/09/voip-ddos-preparations-important-customer-update/ Some of those changes require customer action in order to benefit from them. Please digest this blog posting and contact us for any advice.

Feb 8, 2021

Report: "Support Ticket System down"

Last update 2021-02-08T14:03:55.144Z

resolved2021-02-08T14:03:40.000Z

The Zendesk service is now operating satisfactorily.

monitoring2021-02-08T12:36:33.053Z

Zendesk have reported this as resolved and we are not experiencing any major issues although some minor internal only issues appear to remain.

identified2021-02-08T10:53:48.287Z

Zendesk is still in the middle of a service interruption but is mostly operational for our staff now. We are mitigating the areas where it isn't. We anticipate that ticket and porting functions should work for our customers albeit that Zendesk are still working on their service

identified2021-02-08T10:10:30.343Z

Service has partially been restored but it remains slow and sporadic so you may experience some issues or delay. Please continue to mail to team@simwood.com directly.

identified2021-02-08T09:33:40.872Z

Our support system, supplied by Zendesk, is down as per status.zendesk.com. This will affect our ability to receive tickets from the portal and some porting functions. We will continue to process support tickets mailed into team@simwood.com.

Jan 27, 2021

Report: "Portal / API working slowly for some functions"

Last update 2021-01-27T19:49:06.184Z

resolved2021-01-27T19:49:06.172Z

This incident is now resolved. Apologies again for any interruption accessing information.

monitoring2021-01-27T19:05:21.811Z

A change has been implemented and we believe all services are responding correctly and promptly. Please raise a ticket or comment in our Community Slack channel should you find otherwise. We will continue to monitor the service in the interim. We apologise for any interruption to your service with us.

identified2021-01-27T17:14:09.845Z

Some mitigation has been applied and some functions appear to be working correctly. Those are being monitored but work continues in other areas

investigating2021-01-27T17:03:32.625Z

The API is working slowly for some functions and as the Portal works off of that, it is also affected. We have noted that the download of CDRs, rates and invoices are reported as being affected and are investigating

Jan 12, 2021

Report: "Calls not completing to imported numbers"

Last update 2021-01-12T15:43:27.323Z

resolved2021-01-12T15:43:27.308Z

This incident has been resolved. Apologies to those affected by this.

monitoring2021-01-12T14:53:30.182Z

We are continuing to monitor for any further issues.

monitoring2021-01-12T14:48:17.102Z

A fix has been implemented and we are monitoring calls to previously affected numbers

identified2021-01-12T14:36:47.443Z

A partial fix has been implemented which should help some calls. Work on the core problem is ongoing

identified2021-01-12T14:22:20.221Z

An issue has been identified for inbound calls to some ported numbers and is been worked on

investigating2021-01-12T13:59:22.000Z

We are receiving notifications of calls not completing to some ported numbers and are investigating

Dec 23, 2020

Report: "Support Ticket system failure"

Last update 2020-12-23T13:31:42.828Z

resolved2020-12-23T13:31:42.816Z

This incident has been resolved.

monitoring2020-12-23T10:33:18.629Z

Zendesk appears to be functionally normally now. Their status page reflects there is an incident so we will continue to monitor performance and their updates

identified2020-12-23T10:19:33.700Z

Zendesk has acknowledged they have a problem. Their status page doesn't reflect this at this time.

investigating2020-12-23T10:05:36.016Z

Our Support Ticket system, provided by Zendesk, is down. We can continue to take calls and tickets emailed to team@simwood.com. We have raised an urgent ticket with Zendesk

Dec 14, 2020

Report: "Google outage"

Last update 2020-12-14T12:58:07.366Z

resolved2020-12-14T12:58:07.348Z

Google report the incident resolved and we're seeing still seeing email flowing.

monitoring2020-12-14T12:33:11.692Z

We are seeing some emails arriving now although the service is still down according to Google.

identified2020-12-14T12:20:03.922Z

Google have now acknowledged the issue on https://www.google.com/appsstatus#hl=en&v=status

investigating2020-12-14T12:13:07.644Z

Google appear to have a large outage which they have yet to acknowledge. We use Google Apps for our email and thus are presently not receiving emails into our ticketing system. If you have urgent requirements, please telephone us. No Simwood services are otherwise affected.

Oct 1, 2020

Report: "September CDRs"

Last update 2020-10-01T11:54:34.209Z

resolved2020-10-01T11:54:34.195Z

The cluster remains stable and reconciliation continues to run in the background.

monitoring2020-10-01T11:02:35.556Z

The cluster is fully restored. We will monitor for an our before closing this incident and will perform a reconciliation of CDRs in the background which should complete over the next 24 hours.

identified2020-10-01T10:53:42.777Z

We currently have an issue with some of the search nodes that present CDRs to customers through the portal and API. This means CDRs viewed through the portal or API are incomplete. We are working on restoring the cluster to 100% but will then need to performa a reconciliation of CDRs for September, against the master database, which we will do over the next 24 hours.

Sep 18, 2020

Report: "API access is failing for some services"

Last update 2020-09-18T14:28:47.770Z

resolved2020-09-18T14:28:47.753Z

This incident has been resolved. Apologies for the API service interruption

identified2020-09-18T14:20:25.811Z

A range of errors have been reported from Simwood API. The problem has been identified and a fix is being worked on

Sep 9, 2020

Report: "Support Ticket system failure"

Last update 2020-09-09T18:42:01.753Z

resolved2020-09-09T18:42:01.740Z

Our Zendesk ticketing service continues to operate satisfactorily and Zendesk are now in monitoring mode, so this incident will now be closed.

monitoring2020-09-09T14:13:57.100Z

The Zendesk ticketing system service seems good for us although they continue to work on it "Pod 18 is online in a degraded state. Our teams are working to restore service. We sincerely apologise for the disruption this has caused to your Zendesk service." So this service status remains open, but left as monitoring, and will be updated when Zendesk's work is complete.

monitoring2020-09-09T13:13:32.130Z

The Zendesk ticketing service has resumed but is slow

identified2020-09-09T12:57:54.668Z

An update from Zendesk (we're on Pod 18): We’re seeing some improvements on Pod 17. We are still working towards a full resolution for Pod 18. Please bear with us as we work to fully resolve this issue. The next update by us will by 3pm.

identified2020-09-09T12:38:43.734Z

Zendesk remains unavailable which also means you will be unable to submit porting requests via the portal at this time. No further information is available yet. Any update available will be provided by 2pm

identified2020-09-09T12:16:17.419Z

Our support ticketing system is hosted by Zendesk and they are experiencing server issues which can be seen at https://status.zendesk.com/ The submission of tickets via the support centre is affected by this as is our ability to process existing tickets. In the meantime, please mail ticket requests to team@simwood.com

Jul 16, 2020

Report: "Vodafone issues"

Last update 2020-07-16T13:37:35.022Z

resolved2020-07-16T13:37:35.008Z

This incident has been resolved.

monitoring2020-07-16T12:14:40.608Z

Having previously taken all Vodafone interconnects out of route whilst testing, we have determined this is only affecting our Vodafone interconnects in the Leeds area. All others have been restored to service now and are passing traffic. We will leave those in the Leeds area down and escalate with Vodafone separately. We will monitor for an hour and then mark this issue resolved if that proves to be the case; we have ample redundant capacity between our networks elsewhere.

investigating2020-07-16T10:33:15.869Z

We have noticed a higher number of failures than normal to Vodafone destinations out-with the Simwood network. We have made on-net changes where we can but are investigating further.

Jun 19, 2020

Report: "Vodafone issues"

Last update 2020-06-19T14:29:29.936Z

resolved2020-06-19T14:29:29.912Z

This incident has been resolved.

monitoring2020-06-19T13:44:01.000Z

We are continuing to monitor this.

identified2020-06-19T13:43:45.042Z

Inbound volumes have now normalised and things look much better from here across all our Vodafone interconnects. We are still seeing elevated volumes of outbound traffic directly to Vodafone, which could illustrate issues elsewhere still.

identified2020-06-19T13:30:32.615Z

Our outbound traffic directly to Vodafone is now at elevated levels (suggesting issues elsewhere still) but we're only seeing about 15% of the usual levels of inbound traffic from the Vodafone network.

identified2020-06-19T13:13:58.665Z

Our Leeds interconnects are now passing traffic and they are restoring to normal levels.

identified2020-06-19T13:06:45.606Z

Our interconnects with Vodafone in London and Manchester are back passing calls, but we see those in Leeds still down. Traffic is running at about 40% now.

identified2020-06-19T13:01:47.577Z

We're now seeing a low level of calls to/from Vodafone completing.

identified2020-06-19T12:51:22.727Z

Vodafone operated services such as 101 are not available as part of this incident. 101 is the non-emergency contact number for any police force in England and Wales and it is available 24 hours a day, 7 days a week

identified2020-06-19T12:37:32.148Z

We're now seeing 0% of normal traffic to and from Vodafone destinations. This issue appears to have started at 13.07BST and is ongoing.

identified2020-06-19T12:17:28.279Z

We're seeing traffic on our bilateral interconnects with Vodafone operating at about 10% of normal levels and hearing widespread reports of issues on the Vodafone network. This is not a Simwood issue and is off-net, but will be affecting a large proportion of calls. We have mitigated on-net as far as possible and will update with any resolution by Vodafone.

Jun 9, 2020

Report: "Reports of issues on Vodafone Network"

Last update 2020-06-09T20:26:02.001Z

resolved2020-06-09T20:26:01.982Z

This issue appears to be resolved and our monitoring shows calls completing normally.

identified2020-06-09T17:42:37.000Z

We are continuing to monitor this issue.

identified2020-06-09T17:39:44.912Z

We are aware of an issue affecting calls to and from numbers on the Vodafone network. This fault is beyond the Simwood network, and our own services are operating normally, however we will continue to update this incident with further information as it becomes available.

Jun 4, 2020

Report: "London edge proxy restart"

Last update 2020-06-04T16:29:59.025Z

resolved2020-06-04T16:29:59.010Z

We have been closely monitoring since the restart and are happy that everything is working as expected.

monitoring2020-06-04T15:47:27.653Z

We have just performed (16.37 BST) an emergency reload of one of our London edge proxies which had stopped processing TCP traffic. Calls were failing over to Slough internally but with increased PDD.

May 27, 2020

Report: "Outbound SMS"

Last update 2020-05-27T13:40:42.515Z

resolved2020-05-27T13:40:42.502Z

We have determined this incident has been resolved.

monitoring2020-05-27T11:20:05.121Z

There was an issue this morning with outbound SMS via the portal and API timing out. We have identified this issue and it is now resolved. We continue to monitor this and will update accordingly.

Mar 24, 2020

Report: "BT Openreach porting"

Last update 2020-03-24T10:11:17.138Z

resolved2020-03-24T10:11:17.129Z

BT appear to have closed their Openreach porting team. Please see our blog for background: http://blog.simwood.com/2020/03/bt-to-halt-porting-operations-an-open-letter-to-ofcom/ We will update open porting orders as and when we have more information but at the present time it looks like they will not complete.

Feb 19, 2020

Report: "Invoices in Portal and API"

Last update 2020-02-19T09:47:54.430Z

resolved2020-02-19T09:47:54.412Z

This appears to be resolved and is now working as expected. We will continue to monitor this, and await a response from Xero regarding the underlying cause. Please accept our apologies for any inconvenience caused.

identified2020-02-18T12:58:56.065Z

This has been identified as an integration issue between Simwood and the Xero API and is being investigated. Only retrieval of invoices is affected, Invoices are still being generated as normal and this does not affect any other aspect of billing such as CDRs.

investigating2020-02-18T10:27:17.644Z

We are currently investigating an issue preventing the display and retrieval of Invoices in the Portal and API, no other services are affected and we would like to reassure you this does not affect any other aspect of billing, such as CDRs.

Jan 23, 2020

Report: "Calls issues"

Last update 2020-01-23T23:22:32.059Z

postmortem2020-01-23T22:42:14.190Z

At 11.08 today we were alerted to call volumes reporting as lower than normal and declining. We also began to receive reports of increased PDD and 503 call failures in our community Slack channel. Our investigations later identified the root cause of these intermittent failures to be a result of excessive memory fragmentation on the master Redis node, causing increased latency and connection failures. Those connection failures caused slave nodes, which are distributed throughout the network and used for all read activities, to in some cases resynchronise. Two call-routing nodes, one in Slough and one in Volta, alerted intermittently increased PDD as a result of the \(one of many\) local nodes they were querying being in this unstable state. Other nodes and those in other sites were functional at this stage. By 11.27, mid-investigations, our system automatically elected a new master node, and call volumes immediately began to climb reaching normal levels quite quickly. I hindsight, this was the primary issue mitigated. However, by this time, we were seeing wider spread reports of 403 ‘out of call credit’ and ‘account not enabled’ errors, both for call traffic and in the portal. These were network wide and not just restricted to a few nodes. We realised that numerous accounts were marked as ‘credit blocked’ and by 11.54 had manually reset them, causing the remaining accounts to have successful calls again. As our investigations continued, during which service was working normally, there was a second instance of some ‘credit locked’ accounts at 13.47. These were immediately corrected and this repeat assisted us in identifying the cause. We subsequently discovered that one of the services responsible for monitoring calls in progress and disabling accounts had not failed over and was still connected to the old master, and thus continuing to experience connection difficulties. A bug was identified which meant that should this \(out of band\) service fail to get a returned value for a particular key in Redis \(rather than a negative result\), the account was treated as disabled. ‘Disabled’ accounts with call attempts are blocked at a different level in our stack, to enable more efficient call rejection. This accounts for the progression of error messages some customers were seeing, i.e. they were first identified as disabled, then marked as blocked for no credit. This calls in progress service was stopped as a precautionary measure, the bug was patched and the service was restarted. The incident did not then recur. Technically speaking, service was degraded from 11.07 until 11.27, with a relatively small percentage of calls affected. However, as a number of accounts experienced complete network-wide call rejection until 11.54, starting at different times for each affected account, we are treating this as an SLA eligible incident from 11.07 until 11.54, and again, for 1 minute at 13.47. This grossly overstates the aggregate impact according to our statistics but we appreciate those accounts that were affected experienced a complete loss of service. We’ve learned some useful lessons through this incident and will schedule remedial work to prevent a recurrence as soon as possible. We’re sorry for the disruption caused here and very grateful to our community slack members who provided helpful insight to compliment our own telemetry which helped us identify a potentially elusive issue.

resolved2020-01-23T20:21:47.345Z

This incident was resolved earlier but we've now determined and mitigated root cause. A Post Mortem will follow shortly. We're sorry for the disruption caused and confirm this will trigger SLA credits for eligible customers.

monitoring2020-01-23T14:23:56.025Z

Calls in progress are now updating again.

monitoring2020-01-23T14:01:10.705Z

We have just stopped a service that updates the "calls in progress" values in the portal and is related to credit control. This means the portal will not show calls in progress for the moment.

monitoring2020-01-23T11:56:55.015Z

A fix has been implemented and we are monitoring the results.

investigating2020-01-23T11:23:00.336Z

We've seen our call volumes drop and a number of customers in our community Slack channel have reported issues. We're investigating.

Jan 6, 2020

Report: "Bristol Office"

Last update 2020-01-06T10:41:44.055Z

resolved2020-01-06T10:41:44.040Z

We are now back in the office and calls are being answered as normal.

identified2020-01-06T10:39:28.665Z

Due to a fire alarm within the Bristol office we are currently unable to receive incoming calls to our main telephone number and ticket responses may be delayed. All Simwood services, and the Simwood network, remain unaffected. Please accept our apologies for any inconvenience this may cause.

Dec 19, 2019

Report: "Virgin Ported Numbers"

Last update 2019-12-19T19:04:30.622Z

resolved2019-12-19T19:04:30.610Z

As of 1845 Virgin have confirmed that this should be resolved, and traffic has been re-routed so affected numbers should now be in service. This incident is part of a wider issue affecting the Virgin network after a third party damaged part of their fibre infrastructure and is outwith the control of Simwood.

identified2019-12-19T12:37:12.412Z

We are aware of an issue affecting numbers ported from the Virgin network to Simwood. Virgin have confirmed they are experiencing issues with a fibre break and are working to re-route the affected traffic. This fault is outwith the Simwood network, and our own services are operating normally, however we will continue to update this incident with further information as it becomes available.

Nov 19, 2019

Report: "Elevated PDD"

Last update 2019-11-19T13:30:15.138Z

resolved2019-11-19T13:30:15.122Z

This incident has been resolved.

monitoring2019-11-19T11:44:17.017Z

The PDD issue has been identified and addressed. Traffic over the last five minutes appears to be returning to normal. We will continue to monitor traffic.

investigating2019-11-19T11:37:01.133Z

We have identified an issue with excessive PDD on some inbound and outbound calls causing calls to fail with a timeout affecting many customers. We will update as soon as we have affirmative information and by 11:50 regardless.

investigating2019-11-19T10:38:13.874Z

Whilst aggregate volumes look normal, some customers have reported high PDD or timeouts on certain calls. We're investigating.

Nov 1, 2019

Report: "SMS Services"

Last update 2019-11-01T16:52:49.786Z

resolved2019-11-01T16:52:49.766Z

This was resolved as of 1615. We are aware that some customers are continuing to receive messages from earlier, this is as a result of them being queued upstream of Simwood and we cannot control when the originating network retries where the message could not be delivered initially. Please accept our apologies for any inconvenience caused.

identified2019-11-01T14:39:15.667Z

We are aware of an issue affecting some SMS message deliverability and are investigating. Some customers may experience delays in inbound and outbound SMS. This incident will be updated with more information as soon as it is available.

Report: "Database Performance"

Last update 2019-11-01T16:50:45.524Z

resolved2019-11-01T16:50:45.507Z

Normal service was restored at 1045, and the backlogged CDRs were processed entirely by 1300. This has been monitored continuously since and has operated as expected. Please accept our apologies for any inconvenience this caused.

monitoring2019-11-01T10:45:56.238Z

We believe this is now resolved and are continuing to monitor the situation. The Portal and API should function as expected, although CDRs will remain backlogged on some accounts for a short time. Will will update this incident further when complete.

investigating2019-11-01T08:52:48.865Z

We are experiencing performance issues with our primary database cluster and are investigating as a priority. As a result, you may experience issues with the performance of the API and Portal; new provisioning and reconfiguration of numbering is not possible, and CDRs are currently lagging behind live. Normal calling and messaging services are unaffected, as these are completely isolated from the primary database. Please accept our apologies for any inconvenience this causes. We will update this incident with further information as soon as it becomes available.

Oct 25, 2019

Report: "Support Centre SSL"

Last update 2019-10-25T14:07:28.779Z

resolved2019-10-25T14:07:28.767Z

Our Support Centre SSL certificate has changed. This may generate a warning in some browsers, depending on your configuration, or you may notice the "Green Padlock" provided by EV Certificates on some browsers has been replaced. This does not affect the security of your account or tickets, nor does it affect any calls made using TLS or access to the API or Portal even were strict validation is in use. Due to an error made by COMODO when issuing our SSL certificate, as part of Comodo/Sectigo CA internal audit they have uncovered an encoding error in the certificate used for our support site, support.simwood.com. Although this is an encoding issue and does not affect website security, the CA/B Forum Baseline requirements for the Issuance and Management of Extended Validation Certificates require that the CA corrects the error by revoking the previously issued certificate and issuing a new certificate. Unfortunately, they have only given just over 24 hours notice of this, and will not be able to issue a new certificate before the previous one is revoked, as the re-issuance process takes around one working day. As a result we have temporarily moved the Support Centre to use a certificate provided by LetsEncrypt (https://letsencrypt.org) which is an automated CA operated by the non-profit Internet Security Research Group. This does not affect the API or Portal, which use a different SSL certificate unaffected by this issue.

Oct 9, 2019

Report: "Database cluster issues"

Last update 2019-10-09T00:18:53.707Z

resolved2019-10-09T00:18:40.543Z

Billing has fully caught up. Thanks for your patience.

monitoring2019-10-08T16:46:23.263Z

Failover is largely complete and CDRs are now being processed.

identified2019-10-08T15:55:16.416Z

We are about to commence failover to the standby cluster as this query rollback is showing no signs of concluding. Once this is concluded we'll mark this incident as 'monitoring'. There are several million CDRs to catch up on so we will leave it unresolved until they are processed.

identified2019-10-08T12:05:32.857Z

This remains ongoing but we are making progress. The offending query remains on one node and continues to be in the process of rolling back. Unfortunately, rolling back is less efficient than the problem it caused in the first place. Note this is not an issue with the query per-se (a single row delete) but an internal Galera issue triggered by it. Until this rollback completes the cluster remains effectively write locked but serviceable for reads. We know why this happened and how to prevent it going forwards and have backup nodes with current data ready to takeover should we decide to fail-over from the existing cluster. As we have no idea whatsoever how long the trigger query will take to roll back on the final node, we have held off failing over in anger in the hope it may be soon, but cannot delay indefinitely. Call traffic remains unaffected and our ops team have been handling most urgent customer issues such as locked balances. We will therefore continue monitoring and update here should anything change.

identified2019-10-08T07:17:49.357Z

Whilst not affecting call traffic, we are presently unable to write to our primary database cluster. This is due to an overnight job triggering a bug. The query will eventually work through but we have no way presently of determining how long that will take. We are meanwhile investigating more invasive options. In the interim, this means portal, API and administration options which would normally update the database (e.g. billing, number allocation and pre-pay top-ups) are delayed or non-functional. We're sorry for any impact this will have but, to repeat, call traffic is not affected.

Oct 3, 2019

Report: "Database cluster issues"

Last update 2019-10-03T22:52:20.867Z

resolved2019-10-03T22:52:20.854Z

The database issues were resolved as of 2154 UK time and CDRs were catching up thereafter. All has been back to normal for some time and we're therefore closing this incident. Thanks for your patience.

monitoring2019-10-03T21:02:06.817Z

The database appears to be recovering and we're continuing to monitor the situation.

identified2019-10-03T20:35:42.908Z

We are monitoring a situation with our primary database cluster. We have an expectation of this remedying itself in the next few hours but have an action plan in place for overnight if it does not. In the interim, whilst there is zero impact on production call traffic CDRs, number provisioning and other configuration changes will be delayed. Given the late hour and non-impact on call traffic, we do not intend sending notifications for updates until resolution or a dramatic change in circumstances. We will however update this page where possible.

Sep 19, 2019

Report: "London - Reduced fibre network redundancy"

Last update 2019-09-19T20:32:51.768Z

resolved2019-09-19T20:32:51.754Z

This link has been brought back into service, restoring full redundancy around our London ring.

investigating2019-09-19T08:18:25.867Z

We are seeing one of our physical fibre spans between Telehouse North and Interxion hard down currently, suggesting physical disturbance. We have preemptively shut down links using this whilst it is investigated and corrected. This reduces redundancy around our London ring but as we have multiple other paths, no service impact is expected.

Sep 17, 2019

Report: "Elevated PDD"

Last update 2019-09-17T16:19:46.010Z

resolved2019-09-17T16:19:45.994Z

Whilst we have very few credible examples here, and all of them demonstrate non-compliance with our interop, we have been able to investigate this issue and pushed some code changes. We use anycast at every level of our stack with micro-services consumed by voice nodes anycasted and Redis nodes consumed by them similarly anycasted. It is, therefore, incumbent upon call-routing nodes to monitor the health of the services they're consuming, and fail-over should the IP address respond but the service not be available. We've found that at certain times, usually after the Redis master has performed a backup, the Redis slaves which are consumed by call-routing show a slight increase in latency. This increase in latency was slight (sub-second) but the tolerances before failover were too tightly set. This caused call-routing to return a failure response forcing a lookup against a backup [unicast] instance. However, this response was malformed but valid, causing the voice routing node to actually fail the call, and our edge proxy to try another. Further, that voice routing node would be taken out of service for a few seconds, causing something of a cascade which manifested in increased PDD. In the first instance, we have pushed a change which prevents the trigger false positive here, i.e. more tolerance of latency increase, and a properly formed failure response. We have however tasked further improvements to prevent the increasing latency in the first place. Lastly, we do need to highlight that this was only present in one particular site, which anyone conforming to our interop would not have been sending traffic to. The root cause of traffic ending up here in many cases appears to be very old versions of Asterisk which do not respect DNS TTL and will continue to cache a host-name until restarted. Others are hardcoding IP addresses. Whilst there was limited scope for some inbound calls to have been affected, customers who were sending outbound traffic according to our interop, using equipment which respects DNS record expiry, were unaffected.

monitoring2019-09-17T13:26:32.782Z

We are still investigating this but are pleased to say it was short-lived and we have no examples since 12.35BST. We have the grand sum of 7 example calls with PCAPs, after stripping out other issues such as invalid numbers or unrelated interop issues. We are working through those, our own telemetry and monitoring but so far, we have not found the cause. As an aside, all remaining example calls were forced to our London site, either owing to stale DNS or non-use of FQDNs.

investigating2019-09-17T11:00:29.382Z

Whilst aggregate volumes look normal, some customers have reported high PDD or timeouts on certain calls. We're investigating.

Sep 10, 2019

Report: "Elevated PDD"

Last update 2019-09-10T11:25:39.978Z

resolved2019-09-10T11:23:30.768Z

We believe this is now resolved and apologise to anyone affected. Please remember to use DNS in accordance with our interop guidance and to enable speedier resolution of issues like this. We still saw more traffic hitting London than other availability zones even after swinging it away in DNS, suggesting many are not. Thanks also to those in our community slack for the realtime feedback.

monitoring2019-09-10T10:47:38.752Z

The DNS change at 11am BST (10 UTC) rectified this for all customers using out.simwood.com rather than forcing traffic to London. New nodes have since been deployed in London and are now carrying most traffic. Former nodes are being drained down and will be destroyed once clear.

identified2019-09-10T10:20:02.572Z

Some of the nodes in our London availability zone are showing elevated load. DNS has already been changed to Slough and we're in the process of adding new nodes to London in order to cycle the existing out.

investigating2019-09-10T10:00:18.490Z

Whilst aggregate volumes look normal, some customers have reported high PDD or timeouts on certain calls. We're investigating.

Aug 27, 2019

Report: "Intermittent issues to UK 080 destinations."

Last update 2019-08-27T16:35:23.660Z

resolved2019-08-27T16:35:23.640Z

This has been resolved by rerouting affected destinations.

identified2019-08-27T16:22:05.013Z

We're aware of intermittent issues reaching some UK 080 (Freephone) destinations. We're working to reroute traffic and this should be complete shortly. Inbound calls to Simwood or hosted 080 numbers are not affected.

Aug 5, 2019

Report: "Slough - spurious 503 errors"

Last update 2019-08-05T18:02:59.566Z

resolved2019-08-05T18:02:59.556Z

As discussed in the community Slack channel, we had credible reports of unexpected 503 errors from some calls routed to our Slough Availability Zone. Volumes were also showing as lower than normal in our own telemetry. On investigation, one of the edge routing containers was in an unusual state whereby it was unable to route calls onwards internally. It was up and otherwise responding normally. This was rectified by restarting it. We did not force a DNS change on this occasion as the issue was resolved quickly once credible reports were received, but appropriately configured customer equipment should have respected published SRV records and retried via another site. We will log a non-conformance report internally as our own monitoring should have detected this situation, and will need work to do so.

Jul 31, 2019

Report: "Intermittent Calls Failing"

Last update 2019-07-31T09:54:37.480Z

resolved2019-07-31T09:54:37.468Z

This incident is fully resolved and CDRs are caught up. In an overnight configuration change, one of our customers created a situation where for each inbound call to their numbers, they generated repeated attempts outbound to that number, sometimes thousands. This was configured across multiple numbers and affected two in separate incidents this morning. The consequence was tens of thousands of additional calls in-flight outbound at any one time, which were then coming back in to the Simwood network. To make matters worse, their outbound calls were also egressing over other routes, but looping back and coming into the Simwood network from various other carriers. We show over 1m of such calls in the first hours of this morning. The customer's equipment was then overloaded causing calls to eventually time-out, but of course compounding the number in-flight. Calls were rejected here due to rate and channel limits but the rate and amplification were such that this didn't totally alleviate the problem. From the Simwood side, this caused load issues predominantly in London which manifested as increased PDD for customers. Based on reports so far and what we've seen, other Availability Zones were unaffected but we were seeing this traffic across all of them. In both cases the numbers were blocked here which restored the situation to normal and, separately, BT NMC mitigated a problem caused for them by 'gapping' (a.k.a rate-limiting) the affected number range on their network, for calls destined for Simwood, having originated from other carriers we do not have bilaterals with.

identified2019-07-31T08:03:09.475Z

Calls should be completing normally now but we’re aware there is a delay in CDRs being processed.

investigating2019-07-31T07:51:48.814Z

We are investigating sporadic reports of calls failing, we are investigating this and more information will be provided as soon as it becomes available.

Jul 26, 2019

Report: "Routing instability London"

Last update 2019-07-26T10:31:33.792Z

postmortem2019-07-26T10:27:14.716Z

Cogent acknowledged this incident an hour or so after we mitigated it and have now provided the following RFO: > Your service connected at London Volta may have experienced connectivity issues for some minutes. > > During the execution of a non impacting maintenance in our network, we made a mistake and applied some configuration changes on the wrong device, causing the isolation of a node router. As soon as we realized about the mistake, we reverted the config changes on the affected device. > > This has been a human error. However we will review the maintenance process to see if there is any room of improvement to avoid a similar issue in the future. > > Apologies for any inconvenience caused. Thankfully, we have multiple transit providers and multiple connections to each distributed around the network, so shutting one down in this kind of situation, or even for a prolonged period, is a non-event. Further, only 10% of our traffic flows over transit - 90% is directly on-net or flows over bilateral peering. We continue to encourage customers to connect as directly as possible - either by being on-net directly, cross-connected to us in a common data centre, or having your colocation provider peer with us if they don’t already. Please speak to us if you’d like this. We will be testing Cogent’s fix and re-enabling this session later this evening.

resolved2019-07-25T22:37:55.511Z

Between 23.21 and 23.30 (UK time), customers connecting services in our London availability zone but reaching us over Cogent transit, may have seen instability. Cogent were announcing our routes but not passing traffic. The session was shut down from our side to alleviate things. Those on-net or reaching us over peering sessions (which is the majority) and other ultimate transit providers, or reaching other availability zones over Cogent would have been unaffected. We strongly encourage all customers to connect into us directly wherever possible.