Historical record of incidents for MacStadium
Report: "Private Cloud in Atlanta"
Last updateDear Customers, Private Cloud/VergeIO services in ATL1 have been restored. We have reached out individually to affected customers regarding the restoration of their environments. MacStadium Support Team
Dear Customers, Private Cloud/VergeIO services in ATL1 have been restored. We have reached out individually to affected customers regarding the restoration of their environments. MacStadium Support Team
Dear Customers, The diagnostic review is still ongoing, and the VergeIO team has successfully recovered a portion of the remaining missing data. Efforts continue to restore as much data as possible before bringing the system back into production. We remain actively engaged in the recovery process and will provide further updates as more information becomes available. At this time, we do not have a definitive ETA for full restoration. Thank you for your patience and understanding. MacStadium Support Team
Dear Customers, The diagnostic review is ongoing, and the VergeIO team is actively working to recover as much missing data as possible. Our teams remain fully engaged to support the recovery efforts and bring the system back into production as quickly as possible. We do not have a definitive ETA for full restoration, but we will continue to provide updates as more information becomes available. Thank you for your patience and understanding. MacStadium Support Team
Dear Customers, Our efforts to restore the Atlanta-based Private Cloud environment are progressing, and we have now reached Step 4: Diagnostic Review in VergeIO’s five-step recovery plan. At this stage, the system is being analyzed to identify any gaps or unrecoverable data blocks. We remain actively engaged with VergeIO leadership to work toward a resolution as quickly as possible. Unfortunately, an exact ETA for full restoration is still unknown, but we will continue providing updates as we have more information. Thank you for your patience. MacStadium Support Team
Dear Customers, Our efforts to recover the Atlanta-based Private Cloud environment continue as we work hand in hand with Verge through their restoration process. The system is still working through the first step (full disk scan)--we anticipate another 10 hours for this first step to complete. An overall restoration ETA is unknown; we believe it will likely take some time to bring Private Cloud back online. We will provide additional updates as we have more information to share. Thank you for your patience. MacStadium Support Team.
Dear Customers, During a standard VSAN expansion process, the Verge system experienced unexpected behavior, resulting in the environment going down. Verge has outlined a five-step plan to recover the environment, including a full disk scan, journal walk process, repair cycle, diagnostic review, and VSAN connection. At this time, the first step (full disk scan) is approximately 70% complete. We are actively engaged with Verge leadership to work towards a resolution as quickly as possible, but unfortunately, an ETA is not known at this time, and we believe there is likely quite a way to go to bring Private Cloud back online. We will provide additional updates as we have more information to share. Thank you for your patience. MacStadium Support Team.
Dear Customers, We continue to work actively with our vendor to repair and restore our Private Cloud service in our Atlanta, GA USA data center as quickly as possible. We will continue to provide updates as we have more information. Thank you for your patience. MacStadium Support Team
Dear Customers, We have identified an issue with the VSAN that has caused the Private Cloud service in our Atlanta, GA USA data center to be down. Our team is actively working with our vendor to take down the environment for repair. We will continue to provide updates as we have more information. Thank you for your patience. MacStadium Support Team
Dear Customers, We are aware of an issue affecting the Private Cloud service in our Atlanta, GA USA data center. At this time, the service is hard down, and customers cannot access the system. Our team is actively investigating and working with our vendor to resolve the issue quickly. We will provide updates as we have more information. Thank you for your patience. MacStadium Support Team
Report: "Customer Portal Ticketing System Incident"
Last updateDear Customers, Since implementing the solution, we have observed no further issues with the Customer Portal ticketing system. As a result, we are closing out this incident. Thank you for your patience and trust. MacStadium Support Team
Dear Customers, The issue affecting the Customer Portal ticketing system has been resolved. You should now be able to create, view, and respond to tickets as expected. We are actively monitoring the system to ensure continued stability. Thank you for your patience, and we apologize for any inconvenience this may have caused. MacStadium Support Team
Dear Customers, We are aware of an issue affecting the Customer Portal ticketing system. Some users may be unable to create new tickets or view and respond to existing ones. Our team is actively investigating and working to resolve the issue as quickly as possible. We will provide updates as we have more information. Thank you for your patience. MacStadium Support Team
Report: "Winter Storm Advisory – Our Preparedness for the Dublin Data Center Operations"
Last updateThis incident has been resolved.
As Storm Éowyn approaches the Dublin data center this Friday, we want to assure you that we are fully prepared to maintain seamless operations at our data center. Our co-location provider and our team have robust measures in place, including: Backup power systems, including generators and fuel reserves, to ensure uninterrupted service. Redundant network connections to maintain connectivity. The Irish government has however advised that potential danger to life is possible whilst travelling. After careful consideration, MacStadium has made the decision not to have onsite support at our Dublin data center from Friday 6am GMT until Friday 1pm GMT in order to protect our employee's safety. We continuously monitor weather conditions and have contingency plans in place to ensure your services remain operational and secure. Should you have any concerns or need assistance during this time, please don’t hesitate to reach out to our support team. Thank you for trusting us with your business.
Report: "Winter Weather Advisory – Our Preparedness for Atlanta Data Center Operations"
Last updateThis incident has been resolved.
As winter weather approaches the Atlanta, Georgia area this Tuesday and Wednesday, we want to assure you that we are fully prepared to maintain seamless operations at our data center. Our co-location provider and our team have robust measures in place, including: • Backup power systems, including generators and fuel reserves, to ensure uninterrupted service. • Redundant network connections to maintain connectivity. • On-site and on-call personnel ready to respond swiftly to any weather-related challenges. We continuously monitor weather conditions and have contingency plans in place to ensure your services remain operational and secure. Should you have any concerns or need assistance during this time, please don’t hesitate to reach out to our support team. Thank you for trusting us with your business.
Report: "Winter Weather Advisory – Our Preparedness for Atlanta Data Center Operations"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
We are continuing to monitor for any further issues.
As winter weather approaches the Atlanta, Georgia area this Friday and Saturday, we want to assure you that we are fully prepared to maintain seamless operations at our data center. Our co-location provider and our team have robust measures in place, including: Backup power systems, including generators and fuel reserves, to ensure uninterrupted service. Redundant network connections to maintain connectivity. On-site and on-call personnel ready to respond swiftly to any weather-related challenges. We continuously monitor weather conditions and have contingency plans in place to ensure your services remain operational and secure. Should you have any concerns or need assistance during this time, please don’t hesitate to reach out to our support team. Thank you for trusting us with your business.
Report: "LAS Network Infrastructure Maintenance"
Last updateThis incident has been resolved.
Apologies this was sent in error
Valued MacStadium Customers, On Dec 19th, 2024 from 02:30 UTC to 04:30 UTC, the MacStadium Engineering Team will be conducting network infrastructure upgrades. All hosts in the Las Vegas Data Center will experience a loss of network connection during the upgrade process. The maintenance is rolling, so no server should experience more than 15 minutes of downtime. If you have questions, please contact MacStadium Customer Support by opening a ticket via the Customer Portal or email us at support@macstadium.com. Additionally, visit MacStadium’s Status Page (status.macstadium.com) for updates before, during, and after this change.
Report: "Inclement Weather - Atlanta Data Center"
Last updateThis incident has been resolved.
We are expecting inclement weather related to Hurricane Helene at our Atlanta Data Center. Our data center partner is monitoring all critical infrastructure at the site. All generators have been exercised and fuel levels are at their maximum levels. MacStadium and our DC partner’s staff will be on site to provide support during this weather event. Please open a ticket should you experience an issue.
Report: "LAS1 NETWORK ISSUE"
Last updateOur Las Vegas Data Center Network issues have been resolved. Our Engineers will continue to monitor. Please contact us if you experience any issues. Thank you.
Our Las Vegas Data Center is currently experiencing Network issues. Our Engineers are looking into this and we will provide updates as we work through the issue. Thank you for your patience while we work this issue.
Report: "Atlanta Power Interruption Affecting Some Customers"
Last updateWe have confirmed power has been restored to all affected customers at this time. Please open a ticket if you’re experiencing any issue to your Atlanta-based servers.
Our data center colo partner has restored power to the facility. We are working to confirm power has been restored to all affected customers.
Our data center colo partner has isolated the failure to a specific UPS. They are currently working to remediate the failure. There is still currently no ETA.
Our data center colo partner is actively engaged and troubleshooting the power interruption at the site. We will post the ETA when it is provided to us.
There is a power interruption affecting some of our customers in our Atlanta facility. We are currently working with our data center partner, DataBank, to restore power.
Report: "ATL - Loss of Access to Cloud Environments"
Last updateA fix has been implemented. All services are online, and we are monitoring the results. If you are still experiencing issues, please open a ticket via the Customer Portal.
We are implementing the fix now and customers are coming back online. We should have all systems up shortly.
We have determined root cause and are working to resolve the issue.
All cloud environments are still operational. Only the ability to communicate with them has been affected. We have narrowed down the issue to a network configuration. We expect to correct the issue near term.
We are experiencing a broad outage of customers' ability to access their cloud environments in the Atlanta data center. We are actively investigating and will provide additional updates.
Report: "LAS - Reduced Internet Capacity"
Last updateWe wish to advise our Las Vegas customers that the traffic connectivity to our Las Vegas data center remains at full connectivity capacity, we will step down these notifications but we will continue to monitor. We would like to thank our customers for their patience and once again apologise for any inconvenience related to this issue.
We wish to advise our Las Vegas customers that we have been informed by our data carriers that full connectivity capacity has been restored. We will continue to closely monitor the situation and once again we apologise for any inconvenience related to this issue.
Our data carriers have provided us with an ETA of 13:00 UTC/ 05:00 PT when they expect the restoration of data capacity. We apologise for any inconvenience related to this reduced capacity.
We wish to advise our Las Vegas customers that we are experiencing carrier failures to and from our Las Vegas data center at this time. We expect no customer impact apart from operating at reduced capacity.
Report: "LAS - Reduced Internet Capacity"
Last updateThis incident has been resolved.
Services have been routed off the failed hardware, and no further impact is expected. Normal capacity has been restored.
The ISP identified the piece of hardware that failed and is in the process of replacing it now. No updated ETA has been provided.
One of the Las Vegas ISPs has experienced a hardware failure, causing the site to experience reduced Internet capacity. Currently, traffic has converged to alternate ISPs to minimize impact. The Estimated Time of Repair from the ISP is 2030 UTC.
Report: "Customer Portal Outage"
Last updateOur Developers have resolved the service interruption in our Customer Portal, and we have confirmed that both server details and Support Ticket creation are currently functioning as expected. All systems are now Operational.
We are continuing to investigate the issue. At this time as Support Tickets created through the Customer Portal are blocked, please either email your support request to support@macstadium.com or reach out to us through LiveChat.
Different aspects of our Customer Portal, including creating Support Tickets and viewing individual host information, are currently experiencing a service interruption. Our Developers are actively investigating this issue at present.
Report: "[DUBLIN] Dublin Network Outage"
Last updateThis incident has been resolved.
Our Network Engineers were able to identify the issue that caused the Network impact and resolve it. All traffic has been restored. Our Engineers are continuing to monitor the network.
Our Network Engineers are continuing to Investigate the issue and look for a solution. We will update you once we know more information.
There is a network issue in Dublin that is affecting multiple customers currently and may cause loss of connectivity. We have multiple Network Engineers investigating the issue currently and we will update as soon as we have more information.
Report: "[Atlanta] Power disruption in Atlanta Data Center"
Last updateThis incident has been resolved.
The Atlanta Data Center power and services are now fully restored.We apologise for any inconvenience & we will continue to work with our data center provider to mitigate similar events from reoccurring.
MacStadium have worked with our data center provider to restore power to all affected customers, Macstadium are continuing to restore service to any remaining affected customers. Once again our apologies for any inconvenience.Kind Regards Luke Kavanagh MacStadium Incident Manage
The Atlanta Data Center has had a power disruption which will affect a small number of customers. MacStadium has engaged the Data Center provider and power will be restored momentarily. We apologise for any inconvenience. Kind Regards Luke Kavanagh MacStadium Incident Manager
Report: "[DUBLIN] Network Latency for Multiple Customers"
Last updateThe issue has been resolved. We do not expect any further issues, but we will continue to monitor the situation.
MacStadium Engineers have discovered a Network issue that may cause 10% or more packet loss for multiple customers at the Dublin data center. The engineers are currently investigating the issue and working on remediation steps. We will update the message once we have more information.
Report: "[DUBLIN] Network Latency for Multiple Customers"
Last updateThe issue has been resolved. We do not expect any further issues, but we will continue to monitor the situation. We are following up with our external resources to ensure no similar incidents occur in the future.
External resources continue to work with MacStadium Engineers to help resolve the issues. Still no ETA for total resolution of the issue that is impacting several customers.
MacStadium Engineers have engaged external resources to help identify the issue.
MacStadium Engineers have discovered a Network issue that may cause 10% or more packet loss for multiple customers at the Dublin data center. The engineers are currently investigating the issue and working on remediation steps. We will update the message once we have more information.
Report: "[LAS] Possible Network Issues"
Last updateMonitoring complete.
The Network issue has been discovered and resolved. Customers should be able to reach their Hosts. MacStadium Engineers will continue to monitor the Network.
We have received reports from Multiple customers who are unable to connect consistently to their hosts in the Las Vegas data center. We have multiple Engineers engaged and investigating for a resolution to the issues. We will update this page once we have more information.
Report: "[GLOBAL] Wide Spread Internet Outages"
Last updateGlobal DNS issue appears to have been resolved.
There are currently reports of widespread Internet outages Globally that may affect your ability to access MacStadium resources. We will update this page as we receive more information.
Report: "Atlanta Data Center - Extended maintenance"
Last update| **Postmortem owner** | **Luke Kavanagh, Manager - Operations** | | --- | --- | | **Incident** | The Atlanta data center experienced a network outage | | **Priority** | **P1** | | **Affected services** | All Atlanta-based customer host traffic affected | # Executive summary During a scheduled maintenance window the Atlanta Data centers experienced a network outage lasting approximately two hours \(02:10am EST - 04:05am EST\). During a network device code update, the border leaf switch configurations were inadvertently modified by the update which prevented layer-3 customer traffic from flowing. Once the code lines were identified & updated, service was restored. # Postmortem report ### **Fault** A Scheduled Maintenance network device code update performed an overwrite of existing network configuration which created a scenario where the Atlanta data center was uncontactable to customer traffic. ### **Impact** All customer layer three traffic was unable to flow inbound/outbound from the Atlanta data center ### **Detection** The network engineer performing the scheduled maintenance detected the network outage while performing the maintenance. ### **Response** Once the MacStadium Incident Management Team were alerted, they initiated the incident response to conduct a thorough investigation. ### **Recovery** With the help of our network vendor partners, the offending network code was identified & updated which allowed traffic to resume normally. ### **Timeline** June 20th, 2021 00:01 AM EST - Scheduled maintenance was started by the MacStadium Network Engineering Team. 02:05 AM EST - It was noted by a network engineer that customer traffic had been halted. 02:10 AM EST - The MacStadium Engineering Team performs a rollback of all changes made prior to the halting of traffic, but this unfortunately does not resolve the issue. 02:14 AM EST - The Incident Manager was notified of the issue. 02:24 AM EST - The MacStadium Incident Manager declares an outage & opens an outage bridge for the Incident Team to investigate. 03:07 AM EST - The MacStadium Incident Team contact the network vendor for additional support. 03:08 AM EST- The MacStadium Status page is updated to reflect the outage status. 03:12 AM EST - The network vendor partner joins the outage bridge to help identify the issue. 03:50 AM EST - After numerous troubleshooting steps, the offending code is identified by the vendor. 03:53 AM EST - One of the border leaf switches is updated with new code & this starts to allow traffic to resume. 04:03 AM EST - Service is fully restored, the Incident Team continue to work with the network vendor to identify the root cause. 04:09 AM EST - The MacStadium status page is updated to reflect the restoration of service. 04:44 AM EST - The MacStadium Incident Manager declares the outage is now over & service has been fully restored. The outage bridge is closed. ### **Root Cause Analysis** **Problem:** During a scheduled maintenance window, a network device code update halted all inbound/outbound traffic from the Atlanta data center. **Investigation:** Upon a thorough investigation by the MacStadium incidence team along with our network vendor team it was identified that the network code update inadvertently modified our layer three network configurations which prevented customer traffic from flowing. **Mitigation steps:** Improper layer3 configuration was deployed/stretched in a multisite configuration which caused layer3 to be disabled within the evpn/ vxlan fabric after the network switches rebooted regardless of the version of code, thus this was deemed to be human error. The configuration has since been corrected, validated and tested with the network vendor to ensure network traffic continuity in the event of any border leaf network switches rebooting under any circumstances. ### **Follow-up tasks** MacStadium understands our role as a service provider. We can and do apply all relevant lessons learned from past events as we strive to create reliable remote hands capabilities to our Customers. Our goal is to provide unparalleled service and delivery to our customers' Mac infrastructure needs. We take that responsibility seriously and are constantly working to improve customer experience.
The Engineering Teams have identified and corrected a configuration issue. We will continue to monitor the network closely.
Services remain fully operational with no ongoing concerns. We will continue to monitor our systems closely.
The Engineering Teams have identified and corrected a configuration issue, and normal service has been restored to the Atlanta Data Center at this time. We are continuing our monitoring of the situation to ensure ongoing stability.
Scheduled maintenance for today (Reference ID: CHANGE - 97) must be extended beyond the original maintenance window scheduled to end at 7:00 AM UTC. The Atlanta Data Center is currently losing packets at this time, and the MacStadium Engineering Team has engaged with the vendor Engineering Support Team.
Report: "[DUBLIN] Network Equipment Failure"
Last update## Postmortem summary | **Postmortem owners** | Mark Ryan, Network Engineer ⎮ Michael Weir, Systems Engineer ⎮ Michael Rhoades, Manager, Service Center | | --- | --- | | **Incident** | Infrastructure router crash/module failure | | **Priority** | **P1** | | **Affected services** | Partial network connectivity at MacStadium’s Dublin data center | ## Executive summary At 20:25 UTC on Sunday, September 13, 2020 MacStadium’s Dublin data center experienced a partial network outage affecting some customers. This outage lasted for 2 hours and was due to a Supervisor module crash on a key infrastructure router \(and the hardware failure of one of its modular components\). The supervisor module was manually rebooted and the majority of traffic was restored. A sub-module had experienced a failure that additionally required a hardware replacement. Service to all customers was restored at 22:20 UTC. ## Postmortem report | **Instructions** | **Report** | | --- | --- | | **Timeline** | 2020-08-13, 9:12 UTC – a MacStadium Technician escalates an unusual pattern of alerts to the Incident Manager. The Engineering Team begins an immediate investigation. ⎮ 19:25 UTC – MacStadium Engineers confirm there is a partial outage affecting some customers in the Dublin data center and continue with troubleshooting. ⎮ 19:35 UTC – It was determined a key infrastructure router had crashed. Engineers reboot the router, which restored traffic to the majority of affected servers. ⎮ 19:50 UTC – One of the router sub-modules fails to power back on, and Engineers are dispatched to the Dublin data center to continue the investigation. ⎮ 19:54 UTC – The Incident Manager updates MacStadium’s Status Page with investigation status. ⎮ 20:20 UTC – Engineers arrive on site. They test and confirm the failure of the module. They locate, install, and test a replacement module. Once the cabling is fully reconnected to the new module, all traffic is restored. ⎮ 21:26 UTC – Status Page is updated that the incident cleared. | | **Root cause** | The primary supervisor module on a key infrastructure router crashed, necessitating a reboot of the router. After reboot, it was found that a sub-module had failed and required hardware replacement. | | **Follow-up Actions** | Approximately one year ago MacStadium set out to upgrade its Dublin data center to a new network architecture in order to provide greater east-west bandwidth, resiliency, and scale. The final piece of that plan will be executed by December 2020. Unfortunately, this incident occurred due to a single point of failure in the Dublin network that our network upgrade will address. |
Monitoring will continue, no further issues anticipated. Incident is resolved.
Failed equipment has been replaced and the network issue should be stabilized. On site Engineers are currently monitoring the replacement equipment.
We have received alerts of a down network device that could affect several customers in the Dublin Datacenter, Engineers are onsite investigating the issue currently.
Report: "CenturyLink/Level 3 Outage"
Last updateThe following is based on a Reason For Outage document provided by CenturyLink: **Incident Start:** August 30, 2020 10:04 GMT **Incident Clear:** August 30, 2020 15:10 GMT Affecting Internet Services in Multiple Markets **Cause** A problematic Flowspec announcement interfered with correctly establishing Border Gateway Protocol \(BGP\). This had a significant impact on client services. **Resolution** CenturyLink deployed a configuration change to block the problematic Flowspec. This restored services to normal functioning. **Summary** On August 30, 2020 at 10:04 GMT, CenturyLink became aware of an issue that was starting to affect customers in multiple markets. CenturyLink teams were immediately engaged, and they began an intensive investigation. They were unable to determine a cause at first but attempted to deploy potential solutions. By 10:52 GMT, the effects on MacStadium customers had grown, and the NOC escalated the incident to the Incident Manager and Engineering Team, who immediately began an analysis of the situation based on information from monitoring alerts and customers. The team attempted to contact CenturyLink but were still unable to get through by 11:36 GMT due to overwhelming demand on CenturyLink support. By 12:18 the Engineering Team shut down the CenturyLink links to temporarily bypass all CenturyLink issues at the level of MacStadium’s network. MacStadium network traffic returned to normal, but traffic was still affected by intermediate carrier dependencies on CenturyLink across the globe. At approximately 14:00 GMT, CenturyLink found a Flowspec announcement for managing routing rules that was problematic in that it was preventing Border Gateway Protocol \(BGP\) from establishing as intended. The original source of this was determined to be the unintentional introduction of wildcards into an attempt to block a single IP. At 14:14 GMT, CenturyLink deployed a global configuration change to block the problematic Flowspec announcement. The command began propagating across devices and the problematic protocol was successfully removed. This allowed BGP to correctly establish again. By 15:10 GMT, all alarms had cleared and service returned to nominal. After an observation and monitoring period, MacStadium restored CenturyLink links at 4:55 GMT on August 31, 2020.
We have detected no further issues related to this incident. CenturyLink/Level 3 confirms full restoration of services, and our monitoring will continue as usual.
CenturyLink/Level 3 has recovered. MacStadium has re-introduced CenturyLink Internet routing for Atlanta, Las Vegas, and Dublin alongside our other transit carrier partners. Network traffic remains nominal, but the Network Team will continue monitoring for possible anomalies.
CenturyLink/Level 3 is experiencing a widespread outage internationally. MacStadium has rerouted traffic to other providers, and all traffic is restored. Customers might still see problems at a global scale due to CenturyLink's outage. We are working with this provider now to identify an estimated time to restoration.
CenturyLink/Level 3 Communications is experiencing a widespread network outage affecting some customers. We are now rerouting traffic from those affected links.
Report: "Las Vegas Carrier Outage"
Last updateMacStadium has been monitoring for 48 hours and now considers this issue resolved.
The local carrier has resolved the fiber cut. Redundancy has been restored. MacStadium will monitor to ensure stability.
Local carrier is still working to resolve the fiber cut.
Local carrier Splicing Team is still working on the fiber cut.
Splicing Team is onsite of the fiber cut and are working to restore services.
The issue has been confirmed to be due to a regional fiber cut. The local provider advises the construction crew has pulled in replacement cable. The splice team should arrive shortly and begin preparing the cable for splicing. MacStadium will continue to monitor the situation as it develops.
In the Las Vegas market MacStadium is aware of a known major fiber outage effecting multiple carriers. We are operating at reduced redundancy with potential latency.
Report: "[ATLANTA] Network Issue"
Last updateThe issue has been monitored for 48 hours and is now considered resolved.
Our upstream carriers have filtered out the Upstream traffic which appears to have stabilized the network. Our Engineers are continuing to work with the carriers and monitor the network.
Our Engineers have determined that MacStadium has been the target of a DDoS attack, we are currently working with our Upstream providers to help resolve the issue.
We are continuing to investigate this issue.
We have received reports from multiple customers that they are experiencing intermittent connectivity issues in Atlanta. We have multiple Engineers investigating currently.
Report: "Provider DNS Outage"
Last updateThis incident has been resolved.
Provider has provided an update that the issue appears to be resolved, MacStadium will continue to monitor and provide any additional updates if necessary.
Due to a network outage from one of MacStadium's DNS providers, customers utilizing MacStadium DNS servers for Reverse PTR records will see an impact on their hosted email servers. MacStadium is monitoring the issue with the provider and will pass along updates as they are available.
Report: "Partial Outage Affecting Some Cloud Customers"
Last update# Partial Outage of vCenter Access Root Cause Analysis Date: 04/02/2020 Location: Atlanta, Georgia, USA This Document Provides a Root Cause Analysis of a Partial Outage Affecting Some VMware Cloud Customers _**Date: Wednesday, April 1, 2020**_ ## Prepared by * Paul Benati, SVP Operations * David Hunter, VP Engineering * Preston Lasebikan, Systems Architect & Team Lead - Systems Engineering * Michael Rhoades, Manager, Service Center ## Location Atlanta \(ATL1\) data center ## Event Description At 00:10 EST on Wednesday, April 1, 2020 our Atlanta data center experienced a partial network outage specifically affecting the ability of some customers to access management functions of their vCenter appliances. This outage lasted for approximately 55 minutes and was due to a faulty high availability \(HA\) cable preventing failover during planned steps in a server platform upgrade. Engineers manually invoked the failover and full service was restored by 01:05 EST. ## Chronology of Events / Timeline * 2020/04/01 at 00:10 EST - Engineering remotely pushed planned configuration updates to the fabric interconnects \(FI\) in preparation for the upgrade to an advanced, future-proof server platform. * 2020/04/01 at 00:14 EST - Monitoring alerts indicated disjointed intermittent traffic to multiple Atlanta vCenter server appliance \(VCSA\) hosts causing delays or inability to reach VCSAs externally for some customers. * 2020/04/01 at 00:15 EST - Engineering immediately ascertained their VPN authentication was interrupted by the changes and dispatched personnel to intervene onsite. * 2020/04/01 at 01:02 EST - Manual intervention onsite successfully initiated FI failover and network traffic was immediately restored. Further inspection revealed the HA heartbeat cables had failed and were subsequently replaced. * 2020/04/01 at 01:15 EST - Engineering validated the configuration would prevent any future events by performing failover tests. ## Summary Findings on Technical Causes In preparation for an upcoming upgrade to advanced new fabric interconnects, configuration updates were pushed to one of the current FIs. Normally, failover would occur without interruption to network traffic; however, a failure in one of the physical HA cables between FIs prevented the failover, which simultaneously prevented authentication of the VPN connection used to remotely remediate the situation, thus requiring onsite intervention. The non-failover caused intermittent network traffic loss to multiple Atlanta vCenter appliances affecting the ability of some customers to perform management functions. As a note, this did not affect running hosts but only access to VCSA. ## Root Causes 1. A failed HA cable prevented automatic failover of the fabric interconnect. 2. Subsequent loss of VPN authentication prevented remote manual initiation of FI failover. ## Reaction, Resolution, and Recommendations MacStadium immediately dispatched Engineering to perform the manual failover procedure onsite and network traffic flow immediately restarted. HA cables were then replaced, configuration checked and confirmed. Internal and external alerting quickly confirmed access, and spot checks were run system-wide. Although the issue was isolated to the FI, full cluster checks were conducted. Going forward, VPN authentication will be performed prior to the use of virtual desktop infrastructure, precluding the possibility of a similar incident occurring in the future. Additionally, the integrity of physical HA links will be inspected at regular intervals. MacStadium is also investigating the implementation of advanced self-healing measures. MacStadium takes issues resulting in a loss or degradation of service very seriously and we strive to continually understand the root cause so we may improve our level of service for our customers. ## Sponsor Acceptance ### Approved by the Project Executive Date: 2020-04-02 Paul Benati SVP Operations ## Revision History
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
Our onsite Engineers have identified the issue and service is now restored. We will be closely monitoring the environment to ensure all services are fully functioning.
We are currently investigating an issue where a subset of VMware cloud customers are experiencing failures when attempting to access their vCenter for management purposes. All VMs appear to be running and general production is unaffected.
Report: "Atlanta Network Outage"
Last update# Atlanta Network Outage _**Date: Thursday March 19, 2020**_ ## Prepared by * Paul Benati, SVP Operations * David Hunter, VP Engineering * Khoa Tran, Network Architect & Team Lead - Network Engineering * Michael Rhoades, Manager, Service Center ## Location Atlanta \(ATL1\) data center ## Event Description At 15:05 EST on Thursday, March 19, 2020 our Atlanta data center experienced a network outage. This outage lasted for 41 minutes and was due to a border leaf switch's configuration file inadvertently being modified. The switch's configuration was restored from backup and network service was restored at 15:46 EST. ## Chronology of Events / Timeline * 2020/03/19 at 15:05 EST - Engineering remotely executed a configuration script inadvertently against one of Atlanta's border leaf switches instead of the desired top of rack switch resulting in a network outage * 2020/03/19 at 15:05 EST - VPN and VDI connections down for Atlanta * 2020/03/19 at 15:08 EST - Onsite Engineering at our Atlanta DC \(ATL1\) was engaged to provide access for remote engineers to troubleshoot the outage * 2020/03/19 at 15:35 EST - Access was provided to remote engineers * 2020/03/19 at 15:35 EST - Remote engineers begin to investigate the outage * 2020/03/19 at 15:40 EST - Engineering determines one of the border leaf switch's configuration had been altered * 2020/03/19 at 15:46 EST - Engineering restored the original configuration from backup to the affected border leaf switch thereby restoring service ## Summary Findings on Technical Causes The configuration of one of the Atlanta border leaf switches was inadvertently modified. This modification necessarily made the two HA-paired border leaf switches out of sync with each other. Due to the nature of the Cisco HA capability, this partially altered configuration resulted in disabling external connectivity. ## Root Causes 1. The configuration of one of the Atlanta border leaf switches was inadvertently modified ### Root cause 1 A script was remotely executed that inadvertently affected one of the two high availability paired border leaf switches . The script partially altered one switch's configuration resulting in disabled external connectivity at 15:05 EST. Network service was restored when the original configuration file was re-introduced from backup. ## Reaction, Resolution, and Recommendations MacStadium has disabled remote execution of certain shell scripting on primary switches. Significant changes can now be made only when directly connected to the switch, precluding the possibility of a similar incident in the future. MacStadium takes issues resulting in a loss or degradation of service very seriously and we strive to continually understand the root cause so we may improve our level of service for our customers.
We continued monitoring overnight and did not observe any issues. This incident has been resolved. Thank you again for your patience and understanding.
As of approximately 19:50 UTC a fix was implemented and we are monitoring the results. Again, we apologize for the inconvenience and are working to identify the root cause of our Atlanta border leaf systems failure.
Our Engineers have identified the problem and are now working to resolve the incident. Initial reports point to one of our border leafs failing. More details to be provided shortly.
We are currently experiencing a network outage affecting our Atlanta data center. Engineers are working to identify the cause and remediate as soon as possible. The outage is not affecting any other services (e.g., servers, etc.). We apologize for the inconvenience and will post an ETA as soon as possible.
Report: "[LAS VEGAS] Hardware Incident"
Last updateValued Customers, We have been monitoring the affected systems and they continue to be stable. We are closing this incident and are now focused on our problem management activities.
We are continuing to monitor for any further issues.
Dear Valued MacStadium Customers, Today, January 27th, around 6PM Eastern time, our monitoring system alerted us to a possible hardware incident in the Las Vegas market. Our on site team responded to this alert and they were able to determine and resolve the issue in approximately 15 minutes. This hardware incident may have impacted the ability for several customers to access their environment during the incident window. As of now, all affected systems are stable and onsite technicians will continue to monitor. Our Engineering team will conduct an investigation into the incident and we will make these details available. Please monitor status.macstadium.com for additional updates. If you believe you are still having an issue reaching your environment please contact us immediately via portal.macstadium.com or by calling 877.250.3497
Report: "Customer Portal Outage"
Last updateOutage Notification Reference ID: 88634 Message type: Monitoring Impact type: Non-disruptive to customers' environments Valued Customer, The fix implemented for the Customer Portal continues to be stable and therefore we are closing this incident. If you have questions, then please contact MacStadium Customer Support at 1-877-250-3497 or via email at support@macstadium.com. Additionally, visit MacStadium’s Status Page (http://status.macstadium.com/) for updates before, during, and after this change.
Outage Notification Reference ID: 88634 Message type: Monitoring Impact type: Non-disruptive to customers' environments Valued Customer, A fix has been implemented and our customers should now be able to access the Customer Portal. We are monitoring the system to ensure stability. If you have questions, then please contact MacStadium Customer Support at 1-877-250-3497 or via email at support@macstadium.com. Additionally, visit MacStadium’s Status Page (http://status.macstadium.com/) for updates before, during, and after this change.
We are continuing to investigate this issue.
Outage Notification Reference ID: 88634 Message type: Initial Message Impact type: Non-disruptive to customers' environments Valued Customer, Please be advised that MacStadium's Customer Portal is experiencing an issue whereby users attempting to login are redirected back to the login page resulting in an inability to leverage the portal. We are actively working the issue and will provide updates here. If you have questions, then please contact MacStadium Customer Support at 1-877-250-3497 or via email at support@macstadium.com. Additionally, visit MacStadium’s Status Page (http://status.macstadium.com/) for updates before, during, and after this change.
Report: "Atlanta DC - Cooling Incident"
Last updateThe data center cooling continues to be stable and not additional systems and services have had to be remediated.
At this time we have identified and remediated all known impacted systems and services. Please open a ticket if you’re experiencing any issues.
Due to a water pump failure at our data center colo provider, temperatures at our Atlanta site spiked. The water pump is now working and cooling is again flowing bringing the site's temperatures back into appropriate ranges. Unfortunately, these high temperatures negatively impacted a number of systems which we are assessing and working to resolve.
Report: "Las Vegas Route Impairment"
Last updateThe incident has been resolved and running without issue.
We have addressed the issue with the carrier and testing shows the issue should be resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating an issue related to the supernet 207.254.32.0/21. We are engaging with our carriers to address the issue.
Report: "Atlanta Fiber Ring Incident"
Last updateOur fiber carrier, Zayo, identified two issues on the short-side of our fiber ring. The first identified issue was a bad fiber at their fiber hut nearest the Atlanta DC3 data center. Upon identification they remediated the bad fiber. The second identified issue was as a dirty fiber jack at the Atlanta DC3 meet-me room (MMR) panel. Having identified the issue, Zayo took action to clean the jack and reset the fiber cable connection. In sum, these remediation actions have successfully cleaned up the experienced signal loss thereby bringing the short-side of our fiber ring into specification again. Monitoring shows the short-side of the fiber ring has stabilized and we are closing this incident with no related issues reported in the past 12+ hours.
Our fiber carrier has isolated the issue between their fiber facility at the street and our Altanta DC3 data center. We are continuing to work directly with Zayo and are going to leverage an optical time-domain reflectometer (ORDR) to determine where the issue lies on the short-side of our fiber ring. Additionally, the long-side of our fiber ring is up and operational. We will continue to monitor the situation closely and will make further announcements as necessary.
As of 9:36 PM ET on 10-15-2018 our upstream carrier, Zayo, began experiencing a dark fiber outage of unknown origin impacting a small portion of our Atlanta-based customers. This outage was isolated to one side of our redundant fiber ring connecting our Atlanta DC3 and DC4 data centers. Due to this redundancy, Zayo was able to restore service at 10:19 PM ET on 10-15-2018. As Zayo continued to explore the origin of the fiber ring failure they experienced a complete failure of this fiber, at approximately 12:54 AM ET on 10-16-2018, causing both sides of the ring to collapse. This collapse has resulted in a larger portion of our Altanta DC3-based customers to experience a network outage. We continue to be in direct communication with Zayo and are actively working with them to resolve the incident.
Report: "Atlanta Data Center for Hurricane Michael"
Last updateThis incident has been resolved.
MacStadium continues to actively monitor the progress of Hurricane Michael, now a Tropical Storm located 60 miles southeast of Athens, GA and is moving north-northeast at 21 miles per hour. Current conditions at the site include bands of rain, with wind gusts of 25 mph, the potential for localized flooding remains in place. The data center remains stable, in normal operating configuration, with no storm related impact. We will continue to monitor the situation closely and will make further announcements as necessary.
As an update to our previous notification Hurricane Michael is now a Category 4 Hurricane. The eye of Michael is expected to move ashore over the Florida Panhandle later today, move northeastward across the southeastern United States tonight and Thursday, and then move off the Mid-Atlantic coast away from the United States on Friday. Some additional strengthening is possible before landfall. After landfall, Michael should weaken as it crosses the southeastern United States. Georgia, the Carolinas, and southern Virginia will see rainfalls of 3 to 6 inches, with isolated maximum amounts of 8 inches. This rainfall could lead to flash floods. A few tornadoes will be possible into parts of central and southern Georgia and southern South Carolina this afternoon and tonight. MacStadium is equipped with multiple backup generators in the unlikely event of a short-term municipal power failure. Our staff will be both onsite and remote to support any service tickets or requests. While we don't anticipate any impact to our normal response times, please refer to this status page for updates. Our agents in both Las Vegas and Europe have global access to our ticketing system to respond to customer requests, as needed. Additionally, MacStadium offers customers the ability to sign up for hosts in our other data center locations (i.e., Sunnyvale, CA, Las Vegas, NV, and Dublin, Ireland) to mitigate any potential local outages and enhance business continuity. We will continue to monitor the situation closely and will make further announcements as necessary.
As of October 8, 2018, the National Weather Service is predicting a strong hurricane will make landfall Wednesday, October 10th on the Florida Panhandle and on a trajectory towards Atlanta sometime Thursday, October 11th. It is possible that the storm could bring strong winds and heavy rain to the Atlanta area starting as early as Wednesday night. Our data center in Atlanta is physically elevated and not subject to any local flood risks. MacStadium is equipped with multiple backup generators in the unlikely event of a short-term municipal power failure. Our staff will be both onsite and remote to support any service tickets or requests. While we don't anticipate any impact to our normal response times, please refer to this status page for updates. Our agents in both Las Vegas and Europe have global access to our ticketing system to respond to customer requests, as needed. Additionally, MacStadium offers customers the ability to sign up for hosts in our other data center locations (i.e., Sunnyvale, CA, Las Vegas, NV, and Dublin, Ireland) to mitigate any potential local outages and enhance business continuity. We will continue to monitor the situation closely and will make further announcements as necessary.
Report: "Dublin, Ireland - Core routing issue"
Last updateWe are closing this incident with no issues reported in past 12+ hours related to the routing engine failure. A non-impacting maintenance will be scheduled in coming days to replace the failed hardware.
Cause of issues have been identified as a core switch supervisor failure and subsequent fail-over to secondary. Services should be restored after a brief drop during the fail-over mechanisms. We are continuing to validate full service availability, and will be monitoring closely.
We are investigating a core switching issue in our Dublin, Ireland datacenter, potentially causing connectivity issues to various customers in that facility.
Report: "Global Issue"
Last updateCarriers resolved their global issue yesterday and monitoring shows the issue has stabilized.
There is currently a known issue between Level 3 and Comcast that could affect customers ability to access MacStadium. We are keeping an eye on the global issue that is outside of our network and will keep this message updated as we know more.
Report: "Upstream Carrier Issue"
Last updateRouting of affected range has been resolved and stable for 24 hours.
Upstream carrier was able to identify the affected subnet, 208.52.164.x/24, and resolve the routing issue around 8AM est. Only customers on that subnet were affected. We will continue to work with the carrier and monitor for 24 hours.
Due to an upstream carrier network change this morning we are experiencing a network outage effecting a subset of our public IP ranges. We are in direct communication with our carrier and actively working with them to resolve this incident.
Report: "Atlanta - Upstream Internet Flap"
Last updateWe identified a momentary upstream BGP network drop related to several IPv4 subnets in Atlanta at ~ 8:00 AM EST (12:00 GMT). Upon investigation, we were able to validate that our primary upstream carrier was making routing adjustments in an unplanned maintenance on their side, causing BGP re-convergence to alternate paths, and in return, portions of the MacStadium Atlanta network to become unreachable for minutes at a time. Related to this incident, MacStadium will be proactively communicating performing a couple planned maintenance activities in the very near future to obviate this carrier, and further enhance all Atlanta internet routing to ensure these type of events do not re-occur.
Report: "Official Response - Atlanta Data Center for Hurricane Irma"
Last updateAfter investigating thoroughly, we found that we did NOT suffer any significant outage in Atlanta as a result of the single generator going offline. We did identify a couple customer specific isolated PDU's which did fail due to power surge related to the fail-over. At this time, all incidents are identified and addressed. We are leaving this incident open until the diesel fuel levels at the datacenter are restored to full standby levels upon fuel contractor topping off diesel tanks completely. Please note that our data center is now fully on city power and no longer on generator power. Services are normal status.
We have just received notification from our Atlanta datacenter operator that at least 1 of 3 generators has gone offline after running on diesel fuel for 6+ hours since power was lost in the area due to the Hurricane. Engineering is investigating what is offline and online - we will update incident once more clarity is available.
As of September 8th, 2017, the National Weather Service is predicting a strong hurricane will make landfall Sunday, September 10th in Southern Florida and is on a trajectory towards Atlanta by late Monday, September 11th. It is predicted that the storm could bring strong winds and heavy rain to the Atlanta area starting on Monday, September 11th. Our data center in Atlanta is physically elevated and not subject to any local flood risks. MacStadium is equipped with multiple backup generators in the unlikely event of a short-term municipal power failure. Our staff will be both onsite and remote to support any service tickets or requests. While we don't anticipate any impact to our normal response times, please refer to this status page for updates. Our agents in both Las Vegas and Europe have global access to our ticketing system to respond to customer requests, as needed. Additionally, MacStadium offers customers the ability to sign up for hosts in our other data center locations (i.e., Las Vegas, NV and Dublin, Ireland) to mitigate any potential local outages and enhance business continuity. We will continue to monitor the situation closely and will make further announcements, if necessary.
Report: "Errors Messages Reported"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
After investigating thoroughly, we found that we did NOT suffer any significant outage in Atlanta as a result of the single generator going offline. We did identify a couple customer specific isolated PDU's which did fail due to power surge related to the fail-over. At this time, all incidents are identified and addressed. We are leaving this incident open until the diesel fuel levels at the datacenter are restored to full standby levels upon fuel contractor topping off diesel tanks completely. Services are normal status.
We are currently experiencing multiple error messages via our monitoring systems at our Atlanta data center. Our Engineering team is engaged and investigating. We will continue to update you via status.macstadium.com.
Report: "Atlanta - Isolated intermittant network connectivity"
Last updateThis issue has been resolved and all service is stable.
We have been tracking an intermittent core switching issue which initiated late Saturday 7/29 around 14:30 UTC, and has affected service to a segment of customers in Atlanta. The condition has been highly intermittent, causing a failure of traffic to flow for several minutes, and then traffic would flow again freely for many hours again without re-occurring. Log and NMS reports pointed us to initially repair the isolated issue by replacing a single 10Gb optic which had been identified as failed. Unfortunately, the failure condition re-surfaced again early Sunday morning. On site staff have been very busy re-arranging network paths to isolate the issue since then. More recently, at around 7/30 10:30 UTC, the issue expanded in nature, pointing to a multi port 10Gb switch line card which unexpectedly crashed and rebooted itself. We have since moved all network traffic off of that card, and are in the process of replacing it entirely. This recent development has expanded the nature of this incident to a more significant portion of Atlanta customers. We do not expect any further intermittent service disruption related to this incident. However, due to the very highly intermittent, and expanding nature of this, we will be monitoring it closely for 48+ hours before officially closing it out as resolved.
Report: "Atlanta - Internet Uplink Flapping"
Last updateWe have received closure from Digital Realty on this global case, and are now closing our incident accordingly. --- RESOLVED RED Event - Generator Failure / Critical Load Impact ITST# 448320 56 Marietta Atlanta GA As an update to the Generator failure at 56 Marietta, the engineering team reports that power has been restored and all power systems have been normalized. An RFO, that will outline the root cause as well as the scope of the event, will be published. Further updates will be provided as they become available.
At this time, we have managed a workaround with Level3 via a different POP in Atlanta, and traffic is again flowing. We are operating with limited upstream connectivity, so performance may be sub optimal until further notice.
The issue in Atlanta has grown to be a of critical nature. The root cause is a major power failure at the Digital Realty / Telx 56 Marietta Street facility in Atlanta which acts as the cross-roads for most all traffic in the region. Although there are other upstream connections in place outside of 56 Marietta, the global nature of the incident at 56 Marietta is causing issues with other upstream connections in and around the region. We will continue to update this incident as we gain additional information.
We are investigating an upstream internet BGP issue with one or more carriers which may be causing intermittent service disruption to some Atlanta infrastructure.
Report: "Atlanta - Internet Uplink Failure : Level3 / XO"
Last updateThe root cause of this router failure is identified, and will be permanently addressed in a planned maintenance window in the near future. All Internet up-links in/out of Atlanta are stable at this time.
As of 06:30 UTC (14:30 EST), we have been monitoring an issue with one of our Juniper border routers in Atlanta which has in turn caused Level 3 and XO Communications 10GB Internet up-links to flap. These 10GB up-link failures are causing Internet traffic in/out of Atlanta to fail over to other 10GB Uplink connections on other carriers which we maintain in Atlanta. As the Internet traffic re-converges across the Internet via BGP, you may be experiencing intermittent accessibility to hosts within the MacStadium Atlanta data-center facilities.
Report: "Internet Outage - Atlanta"
Last updateFurther investigation found the BGP issues to be rooted with Level3 who was having routing issues on a regional basis at this time. Level3 is the preferred carrier for a significant amount of traffic in and out of MacStadium Atlanta including the preferred carrier to popular network peers like AWS. During this incident, we found no network transport problems inside MacStadium Atlanta, or its interconnections to our Tier 1 network providers in the region. Our Internet BGP sessions with Level 3 remained up, however traffic stopped flowing to/from Level3 due to an issue inside their network at the time. Because the BGP session never went down, manual intervention was required to force Internet traffic onto other BGP peers we have in Atlanta. We have still not received a full explanation from Level 3 as to what occurred, but we do feel that it is necessary to close out this incident at this time as we have seen no other effects or symptoms related to this event in over 24 hours.
Approx 14:02 EST - 14:18 EST, we suffered a very rare incident which significantly degraded performance to various public upstream carriers at the Metro Transport layer. We are still investigating root cause, and will provide more information as that becomes available. Connectivity is stable at this time, and we are monitoring closely.
We are investigating a BGP peering issue in Atlanta causing significant loss of access to MacStadium - Atlanta.
Report: "LAS VEGAS Network Latency"
Last updateThis incident has been resolved.
Level3 seems to have stabailized over the past 15 minutes. We will keep this incident open for several hours and keep monitoring. https://goo.gl/KMQaJx
We are monitoring an issue in our Las Vegas datacenter where we are observing increased latency and inefficient BGP routing thru our upstream connection with Level3. Engineers are working to reroute traffic around the issue at this time.
Report: "Intermittant DNS Resolution - Global Internet Issue"
Last updateThis incident has been resolved.
We're just providing a heads up to our customer base that there are Global DNS Issues in North America today (rooted outside of MacStadium), and we are seeing some symptoms of it here at MacStadium intermittantly as well. The symptoms present themselves as certain Internet sites/assets not being reachable. However, IP Connectivity is there - if you are to run PING/Traceroute tests. Here is a decent write up on it: http://gizmodo.com/this-is-probably-why-half-the-internet-shut-down-today-1788062835
Report: "Atlanta Core - Line Card Down"
Last updateAll 10Ge uplinks have been migrated. As far as we can tell, everything is backto 100% at this time. Closing Incident.
All 10Ge uplinks have been migrated off of the affected core interface card. We are completing some cleanup, and expect to be 100% within about 10-15minutes.
All 10Ge uplinks have been migrated off of the affected core interface card. We are completing some cleanup, and expect to be 100% within about 10-15minutes.
We have a 16 Port 10Ge Line Card down in Atlanta - 10Ge circuits are being moved to bypass the issue.
Report: "LAS VEGAS Network Latency"
Last updateCogent cleared the issue in Los Angelas reported as being a fiber cut. We have been monitoring closely for a couple hours since resolution and are now closing the incident as we are no longer seeing any symptoms.
We are observing excessive latency related to Cogent circuits in the region.
Report: "LAS VEGAS Network Latency"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
As of ~ 9:15AM PST, packet loss has cleared. Issue is rooted on the Cogent transport network in/around Los Angelas, CA - outside of MacStadium's data center environment in Las Vegas We will continue to monitor the incident closely.
We are investigating severe packet loss issue on transport circuits in/out of Las Vegas datacenter, which is responsible for degredation of service.
Report: "Dublin Core Issue"
Last updateIssue resolved.
We are investigating a routing issue in/out of Dublin at this time.
Report: "Atlanta Upstream Internet BGP - Host.net Flapping"
Last updateThis incident has been resolved.
After adjusting upstream BGP routing to mitigate the Host.net carrier feed, all connectivity has returned to normal for approx. 3 hours. Incident resolved.
Around 8:15 AM (EST) we noticed upstream routing issues with one of our upstream providers. Traffic destined to Macstadium from various locations around the internet could be impacted if that traffic chose to utilize Host.net as its primary pathway into MacStadium Atlanta. Tier 3 engineers are looking at the issue and adjusting routing to work around the issue until Host.net has resolved the core problem.