Is MacStadium Down Right Now? Discover if there is an ongoing service outage.

MacStadium is currently Operational

Last checked Jul 29, 2025 14:35 UTC from MacStadium's official status page

Historical record of incidents for MacStadium

Feb 8, 2025

Report: "Private Cloud in Atlanta"

Last update 2025-02-08T14:45:35.822Z

resolved2025-02-08T14:45:35.807Z

Dear Customers, Private Cloud/VergeIO services in ATL1 have been restored. We have reached out individually to affected customers regarding the restoration of their environments. MacStadium Support Team

identified2025-02-08T14:44:10.116Z

identified2025-02-07T19:16:10.910Z

Dear Customers, The diagnostic review is still ongoing, and the VergeIO team has successfully recovered a portion of the remaining missing data. Efforts continue to restore as much data as possible before bringing the system back into production. We remain actively engaged in the recovery process and will provide further updates as more information becomes available. At this time, we do not have a definitive ETA for full restoration. Thank you for your patience and understanding. MacStadium Support Team

identified2025-02-07T15:46:34.027Z

Dear Customers, The diagnostic review is ongoing, and the VergeIO team is actively working to recover as much missing data as possible. Our teams remain fully engaged to support the recovery efforts and bring the system back into production as quickly as possible. We do not have a definitive ETA for full restoration, but we will continue to provide updates as more information becomes available. Thank you for your patience and understanding. MacStadium Support Team

identified2025-02-07T09:52:22.711Z

Dear Customers, Our efforts to restore the Atlanta-based Private Cloud environment are progressing, and we have now reached Step 4: Diagnostic Review in VergeIO’s five-step recovery plan. At this stage, the system is being analyzed to identify any gaps or unrecoverable data blocks. We remain actively engaged with VergeIO leadership to work toward a resolution as quickly as possible. Unfortunately, an exact ETA for full restoration is still unknown, but we will continue providing updates as we have more information. Thank you for your patience. MacStadium Support Team

identified2025-02-06T19:07:48.751Z

Dear Customers, Our efforts to recover the Atlanta-based Private Cloud environment continue as we work hand in hand with Verge through their restoration process. The system is still working through the first step (full disk scan)--we anticipate another 10 hours for this first step to complete. An overall restoration ETA is unknown; we believe it will likely take some time to bring Private Cloud back online. We will provide additional updates as we have more information to share. Thank you for your patience. MacStadium Support Team.

identified2025-02-06T13:19:40.000Z

Dear Customers, During a standard VSAN expansion process, the Verge system experienced unexpected behavior, resulting in the environment going down. Verge has outlined a five-step plan to recover the environment, including a full disk scan, journal walk process, repair cycle, diagnostic review, and VSAN connection. At this time, the first step (full disk scan) is approximately 70% complete. We are actively engaged with Verge leadership to work towards a resolution as quickly as possible, but unfortunately, an ETA is not known at this time, and we believe there is likely quite a way to go to bring Private Cloud back online. We will provide additional updates as we have more information to share. Thank you for your patience. MacStadium Support Team.

identified2025-02-06T09:22:06.543Z

Dear Customers, We continue to work actively with our vendor to repair and restore our Private Cloud service in our Atlanta, GA USA data center as quickly as possible. We will continue to provide updates as we have more information. Thank you for your patience. MacStadium Support Team

identified2025-02-06T02:22:44.137Z

Dear Customers, We have identified an issue with the VSAN that has caused the Private Cloud service in our Atlanta, GA USA data center to be down. Our team is actively working with our vendor to take down the environment for repair. We will continue to provide updates as we have more information. Thank you for your patience. MacStadium Support Team

investigating2025-02-05T22:22:00.603Z

Dear Customers, We are aware of an issue affecting the Private Cloud service in our Atlanta, GA USA data center. At this time, the service is hard down, and customers cannot access the system. Our team is actively investigating and working with our vendor to resolve the issue quickly. We will provide updates as we have more information. Thank you for your patience. MacStadium Support Team

Feb 3, 2025

Report: "Customer Portal Ticketing System Incident"

Last update 2025-02-03T20:10:06.595Z

resolved2025-02-03T20:10:06.575Z

Dear Customers, Since implementing the solution, we have observed no further issues with the Customer Portal ticketing system. As a result, we are closing out this incident. Thank you for your patience and trust. MacStadium Support Team

monitoring2025-02-03T15:53:41.592Z

Dear Customers, The issue affecting the Customer Portal ticketing system has been resolved. You should now be able to create, view, and respond to tickets as expected. We are actively monitoring the system to ensure continued stability. Thank you for your patience, and we apologize for any inconvenience this may have caused. MacStadium Support Team

investigating2025-02-03T13:55:50.172Z

Dear Customers, We are aware of an issue affecting the Customer Portal ticketing system. Some users may be unable to create new tickets or view and respond to existing ones. Our team is actively investigating and working to resolve the issue as quickly as possible. We will provide updates as we have more information. Thank you for your patience. MacStadium Support Team

Jan 24, 2025

Report: "Winter Storm Advisory – Our Preparedness for the Dublin Data Center Operations"

Last update 2025-01-24T13:10:09.257Z

resolved2025-01-24T13:10:09.239Z

This incident has been resolved.

investigating2025-01-22T18:45:09.658Z

As Storm Éowyn approaches the Dublin data center this Friday, we want to assure you that we are fully prepared to maintain seamless operations at our data center. Our co-location provider and our team have robust measures in place, including: Backup power systems, including generators and fuel reserves, to ensure uninterrupted service. Redundant network connections to maintain connectivity. The Irish government has however advised that potential danger to life is possible whilst travelling. After careful consideration, MacStadium has made the decision not to have onsite support at our Dublin data center from Friday 6am GMT until Friday 1pm GMT in order to protect our employee's safety. We continuously monitor weather conditions and have contingency plans in place to ensure your services remain operational and secure. Should you have any concerns or need assistance during this time, please don’t hesitate to reach out to our support team. Thank you for trusting us with your business.

Jan 22, 2025

Report: "Winter Weather Advisory – Our Preparedness for Atlanta Data Center Operations"

Last update 2025-01-22T18:22:15.830Z

resolved2025-01-22T18:22:15.814Z

This incident has been resolved.

monitoring2025-01-20T14:30:45.266Z

As winter weather approaches the Atlanta, Georgia area this Tuesday and Wednesday, we want to assure you that we are fully prepared to maintain seamless operations at our data center. Our co-location provider and our team have robust measures in place, including: • Backup power systems, including generators and fuel reserves, to ensure uninterrupted service. • Redundant network connections to maintain connectivity. • On-site and on-call personnel ready to respond swiftly to any weather-related challenges. We continuously monitor weather conditions and have contingency plans in place to ensure your services remain operational and secure. Should you have any concerns or need assistance during this time, please don’t hesitate to reach out to our support team. Thank you for trusting us with your business.

Jan 11, 2025

Report: "Winter Weather Advisory – Our Preparedness for Atlanta Data Center Operations"

Last update 2025-01-11T15:22:49.135Z

resolved2025-01-11T15:22:49.120Z

This incident has been resolved.

monitoring2025-01-09T18:04:43.718Z

We are continuing to monitor for any further issues.

monitoring2025-01-09T18:04:04.150Z

We are continuing to monitor for any further issues.

monitoring2025-01-09T18:03:40.173Z

As winter weather approaches the Atlanta, Georgia area this Friday and Saturday, we want to assure you that we are fully prepared to maintain seamless operations at our data center. Our co-location provider and our team have robust measures in place, including: Backup power systems, including generators and fuel reserves, to ensure uninterrupted service. Redundant network connections to maintain connectivity. On-site and on-call personnel ready to respond swiftly to any weather-related challenges. We continuously monitor weather conditions and have contingency plans in place to ensure your services remain operational and secure. Should you have any concerns or need assistance during this time, please don’t hesitate to reach out to our support team. Thank you for trusting us with your business.

Dec 5, 2024

Report: "LAS Network Infrastructure Maintenance"

Last update 2024-12-05T19:45:34.043Z

resolved2024-12-05T19:45:34.023Z

This incident has been resolved.

investigating2024-12-05T19:44:47.727Z

Apologies this was sent in error

investigating2024-12-05T19:41:12.562Z

Valued MacStadium Customers, On Dec 19th, 2024 from 02:30 UTC to 04:30 UTC, the MacStadium Engineering Team will be conducting network infrastructure upgrades. All hosts in the Las Vegas Data Center will experience a loss of network connection during the upgrade process. The maintenance is rolling, so no server should experience more than 15 minutes of downtime. If you have questions, please contact MacStadium Customer Support by opening a ticket via the Customer Portal or email us at support@macstadium.com. Additionally, visit MacStadium’s Status Page (status.macstadium.com) for updates before, during, and after this change.

Sep 27, 2024

Report: "Inclement Weather - Atlanta Data Center"

Last update 2024-09-27T14:29:37.041Z

resolved2024-09-27T14:29:37.021Z

This incident has been resolved.

monitoring2024-09-26T16:20:38.061Z

We are expecting inclement weather related to Hurricane Helene at our Atlanta Data Center. Our data center partner is monitoring all critical infrastructure at the site. All generators have been exercised and fuel levels are at their maximum levels. MacStadium and our DC partner’s staff will be on site to provide support during this weather event. Please open a ticket should you experience an issue.

Aug 28, 2024

Report: "LAS1 NETWORK ISSUE"

Last update 2024-08-28T22:32:37.061Z

resolved2024-08-28T22:32:37.046Z

Our Las Vegas Data Center Network issues have been resolved. Our Engineers will continue to monitor. Please contact us if you experience any issues. Thank you.

investigating2024-08-28T22:14:49.646Z

Our Las Vegas Data Center is currently experiencing Network issues. Our Engineers are looking into this and we will provide updates as we work through the issue. Thank you for your patience while we work this issue.

Jul 10, 2024

Report: "Atlanta Power Interruption Affecting Some Customers"

Last update 2024-07-10T03:41:42.004Z

resolved2024-07-10T03:41:41.984Z

We have confirmed power has been restored to all affected customers at this time. Please open a ticket if you’re experiencing any issue to your Atlanta-based servers.

monitoring2024-07-10T03:21:39.706Z

Our data center colo partner has restored power to the facility. We are working to confirm power has been restored to all affected customers.

identified2024-07-10T03:04:10.614Z

Our data center colo partner has isolated the failure to a specific UPS. They are currently working to remediate the failure. There is still currently no ETA.

investigating2024-07-10T02:31:59.571Z

Our data center colo partner is actively engaged and troubleshooting the power interruption at the site. We will post the ETA when it is provided to us.

investigating2024-07-10T02:00:41.098Z

There is a power interruption affecting some of our customers in our Atlanta facility. We are currently working with our data center partner, DataBank, to restore power.

Jul 6, 2023

Report: "ATL - Loss of Access to Cloud Environments"

Last update 2023-07-06T22:42:42.455Z

resolved2023-07-06T22:42:42.440Z

A fix has been implemented. All services are online, and we are monitoring the results. If you are still experiencing issues, please open a ticket via the Customer Portal.

identified2023-07-06T21:44:55.642Z

We are implementing the fix now and customers are coming back online. We should have all systems up shortly.

identified2023-07-06T21:09:29.295Z

We have determined root cause and are working to resolve the issue.

investigating2023-07-06T20:35:08.004Z

All cloud environments are still operational. Only the ability to communicate with them has been affected. We have narrowed down the issue to a network configuration. We expect to correct the issue near term.

investigating2023-07-06T20:25:09.023Z

We are experiencing a broad outage of customers' ability to access their cloud environments in the Atlanta data center. We are actively investigating and will provide additional updates.

Feb 10, 2023

Report: "LAS - Reduced Internet Capacity"

Last update 2023-02-10T10:36:18.268Z

resolved2023-02-10T10:36:18.255Z

We wish to advise our Las Vegas customers that the traffic connectivity to our Las Vegas data center remains at full connectivity capacity, we will step down these notifications but we will continue to monitor. We would like to thank our customers for their patience and once again apologise for any inconvenience related to this issue.

monitoring2023-02-10T09:13:39.706Z

We wish to advise our Las Vegas customers that we have been informed by our data carriers that full connectivity capacity has been restored. We will continue to closely monitor the situation and once again we apologise for any inconvenience related to this issue.

investigating2023-02-10T07:55:37.167Z

Our data carriers have provided us with an ETA of 13:00 UTC/ 05:00 PT when they expect the restoration of data capacity. We apologise for any inconvenience related to this reduced capacity.

investigating2023-02-10T07:28:15.335Z

We wish to advise our Las Vegas customers that we are experiencing carrier failures to and from our Las Vegas data center at this time. We expect no customer impact apart from operating at reduced capacity.

Dec 6, 2022

Report: "LAS - Reduced Internet Capacity"

Last update 2022-12-06T13:19:06.892Z

resolved2022-12-06T13:19:06.874Z

This incident has been resolved.

monitoring2022-12-06T13:18:38.425Z

Services have been routed off the failed hardware, and no further impact is expected. Normal capacity has been restored.

monitoring2022-12-05T21:35:10.788Z

The ISP identified the piece of hardware that failed and is in the process of replacing it now. No updated ETA has been provided.

monitoring2022-12-05T17:52:11.816Z

One of the Las Vegas ISPs has experienced a hardware failure, causing the site to experience reduced Internet capacity. Currently, traffic has converged to alternate ISPs to minimize impact. The Estimated Time of Repair from the ISP is 2030 UTC.

Nov 23, 2022

Report: "Customer Portal Outage"

Last update 2022-11-23T19:35:53.736Z

resolved2022-11-23T19:35:53.721Z

Our Developers have resolved the service interruption in our Customer Portal, and we have confirmed that both server details and Support Ticket creation are currently functioning as expected. All systems are now Operational.

investigating2022-11-23T18:56:35.324Z

We are continuing to investigate the issue. At this time as Support Tickets created through the Customer Portal are blocked, please either email your support request to support@macstadium.com or reach out to us through LiveChat.

investigating2022-11-23T18:41:14.881Z

Different aspects of our Customer Portal, including creating Support Tickets and viewing individual host information, are currently experiencing a service interruption. Our Developers are actively investigating this issue at present.

Oct 3, 2022

Report: "[DUBLIN] Dublin Network Outage"

Last update 2022-10-03T19:45:09.799Z

resolved2022-10-03T19:45:09.784Z

This incident has been resolved.

monitoring2022-10-03T17:12:11.160Z

Our Network Engineers were able to identify the issue that caused the Network impact and resolve it. All traffic has been restored. Our Engineers are continuing to monitor the network.

investigating2022-10-03T17:04:38.346Z

Our Network Engineers are continuing to Investigate the issue and look for a solution. We will update you once we know more information.

investigating2022-10-03T16:36:31.979Z

There is a network issue in Dublin that is affecting multiple customers currently and may cause loss of connectivity. We have multiple Network Engineers investigating the issue currently and we will update as soon as we have more information.

Jun 13, 2022

Report: "[Atlanta] Power disruption in Atlanta Data Center"

Last update 2022-06-13T14:52:45.677Z

resolved2022-06-13T14:52:45.660Z

This incident has been resolved.

investigating2022-06-13T14:36:59.897Z

The Atlanta Data Center power and services are now fully restored.We apologise for any inconvenience & we will continue to work with our data center provider to mitigate similar events from reoccurring.

investigating2022-06-13T14:09:22.767Z

MacStadium have worked with our data center provider to restore power to all affected customers, Macstadium are continuing to restore service to any remaining affected customers. Once again our apologies for any inconvenience.Kind Regards Luke Kavanagh MacStadium Incident Manage

investigating2022-06-13T13:47:05.000Z

The Atlanta Data Center has had a power disruption which will affect a small number of customers. MacStadium has engaged the Data Center provider and power will be restored momentarily. We apologise for any inconvenience. Kind Regards Luke Kavanagh MacStadium Incident Manager

May 18, 2022

Report: "[DUBLIN] Network Latency for Multiple Customers"

Last update 2022-05-18T11:14:12.484Z

resolved2022-05-18T11:14:12.472Z

The issue has been resolved. We do not expect any further issues, but we will continue to monitor the situation.

investigating2022-05-18T11:06:23.787Z

MacStadium Engineers have discovered a Network issue that may cause 10% or more packet loss for multiple customers at the Dublin data center. The engineers are currently investigating the issue and working on remediation steps. We will update the message once we have more information.

May 17, 2022

Report: "[DUBLIN] Network Latency for Multiple Customers"

Last update 2022-05-17T00:32:37.199Z

resolved2022-05-17T00:32:37.185Z

The issue has been resolved. We do not expect any further issues, but we will continue to monitor the situation. We are following up with our external resources to ensure no similar incidents occur in the future.

investigating2022-05-16T21:31:32.555Z

External resources continue to work with MacStadium Engineers to help resolve the issues. Still no ETA for total resolution of the issue that is impacting several customers.

investigating2022-05-16T18:32:35.490Z

MacStadium Engineers have engaged external resources to help identify the issue.

investigating2022-05-16T15:23:21.679Z

Nov 21, 2021

Report: "[LAS] Possible Network Issues"

Last update 2021-11-21T06:47:57.782Z

resolved2021-11-21T06:47:27.000Z

Monitoring complete.

monitoring2021-11-20T00:03:40.141Z

The Network issue has been discovered and resolved. Customers should be able to reach their Hosts. MacStadium Engineers will continue to monitor the Network.

investigating2021-11-19T23:39:57.385Z

We have received reports from Multiple customers who are unable to connect consistently to their hosts in the Las Vegas data center. We have multiple Engineers engaged and investigating for a resolution to the issues. We will update this page once we have more information.

Jul 22, 2021

Report: "[GLOBAL] Wide Spread Internet Outages"

Last update 2021-07-22T17:02:29.318Z

resolved2021-07-22T17:02:29.301Z

Global DNS issue appears to have been resolved.

monitoring2021-07-22T16:25:05.438Z

There are currently reports of widespread Internet outages Globally that may affect your ability to access MacStadium resources. We will update this page as we receive more information.

Jun 23, 2021

Report: "Atlanta Data Center - Extended maintenance"

Last update 2021-06-23T20:52:45.251Z

postmortem2021-06-22T23:48:02.812Z

| **Postmortem owner** | **Luke Kavanagh, Manager - Operations** | | --- | --- | | **Incident** | The Atlanta data center experienced a network outage | | **Priority** | **P1** | | **Affected services** | All Atlanta-based customer host traffic affected | # Executive summary During a scheduled maintenance window the Atlanta Data centers experienced a network outage lasting approximately two hours \(02:10am EST - 04:05am EST\). During a network device code update, the border leaf switch configurations were inadvertently modified by the update which prevented layer-3 customer traffic from flowing. Once the code lines were identified & updated, service was restored. # Postmortem report ### **Fault** A Scheduled Maintenance network device code update performed an overwrite of existing network configuration which created a scenario where the Atlanta data center was uncontactable to customer traffic. ### **Impact** All customer layer three traffic was unable to flow inbound/outbound from the Atlanta data center ### **Detection** The network engineer performing the scheduled maintenance detected the network outage while performing the maintenance. ### **Response** Once the MacStadium Incident Management Team were alerted, they initiated the incident response to conduct a thorough investigation. ### **Recovery** With the help of our network vendor partners, the offending network code was identified & updated which allowed traffic to resume normally. ### **Timeline** June 20th, 2021 00:01 AM EST - Scheduled maintenance was started by the MacStadium Network Engineering Team. 02:05 AM EST - It was noted by a network engineer that customer traffic had been halted. 02:10 AM EST - The MacStadium Engineering Team performs a rollback of all changes made prior to the halting of traffic, but this unfortunately does not resolve the issue. 02:14 AM EST - The Incident Manager was notified of the issue. 02:24 AM EST - The MacStadium Incident Manager declares an outage & opens an outage bridge for the Incident Team to investigate. 03:07 AM EST - The MacStadium Incident Team contact the network vendor for additional support. 03:08 AM EST- The MacStadium Status page is updated to reflect the outage status. 03:12 AM EST - The network vendor partner joins the outage bridge to help identify the issue. 03:50 AM EST - After numerous troubleshooting steps, the offending code is identified by the vendor. 03:53 AM EST - One of the border leaf switches is updated with new code & this starts to allow traffic to resume. 04:03 AM EST - Service is fully restored, the Incident Team continue to work with the network vendor to identify the root cause. 04:09 AM EST - The MacStadium status page is updated to reflect the restoration of service. 04:44 AM EST - The MacStadium Incident Manager declares the outage is now over & service has been fully restored. The outage bridge is closed. ### **Root Cause Analysis** **Problem:** During a scheduled maintenance window, a network device code update halted all inbound/outbound traffic from the Atlanta data center. **Investigation:** Upon a thorough investigation by the MacStadium incidence team along with our network vendor team it was identified that the network code update inadvertently modified our layer three network configurations which prevented customer traffic from flowing. **Mitigation steps:** Improper layer3 configuration was deployed/stretched in a multisite configuration which caused layer3 to be disabled within the evpn/ vxlan fabric after the network switches rebooted regardless of the version of code, thus this was deemed to be human error. The configuration has since been corrected, validated and tested with the network vendor to ensure network traffic continuity in the event of any border leaf network switches rebooting under any circumstances. ### **Follow-up tasks** MacStadium understands our role as a service provider. We can and do apply all relevant lessons learned from past events as we strive to create reliable remote hands capabilities to our Customers. Our goal is to provide unparalleled service and delivery to our customers' Mac infrastructure needs. We take that responsibility seriously and are constantly working to improve customer experience.

resolved2021-06-20T11:02:57.729Z

The Engineering Teams have identified and corrected a configuration issue. We will continue to monitor the network closely.

monitoring2021-06-20T08:50:54.627Z

Services remain fully operational with no ongoing concerns. We will continue to monitor our systems closely.

identified2021-06-20T07:55:29.000Z

The Engineering Teams have identified and corrected a configuration issue, and normal service has been restored to the Atlanta Data Center at this time. We are continuing our monitoring of the situation to ensure ongoing stability.

investigating2021-06-20T07:01:55.770Z

Scheduled maintenance for today (Reference ID: CHANGE - 97) must be extended beyond the original maintenance window scheduled to end at 7:00 AM UTC. The Atlanta Data Center is currently losing packets at this time, and the MacStadium Engineering Team has engaged with the vendor Engineering Support Team.

Sep 21, 2020

Report: "[DUBLIN] Network Equipment Failure"

Last update 2020-09-21T18:01:44.346Z

postmortem2020-09-21T17:48:46.674Z

## Postmortem summary | **Postmortem owners** | Mark Ryan, Network Engineer ⎮ Michael Weir, Systems Engineer ⎮ Michael Rhoades, Manager, Service Center | | --- | --- | | **Incident** | Infrastructure router crash/module failure | | **Priority** | **P1** | | **Affected services** | Partial network connectivity at MacStadium’s Dublin data center | ‌ ## Executive summary At 20:25 UTC on Sunday, September 13, 2020 MacStadium’s Dublin data center experienced a partial network outage affecting some customers. This outage lasted for 2 hours and was due to a Supervisor module crash on a key infrastructure router \(and the hardware failure of one of its modular components\). The supervisor module was manually rebooted and the majority of traffic was restored. A sub-module had experienced a failure that additionally required a hardware replacement. Service to all customers was restored at 22:20 UTC. ‌ ## Postmortem report | **Instructions** | **Report** | | --- | --- | | **Timeline** | 2020-08-13, 9:12 UTC – a MacStadium Technician escalates an unusual pattern of alerts to the Incident Manager. The Engineering Team begins an immediate investigation. ⎮ 19:25 UTC – MacStadium Engineers confirm there is a partial outage affecting some customers in the Dublin data center and continue with troubleshooting. ⎮ 19:35 UTC – It was determined a key infrastructure router had crashed. Engineers reboot the router, which restored traffic to the majority of affected servers. ⎮ 19:50 UTC – One of the router sub-modules fails to power back on, and Engineers are dispatched to the Dublin data center to continue the investigation. ⎮ 19:54 UTC – The Incident Manager updates MacStadium’s Status Page with investigation status. ⎮ 20:20 UTC – Engineers arrive on site. They test and confirm the failure of the module. They locate, install, and test a replacement module. Once the cabling is fully reconnected to the new module, all traffic is restored. ⎮ 21:26 UTC – Status Page is updated that the incident cleared. | | **Root cause** | The primary supervisor module on a key infrastructure router crashed, necessitating a reboot of the router. After reboot, it was found that a sub-module had failed and required hardware replacement. | | **Follow-up Actions** | Approximately one year ago MacStadium set out to upgrade its Dublin data center to a new network architecture in order to provide greater east-west bandwidth, resiliency, and scale. The final piece of that plan will be executed by December 2020. Unfortunately, this incident occurred due to a single point of failure in the Dublin network that our network upgrade will address. |

resolved2020-09-14T14:32:51.088Z

Monitoring will continue, no further issues anticipated. Incident is resolved.

monitoring2020-09-13T21:26:56.788Z

Failed equipment has been replaced and the network issue should be stabilized. On site Engineers are currently monitoring the replacement equipment.

investigating2020-09-13T19:54:49.309Z

We have received alerts of a down network device that could affect several customers in the Dublin Datacenter, Engineers are onsite investigating the issue currently.

Sep 8, 2020

Report: "CenturyLink/Level 3 Outage"

Last update 2020-09-08T01:16:55.962Z

postmortem2020-09-04T15:45:21.334Z

The following is based on a Reason For Outage document provided by CenturyLink: **Incident Start:** August 30, 2020 10:04 GMT **Incident Clear:** August 30, 2020 15:10 GMT Affecting Internet Services in Multiple Markets **Cause** A problematic Flowspec announcement interfered with correctly establishing Border Gateway Protocol \(BGP\). This had a significant impact on client services. **Resolution** CenturyLink deployed a configuration change to block the problematic Flowspec. This restored services to normal functioning. **Summary** On August 30, 2020 at 10:04 GMT, CenturyLink became aware of an issue that was starting to affect customers in multiple markets. CenturyLink teams were immediately engaged, and they began an intensive investigation. They were unable to determine a cause at first but attempted to deploy potential solutions. By 10:52 GMT, the effects on MacStadium customers had grown, and the NOC escalated the incident to the Incident Manager and Engineering Team, who immediately began an analysis of the situation based on information from monitoring alerts and customers. The team attempted to contact CenturyLink but were still unable to get through by 11:36 GMT due to overwhelming demand on CenturyLink support. By 12:18 the Engineering Team shut down the CenturyLink links to temporarily bypass all CenturyLink issues at the level of MacStadium’s network. MacStadium network traffic returned to normal, but traffic was still affected by intermediate carrier dependencies on CenturyLink across the globe. At approximately 14:00 GMT, CenturyLink found a Flowspec announcement for managing routing rules that was problematic in that it was preventing Border Gateway Protocol \(BGP\) from establishing as intended. The original source of this was determined to be the unintentional introduction of wildcards into an attempt to block a single IP. At 14:14 GMT, CenturyLink deployed a global configuration change to block the problematic Flowspec announcement. The command began propagating across devices and the problematic protocol was successfully removed. This allowed BGP to correctly establish again. By 15:10 GMT, all alarms had cleared and service returned to nominal. After an observation and monitoring period, MacStadium restored CenturyLink links at 4:55 GMT on August 31, 2020.

resolved2020-08-31T20:07:32.004Z

We have detected no further issues related to this incident. CenturyLink/Level 3 confirms full restoration of services, and our monitoring will continue as usual.

monitoring2020-08-31T13:09:40.000Z

CenturyLink/Level 3 has recovered. MacStadium has re-introduced CenturyLink Internet routing for Atlanta, Las Vegas, and Dublin alongside our other transit carrier partners. Network traffic remains nominal, but the Network Team will continue monitoring for possible anomalies.

identified2020-08-30T11:45:15.000Z

CenturyLink/Level 3 is experiencing a widespread outage internationally. MacStadium has rerouted traffic to other providers, and all traffic is restored. Customers might still see problems at a global scale due to CenturyLink's outage. We are working with this provider now to identify an estimated time to restoration.

investigating2020-08-30T11:29:10.000Z

CenturyLink/Level 3 Communications is experiencing a widespread network outage affecting some customers. We are now rerouting traffic from those affected links.

Aug 10, 2020

Report: "Las Vegas Carrier Outage"

Last update 2020-08-10T12:30:51.014Z

resolved2020-08-10T12:30:50.998Z

MacStadium has been monitoring for 48 hours and now considers this issue resolved.

monitoring2020-08-08T14:06:52.950Z

The local carrier has resolved the fiber cut. Redundancy has been restored. MacStadium will monitor to ensure stability.

identified2020-08-07T22:09:39.172Z

Local carrier is still working to resolve the fiber cut.

identified2020-08-07T19:54:15.383Z

Local carrier Splicing Team is still working on the fiber cut.

identified2020-08-07T17:02:06.000Z

Splicing Team is onsite of the fiber cut and are working to restore services.

identified2020-08-07T14:53:16.290Z

The issue has been confirmed to be due to a regional fiber cut. The local provider advises the construction crew has pulled in replacement cable. The splice team should arrive shortly and begin preparing the cable for splicing. MacStadium will continue to monitor the situation as it develops.

identified2020-08-07T02:12:42.209Z

In the Las Vegas market MacStadium is aware of a known major fiber outage effecting multiple carriers. We are operating at reduced redundancy with potential latency.

Jun 29, 2020

Report: "[ATLANTA] Network Issue"

Last update 2020-06-29T15:49:44.572Z

resolved2020-06-29T15:49:44.555Z

The issue has been monitored for 48 hours and is now considered resolved.

monitoring2020-06-27T01:31:48.688Z

Our upstream carriers have filtered out the Upstream traffic which appears to have stabilized the network. Our Engineers are continuing to work with the carriers and monitor the network.

identified2020-06-27T01:20:01.592Z

Our Engineers have determined that MacStadium has been the target of a DDoS attack, we are currently working with our Upstream providers to help resolve the issue.

investigating2020-06-27T01:18:29.283Z

We are continuing to investigate this issue.

investigating2020-06-27T00:23:08.366Z

We have received reports from multiple customers that they are experiencing intermittent connectivity issues in Atlanta. We have multiple Engineers investigating currently.

Apr 28, 2020

Report: "Provider DNS Outage"

Last update 2020-04-28T22:00:18.839Z

resolved2020-04-28T22:00:18.826Z

This incident has been resolved.

monitoring2020-04-28T15:39:39.333Z

Provider has provided an update that the issue appears to be resolved, MacStadium will continue to monitor and provide any additional updates if necessary.

investigating2020-04-28T15:19:48.328Z

Due to a network outage from one of MacStadium's DNS providers, customers utilizing MacStadium DNS servers for Reverse PTR records will see an impact on their hosted email servers. MacStadium is monitoring the issue with the provider and will pass along updates as they are available.

Apr 2, 2020

Report: "Partial Outage Affecting Some Cloud Customers"

Last update 2020-04-02T13:50:33.183Z

postmortem2020-04-02T12:02:05.155Z

# Partial Outage of vCenter Access Root Cause Analysis Date: 04/02/2020 Location: Atlanta, Georgia, USA This Document Provides a Root Cause Analysis of a Partial Outage Affecting Some VMware Cloud Customers _**Date: Wednesday, April 1, 2020**_ ## Prepared by * Paul Benati, SVP Operations * David Hunter, VP Engineering * Preston Lasebikan, Systems Architect & Team Lead - Systems Engineering * Michael Rhoades, Manager, Service Center ## Location Atlanta \(ATL1\) data center ## Event Description At 00:10 EST on Wednesday, April 1, 2020 our Atlanta data center experienced a partial network outage specifically affecting the ability of some customers to access management functions of their vCenter appliances. This outage lasted for approximately 55 minutes and was due to a faulty high availability \(HA\) cable preventing failover during planned steps in a server platform upgrade. Engineers manually invoked the failover and full service was restored by 01:05 EST. ## Chronology of Events / Timeline * 2020/04/01 at 00:10 EST - Engineering remotely pushed planned configuration updates to the fabric interconnects \(FI\) in preparation for the upgrade to an advanced, future-proof server platform. * 2020/04/01 at 00:14 EST - Monitoring alerts indicated disjointed intermittent traffic to multiple Atlanta vCenter server appliance \(VCSA\) hosts causing delays or inability to reach VCSAs externally for some customers. * 2020/04/01 at 00:15 EST - Engineering immediately ascertained their VPN authentication was interrupted by the changes and dispatched personnel to intervene onsite. * 2020/04/01 at 01:02 EST - Manual intervention onsite successfully initiated FI failover and network traffic was immediately restored. Further inspection revealed the HA heartbeat cables had failed and were subsequently replaced. * 2020/04/01 at 01:15 EST - Engineering validated the configuration would prevent any future events by performing failover tests. ## Summary Findings on Technical Causes In preparation for an upcoming upgrade to advanced new fabric interconnects, configuration updates were pushed to one of the current FIs. Normally, failover would occur without interruption to network traffic; however, a failure in one of the physical HA cables between FIs prevented the failover, which simultaneously prevented authentication of the VPN connection used to remotely remediate the situation, thus requiring onsite intervention. The non-failover caused intermittent network traffic loss to multiple Atlanta vCenter appliances affecting the ability of some customers to perform management functions. As a note, this did not affect running hosts but only access to VCSA. ## Root Causes 1. A failed HA cable prevented automatic failover of the fabric interconnect. 2. Subsequent loss of VPN authentication prevented remote manual initiation of FI failover. ## Reaction, Resolution, and Recommendations MacStadium immediately dispatched Engineering to perform the manual failover procedure onsite and network traffic flow immediately restarted. HA cables were then replaced, configuration checked and confirmed. Internal and external alerting quickly confirmed access, and spot checks were run system-wide. Although the issue was isolated to the FI, full cluster checks were conducted. Going forward, VPN authentication will be performed prior to the use of virtual desktop infrastructure, precluding the possibility of a similar incident occurring in the future. Additionally, the integrity of physical HA links will be inspected at regular intervals. MacStadium is also investigating the implementation of advanced self-healing measures. MacStadium takes issues resulting in a loss or degradation of service very seriously and we strive to continually understand the root cause so we may improve our level of service for our customers. ## Sponsor Acceptance ### Approved by the Project Executive Date: 2020-04-02 Paul Benati SVP Operations ## Revision History

resolved2020-04-01T05:50:22.116Z

This incident has been resolved.

monitoring2020-04-01T05:21:02.541Z

A fix has been implemented and we are monitoring the results.

identified2020-04-01T05:19:48.617Z

We are continuing to work on a fix for this issue.

identified2020-04-01T05:02:30.000Z

Our onsite Engineers have identified the issue and service is now restored. We will be closely monitoring the environment to ensure all services are fully functioning.

investigating2020-04-01T04:10:44.000Z

We are currently investigating an issue where a subset of VMware cloud customers are experiencing failures when attempting to access their vCenter for management purposes. All VMs appear to be running and general production is unaffected.

Mar 23, 2020

Report: "Atlanta Network Outage"

Last update 2020-03-23T13:17:09.053Z

postmortem2020-03-23T13:14:46.735Z

# Atlanta Network Outage _**Date: Thursday March 19, 2020**_ ## Prepared by * Paul Benati, SVP Operations * David Hunter, VP Engineering * Khoa Tran, Network Architect & Team Lead - Network Engineering * Michael Rhoades, Manager, Service Center ## Location Atlanta \(ATL1\) data center ## Event Description At 15:05 EST on Thursday, March 19, 2020 our Atlanta data center experienced a network outage. This outage lasted for 41 minutes and was due to a border leaf switch's configuration file inadvertently being modified. The switch's configuration was restored from backup and network service was restored at 15:46 EST. ## Chronology of Events / Timeline * 2020/03/19 at 15:05 EST - Engineering remotely executed a configuration script inadvertently against one of Atlanta's border leaf switches instead of the desired top of rack switch resulting in a network outage * 2020/03/19 at 15:05 EST - VPN and VDI connections down for Atlanta * 2020/03/19 at 15:08 EST - Onsite Engineering at our Atlanta DC \(ATL1\) was engaged to provide access for remote engineers to troubleshoot the outage * 2020/03/19 at 15:35 EST - Access was provided to remote engineers * 2020/03/19 at 15:35 EST - Remote engineers begin to investigate the outage * 2020/03/19 at 15:40 EST - Engineering determines one of the border leaf switch's configuration had been altered * 2020/03/19 at 15:46 EST - Engineering restored the original configuration from backup to the affected border leaf switch thereby restoring service ## Summary Findings on Technical Causes The configuration of one of the Atlanta border leaf switches was inadvertently modified. This modification necessarily made the two HA-paired border leaf switches out of sync with each other. Due to the nature of the Cisco HA capability, this partially altered configuration resulted in disabling external connectivity. ## Root Causes 1. The configuration of one of the Atlanta border leaf switches was inadvertently modified ### Root cause 1 A script was remotely executed that inadvertently affected one of the two high availability paired border leaf switches . The script partially altered one switch's configuration resulting in disabled external connectivity at 15:05 EST. Network service was restored when the original configuration file was re-introduced from backup. ## Reaction, Resolution, and Recommendations MacStadium has disabled remote execution of certain shell scripting on primary switches. Significant changes can now be made only when directly connected to the switch, precluding the possibility of a similar incident in the future. MacStadium takes issues resulting in a loss or degradation of service very seriously and we strive to continually understand the root cause so we may improve our level of service for our customers.

resolved2020-03-20T11:57:27.752Z

We continued monitoring overnight and did not observe any issues. This incident has been resolved. Thank you again for your patience and understanding.

monitoring2020-03-19T20:52:47.958Z

As of approximately 19:50 UTC a fix was implemented and we are monitoring the results. Again, we apologize for the inconvenience and are working to identify the root cause of our Atlanta border leaf systems failure.

identified2020-03-19T19:50:06.389Z

Our Engineers have identified the problem and are now working to resolve the incident. Initial reports point to one of our border leafs failing. More details to be provided shortly.

investigating2020-03-19T19:30:59.788Z

We are currently experiencing a network outage affecting our Atlanta data center. Engineers are working to identify the cause and remediate as soon as possible. The outage is not affecting any other services (e.g., servers, etc.). We apologize for the inconvenience and will post an ETA as soon as possible.

Jan 28, 2020

Report: "[LAS VEGAS] Hardware Incident"

Last update 2020-01-28T13:40:40.396Z

resolved2020-01-28T13:37:04.431Z

Valued Customers, We have been monitoring the affected systems and they continue to be stable. We are closing this incident and are now focused on our problem management activities.

monitoring2020-01-28T00:53:31.975Z

We are continuing to monitor for any further issues.

monitoring2020-01-28T00:52:00.774Z

Dear Valued MacStadium Customers, Today, January 27th, around 6PM Eastern time, our monitoring system alerted us to a possible hardware incident in the Las Vegas market. Our on site team responded to this alert and they were able to determine and resolve the issue in approximately 15 minutes. This hardware incident may have impacted the ability for several customers to access their environment during the incident window. As of now, all affected systems are stable and onsite technicians will continue to monitor. Our Engineering team will conduct an investigation into the incident and we will make these details available. Please monitor status.macstadium.com for additional updates. If you believe you are still having an issue reaching your environment please contact us immediately via portal.macstadium.com or by calling 877.250.3497

Dec 8, 2019

Report: "Customer Portal Outage"

Last update 2019-12-08T14:33:26.697Z

resolved2019-12-08T14:33:26.685Z

Outage Notification Reference ID: 88634 Message type: Monitoring Impact type: Non-disruptive to customers' environments Valued Customer, The fix implemented for the Customer Portal continues to be stable and therefore we are closing this incident. If you have questions, then please contact MacStadium Customer Support at 1-877-250-3497 or via email at support@macstadium.com. Additionally, visit MacStadium’s Status Page (http://status.macstadium.com/) for updates before, during, and after this change.

monitoring2019-12-07T21:07:30.544Z

Outage Notification Reference ID: 88634 Message type: Monitoring Impact type: Non-disruptive to customers' environments Valued Customer, A fix has been implemented and our customers should now be able to access the Customer Portal. We are monitoring the system to ensure stability. If you have questions, then please contact MacStadium Customer Support at 1-877-250-3497 or via email at support@macstadium.com. Additionally, visit MacStadium’s Status Page (http://status.macstadium.com/) for updates before, during, and after this change.

investigating2019-12-07T19:06:52.004Z

We are continuing to investigate this issue.

investigating2019-12-07T19:05:50.894Z

Outage Notification Reference ID: 88634 Message type: Initial Message Impact type: Non-disruptive to customers' environments Valued Customer, Please be advised that MacStadium's Customer Portal is experiencing an issue whereby users attempting to login are redirected back to the login page resulting in an inability to leverage the portal. We are actively working the issue and will provide updates here. If you have questions, then please contact MacStadium Customer Support at 1-877-250-3497 or via email at support@macstadium.com. Additionally, visit MacStadium’s Status Page (http://status.macstadium.com/) for updates before, during, and after this change.

Oct 23, 2019

Report: "Atlanta DC - Cooling Incident"

Last update 2019-10-23T12:57:53.556Z

resolved2019-10-23T12:57:53.539Z

The data center cooling continues to be stable and not additional systems and services have had to be remediated.

monitoring2019-10-23T02:51:25.361Z

At this time we have identified and remediated all known impacted systems and services. Please open a ticket if you’re experiencing any issues.

identified2019-10-23T01:58:00.310Z

Due to a water pump failure at our data center colo provider, temperatures at our Atlanta site spiked. The water pump is now working and cooling is again flowing bringing the site's temperatures back into appropriate ranges. Unfortunately, these high temperatures negatively impacted a number of systems which we are assessing and working to resolve.

Oct 4, 2019

Report: "Las Vegas Route Impairment"

Last update 2019-10-04T19:27:25.976Z

resolved2019-10-04T13:43:19.577Z

The incident has been resolved and running without issue.

monitoring2019-10-03T11:46:00.294Z

We have addressed the issue with the carrier and testing shows the issue should be resolved.

identified2019-10-03T11:07:16.854Z

The issue has been identified and a fix is being implemented.

investigating2019-10-03T11:06:38.625Z

We are currently investigating an issue related to the supernet 207.254.32.0/21. We are engaging with our carriers to address the issue.

Oct 17, 2018

Report: "Atlanta Fiber Ring Incident"

Last update 2018-10-17T11:25:01.267Z

resolved2018-10-17T11:25:01.252Z

Our fiber carrier, Zayo, identified two issues on the short-side of our fiber ring. The first identified issue was a bad fiber at their fiber hut nearest the Atlanta DC3 data center. Upon identification they remediated the bad fiber. The second identified issue was as a dirty fiber jack at the Atlanta DC3 meet-me room (MMR) panel. Having identified the issue, Zayo took action to clean the jack and reset the fiber cable connection. In sum, these remediation actions have successfully cleaned up the experienced signal loss thereby bringing the short-side of our fiber ring into specification again. Monitoring shows the short-side of the fiber ring has stabilized and we are closing this incident with no related issues reported in the past 12+ hours.

investigating2018-10-16T14:39:18.844Z

Our fiber carrier has isolated the issue between their fiber facility at the street and our Altanta DC3 data center. We are continuing to work directly with Zayo and are going to leverage an optical time-domain reflectometer (ORDR) to determine where the issue lies on the short-side of our fiber ring. Additionally, the long-side of our fiber ring is up and operational. We will continue to monitor the situation closely and will make further announcements as necessary.

investigating2018-10-16T06:23:47.060Z

As of 9:36 PM ET on 10-15-2018 our upstream carrier, Zayo, began experiencing a dark fiber outage of unknown origin impacting a small portion of our Atlanta-based customers. This outage was isolated to one side of our redundant fiber ring connecting our Atlanta DC3 and DC4 data centers. Due to this redundancy, Zayo was able to restore service at 10:19 PM ET on 10-15-2018. As Zayo continued to explore the origin of the fiber ring failure they experienced a complete failure of this fiber, at approximately 12:54 AM ET on 10-16-2018, causing both sides of the ring to collapse. This collapse has resulted in a larger portion of our Altanta DC3-based customers to experience a network outage. We continue to be in direct communication with Zayo and are actively working with them to resolve the incident.

Oct 16, 2018

Report: "Atlanta Data Center for Hurricane Michael"

Last update 2018-10-16T14:29:05.384Z

resolved2018-10-16T14:29:05.362Z

This incident has been resolved.

monitoring2018-10-11T13:15:49.826Z

MacStadium continues to actively monitor the progress of Hurricane Michael, now a Tropical Storm located 60 miles southeast of Athens, GA and is moving north-northeast at 21 miles per hour. Current conditions at the site include bands of rain, with wind gusts of 25 mph, the potential for localized flooding remains in place. The data center remains stable, in normal operating configuration, with no storm related impact. We will continue to monitor the situation closely and will make further announcements as necessary.

monitoring2018-10-10T11:47:22.901Z

As an update to our previous notification Hurricane Michael is now a Category 4 Hurricane. The eye of Michael is expected to move ashore over the Florida Panhandle later today, move northeastward across the southeastern United States tonight and Thursday, and then move off the Mid-Atlantic coast away from the United States on Friday. Some additional strengthening is possible before landfall. After landfall, Michael should weaken as it crosses the southeastern United States. Georgia, the Carolinas, and southern Virginia will see rainfalls of 3 to 6 inches, with isolated maximum amounts of 8 inches. This rainfall could lead to flash floods. A few tornadoes will be possible into parts of central and southern Georgia and southern South Carolina this afternoon and tonight. MacStadium is equipped with multiple backup generators in the unlikely event of a short-term municipal power failure. Our staff will be both onsite and remote to support any service tickets or requests. While we don't anticipate any impact to our normal response times, please refer to this status page for updates. Our agents in both Las Vegas and Europe have global access to our ticketing system to respond to customer requests, as needed. Additionally, MacStadium offers customers the ability to sign up for hosts in our other data center locations (i.e., Sunnyvale, CA, Las Vegas, NV, and Dublin, Ireland) to mitigate any potential local outages and enhance business continuity. We will continue to monitor the situation closely and will make further announcements as necessary.

monitoring2018-10-08T21:02:04.520Z

As of October 8, 2018, the National Weather Service is predicting a strong hurricane will make landfall Wednesday, October 10th on the Florida Panhandle and on a trajectory towards Atlanta sometime Thursday, October 11th. It is possible that the storm could bring strong winds and heavy rain to the Atlanta area starting as early as Wednesday night. Our data center in Atlanta is physically elevated and not subject to any local flood risks. MacStadium is equipped with multiple backup generators in the unlikely event of a short-term municipal power failure. Our staff will be both onsite and remote to support any service tickets or requests. While we don't anticipate any impact to our normal response times, please refer to this status page for updates. Our agents in both Las Vegas and Europe have global access to our ticketing system to respond to customer requests, as needed. Additionally, MacStadium offers customers the ability to sign up for hosts in our other data center locations (i.e., Sunnyvale, CA, Las Vegas, NV, and Dublin, Ireland) to mitigate any potential local outages and enhance business continuity. We will continue to monitor the situation closely and will make further announcements as necessary.

Dec 23, 2017

Report: "Dublin, Ireland - Core routing issue"

Last update 2017-12-23T17:49:15.782Z

resolved2017-12-23T17:49:15.764Z

We are closing this incident with no issues reported in past 12+ hours related to the routing engine failure. A non-impacting maintenance will be scheduled in coming days to replace the failed hardware.

investigating2017-12-22T22:14:24.471Z

Cause of issues have been identified as a core switch supervisor failure and subsequent fail-over to secondary. Services should be restored after a brief drop during the fail-over mechanisms. We are continuing to validate full service availability, and will be monitoring closely.

investigating2017-12-22T21:52:40.732Z

We are investigating a core switching issue in our Dublin, Ireland datacenter, potentially causing connectivity issues to various customers in that facility.

Nov 7, 2017

Report: "Global Issue"

Last update 2017-11-07T19:08:04.333Z

resolved2017-11-07T19:08:04.296Z

Carriers resolved their global issue yesterday and monitoring shows the issue has stabilized.

monitoring2017-11-06T18:58:40.884Z

There is currently a known issue between Level 3 and Comcast that could affect customers ability to access MacStadium. We are keeping an eye on the global issue that is outside of our network and will keep this message updated as we know more.

Nov 1, 2017

Report: "Upstream Carrier Issue"

Last update 2017-11-01T14:28:47.052Z

resolved2017-11-01T14:28:47.027Z

Routing of affected range has been resolved and stable for 24 hours.

monitoring2017-10-31T12:54:06.329Z

Upstream carrier was able to identify the affected subnet, 208.52.164.x/24, and resolve the routing issue around 8AM est. Only customers on that subnet were affected. We will continue to work with the carrier and monitor for 24 hours.

investigating2017-10-31T11:59:35.229Z

Due to an upstream carrier network change this morning we are experiencing a network outage effecting a subset of our public IP ranges. We are in direct communication with our carrier and actively working with them to resolve this incident.

Oct 29, 2017

Report: "Atlanta - Upstream Internet Flap"

Last update 2017-10-29T12:49:42.760Z

resolved2017-10-29T12:49:42.749Z

We identified a momentary upstream BGP network drop related to several IPv4 subnets in Atlanta at ~ 8:00 AM EST (12:00 GMT). Upon investigation, we were able to validate that our primary upstream carrier was making routing adjustments in an unplanned maintenance on their side, causing BGP re-convergence to alternate paths, and in return, portions of the MacStadium Atlanta network to become unreachable for minutes at a time. Related to this incident, MacStadium will be proactively communicating performing a couple planned maintenance activities in the very near future to obviate this carrier, and further enhance all Atlanta internet routing to ensure these type of events do not re-occur.

Sep 12, 2017

Report: "Official Response - Atlanta Data Center for Hurricane Irma"

Last update 2017-09-12T17:53:33.180Z

resolved2017-09-12T01:52:00.000Z

After investigating thoroughly, we found that we did NOT suffer any significant outage in Atlanta as a result of the single generator going offline. We did identify a couple customer specific isolated PDU's which did fail due to power surge related to the fail-over. At this time, all incidents are identified and addressed. We are leaving this incident open until the diesel fuel levels at the datacenter are restored to full standby levels upon fuel contractor topping off diesel tanks completely. Please note that our data center is now fully on city power and no longer on generator power. Services are normal status.

monitoring2017-09-11T23:37:27.885Z

We have just received notification from our Atlanta datacenter operator that at least 1 of 3 generators has gone offline after running on diesel fuel for 6+ hours since power was lost in the area due to the Hurricane. Engineering is investigating what is offline and online - we will update incident once more clarity is available.

monitoring2017-09-08T16:50:54.284Z

As of September 8th, 2017, the National Weather Service is predicting a strong hurricane will make landfall Sunday, September 10th in Southern Florida and is on a trajectory towards Atlanta by late Monday, September 11th. It is predicted that the storm could bring strong winds and heavy rain to the Atlanta area starting on Monday, September 11th. Our data center in Atlanta is physically elevated and not subject to any local flood risks. MacStadium is equipped with multiple backup generators in the unlikely event of a short-term municipal power failure. Our staff will be both onsite and remote to support any service tickets or requests. While we don't anticipate any impact to our normal response times, please refer to this status page for updates. Our agents in both Las Vegas and Europe have global access to our ticketing system to respond to customer requests, as needed. Additionally, MacStadium offers customers the ability to sign up for hosts in our other data center locations (i.e., Las Vegas, NV and Dublin, Ireland) to mitigate any potential local outages and enhance business continuity. We will continue to monitor the situation closely and will make further announcements, if necessary.

Report: "Errors Messages Reported"

Last update 2017-09-12T01:51:58.749Z

resolved2017-09-12T01:51:58.730Z

This incident has been resolved.

monitoring2017-09-12T01:50:45.282Z

A fix has been implemented and we are monitoring the results.

investigating2017-09-12T01:50:42.373Z

investigating2017-09-11T23:36:38.347Z

We are currently experiencing multiple error messages via our monitoring systems at our Atlanta data center. Our Engineering team is engaged and investigating. We will continue to update you via status.macstadium.com.

Aug 3, 2017

Report: "Atlanta - Isolated intermittant network connectivity"

Last update 2017-08-03T17:07:02.742Z

resolved2017-08-03T17:07:02.694Z

This issue has been resolved and all service is stable.

monitoring2017-07-30T15:46:29.484Z

We have been tracking an intermittent core switching issue which initiated late Saturday 7/29 around 14:30 UTC, and has affected service to a segment of customers in Atlanta. The condition has been highly intermittent, causing a failure of traffic to flow for several minutes, and then traffic would flow again freely for many hours again without re-occurring. Log and NMS reports pointed us to initially repair the isolated issue by replacing a single 10Gb optic which had been identified as failed. Unfortunately, the failure condition re-surfaced again early Sunday morning. On site staff have been very busy re-arranging network paths to isolate the issue since then. More recently, at around 7/30 10:30 UTC, the issue expanded in nature, pointing to a multi port 10Gb switch line card which unexpectedly crashed and rebooted itself. We have since moved all network traffic off of that card, and are in the process of replacing it entirely. This recent development has expanded the nature of this incident to a more significant portion of Atlanta customers. We do not expect any further intermittent service disruption related to this incident. However, due to the very highly intermittent, and expanding nature of this, we will be monitoring it closely for 48+ hours before officially closing it out as resolved.

Jul 12, 2017

Report: "Atlanta - Internet Uplink Flapping"

Last update 2017-07-12T17:46:47.069Z

resolved2017-07-12T17:46:47.024Z

We have received closure from Digital Realty on this global case, and are now closing our incident accordingly. --- RESOLVED RED Event - Generator Failure / Critical Load Impact ITST# 448320 56 Marietta Atlanta GA As an update to the Generator failure at 56 Marietta, the engineering team reports that power has been restored and all power systems have been normalized. An RFO, that will outline the root cause as well as the scope of the event, will be published. Further updates will be provided as they become available.

investigating2017-07-12T14:31:23.459Z

At this time, we have managed a workaround with Level3 via a different POP in Atlanta, and traffic is again flowing. We are operating with limited upstream connectivity, so performance may be sub optimal until further notice.

investigating2017-07-12T13:39:25.091Z

The issue in Atlanta has grown to be a of critical nature. The root cause is a major power failure at the Digital Realty / Telx 56 Marietta Street facility in Atlanta which acts as the cross-roads for most all traffic in the region. Although there are other upstream connections in place outside of 56 Marietta, the global nature of the incident at 56 Marietta is causing issues with other upstream connections in and around the region. We will continue to update this incident as we gain additional information.

investigating2017-07-12T13:02:00.850Z

We are investigating an upstream internet BGP issue with one or more carriers which may be causing intermittent service disruption to some Atlanta infrastructure.

May 3, 2017

Report: "Atlanta - Internet Uplink Failure : Level3 / XO"

Last update 2017-05-03T19:26:46.793Z

resolved2017-05-03T19:26:46.767Z

The root cause of this router failure is identified, and will be permanently addressed in a planned maintenance window in the near future. All Internet up-links in/out of Atlanta are stable at this time.

monitoring2017-05-03T19:05:02.915Z

As of 06:30 UTC (14:30 EST), we have been monitoring an issue with one of our Juniper border routers in Atlanta which has in turn caused Level 3 and XO Communications 10GB Internet up-links to flap. These 10GB up-link failures are causing Internet traffic in/out of Atlanta to fail over to other 10GB Uplink connections on other carriers which we maintain in Atlanta. As the Internet traffic re-converges across the Internet via BGP, you may be experiencing intermittent accessibility to hosts within the MacStadium Atlanta data-center facilities.

Feb 10, 2017

Report: "Internet Outage - Atlanta"

Last update 2017-02-10T20:19:58.344Z

resolved2017-02-10T20:19:58.318Z

Further investigation found the BGP issues to be rooted with Level3 who was having routing issues on a regional basis at this time. Level3 is the preferred carrier for a significant amount of traffic in and out of MacStadium Atlanta including the preferred carrier to popular network peers like AWS. During this incident, we found no network transport problems inside MacStadium Atlanta, or its interconnections to our Tier 1 network providers in the region. Our Internet BGP sessions with Level 3 remained up, however traffic stopped flowing to/from Level3 due to an issue inside their network at the time. Because the BGP session never went down, manual intervention was required to force Internet traffic onto other BGP peers we have in Atlanta. We have still not received a full explanation from Level 3 as to what occurred, but we do feel that it is necessary to close out this incident at this time as we have seen no other effects or symptoms related to this event in over 24 hours.

monitoring2017-02-09T19:27:05.966Z

Approx 14:02 EST - 14:18 EST, we suffered a very rare incident which significantly degraded performance to various public upstream carriers at the Metro Transport layer. We are still investigating root cause, and will provide more information as that becomes available. Connectivity is stable at this time, and we are monitoring closely.

investigating2017-02-09T19:17:02.523Z

We are investigating a BGP peering issue in Atlanta causing significant loss of access to MacStadium - Atlanta.

Jan 3, 2017

Report: "LAS VEGAS Network Latency"

Last update 2017-01-03T04:08:22.773Z

resolved2017-01-03T04:08:22.749Z

This incident has been resolved.

monitoring2017-01-02T20:41:18.151Z

Level3 seems to have stabailized over the past 15 minutes. We will keep this incident open for several hours and keep monitoring. https://goo.gl/KMQaJx

identified2017-01-02T20:16:11.832Z

We are monitoring an issue in our Las Vegas datacenter where we are observing increased latency and inefficient BGP routing thru our upstream connection with Level3. Engineers are working to reroute traffic around the issue at this time.

Oct 22, 2016

Report: "Intermittant DNS Resolution - Global Internet Issue"

Last update 2016-10-22T16:53:37.416Z

resolved2016-10-22T16:53:37.393Z

This incident has been resolved.

monitoring2016-10-21T16:20:33.892Z

We're just providing a heads up to our customer base that there are Global DNS Issues in North America today (rooted outside of MacStadium), and we are seeing some symptoms of it here at MacStadium intermittantly as well. The symptoms present themselves as certain Internet sites/assets not being reachable. However, IP Connectivity is there - if you are to run PING/Traceroute tests. Here is a decent write up on it: http://gizmodo.com/this-is-probably-why-half-the-internet-shut-down-today-1788062835

Oct 13, 2016

Report: "Atlanta Core - Line Card Down"

Last update 2016-10-13T07:02:27.236Z

resolved2016-10-13T07:02:27.212Z

All 10Ge uplinks have been migrated. As far as we can tell, everything is backto 100% at this time. Closing Incident.

monitoring2016-10-13T06:54:28.952Z

All 10Ge uplinks have been migrated off of the affected core interface card. We are completing some cleanup, and expect to be 100% within about 10-15minutes.

identified2016-10-13T06:54:22.339Z

All 10Ge uplinks have been migrated off of the affected core interface card. We are completing some cleanup, and expect to be 100% within about 10-15minutes.

identified2016-10-13T04:26:52.547Z

We have a 16 Port 10Ge Line Card down in Atlanta - 10Ge circuits are being moved to bypass the issue.

Sep 20, 2016

Report: "LAS VEGAS Network Latency"

Last update 2016-09-20T01:58:31.723Z

resolved2016-09-20T01:58:31.696Z

Cogent cleared the issue in Los Angelas reported as being a fiber cut. We have been monitoring closely for a couple hours since resolution and are now closing the incident as we are no longer seeing any symptoms.

investigating2016-09-19T19:01:06.323Z

We are observing excessive latency related to Cogent circuits in the region.

Sep 19, 2016

Report: "LAS VEGAS Network Latency"

Last update 2016-09-19T03:08:37.456Z

resolved2016-09-19T03:08:37.427Z

This incident has been resolved.

monitoring2016-09-18T16:27:04.918Z

A fix has been implemented and we are monitoring the results.

investigating2016-09-18T16:26:56.789Z

As of ~ 9:15AM PST, packet loss has cleared. Issue is rooted on the Cogent transport network in/around Los Angelas, CA - outside of MacStadium's data center environment in Las Vegas We will continue to monitor the incident closely.

investigating2016-09-18T15:39:06.481Z

We are investigating severe packet loss issue on transport circuits in/out of Las Vegas datacenter, which is responsible for degredation of service.

Aug 29, 2016

Report: "Dublin Core Issue"

Last update 2016-08-29T18:50:42.077Z

resolved2016-08-29T18:50:42.040Z

Issue resolved.

identified2016-08-29T18:41:38.133Z

We are investigating a routing issue in/out of Dublin at this time.

May 22, 2016

Report: "Atlanta Upstream Internet BGP - Host.net Flapping"

Last update 2016-05-22T15:06:47.480Z

resolved2016-05-22T15:06:47.450Z

This incident has been resolved.

monitoring2016-05-22T15:06:28.954Z

After adjusting upstream BGP routing to mitigate the Host.net carrier feed, all connectivity has returned to normal for approx. 3 hours. Incident resolved.

monitoring2016-05-22T12:50:36.878Z

Around 8:15 AM (EST) we noticed upstream routing issues with one of our upstream providers. Traffic destined to Macstadium from various locations around the internet could be impacted if that traffic chose to utilize Host.net as its primary pathway into MacStadium Atlanta. Tier 3 engineers are looking at the issue and adjusting routing to work around the issue until Host.net has resolved the core problem.