Cigo

Is Cigo Down Right Now? Check if there is a current outage ongoing.

Cigo is currently Operational

Last checked from Cigo's official status page

Historical record of incidents for Cigo

Report: "Network Degradation Impacting Cigo Services"

Last update
postmortem

## 📣 Incident Summary – March 18–19, 2025 **Service Impact on Cigo Tracker due to Azure Regional Outage** On March 18, 2025, Cigo Tracker experienced intermittent service disruption due to a regional outage within Microsoft Azure’s East US data center region. Below is a summary of the root cause, impact, and the steps being taken to prevent future occurrences. ### 🕒 What Happened? Azure’s East US region suffered two separate impact windows: * March 18, 13:37 to 16:52 UTC * March 18, 23:20 to March 19, 00:30 UTC The incident was triggered by a **third-party fiber cut** during external drilling work, which caused reduced network capacity in one of Azure’s Availability Zones. A **tooling failure** during Azure's recovery efforts later reintroduced traffic prematurely, leading to congestion and a second round of intermittent connectivity issues. ### 🔍 Root Cause 1. **Fiber Cut**: A construction-related accident physically damaged fiber cabling serving the East US datacenter, degrading network capacity. 2. **Router Maintenance**: A key router in the same zone was already under repair, limiting redundancy. 3. **Tooling Error**: Azure’s automated recovery system failed to fully isolate damaged infrastructure, inadvertently reintroducing traffic too early. 4. **Congestion Spillover**: The unexpected traffic load caused congestion to spread beyond AZ03 into neighboring zones. ### 🎯 Impact on Cigo Tracker While the Azure issue only affected a subset of inter-zone traffic in East US, this included infrastructure we rely on, resulting in **intermittent connectivity issues** for some customers during the incident windows. Core services were restored once Azure manually completed isolation and fiber recovery work. ### 🛠 Resolution Timeline * **13:37 UTC, Mar 18** – Outage begins due to fiber cut * **13:55 UTC** – Initial mitigation starts; traffic rerouted * **16:52 UTC** – First impact window ends * **23:20 UTC** – Second outage begins due to tooling error during recovery * **00:30 UTC, Mar 19** – Final mitigation complete * **06:50 UTC** – Full restoration of all infrastructure ### ✅ What Azure is Doing to Prevent Recurrence * Fixing tooling failures that allowed reintroduction of unready capacity \(by May 2025\) * Accelerating a capacity upgrade for the East US datacenter \(by July 2025\) * Architecting better safeguards to prevent impact from spreading across zones \(by February 2026\) We apologize for the inconvenience caused. Please rest assured that our team is working closely with Azure and continuing to invest in the resiliency of our platform. If you have any questions or would like help designing a more resilient setup, feel free to reach out to our support team. Thank you for your continued trust

resolved

After monitoring the situation over the past few hours, we can confirm that the immediate impact of the outage has been fully mitigated. Our services are stable, and network performance has returned to normal. We will provide a more detailed post-mortem once we receive a conclusive report from Microsoft's Azure Operations Support (OSS) team. Thank you for your patience and understanding. We sincerely apologize for any inconvenience this may have caused.

monitoring

We want to inform you that we recently experienced network degradation affecting our services due to an ongoing issue within Microsoft's Azure infrastructure in the East US region. What Happened? According to Azure, between 13:09 UTC and 18:51 UTC, a fiber cut impacted network capacity in the region, leading to intermittent connectivity loss and increased latency. While Azure has since mitigated the issue, we observed disruptions in our own services between 7:20 PM and 8:27 PM (Eastern Time), specifically affecting connections between the Cigo Tracker web app and our Redis service. Current Status As of 8:27 PM UTC, network latencies have returned to normal, and service stability has been restored. However, to ensure a prompt and complete resolution, we have escalated this matter to Azure's Operations Support with a critical priority. We appreciate your patience and will continue monitoring the situation closely. If you experience any further issues, please reach out to our support team.

Report: "DNS and Network Downtime for cigotracker.com and cigopay.com"

Last update
resolved

On November 14th, at approximately 10:15 PM, during a planned network maintenance session, we undertook a migration of core network routing resources. Unfortunately, this migration inadvertently impacted our DNS records, causing DNS Probe errors for some customers attempting to access our services. To address the issue, our operations team promptly remapped the affected DNS records and adjusted network restriction rules to restore connectivity. The total downtime was less than one hour. The root cause of this incident was an unforeseen interaction between the migration process and automatic changes to both networking rules and DNS records. While this was an unanticipated challenge, we’re grateful for the swift recovery efforts of our operations team. The issue has been fully resolved, and measures have been implemented to mitigate the risk of similar occurrences in the future. If you are still experiencing connectivity issues, we recommend: 1. Allowing additional time for DNS propagation to reach your network. 2. Performing a DNS flush on your network or operating system. If the issue persists, please contact us at support@cigotracker.com, and we’ll be happy to assist you. We sincerely apologize for the inconvenience caused and appreciate your patience and understanding during this time.

Report: "Email Consumer Disruption and Queue Management Issue Impacting Notification Service"

Last update
resolved

Incident Summary: On August 11th, a significant issue occurred with the Email Consumer in the Notification Service. The Email Consumer stopped functioning at approximately 8:00 PM EDT, leading to a series of cascading issues within the Notification Service. This incident did not impact the rest of the Cigo Tracker service. Timeline of Events: - 8:00 PM EDT: The Email Consumer ceased functioning. - 8:00 PM - 7:00 AM EDT: Messages began queuing at a significantly reduced rate. - 7:00 AM - 10:00 AM EDT: The rate at which messages were queued increased substantially. - 11:00 AM - 2:08 PM EDT: The Email queue reached its capacity, resulting in the rejection of new messages. During this period, both email and SMS message publishing were halted, despite the expectation that only the Email queue should have been affected. Impact: - The system experienced a complete halt in both email and SMS message publishing between 11:00 AM and 2:08 PM EDT, affecting communication and potentially leading to delays in message delivery. Root Cause: - There is a suspected code-logic error that may be causing incorrect checks on the queues, leading to the stoppage of both email and SMS publishing when the Email queue reaches its capacity. Actions Taken: - An alert was sent to the appropriate internal communications channel when the Email Consumer failed. However, due to a configuration issue, the notification was not received by the intended recipients at the time of the failure. - The issue was remediated at around 2:00 PM EDT by restarting the Email Consumer and fixing the Email queue. Adjustments have been made to the channel configuration to include additional team members and ensure quicker responses in the future. Next Steps: - Investigate the root cause of the Email queue failure. - Determine why reaching the maximum queue size for emails also impacted the SMS queue. - Ensure that the monitoring and alerting systems are fully operational and that all team members are notified promptly of any critical failures. - Conduct a thorough review of the queue management logic to identify and rectify any underlying issues. Conclusion: We are taking appropriate actions to prevent this issue from recurring. Our team is committed to ensuring the reliability of the Notification Service, and we will continue to monitor and improve our systems to provide the best possible service to our customers.

Report: "Azure Outage"

Last update
resolved

Microsoft Azure has claimed that the issue has been fully resolved. For details, you can review the complete Azure outage history here for today's partial outage: Mitigation Statement - Azure Front Door - Issues accessing a subset of Microsoft services (Tracking ID: KTY1-HW8) [ https://azure.status.microsoft/en-us/status/history/ ] Everything seems to be fully operational on our end now. If you are still experiencing any issues, please reach out to our support team at support@cigotracker.com

monitoring

The latest update from Microsoft Azure indicates that the majority of the issue should now be mitigated: "An unexpected usage spike resulted in Azure Front Door (AFD) components performing below acceptable thresholds, leading to intermittent errors, timeout, and latency spikes. We have implemented network configuration changes and have performed failovers to provide alternate network paths for relief. Our monitoring telemetry shows improvement in service availability from approximately 14:10 UTC onwards. As we investigate reports of specific services and regions that are still experiencing intermittent errors, we believe that our network configuration changes have successfully mitigated the impacts of the usage spike, but that these changes are causing some side effects to certain services. We are updating our mitigation approach to minimize these side effects, and applying these following Safe Deployment Practices - beginning in Asia Pacific regions and then expanding in phases. We will provide an update on our continued mitigation efforts by 19:00 UTC, or sooner if we have progress to share." This message was last updated at 17:59 UTC on 30 July 2024 You can refer to Azure's latest updates here: https://azure.status.microsoft/en-us/status

monitoring

A fix has been implemented by the Microsoft Azure team. As of their latest update: "We have implemented networking configuration changes and have performed failovers to alternate networking paths to provide relief. Monitoring telemetry shows improvement in service availability from approximately 14:10 UTC onwards, and we are continuing to monitor to ensure full recovery." This message was last updated at 14:54 UTC on 30 July 2024. On our end all connectivity issues seem to be resolved so far, but we are monitoring until Azure confirms full resolution.

identified

We have identified the issue as being related to Azure’s network infrastructure. Multiple engineering teams at Microsoft are engaged to diagnose and resolve the issue. Azure's latest update ( https://azure.status.microsoft/en-us/status ): "We are investigating reports of issues connecting to Microsoft services globally. Customers may experience timeouts connecting to Azure services. We have multiple engineering teams engaged to diagnose and resolve the issue. More details will be provided as soon as possible." This message was last updated at 13:13 UTC on 30 July 2024. We will provide updates as soon as we have more information. Thank you for your patience and understanding as we work through this issue.

investigating

We are currently experiencing issues accessing our services due to an outage reported by Microsoft Azure. Microsoft Azure has reported an issue impacting access to the Azure portal and Azure services in general. We suspect that this may be related to general DNS issues affecting their services, which in turn is impacting access to our website and services. Our team is actively monitoring the situation and working closely with Azure to resolve the issue as quickly as possible. We will provide updates as soon as we have more information. Thank you for your patience and understanding as we work through this issue. If you have any questions or need further assistance, please contact our support team at support@cigotracker.com.

Report: "SMS and Email Notifications Service Outage"

Last update
resolved

The incident has been resolved, and SMS and email notification services are now fully restored. We are currently analyzing the issue to prevent it from happening again. Thank you for your patience and understanding.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We've identified the issue with our messaging service that started at 9:40 AM (Eastern Time). Rest assured, all scheduled and event notifications sent before this time were successfully delivered. Our team is actively working on a fix, and we will post an update as soon as the problem is resolved. Thank you for your patience.

investigating

We are currently investigating an emerging issue with our SMS and Email notifications service. Our team is actively investigating the issue to identify the root cause and implement a solution. We will provide updates as we make progress. We apologize for any inconvenience this may have caused and appreciate your patience as we work to resolve this issue promptly.

Report: "SMS Notifications Service Outage"

Last update
resolved

This incident has been resolved. We identified the root cause of the problem and will work on improving our system's health checks to ensure such issues are identified and mitigated more rapidly in the future. Once again, we apologize for the inconvenience this has caused and appreciate your patience throughout this process.

monitoring

The backlog of SMS notifications has been fully processed, and our SMS service is back to normal operation. We are continuing to monitor the system to ensure everything is working well.

identified

The issue has been identified, and a fix has been implemented. We determined that the system stopped sending SMS messages yesterday at 2:28 PM EDT. Our SMS notification system is now operational and is currently processing the backlog at a rate of 300 SMS per second. We will continue to monitor the system to ensure a full recovery.

investigating

We are currently experiencing an outage with our SMS service that began last night. Our team is actively investigating the issue to identify the root cause and implement a solution. We will provide updates as we make progress. We apologize for any inconvenience this may have caused and appreciate your patience as we work to resolve this issue promptly.

Report: "Homepage Accessibility Issue"

Last update
resolved

This incident has been resolved.

investigating

We are actively investigating the issue causing delayed loading on the homepage. As a temporary measure, we have redirected traffic from cigotracker.com to the login page of our web application. Meanwhile, we are thoroughly examining our operational logs and conducting trace analysis to pinpoint the cause of the outage on the landing page. Thank you for your patience.

investigating

As of 8:55 AM EDT today, we are experiencing an unplanned outage affecting our homepage (https://cigotracker.com). Our technical team is actively investigating the cause of this issue. Impact: Please note that this outage does not affect the overall availability of our web application. Users can continue to access our web application through the following link: https://app.cigotracker.com/site/login. The operator applications on Android and iOS, as well as all other components of our web application, are operating normally. Next Steps: We are working diligently to identify and resolve the issue. Updates will be provided as more information becomes available.

Report: "Cigo Pay Tip Submission Error (April 23rd, 6 PM EDT - April 24th, 1 PM EDT)"

Last update
resolved

On Tuesday, April 23rd, at approximately 6 PM EDT, we implemented a network configuration adjustment in our backend routing rules. This adjustment inadvertently affected Cigo Pay, specifically causing errors when users attempted to submit tip transactions. Although Cigo Pay remained technically online, users encountered difficulties with the final submission of tips. Reports of this error began to surface on Wednesday morning. Upon investigation, we identified that the network configuration change was the root cause of the issue. We promptly rectified the configuration error by 1 PM EDT on the same day. We sincerely apologize for any inconvenience this incident may have caused. To prevent similar occurrences in the future, we have implemented enhancements to our monitoring systems. Additionally, we regret the oversight in failing to create an incident report at the time of the problem. Thank you for your patience and understanding as we worked to resolve this issue. If you have any further questions or concerns, please don't hesitate to reach out.

Report: "Unplanned Platform Downtime (Cloud Vendor Outage)"

Last update
postmortem

We want to provide you with an update on the January 21st incident that impacted our services. Here's a breakdown of the situation: **Incident Summary:** On January 20th, 2024, at around 9 PM EST, an internal maintenance process by the Azure OSS team resulted in a configuration change to Azure Resource Manager. Unfortunately, this led to repeated failures of the Azure Resource Manager's node upon startup. **Root Cause:** The configuration change triggered a negative feedback loop, overwhelming the remaining Azure Resource Manager nodes and causing a rapid drop in availability. This, in turn, affected our backend storage, leading to random failures on data plane API calls. These failures, specifically, disrupted the functionality of our database server, leading to intermittent crashes, particularly during the timeframe of 12 AM to 2 AM. **Resolution:** The Azure engineering team worked to address the issue, and we’re able to fully resolve it around 4 AM EST on January 21st, 2024. **Preventive Measures:** To prevent similar incidents in the future, we are closely reviewing our internal processes and working collaboratively with the Azure OSS team to implement additional safeguards. We sincerely apologize for any inconvenience this may have caused, and we appreciate your understanding as we continue to enhance our systems to provide you with a more reliable experience.

resolved

The incident has been successfully resolved. We're now awaiting the Root-Cause Analysis report from Microsoft Azure's team. Once received, we'll compile a post-mortem of the event to provide you with a comprehensive overview.

monitoring

Our team is actively monitoring our database server instances to guarantee service availability. While things are looking positive with the recent improvements, we are awaiting official confirmation from the Azure team to ensure that the problem has been fully resolved.

monitoring

It appears that Microsoft Azure has successfully implemented a fix, and our database server connections are now operational. However, we are currently awaiting official confirmation from the Microsoft Operational Systems Support team to validate that the issue has been fully mitigated. Thank you for your continued understanding.

identified

Our ongoing investigation into the connectivity disruption impacting a subset of our database servers and various Azure services has identified an issue on Microsoft's end. The Microsoft Operational Systems Support team has acknowledged the problem and is actively addressing it. We are diligently awaiting further updates from their team and will keep you informed as soon as new information becomes available. Your patience during this time is sincerely appreciated.

investigating

We are currently experiencing extended database downtimes stemming from Microsoft Azure's database instances. This has been occurring intermittently since 9:45 PM (EST), and the issue has been recurring at a higher frequency since 12 AM (EST). Our team is actively investigating the root cause of this service disruption. We appreciate your patience as we work to resolve this issue promptly.

Report: "Intermittent platform availability issue"

Last update
resolved

We encountered a temporary glitch in domain name resolution, leading to connectivity issues with our caching database and causing a partial service outage. We've taken immediate measures to address the issue, and it also appears to be self-healing. The disruption occurred from 3:04 PM to 3:16 PM (ET) and briefly from 3:25 PM to 3:28 PM (ET). We apologize for any inconvenience this may have caused. Rest assured, we're proactively engaging with our cloud hosting vendor to conduct a thorough analysis and prevent future occurrences. Thank you for your understanding.

identified

We experienced connectivity issues between our web platform and one of our caching database services. We have identified a potential cause and implemented corrective measures. The situation appears to have been resolved, and the platform is now stable. Our team is currently conducting a thorough analysis to understand the root cause. Further updates will follow as we gather more information.

investigating

We are investigating the cause of the intermittent platform availability issue. We will post an update once we have more information.

Report: "Degraded web and mobile app performance due to server connectivity issues"

Last update
postmortem

### Incident Timeline **Incident Duration:** Approximately 6 hours, from 11:20 UTC to 17:30 UTC on September 13, 2023 ### Timeline of Events #### Incident Identification \(September 13, 2023\) * **11:20 UTC / 7:20 ET:** Customers started experiencing issues with degraded web and mobile app performance, including higher latency and disconnects. #### Incident Response \(September 13, 2023\) * **13:00 UTC / 9:30 ET:** We initiated an internal investigation into the performance degradation and suspected connectivity issues with our servers. #### Mitigation and Communication \(September 13, 2023\) * **17:28 UTC / 13:28 ET:** Full platform performance was restored, and intermittent connectivity errors disappeared. However, we continued monitoring the situation. #### Incident Closure and Ongoing Investigation \(September 13, 2023\) * **19:00 UTC / 15:00 ET:** The incident was officially closed as platform performance returned to normal levels. ### Root Cause Analysis The root cause of this incident was identified as a faulty device in Azure Frontdoor. This device continued transmitting traffic from the edge sites for an extended period of time, leading to congestion and packet drops. The prolonged transmission from the faulty device resulted in higher latency, disconnects, and failed service responses. ### Mitigation Microsoft Azure mitigated the issue by routing traffic away from the problematic device to a healthy one. This action restored normal service operations. ### Preventive Measures To prevent future occurrences, we are committed to implementing the following measures: 1. **Collaboration with Azure:** We will maintain a strong collaboration with Microsoft Azure's OSS team to ensure a proactive approach to identifying and addressing potential issues promptly. 2. **Traffic Monitoring:** Regular monitoring of traffic patterns will be implemented to detect anomalies and address them swiftly. 3. **Redundancy and Failover:** We will explore redundancy options and failover mechanisms to minimize the impact of similar incidents. ### Conclusion We sincerely apologize for the inconvenience and disruption this incident may have caused our customers during the impact window of 11:20 UTC to 17:30 UTC \(7:20 ET to 13:30 ET\) on September 13, 2023. We appreciate your patience and understanding throughout the incident resolution process. Our commitment to providing reliable and performant services remains unwavering, and we will continue to work diligently to improve our systems and prevent future incidents. If you have any further questions or require additional information, please do not hesitate to reach out to us. Thank you for your continued support.

resolved

We are pleased to inform you that we are closing this incident with the following important notes: 1. Platform Performance: Our platform's performance has returned to its normal levels since approximately 1:28 PM (Eastern Time). We've closely monitored the situation, and the intermittent connectivity errors that were affecting our services have now disappeared. 2. Ongoing Investigation: While the immediate issue has been resolved, we continue to work closely with the Azure Operations Support System (OSS) team to conduct a comprehensive root cause analysis. Our joint efforts aim to identify the underlying reasons for the incident. 3. Future Updates: As soon as we gather more information and insights from our collaboration with the Azure OSS team, we will provide a post-mortem update on this incident. This update will offer a detailed account of our investigation results and outline our plan of action to prevent a recurrence of this issue. We sincerely apologize for any inconvenience or disruption this incident may have caused to our customers' operations. Our team is committed to ensuring the reliability and performance of our services, and we appreciate your patience and understanding throughout this process. If you have any further questions or require additional information, please do not hesitate to reach out to us.

monitoring

We wanted to provide you with an update on the recent server connectivity issues that were impacting our web and mobile app performance. Based on our observations, it appears that the server connectivity issues have been resolved, and our systems are now showing signs of stability. However, we are still actively monitoring our systems to ensure that everything remains in good working order. We understand the importance of a comprehensive analysis to prevent future occurrences, and to that end, we are eagerly awaiting further information from the Microsoft Azure Operations Support System (OSS) team. We hope that their expertise will help us pinpoint the root cause of the issue, allowing us to take any necessary preventive measures going forward. We appreciate your patience and understanding as we continue to work on this matter, and we will keep you updated as soon as we receive more information from the Microsoft Azure OSS team. If you have any questions or concerns in the meantime, please don't hesitate to reach out to us.

investigating

We are presently in the process of identifying the underlying reasons for the diminished performance of our web and mobile applications, which appears to be stemming from connectivity problems with our servers. Our initial examination has not uncovered any issues originating from our side. Consequently, we have initiated contact with our cloud hosting provider, Microsoft Azure, to collaborate with their Operations Support System (OSS) team in order to further investigate this matter.

Report: "Gateway connection errors"

Last update
resolved

The incident that took place from 12:48 PM to 1:31 PM EDT (UTC -4) has been successfully rectified and the networking infrastructure has now returned to a stable-state. While we await Microsoft Azure's OSS team to furnish us with a RCA (Root Cause Analysis) in response to additional logs and information provided by us, we take this opportunity to apologize for any operational disruption that may have been caused by the partial outage. We are grateful for your patience and understanding in this matter.

monitoring

It appears that the issue was caused by the infrastructure of our cloud vendor, which has since stabilized. Our team is awaiting confirmation from Azure's Operations Support Systems (OSS) team and we are closely observing the performance and availability of our web application.

investigating

We are currently investigating a partial outage that is causing our web services to be unavailable intermittently. When our service fails to load, some of our users may observe a 502 gateway error page. We apologize for the inconvenience and are working with Microsoft Azure's operational support systems team to resolve the problem as soon as possible.

Report: "Microsoft Azure networking outage"

Last update
resolved

The Cigo Tracker platform and backend systems that feed the mobile app were impacted by an outage within Microsoft Azure's networking infrastructure. The outage also impacted Microsoft 365, Teams, and Outlook. Due to this widespread outage, our systems were either fully or partially unavailable from 2:05 AM to 3:45 AM EST (7:05 AM to 9:45 AM UTC). The root cause was identified by Azure, and mitigated. This seems to have primarily impacted our customers in Europe and in the Middle East during their core hours of operation. We apologize for any inconvenience this may have caused. You can read more about the incident (Tracking ID VSG1-B90) here: https://status.azure.com/en-us/status/history/ This outage also made it to numerous online publications: - https://www.reuters.com/technology/microsoft-teams-down-thousands-users-india-downdetector-2023-01-25/ - https://www.theguardian.com/technology/2023/jan/25/microsoft-investigates-outage-affecting-teams-and-outlook-users-worldwide - https://techcrunch.com/2023/01/25/microsoft-teams-outlook-service-outage

Report: "DNS connectivity issues"

Last update
resolved

The problem on Microsoft Azure's end seems to have been mitigated. Our workloads have returned to normal.

monitoring

Microsoft Azure's team hasn't reported any additional updates, but based on our monitoring, traffic seems to have returned to normal in the last 30 minutes (3:30 PM EDT). We are standing by for a full resolution confirmation to be confirmed by the Azure team. Thank you for your patience and understanding.

monitoring

Microsoft Azure's team has confirmed that they are working on a full system recovery following a large spike in traffic that disrupted their network infrastructure: "We are recovering intermittent connectivity issues. Traffic managed by Azure Front Door service is being recovered by systematically going through the regions where we are observing resource impact and enforcing traffic management on the same. Once the recovery process is completed, the service should be able to resume handling traffic normally." We are monitoring the health of our Front Door and are seeing a steady recovery.

identified

The Microsoft Azure team has confirmed that they identified the potential cause as "a spike in traffic". They have further clarified the following: "While we are not currently observing any traffic spikes currently, we are working on remediating the residual impact. We are recovering a number of nodes that are showing intermittent connectivity issues. For customers who are experiencing connectivity issues, retries are likely to be successful. Most customers should be seeing recovery at this stage." From our monitoring systems, our DNS health seems to be recovering, but there is still a 5% fluctuation that may impact some customers in some regions.

identified

New update from Azure. They are still investigating the issue and we remain on hold: Starting at 16:10 (UTC) on 07 Sep 2022, customers using Azure Front Door could be experiencing connectivity issues. This could also be impacting customers’ ability to access the Azure Management Portal. We are investigating a spike in traffic as a potential cause. Retries are likely to be successful.

identified

Our platform is currently experiencing a partial outage. A portion of our users are fully impacted and are unable to access the platform, for other response times are slow, and the rest of our users are unaffected. The report from the Microsoft Azure team is as follows (as of 2022-09-07 1:05 PM EDT): Connectivity issues We are aware of connectivity issues to the Azure Portal and customers using Azure Front Door. We will provide more information as it is known. This message was last updated at 17:00 UTC on 07 September 2022

Report: "Issues with routing and route optimization"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified with one of our third party providers, and we are working with them to resolve it as soon as possible.

Report: "System outage"

Last update
postmortem

After reviewing this incident with the Microsoft Azure OSS \(Operations support systems\) team, the issue seemed to have been related to a 30 minutes outage between our Virtual Network and the Azure DNS \(from between 1:45 pm and 2:15 pm EST or 18:45 and 19:15 UTC\). Based on the recommendations we were given, our operations team has made some initial configuration adjustments that should prevent this issue from re-occurring in the future. We’re also investigating and implementing additional fallbacks to continue to ensure the highest system availability possible. We apologize for the inconvenience this may have caused to your operations, your fulfilment teams, and your end consumers.

resolved

This incident has been resolved, and we will continue our investigation with our Cloud hosting provider to better grasp what happened, and how we can prevent it for the future.

monitoring

The issue seems to have been resolved; we're still seeing some degraded performance as things stabilize. We've identified the cause of the issue, and will further investigate how to prevent it from re-occurring in the future. We apologize for the downtime experienced.

investigating

We are experiencing a system outage, and are investigating the issue.

Report: "Customer tracker not accessible via cigo.io links due to Google service outage"

Last update
resolved

As of 9:37 PM EST, this incident has been confirmed as resolved by the Google Firebase team. See their latest update: https://status.firebase.google.com/incidents/yioFLgdcS81z3keW16n5

monitoring

In their latest status update (9:23 PM EST), the Google Firebase team has confirmed that a fix has been put into place, and that our short link service is now operational. They are monitoring the service, and we will continue to closely monitor this on our end as well until they confirm the issue has been fully resolved. You can continue to monitor the incident updates directly from our vendor here: https://status.firebase.google.com/incidents/yioFLgdcS81z3keW16n5

monitoring

We're observing a gradual return of the Google short link redirection service, but we are waiting on an official update on their status page for this incident: https://status.firebase.google.com/incidents/yioFLgdcS81z3keW16n5 We are continuing to monitor the situation.

identified

We have received reports of the cigo.io short links are not correctly redirecting to the web tracker. We've investigated the issue and have determined that the 3rd party service we use to generate these short links (Google Firebase) is currently experiencing an ongoing outage. Their team has identified the problem and is currently working on implementing a fix. To read more about the outage from our vendor, please visit: https://status.firebase.google.com/incidents/yioFLgdcS81z3keW16n5 For customers that have received a notification via Email, entering their phone number and access code on this page will allow them to track: https://cigotracker.com/site/tracker-lookup/ We are actively monitoring the situation, and we apologize for the inconvenience.

Report: "Degraded performance on Sunday (Web and Mobile)"

Last update
resolved

On November 28th, before before 12 PM (GMT-5), we experienced a spike in traffic and URL crawling attempts that resulted in a temporary slow down of our application. Unfortunately, this also impacted a part of our backend infrastructure, causing increased response times for a few hours. We were able to mitigate and resolve the issue later during the day at around 3 PM (GMT-5), and we made further adjustments later at night at around 9:30 PM (GMT-5). Our engineering team is actively working on additional ways to mitigate and minimize the impact of such incidents in the future. We apologize for any inconvenience this may have caused for operations that were active throughout the day yesterday. Thank you for your understanding

Report: "Network Gateway errors from our Cloud hosting provider"

Last update
resolved

This incident has been resolved, but our infrastructure team and the Azure support team are still actively monitoring specific gateways in case the problem resurfaces. We apologize for the inconvenience this has caused to your operations throughout the day, and we hope to get a root cause analysis from the Azure Operational support teams involved in this incident.

monitoring

We have implemented a temporary workaround while we await further instructions from the Azure Support team.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified. We are awaiting for further instructions from the Azure support team since the problem seems to continue to persist intermittently.

investigating

We are currently investigating an issue that causes intermittent loading errors of resources across our platform and services. Our team suspects the error is linked to the Microsoft Azure DNS. An update will be provided once we have more information.

Report: "Notification system internal connectivity issues"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an connection issue to our notification system that manages sending email and SMS notifications.

Report: "Network DNS errors from our hosting provider"

Last update
resolved

Microsoft Azure has confirmed that the underlying DNS outage has been mitigated.

monitoring

Cigo Tracker and its services have recovered according to our monitoring systems. With that said, Microsoft Azure has no fully closed the incident report on their end. We will continue to monitor their infrastructure status updates and we will provide updates accordingly.

monitoring

According to the last update issued by the Microsoft Azure engineering team (see: https://status.azure.com/en-us/status), traffic has been rerouted to an alternative DNS. Our services seem to have resumed. We are continuing to monitor the situation, and we will continue to provide updates.

identified

Microsoft Azure, our cloud hosting provider, is currently experiencing a widespread outage affecting our platform, and many others. For more information, you can visit: https://status.azure.com/en-us/status We apologize for the inconvenience, we're waiting for more information and resolution from them.

identified

We are continuing to work on a fix for this issue.

identified

Our cloud hosting provider seems to be experiencing some major service disruption. Our team has contacted them and we are actively looking into the service disruption.

Report: "Exports to Files Server"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

Our initial fix seems to not have completely resolved the issue. Our networking team is actively working on resolving this problem.

monitoring

We were able to resolve the networking error and we are currently monitoring the resolution.

identified

We've identified the cause of the issue to be related to a firewall configuration. We are actively working on resolving the issue.

investigating

Exporting itinerary data from the planner to our remote Files server is currently not possible.

Report: "Issues with Importation, Job creation, and Route Editing"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Intermittently degraded service performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we will be actively monitoring the results. All components are operational.

identified

The platform is in maintenance mode while a fix is being applied.

identified

The platform will be under maintenance shortly for a fix to be applied.

identified

The issue has been identified and a fix will be deployed later this evening. The deployment of this fix will incur some down time after 11 PM EDT (UTC -04:00).

investigating

We are still investigating the issue and making the necessary adjustments, while monitoring the current stability of the platform. Our main components are currently operating normally.

investigating

We are currently investigating an issue affecting overall platform performance intermittently.