urllo

Is urllo Down Right Now? Check if there is a current outage ongoing.

urllo is currently Operational

Last checked from urllo's official status page

Historical record of incidents for urllo

Report: "Possible Connectivity Issues with Redirector Cluster"

Last update
resolved

This incident has been resolved.

identified

We've identified the source of the issue affecting redirects on our IP 54.68.182.72. We are currently taking steps to resolve the issue.

investigating

Our automated systems have detected possible connectivity issues with a redirector cluster. We're currently investigating, and will provide more information soon.

Report: "Connectivity Issues with 54.68.182.72 Redirector Cluster"

Last update
resolved

An issue with our upstream provider resulted in 38 minutes of downtime on the 54.68.182.72 redirector cluster earlier today. We are performing a root-cause analysis and based on those results will be implementing further strategies to mitigate these issues in the future.

Report: "Possible Connectivity Issues with Management Dashboard"

Last update
resolved

The issues impacting the connectivity of the management dashboard have been fully resolved.

investigating

Our automated systems have detected possible connectivity issues with the management dashboard. We're currently investigating, and will provide more information soon.

Report: "Background job latency"

Last update
resolved

We have identified and resolved an issue which caused background jobs to take longer than usual. This resulted in a large queue of work we needed to process. All important jobs have been completed (e.g. redirect updates, certificate requests and renewals) and we do not expect any further customer-facing issues as a result of this.

Report: "Redirector connectivity issues"

Last update
postmortem

We’d like to shed more light on the responsiveness issues experienced by our redirection services on February 6th. **Firstly, we’d like to sincerely apologize that this incident occurred.** The performance of our services over this period does not reflect our goal of 100% uptime and we recognize this affected our customers negatively. I hope the following information shows that we have understood what has happened, and that we have laid plans to reduce the likelihood of this occurring in the future. #### **Customer impacted incident start time**: February 6, 2020 @ 20:01 MST \(2020-02-07 @ 03:01 UTC\) #### **Customer impacted incident end time**: February 6, 2020 @ 20:33 MST \(2020-02-07 @ 03:33 UTC\) #### **Impact:** The redirection services on 54.68.182.72 and 34.213.106.51 responded to requests very slowly, and in some instances requests were dropped. #### **Root cause**: A distributed-denial-of-service \(DDOS\) attack on a customer website. #### **Solution**: Provision enough capacity to handle the full load of the attack while ensuring all traffic was processed within our typical response times. ## Background The EasyRedir redirection services are hosted on AWS across multiple availability zones \(AZ\) within the US-West-2 region. There is an AWS Network Load Balancer \(NLB\) that has an interface in each AZ, and a fleet of EC2 instances in each AZ that actually processes the redirection requests. This architecture has proven to be highly reliable and easily scales to very high traffic levels. ## Incident On February 6, 2020 we received alerts from our monitoring tools of high loads on our redirection servers. We immediately began an investigation and determined the servers were receiving vastly higher traffic levels than we typically process at any given time. At peak, our servers were processing 44x our typical traffic levels. It’s important to note that although our systems were loaded much higher than is typical, we were still responding to this traffic within our typical response times. Our systems have a variety of tools at their disposal to mitigate attacks from bad actors. Our AWS NLB has a variety of DDOS mitigation functions built into it \(which typically operate at the IP or TCP layers of the network stack\). Our redirection servers also have a variety of tools to handle this level of traffic \(highly tuned Linux kernel parameters, iptables based IP blocking, request and connection limits built into the web server configuration, and crucially, a carefully constructed series of RAM-based caches that cache redirect configuration information\). The nexus of the customer visible impact originated from our action to make a web server configuration change to block this traffic at an earlier point in our processing pipeline. This change required a reload of our server configurations. What was not fully understood at that time was the degree to which our caches were contributing to our low \(and typical\) response times. When each server configuration was reloaded, the cache was cleared. This had a knock-on effect throughout our processing pipeline - connections to backing cache servers had to be reestablished, and RAM caches rebuilt. It was this action that caused the start of the customer visible incident as our systems struggled to respond to client requests in a timely manner. We immediately began to provision additional EC2 instances and added them to the NLB. Once this capacity started to come online, response times started to drop back down towards normal levels. Fully normal response times and traffic processing capabilities were returned 32 minutes into the customer visible incident. It’s important to note that during this time, many requests were serviced successfully, albeit at times much longer than we typically take to process a request. ## Resolution The redirection services were fully restored within 32 minutes of the start of the customer visible event. ## Corrective Action This failure was regrettable on both a corporate and personal level. The decision to initiate the actions that led to this incident was taken by our staff - this was not a failure of our architecture or technology. This has been felt personally by us, and we are sincerely sorry. We have already taken a number of actions as a result of this incident, and plan to take many more in the days to come. ##### Actions already taken include: * Provisioning additional “standby” capacity that is ready to be activated at a moments notice * Added additional system monitors to detect anomalies in our traffic levels before they would trigger a “high load” monitor * Changed some system/server configuration parameters to be more aggressive towards limiting bad actors ##### Actions which will be taken over the coming days: * Investigate using proxy technology on each redirection instance to prevent process restarts from crushing caching datastores * Investigate how to detect and ignore spoofed IP addresses * Tune logging related to request and connection limits * Investigate whether using instances with greater network capacity would be helpful for our caching datastores

resolved

We're satisfied the issues impacting the connectivity of the redirection clusters have been resolved. We'll continue to closely monitor this situation and will perform a root-cause analysis to determine steps to help prevent a disruption like this in the future.

monitoring

The additional capacity we've provisioned has resolved the issues with response times. We're currently monitoring the situation and to ensure systems are performing as expected.

investigating

We are provisioning more server capacity to reduce response times. We will post further updates shortly.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating connectivity and responsiveness issues related to our redirection cluster. We will report further information as we know it.

Report: "Intermittent connectivity issues"

Last update
resolved

Our monitoring over the past 2 hours has shown that the intermittent network connectivity issues have now been rectified. We now consider this issue resolved.

monitoring

We have made some adjustments to our networking flows to route around the problems of our upstream provider. They have identified the root cause of the issue and are actively working on remediating it. We are still seeing some infrequent high latencies. We are continuing to monitor this closely and will provide updates as we work towards a full resolution.

monitoring

We continue to closely monitor connectivity across our services. Our upstream provider has implemented changes to resolve the intermittent connectivity issues, and based on our monitoring they are improving the situation. We can see that the vast majority of traffic to EasyRedir services is being answered quickly and correctly, with some requests having increased latencies, and an extremely small number of requests timing out. We will continue to provide updates as we work towards a full resolution.

identified

We are continuing to monitor the intermittent connectivity issues we've identified. We have brought additional compute capacity online and are monitoring the status of our upstream providers.

investigating

We are currently experiencing intermittent connectivity issues across our infrastructure due to an issue with our upstream provider. We are investigating the situation and will provide updates as we work to resolve this.

Report: "Intermittent connectivity issues"

Last update
resolved

AWS has resolved the internet connectivity issue affecting their US-West-2 region. We can confirm that our services are now operating normally.

monitoring

AWS has identified the root cause of their internet connectivity issues and have taken corrective action that is resulting in improved performance. We continue to closely monitor the situation and will provide further updates shortly.

identified

AWS is reporting internet connectivity issues in their US-West-2 region. This issue are the cause of our intermittent connectivity issues. We note that our redirector edge V2 infrastructure remains fully available and unaffected by this issue. We are actively working this issue and will provide further updates as we have them.

investigating

We are currently investigating intermittent connectivity issues so some of our services. Currently, our management dashboard and our V1 redirector edge network are experiencing issues. We continue to investigate and will provide and update shortly.

Report: "Request analytics data is delayed"

Last update
resolved

This incident has been resolved.

monitoring

The issue has been resolved.

monitoring

A fix has been implemented and we continue to monitor the results.

identified

We have identified the root cause of the issue and are working on implementing a fix. This should be completed within the next few hours. We will post an update when the issue has been solved.

investigating

We are currently investigating an issue that is delaying the availability of some types of request analytics in our dashboard, and also preventing the delivery of recurring analytics emails.