Historical record of incidents for The DOI Foundation
Report: "doi.org outage"
Last updateThe [doi.org](http://doi.org) resolution downtime on 2024-05-29 was caused by a sudden disappearance of our access to Cloudflare’s “Load Balancing” product. This has been used by [doi.org](http://doi.org) to steer traffic based on location to appropriate backend servers. We were not able to re-enable “Load Balancing” and found a workaround using AWS services, eventually AWS “Global Accelerator”. We opened a ticket with Cloudflare at that time, but after three weeks we have not received any useful information from them. During this time, Cloudflare has noted an issue with billing and subscriptions at [https://www.cloudflarestatus.com/incidents/5t270n2ndf0h](https://www.cloudflarestatus.com/incidents/5t270n2ndf0h) . We suspect that our issue is related to this, and just happens to have been perfectly organized to result in a loss of service. Although widespread billing issues may explain Cloudflare’s lack of response to our ticket, we find it disappointing, especially given that we had actual service downtime as a result. We have decided to make our use of AWS “Global Accelerator” permanent. For [doi.org](http://doi.org), this service is significantly less expensive than Cloudflare’s “Load Balancing”, and even seems to have a small positive effect on latency. We are still using Cloudflare services also, especially for rate limiting and potential DDoS mitigation. We will continue to monitor our Cloudflare subscriptions, billing, and tickets for any new issues or any explanatory information. And we apologize for the inconvenience caused by this downtime.
doi.org resolution services were inaccessible for approximately an hour from 00:30 to 01:30 UTC on 2024-05-29. The issue seems to be caused by some failure of Cloudflare's "Load Balancing" product which is used by doi.org to steer global traffic to appropriate backend servers. We have worked around this issue in order to restore doi.org resolution services, but were not immediately able to restore full geo-steering functionality or determine why the failure happened. We will add information as it becomes available. By roughly 03:30 UTC we restored geo-steering by routing through AWS's "Global Accelerator" product. We are still working with Cloudflare to determine the source of the issue. As of 2024-05-31, Cloudflare has escalated thiis internally, but has still not fixed the issue or provided more information. We are evaluating AWS Global Accelerator as a permanent replacement to Cloudflare Load Balancing.
Report: "doi.org outage due to Cloudflare HTTPS configuration issues"
Last updateShortly after 2023-06-07 0:00Z, doi.org became inaccessible due to HTTPS issues. This was eventually tracked down to Cloudflare configuration, but we do not yet have an explanation for how this suddenly changed. We are working with Cloudflare to analyze the issue.
Report: "Possible doi.org slowdowns related to heavy traffic"
Last updateFrom roughly 5:00 to 10:00 this morning EDT (UTC-04:00) there was unusually heavy traffic on doi.org servers. We believe the traffic was sufficient to cause slowdowns in ordinary DOI resolution. We are investigating the cause of the traffic and in possible mitigations to prevent similar problems in the future.
Report: "doi.org outage due to widespread Cloudflare outage"
Last updateAn outage at Cloudflare made doi.org (and many many other sites) unusable for some 30-60 minutes starting around 06:30 UTC. Cloudflare's discussion of the outage is here: https://blog.cloudflare.com/cloudflare-outage-on-june-21-2022/ Cloudflare outages are notably difficult to deal with. We have a project related to bypassing Cloudflare, but any method to temporarily bypass Cloudflare requires use of Cloudflare services that are often (but not always) down as part of the outage. Luckily, Cloudflare outages (which often seem to affect "half the internet") tend to be resolved quickly.
Report: "doi.org proxy and website outage"
Last updateThe DOI Proxy service and the DOI website have not been available to users from multiple geographical regions, roughly between 17:20 to 17:40 EDT. The issue seems to be resolved now. Post-mortem: The DNS resolution of doi.org, which is provided by Cloudflare, has been determined to be the cause. This was a Cloudflare issue that apparently affected a large number of sites (including for example Discord, Patreon, npmjs, DigitalOcean, Coinbase, Zendesk, Medium, GitLab, Fiverr, Upwork, Udemy according to a post on ycombinator). It seems like this was a Cloudflare issue, although some commentators have suggested something more widespread. We anticipate a thorough post-mortem from Cloudflare. We determined the outage time based on both Nagios service that CNRI internally runs as well as the externally-run Pingdom service that we use to monitor the DOI services.
We can confirm the issue has been caused by Cloudflare infrastructure. Things seem to have been resolved. We will monitor for a few more minutes before confirming resolution on the issue.
We noticed doi.org website and proxy services are currently not available for users from multiple geographical regions, possibly due to issues on the Cloudflare side. We are currently investigating.
Report: "DOI proxy service notification"
Last updateDOI proxy service was unavailable for a few minutes due to a Cloudflare[1] issue. DOI services were impacted between 13:50 (UTC) and 14:07. This is unrelated to today's planned maintenance as no change had yet been made. [1] https://www.cloudflarestatus.com/incidents/tx4pgxs6zxdr
Report: "DOI proxy service notification"
Last updateDOI proxy service was unavailable for a few hours due to a “widespread BGP routing leak” today which affected especially Cloudflare [1] and many other internet services. DOI services were impacted between 10:41 (UTC) and 11:00, with further partial failures until 11:24, followed by another brief failure for a few minutes at 11:55. It is possible there were other intermittent failures for users that are not reflected at the monitors; Cloudflare did not mark their incident as resolved until 13:02. [1] https://www.cloudflarestatus.com/incidents/46z55mdhg0t5
Report: "DOI proxy service notification"
Last updateOne of the proxy servers was down for 2 minutes on December 22, 2018 between 04:13 - 04:14 UTC and 04:32 - 04:33 UTC. This had minimal to no effect on the service availability. Statuspage incorrectly reported Major Outage in spite of the services being operational. We have manually corrected the Statuspage reporting error.
Report: "DOI proxy service notification"
Last updateDuring our process of transitioning to a more robust DOI proxy service infrastructure, we introduced a bug which prevented users from resolving DOI links containing an encoded slash (%2F). Links with %2F encoding would have stopped working after 2018-07-02 15:30 UTC, and started working by 2018-07-03 08:40 UTC when a fix was successfully applied. Transition to the new proxy service infrastructure upgrade is complete.
Report: "DOI proxy service notification"
Last updateWe are in the process of transitioning to a more robust DOI proxy service infrastructure. That robust infrastructure includes an extended set of proxy servers and geo-load balancing capabilities. During this transition process, we have seen minor to moderate service interruptions. On one end, customers from Asia and Latin America may not have been able to connect to the service occasionally, although we have resolved that issue now. On the other end, for approximately 30 minutes between the time period 19:30 and 20:00 UTC some clients connecting using the HTTPS endpoint saw service interruptions. This issue has also been fixed now. We apologize for any inconvenience. We are still in the middle of the transition process, and we will continue to monitor the service availability. We will report back when we have completed the move.
Asian and Latin American customers might be experiencing brief network issues when using the DOI proxy service. We are currently trying to resolve the underlying issue. We will post an update as soon as the issue is resolved. Customers from other regions should not see any service issues.
Report: "DOI proxy service notification"
Last updateOne of the proxy servers was down for approximately 9 minutes between 03:19 and 03:30 UTC. This had minimal to no effect on the service availability.
Report: "DOI proxy service notification"
Last updateOne of the proxy servers was down for 10 minutes between 00:20 and 00:30 UTC. This had minimal to no effect on the service availability.
Report: "dx.doi.org was temporarily inaccessible"
Last updateThe service is now operational. There was an issue for about eight minutes. Investigation is ongoing. Details will be updated here. Postmortem: It was a DNS related issue. These kinds of issues would be mitigated with our new DNS management plan once exercised.
Report: "DOI proxy service notification"
Last updateOne of the proxy servers was down for 18 minutes between 00:09 and 00:27 UTC. This had minimal to no effect on the service availability.
Report: "DOI proxy service notification"
Last updateOne of the proxy servers was down for an hour between 11:41 and 12:59 UTC. This had minimal to no effect on the service availability.
Report: "dx.doi.org was temporarily inaccessible"
Last updateAt 10:11 AM UTC, dx.doi.org became unreachable because of a DNS issue. The issue has been resolved as of 11:44 AM. Postmortem: Today's problem was entirely a DNS problem, caused by a combination of factors. A weather-related power outage took down the primary DNS name server for dx.doi.org and an ill-considered setting in that service caused the secondary DNS name servers to conclude that stale information was worse than no information and so they stopped responding as well. This continued for 90 minutes while we figured out what was happening and then brought the primary back up, which in this case involved needing to get physical access to the machine. That setting has now been changed and this particular problem will not happen again.
Report: "DOI proxy service few seconds outage"
Last updateTwo of the six proxy servers observed a brief outage lasting from a few seconds up to one minute on Jan 16, 2018 at 12:10 UTC for unknown reasons. We will update the statement here when we discover the source of the problem. Update on Feb 16, 2018: After doing some investigation, we concluded that there was a brief network outage, but the servers themselves were not affected.
Report: "DOI proxy service notification"
Last updateBecause of a Rackspace issue, one of the proxy servers was down for an hour between 21:08 and 22:16 EST. This had minimal to no effect on the service availability.
Report: "DOI proxy service and website was temporarily inaccessible for limited number of users"
Last updateA DNS administration error resulted in roughly a 10 minute downtime for some fraction of users depending on the location of those users and the DOIs that are being resolved.
Report: "dx.doi.org was temporarily inaccessible for limited number of users"
Last updateWhile the third party monitoring services that automatically update the operational status on this dashboard did not notice any service interruptions, we noticed a temporary issue that might have affected limited number of DOI proxy users. Additional details are stated next. A DNSSEC configuration issue caused dx.doi.org to be unavailable for some users for a brief period. CNRI's monitoring service noticed the issue from 4:22 pm until 4:45 pm Eastern time. Based on our current understanding of the issue, the problem could potentially have started as early as 4:00 pm and ended as late as 4:50 pm for some users. Other users will not have been affected at all. In particular, the third party monitoring services we additionally use did not detect any issue. Links to doi.org were not affected.