Historical record of incidents for DataCite
Report: "Service slowness and occasional errors"
Last updateThis incident has been resolved.
Mitigation strategies have been deployed to help with some anonymous traffic that was causing increased load on our search cluster. Services appear to be resuming normal function but with potentially still some degraded performance, we will continue monitoring.
We're currently experiencing increased load on our services and are investigating the underlying cause and mitigation that we can take. This mostly appears to be affecting our search cluster, and therefore has a knock-on effect to queries and retrieval through DataCite Commons and some functionality in Fabrica. DOI Registration services should be unaffected, although may experience timeouts viewing content after registration.
Report: "Service slowness"
Last updateThis incident has been resolved.
Services appear stable, we will continue monitoring.
We are continuing to investigate this issue.
We're currently experiencing increased load on our services, we're investigating mitigation strategies. This mostly appears to be affecting our search cluster. Registration services should be unaffected, though may experience timeouts viewing content through frontend services / api search interfaces.
Report: "Search cluster slow affecting multiple services."
Last updateYesterday we made some changes to how traffic was handled by our services, this combined with overall load reducing, everything should be working as normal now.
We've noticed an increase load on our services which is having an affect on returning search results in different places. We are investigating possible causes and attempting to compensate.
Report: "Services unavailable"
Last updateAll systems are operational and working. We continue to investigate whether there are any data issues and we will contact impacted members if necessary.
DataCite experienced issues with our database services earlier today. Systems are now operational and we continue to monitor closely. We are investigating whether DOI registration and/or metadata updates were impacted and if so, will contact affected members directly.
We have restored connectivity to the database, DOI registration and Search/Index lookups are now operational but marked as degraded while we establish further incident root causes, we will update further once we have a clear picture.
We are continuing to investigate this issue.
We are currently experiencing problems with our database services. This is affecting all DOI registration services and some search/indexing service responses. We are investigating as a matter of priority.
Report: "Services experiencing reduced performance and instability."
Last updateServices are operational. We continue to investigate and consider further long term improvements.
Yesterday evening (UTC) we introduced a few extra changes to increase support for load in certain areas of our services. We will continue to monitor the situation, services are still slightly degraded but overall should be functional.
The issues currently seen are related to increased load across our services, some specific problems have been identified and resolved. Services are not down, but you may experience intermittent timeouts/errors depending on which service. we are looking at various options to further improve this and increase our capacity.
We are investigating degraded service across our services. This is manifesting as occasional error messages or slow response times for API requests and/or frontend services. This appears to be due to increased traffic across but we are investigating the source and what capacity we can increase.
Report: "Issues with DOI pages on DataCite Commons"
Last updateAll systems remain operational, overall health of services appears normal. Incident Resolved.
We have identified a problem and individual DOI detail pages should be available again within commons. We however have been receiving and influx in high usage from some automated sources, so we will continue to monitor for additional effects.
We are continuing to investigate the underlying cause of the errors appearing on individual DOI pages on the DataCite Commons service. Individual DOI metadata can be accessed via REST and GraphQL - for more information, please see our Support Pages: https://support.datacite.org/docs/api and https://support.datacite.org/docs/datacite-graphql-api-guide
We are aware of an issue where individual DOI pages on DataCite commons are returning a 404 Error and are investigating the root cause. Individual DOI metadata and related information can still be accessed via the REST and GraphQL API endpoints.
Report: "Instability in DOI registration and other services"
Last updateThe applied mitigation has allowed services to stabilise back to normal operating conditions. Services are still scaled up to prevent further issue.
Mitigation has been applied, and the services are starting to stabilise. We will continue to monitor the situation
We believe we have identified the source of the increased load, and are working on further mitigation
We are investigating instability affecting DOI registration and other services. This appears to be related to increased service load and we are scaling services to compensate.
Report: "Slow response times from DOI Registration Services"
Last updateAll services appear normal now, closing incident. Services are still marginally scaled up to prevent issues.
Services have been scaled to cope with load, we are monitoring the situation.
We are investigating an issue with response times being slow for DOI registration. This appears to be related to increased service load and we are scaling services to compensate.
Report: "DOI Registration Issues"
Last updateAll services appear normal now, closing incident. Services are still marginally scaled up to prevent issues.
We increased our capacity to cope with an increased load on our services. We are still monitoring at this stage but affected systems appear operational.
We've received reports and internal monitoring that is showing errors when attempting to register DOIs, this appears to not affect all registrations but instead intermittent performance issues. We are currently investigating.
Report: "High load on DOI registration and related services"
Last updateAll services appear normal.
The initial capacity increases looks to have provided the needed support for services, there may be some slowness for indexing of DOIs which may take longer than the usual few minutes, a backlog queue is currently being processed after the load spike. We are monitoring for further problems.
increased load has been creeping up for past couple of hours, some automatic and manual mitigation has already taken place, but we are investigating further.
Report: "Slow DOI registration and timeouts"
Last updateService appears stable for few hours now, incident resolved but will continue monitoring.
We have identified that our services are experiencing high load. It is primarily affecting MDS however REST API for DOI registrations may also be affected, consequently our frontend application Fabrica is potentially affected. We've already compensated with higher number of servers running in our production cluster and our now monitoring the situation.
Report: "Commons experiencing errors"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating errors in Commons. Other services are not affected.
Report: "MDS experiencing timeouts and slowness"
Last updateWe believe the performance issues were due to increased load, we've increased our capacity and still investigating further options. We hope changes made yesterday have returned stability but we will continue monitoring.
There are still some reported issues with the MDS API, we are investigating and continuing to monitor for any further issues.
A fix has been deployed, there was an issue with automatic scaling of servers, this has been now corrected and it should be operating as normal. We will continue to monitor.
We are currently investigating this issue.
Report: "Degraded performance for MDS and REST APIs"
Last updateThis incident has been resolved. We will continue to monitor the MDS and REST APIs for performance issues.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue that may be causing degraded performance for the MDS and REST APIs in our test and production systems.
Report: "OAI-PMH API Outage"
Last updateDue to a deployment problem search and the oai-pmh service were unable to be deployed into our infrastructure after they were routinely cycled. This has now been upgraded to use our new standard workflow and have deployed successfully.
search.datacite.org is also affected. commons.datacite.org however is not affected, please use that for any front-end search queries.
The OAI-PMH service is currently down, we are investigating the probable cause.
Report: "High load amongst content negotiation service."
Last updateWe are still receiving a high load of requests but we are now fufilling due to increased capacity, so this issue is being marked resolved. Requests for DataCite DOIs should be unaffected going forward, requests for Crossref DOIs against DataCite Content negotiation directly may occasionally return 404s when it is unable to process. It is preferred if possible to use Content Negotiation via doi.org and you will be redirected as appropriate to the registration agencies content negotiation service as appropriate.
The service appears stable now, and we are processing requests. We are monitoring now with the changes we've made. As mentioned in the previous update, do note requests for Crossref DOIs via DataCite Content Negotiation may still return a 404 and this is expected behaviour at present.
A large number of requests has been identified that are specifically attempting to resolve Crossref DOIs via the DataCite content negotiation service, we are having on-going conversations about this use-case. To remedy we've implemented some timeout logic and additional rate limiting, this however may have a knock on effect those attempting Content Negotiation for Crossref DOIs via DataCite Content Negotiation (Crossref Content Negotiation or via doi.org is unaffected) may receive a 404. Requests for DataCite DOIs via Content Negotiation should now start to be resolving.
During the past week or so we've noticed a huge surge of increased traffic to the Content Negotiation Service. This has unfortunately caused various problems with requests not being resolved. This is currently affecting both the Content Negotiation service but also the Citation Formatter service. We've made some initial improvements to support the increased load on the service, but it is unfortunately still causing issues. We will continue to investigate.
Report: "Fabrica in Production and Test are not working properly because of a CORS issue"
Last updateThe issue has been resolved.
We are deploying a fix that should address the issue in the next 30 min.
Report: "Stability issues with the DataCite REST API"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have identified and fixed the issue. We will continue to monitor the REST API for irregularities.
We are currently investigating the issue.
Report: "Issue registering DOIs in the handle system"
Last updateIn the last 8 hours there has been an issue registering DataCite DOIs in the handle system managed by CNRI. The issue has been identified and fixed by CNRI, and DataCite has re-registered all DOIs from the last 24 hours where handle registration failed. The issue has been resolved.
Report: "The Fabrica service is not responding properly."
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating the issue.
Report: "We are experiencing issues with Fabrica service."
Last updateThe incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating the issue. Other services are not affected.
Report: "Re are seeing issues with the DataCite Fabrica service. We are investigating. Other services are not affected."
Last updateThis incident has been resolved.
DataCite Fabrica is operating normally again. We continue to monitor the service.
We are currently investigating this issue.
Report: "Fabrica service not available"
Last updateThe incident has been resolved.
We have encountered an issue with the Fabrica service. We have found the problem and have deployed a fix, we expect this issue to be resolved within the next 30 min. Other DataCite services are not affected by this issue.
Report: "Repository creation and DOI registration is currently not working correctly in the test system"
Last updateThe incident has been resolved and the test system is operating normally again.
A fix has been deployed and we are monitoring the situation.
We are investigating the issue. The production infrastructure is not affected.
Report: "Intermittent problems with the MDS API in the Test System"
Last updateThis incident has been resolved.
This issue has been resolved and the Test MDS is operating normally again.
The MDS API in the Test System is only reachable intermittently. We are investigating.
Report: "DOI Fabrica service down"
Last updateThis incident has been resolved.
We identified the problem related to the availability of the Fabrica service. A fix has been applied and service should be operational, we will continue to monitor.
The Fabrica service for DOI registrations is not working properly and we are investigating. Other services, including the REST and MDS APIs, are not affected.
Report: "We are experiencing a slow REST API"
Last updateThe slowness of the REST API was caused by an unusually high amount of traffic to our Event Data API. This incident has been resolved.
We are currently seeing a slow REST API because of extra traffic. We are monitoring the situation.
Report: "Missing DOIs in DOI index"
Last updateThe DOI index has been successfully refreshed.
We last updated our DOI index on Sunday, but unfortunately about 10% or 2.5 million of our DOIs were not indexed properly. We have started another reindexing, and this issue should be resolved at about 8 PM GMT or in five hours today. Searching for 90% of DOIs, as well as DOI registrations and updates work normally.
Report: "DOI index incomplete for the next few hours"
Last updateThe DOI index has been completely refreshed, and all services are operating normally again.
The reindexing of all DOIs took longer than expected. More than 80% of DOIs are again available in the search index, we expect the indexing to be completed by 12 PM GMT.
We had an unexpected issue with our search index for DOIs. We have started to reindex all DOIs, and the full index should again be available by 8 PM GMT. We are sorry for the inconvenience. DOI registrations or updates are not affected.
Report: "Performance issues with the REST API"
Last updateThis incident has been resolved.
We are currently experiencing performance issues with the REST API. We are investigating.
Report: "The Fabrica service is currenlty experiencing issues and can't be reached"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating.
Report: "Our REST API is under heavy load because of large numbers of event registrations in the Event Data service"
Last updateThis incident has been resolved, the REST API is again under normal load and responding appropriately.
We are continuing to work on a fix for this issue.
Our REST API (and downstream services such as DataCite Search, Fabrica and OAI-PMH) are currently responding slower than usual because of an unexpected high number of event registrations in the Event Data system. This is not a system outage, and we expect the REST API to be back to normal operation within the next 16 hours.
Report: "Crosscite Services are unavailable"
Last updateThe issue has been resolved and all services should be operating normally again. The problem was a misconfiguration for the name servers for the domain crosscite.org.
We are continuing to investigate this issue. the Initial investigation indicates that the previous services TTL configuration (https://en.wikipedia.org/wiki/Time_to_live) it large enough to not have change immediately. We will continue to monitor the next 12 hrs to see if the change takes effect.
Currently experiencing downtime in our services citation.crosscite.org and data.crosscite.org
Report: "We are experiencing disk space issues with one of our databases"
Last updateThis issue has been resolved.
The issue has been resolved, but we continue to monitor our services.
We are continuing to investigate this issue.
This affects the REST API and related services (including MDS API, DataCite Search and Fabrica). We are working on this, and the issue should be resolved within the next 60 min.
Report: "Processing of DOI registrations can take several hours today as part of service updates"
Last updateThe service update has been completed this morning, and the backlog of DOI indexing and handle registration requests that had queued up since yesterday morning has been processed. DOI registration is again operating normally.
As part of ongoing service updates we see a delay of processing of new DOI registrations for up to several hours. This should be resolved by the end of today. Retrieval of information from DataCite APIs and frontend services is not affected and works as usual. We will give an update when the issue has been resolved.
Report: "Heavy load across services."
Last updateThe incident has been resolved and all systems are operating normally again. Some DOIs registered or updated since about 18:00 GMT on July 23 might not have been updated properly. We are working on this and hope to have this resolved by 16:00 PM GMT tomorrow. The incident was triggered by very unusual activity by a member, trying to update more than a million DOIs in a single morning.
We have manually removed a large number of queued jobs from our backlog and are monitoring the situation. There maybe still some delay to those DOI's registered today.
We are currently experiencing an increased load on our core services, which is causing a delay to complete background queued jobs such as indexing of DOI's within search results or those available in the API. No data loss is occurring, just processing time to be fully realised across all our services is affected.
Report: "Database migration issues with the DataCite Event Data API"
Last updateThe issue has been resolved and the Event Data API is working normally again.
The DataCite Event Data API at https://api.datacite.org/events is currently not available because of unexpected database migration issues. We are working on a fix and hope to have this resolved later tonight.
Report: "REST API under heavy load"
Last updateThe issue has been resolved.
We are seeing an unusually heavy load on our REST API, which is causing the service to not respond properly. This affects a number of upstream services, including MDS API, EZ API, DOI Fabrica and DataCite Search.
Report: "We have issues with our Solr Search Index"
Last updateThe incident with our Solr Search index has been resolved at 3 AM GMT. We are continuing to monitor the service.
We have partially restored search functionality: the majority of DOis (9.5 million out of 13 million) are now available for searching via the REST API and DataCite Search. OAI-PMH, Content Negotiation and Stats portal continue to be affected. We are working on a fix to fully restore search functionality in all services.
Unfortunately it takes longer than expected to fix the Solr Search index issue. We have stopped all other database maintenance work related to our migration to Elasticsearch as it is putting too much load on our database servers and thus interferes with Solr indexing. We have re-indexed 70% of all DOis in the past few hours and hope to have the indexing completed by 20:00 GMT today. We will give an update at this time. Please keep in mind that DOI registration or metadata updates are not affected by this Solr outage.
We are continuing to work on a fix for this issue.
Our Solr Search index is currently not indexing probably because of heavy database load. This affects the REST API, DataCite Search, OAI-PMH, content negotiation, and the Stats Portal. We hope to have this resolved via re-indexing in the next 6-8 hours.
Report: "Error in Solr Search indexing"
Last updateThe issue with our Solr Search index has been resolved.
We had an issue with our Solr Search index this morning and only 4 million DOIs were indexed. We are re-indexing all DOIs and the issue should be resolved by around 19:00 GMT today. This issue affects DataCite Search, the REST API, Content Negotiation and OAI-PMH.
Report: "REST API and DataCite Search not responding properly"
Last updateThe issue has been resolved and all services are again operating normally.
The REST API and DataCite Search are currently not working properly and respond with a 422 HTTP status code. We are investigating the issue.
Report: "We are experiencing issues with DOI registration in the test system"
Last updateWe have resolved the known issues with the new test system. Please send a message to DataCite support if you still experience issues with the test system.
After a major update of the test system yesterday we are still experiencing issues with DOI registration via MDS API, EZ API and DOI Fabrica web frontend. We are working hard to resolve those issues. The production infrastructure is not affected, and is working normally.
Report: "REST API, OAI-PMH and Search will not be available until 15:00 GMT"
Last updateAll services are operating normally again.
The DataCite REST API for DOIs, OAI-PMH and Search are not available until 15:00 GMT as we had to update the server running our Solr index. The DOI Fabrica service is not affected by this. We are sorry for the inconvenience.
Report: "Slow responses from our Solr Search index"
Last updateThe incident has been resolved.
Since the evening of October 1st we are experiencing issues with our Solr Search index. This is caused by unusually large numbers of slow queries. The issue was partly resolved today, but we are still investigating. This service interruption affects all services using our Solr index, including parts of the DataCite REST API, DataCite Search, and OAI-PMH.
Report: "DOI registration in handle system was delayed 3 AM GMT to 1 PM GMT today"
Last updateDOI registration in the handle system was not working properly from 3 AM GMT to 1 PM GMT. This issue has been resolved. We have re-registered all DOIs where handle registration had issues. We have also written a permanent fix for this issue. This fix needs more testing and will deployed tomorrow.
Report: "DOI registration in handle system may be delayed"
Last updateThis issue has been resolved and DOI registration via the handle system is operating normally again since 8 AM this morning. We are re-registering all DOIs where handle registration had issues, and plan to finish this by tonight.
Since 3 AM GMT this morning the handle registration part of the Metadata Store (MDS) has intermittent service timeouts. The registration of metadata is not affected and we are trying to automatically re-register those DOIs in the handle system where this step failed. The MDS web interface is also affected. We are investigating the issue.
Report: "MDS Service upgrade on July 16, DOI registration issue resolved"
Last updateOn Monday July 16 we launched a new version of the Metadata Store (MDS) API for DOI registration. There have been a small number of unanticipated short service interruptions on Monday and Tuesday, and there was an issue with a number of DOIs (3,862) not being properly registered. The DOI registration issues were resolved this morning at around 8 AM GMT, and we had registered all 3,862 DOIs by 12 PM GMT. There are currently a number of specific issues affected a small group of users, but DOI and metadata registration is operating normally for the vast majority of users.
Report: "We are currently experiencing issues with our Solr Index"
Last updateWe have resolved the issue and our search index is working normally again. Our Solr instance was hit very hard by specific requests coming in from our OAI-PMH service that take a very long time to process. We have disabled the OAI-PMH service for the next 24 hours and will put limits on the types of requests possible in place.
We have redeployed a new container for our Solr Search index. Reindexing will take four hours, and we expect our services to then work as expected again.
Since 15.22 GMT today we are experiencing issues with our Solr index, and this affects all downstream services (REST API, Search, OAi-PMH, Content Negotiation). We are investigating.
Report: "Metadata Store (MDS) API Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "DNS issue causing service interruption today"
Last updateToday from 12.52 PM GMT to 15.22 PM GMT DataCite services were not reachable because of a DNS issue. The issue has been resolved and services are working normally again. We are sorry for the inconvenience.