Historical record of incidents for Altmetric
Report: "Altmetric Explorer on-going issues"
Last updateThe root cause has been identified and fixed. Our team monitored Altmetric Explorer until end of business day. The issue is marked as resolved.
A fix for the root cause of the service interruption has been implemented, we are now continuing preventative monitoring until EOB.
Functionality to the Altmetric Explorer has now been restored but we are continuing to investigate the root causes of the interruption in service.
We are continuing to investigate this issue.
We are currently investigating a service outage for the Altmetric Explorer application. We will post further updates as we work through the incident.
Report: "Altmetric Explorer on-going issues"
Last updateWe are currently investigating a service outage for the Altmetric Explorer application. We will post further updates as we work through the incident.
Report: "Data processing delays"
Last updateAll data has been reprocessed successfully. Incident resolved.
Research outputs are being processed now. We expect the full reprocessing to take less than 5h.
There might be some data delays in a small subset of research outputs processed between Wed-Fri that we're currently reprocessing to make sure they're up to date.
Report: "Data processing delays"
Last updateThere might be some data delays in a small subset of research outputs processed between Wed-Fri that we're currently reprocessing to make sure they're up to date.
Report: "Periodic maintenance"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Altmetric core infrastructure will be undergoing periodic updates and maintenance during this time window. No extended downtime is expected as part of this maintenance but occasional brief interruptions to Altmetric applications are possible and will be announced in advance where expected.
Report: "Altmetric Explorer maintenance"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
The Altmetric Explorer will be undergoing urgent essential maintenance operations to its primary database store during this time window. Although all care will taken to avoid any extended downtime, brief interruptions to Altmetric Explorer availability should be expected and planned for. Further updates will be posted during the maintenance window as needed.
Report: "Altmetric Explorer Data refresh delay"
Last updateThe Explorer data refresh has now been successfully completed and Altmetric Explorer data is up to date.
We have identified an issue with the data refresh process which has resulted in a delay for the data available in the Altmetric Explorer. This issue has now been corrected and the data refresh process is now running and expected to complete by EOB today. We will update this incident once the Altmetric Explorer data is up to date.
Report: "Application deployment infrastructure instability"
Last updateAll systems are now operating normally so we are closing this incident.
A fix has been implemented and we are monitoring the results.
We are currently investigating a series of issues causing instability in our application deployment infrastructure. No major impact on client facing products is being reported at present, however there is a possibility that sporadic issues may develop.
Report: "Detail pages API downtime"
Last updateThis incident has been resolved.
Data reprocessing is done and we're now monitoring the results.
A fix has been implemented and we are dealing with data reprocessing in order to bring the API data up to date
The issue has been identified and a fix is being implemented.
Report: "Altmetric website outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Details Page outage and data delays"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues. Altmetric Details Pages are fully operational. The data processing delays only affect news and blogs from today, and only in Details Pages and Details Pages API.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently investigating an issue that is affecting Detail Pages and Data Processing. Our Incident Team is investigating and we will report back in 30 minutes.
Report: "Data processing delays"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we're monitoring the results. The data processing delays only affect news from the day before.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are currently investigating data delays related to upgrade maintenance work performed earlier today. No interruptions to application service have been reported or are expected but new mentions are not currently being processed and updated across our products. We will update as more information becomes available.
Report: "Altmetric Explorer data delays"
Last updateAltmetric Explorer data has been refreshed and is now up to date. Root causes of the delay in data processing have been identified and resolved.
Root cause of incident has now been addressed and Explorer data refresh is underway - we will update once this is complete
The cause of the data delays has been identified and the team is now working on a solution
We are currently investigating issues with the data processing pipelines for the Altmetric Explorer. Data presented is currently >24hrs old and should be treated as stale until this incident is resolved.
Report: "Altmetric Details Page API downtime"
Last updateThis incident has been resolved.
The Altmetric Details API and Detail Pages are now operational again. We are continuing to monitor the situation to ensure full resolution.
We are currently investigating an on-going outage of the Altmetric Details API (api.altmetric.com). We will provide further details once we have identified the cause and a resolution pathway.
Report: "Intermittent access issues to Altmetric Explorer"
Last updateContinued monitoring has not revealed any more connection issues and no further reports have been received.
The root cause of the issues being reported has been identified and a configuration change has been put in place to mitigate the issues. We are continuing to monitor for further reports of connection issues.
We are currently investigating reports of intermittent access issues to the Altmetric Explorer and will update when further details are available.
Report: "Altmetric Explorer functionality issues"
Last updateThis incident has been resolved. Altmetric staff will continue to monitor all services and complete a post-incident review. We thank our customers for their patience and apologise for any disruption.
A fix has been implemented and the issues reported resolved. We'll leave the incident open for a couple hours while we monitor to make sure there are no other issues.
We are currently experiencing issues affecting search, registration and export functionality in the Altmetric Explorer. Our development team are currently investigating a root cause. Our next update will be before 1300 UTC on Thursday 18th May. Thank you for your patience and apologies for any disruption this incident causes.
Report: "Explorer delay in data processing"
Last updateThe Altmetric Explorer data has now been refreshed and confirmed. Altmetric staff will continue to monitor all services and complete a post-incident review. We thank our customers for their patience while we resolved the issue and apologise again for any disruption. There'll be no further updates to this incident.
The data recovery process is going as expected and we are on track to complete the process by tomorrow. Our next update will be before 1000 UTC on Friday 24th March. We’re very sorry for any users who are impacted. The Details Pages and Details Page API remain unaffected.
We've calculated the likely duration of the first steps of the recovery process. In the best case scenario we'd expect the data to be available on Friday, however this may extend into the weekend. Our major incident team are continuously monitoring and seeking opportunities to bring this forward. Our next update will be before 1000 UTC on Thursday 23rd March. We're very sorry for any users who are impacted. The Details Pages and Details Page API remain available throughout.
Our major incident team has identified a backlog of processing which is not expected to clear before early next week. Incremental updates will be posted here as they become available before a full update before 1000UTC on Monday 20th March. We're very sorry for any users who are impacted. The Details Pages and Details Page API remain available throughout.
Report: "Altmetric Details Page API downtime"
Last updateThis incident has now been resolved with all components operational and no further issues detected.
We are currently investigating an on-going outage of the Altmetric Details API (api.altmetric.com). We will provide further details once we have identified the cause and a resolution pathway.
Report: "Altmetric Details API outage"
Last updateThis incident has been resolved
A solution has been implemented and the Altmetric Details API and Detail Pages are now operational again. We are continuing to monitor the situation to ensure full resolution.
We are continuing to investigate this issue.
We are currently investigating an on-going outage of the Altmetric Details API (api.altmetric.com). We will provide further details once we have identified the cause and a resolution pathway.
Report: "Altmetric Explorer data delays"
Last updateThis incident has been resolved.
We've fixed the issue with the data refresh process and the data is now up to date. We are therefore closing the incident at this time and will be following up internally to identify any improvements and safeguards applicable to prevent these issues from re-occurring.
We are currently experiencing delays to the Altmetric Explorer data refresh. The Explorer data can be considered current up to Wed, 01 Mar 2023. We are currently working on a fix and will update when the data is up to date once again.
Report: "Altmetric Explorer data refresh delay"
Last updateWe are now satisfied that the data refresh issues initially encountered have been fully resolved and the data refresh process has remained stable. We are therefore closing the incident at this time and will be following up internally to identify any improvements and safeguards applicable to prevent these issues from re-occurring.
All data refresh and cache warming operations are now running as scheduled. We continue to monitor the situation for any further issues.
The data refresh process for the Altmetric Explorer has now successfully finished and the Altmetric Explorer data is now fully up to date. We are currently running the subsequent cache warming processes to improve the performance of the system when accessing the new data and will mark the systems as fully operational once this final step is complete.
The data refresh process is currently running for the Altmetric Explorer and we expect it to complete before 2PM GMT. We will update again once the Explorer data is up to date and to advise on further steps that we will be taking before closing the incident.
Remediation work has been resumed on the issues affecting the Altmetric Explorer data refresh. We are currently in a position where we have been able to start the data refresh process and are currently monitoring its progress. We will post a further update at 11AM GMT.
Our teams continue to work on remediation for the issues affecting the data refresh process for the Altmetric Explorer. At this time the Altmetric Explorer is stable but further work is still required to bring the data up to date. None of the other Altmetric products are currently affected by these or any other issues. Although an additional operational plan has been put together and work continues in the background we no longer expect a resolution to these issues today and will resume incident communication at 9:30AM GMT tomorrow.
Unfortunately we have encountered further issues trying to restore functionality to the data refresh process and continue to work on a remediation pathway. The Altmetric Explorer remains functional with data current up to February 14th. We will post a further update at 5:30 PM GMT
The majority of the remediation infrastructure work planned has now been completed without incident. We are continuing to work on minor configuration changes that will allow us to re-enable the data refresh process for the Altmetric Explorer and will update again at 3PM GMT.
Remediation work is continuing as planned at this time. We will release a further update at 1PM GMT.
The remediation work aimed to address the data refresh issues affecting the Altmetric Explorer has started as scheduled. Short interruptions to Altmetric Explorer functionality are to be expected throughout the day until this emergency maintenance is complete, so we ask all users to plan accordingly. We expect to post a further update at 11AM GMT.
Unfortunately the investigation so far has revealed that the issues preventing the Altmetric Explorer data refresh are more complex than initially estimated and will require further infrastructure work. As this will involve periods of interruption to Altmetric Explorer functionality, an emergency maintenance window has been scheduled for tomorrow morning, Feb 16th, beginning at 9:00 AM GMT time. We will post further updates at that time and through the course of the maintenance window, with the aim being to restore full Altmetric Explorer functionality by end of business hours on Feb 16th.
We are continuing to work on a solution for the underlying issues preventing the Altmetric Explorer data refresh from completing successfully and will issue a further update at 5:30PM GMT.
The issues preventing a successful Altmetric Explorer data refresh have been identified and we are currently working on a resolution. We will update further once a solution has been put in place and we have an estimate on when the data refresh will be completed.
We are currently investigating issues with the data refresh for the Altmetric Explorer. Customers may not be able to see the latest available data in the Altmetric Explorer while we identify and resolve the underlying causes. Explorer data is expected to be current up to Feb 14th 11AM GMT.
Report: "Altmetric Explorer outage"
Last updateThe incident is now fully resolved.
We are continuing to monitor for any further issues.
The root cause for the incident has now been remediated and normal access to the Altmetric Explorer has been restored. We will continue to monitor the situation and perform post-mortem diagnostics for the next 3 hours. Final update will be posted once this has been completed.
Initial investigations have indicated a problem with the backing database for the Altmetric Explorer. Remediation work is currently under way to restore database functionality and subsequently access to the Altmetric Explorer.
We are currently investigating this issue
Report: "Altmetric website outage"
Last updateThis incident has now been resolved. The Altmetric website is functioning as normal.
The Altmetric website is currently unresponsive - we are aware of the issue and are investigating. The Explorer and all APIs should remain functioning normally.
Report: "Delays in data updates/processing for Altmetric Explorer"
Last updateThe Altmetric Explorer data has now been refreshed and confirmed. Altmetric staff will continue to monitor all services and complete a post-incident review. We thank our customers for their patience while we resolved the issue and apologise again for any disruption. There'll be no further updates to this incident.
The root cause of this issue has been resolved, a fix implemented and the data processing for the Altmetric Explorer is now in progress. We anticipate that the data in the explorer will be refreshed in approximately two hours. Data accessed via the Details Pages and Details Page API are unaffected. We'd like to apologise for any disruption this may have caused. Our next update will be before 1700 (5pm) UTC.
There is currently a delay in data processing impacting the Altmetric Explorer. The specific issues involved have been identified and our engineers are currently working to resolve the situation with a data refresh to follow once this has been fully resolved. We will update this incident before 3PM UTC
Report: "Altmetric Explorer and Detail Pages outage"
Last updateFollowing electrical work within the datacenter, our equipment has been online and stable for several hours. The Altmetric team continue to monitor the service. This incident will now be closed.
Altmetric staff have enacted secondary services as a workaround which has restored access to all Altmetric services. We shall continue to monitor the performance of the service and be ready to re-enable the primary systems once they are restored. We apologise for the interruption to service and will be working with our hosting supplier to conduct a root cause analysis.
The Altmetric hosting provider has detected a critical electrical connection issue that has required them to make some of our infrastructure unavailable. In order to recover quickly, Altmetric staff have enacted the disaster recovery switchover to secondary equipment in an alternative location. We anticipate the Explorer and Details Pages being returned to service within the next 30mins. Our next update will be at 16:15 UK / UTC.
We are currently investigating an issue that is affecting the Explorer & Detail Pages. Our Incident Team is investigating and we will report back in 30 minutes.
Report: "Explorer delay in data processing"
Last updateAll customer facing impacts have now been resolved. The incident team will now review the incident, identifying lessons learned and implementing improvements to reduce the likelihood of recurrence and speeding up recovery times.
Our incident management team have now reached the final stages of this incident and are processing the current news mention backlog. We shall be doing this over the course of the next couple of days so that normal operation remains unaffected. The next update on this incident will be by 1700 UTC/UK on Friday 12th November.
We are nearing resolution for the issues currently impacting news mention processing and expect to be able to resume updates as early as tomorrow morning. We expect the current news mention backlog to take some further time to process once the stream is up and running. We will post an update tomorrow with our progress at that time.
We can confirm that the data lost from the original incident has now been recovered and fully synchronised across our databases. Unfortunately during this recovery process, it had an unexpected impact on our news processing, and as result we are still behind in processing news mentions from November 2nd onwards. The team are working to catch up and we will have another update on Monday 8th November. We thank you for your continued patience while we ensure the stability and integrity of our data.
Our incident management team have completed the recovery of the missing data and are now synchronising this data across our databases. Doing this while our data is continuously changing means doing so slowly however we have reduced our recovery time objective. We had thought that it could take up to another week to fully recover and test all aspects of the data, we've revised this estimate to the end of this current week. Our teams will continue to work until the incident is resolved and if anything changes in the meantime, we shall update this page. Thank you for being patient while we ensure the stability and integrity of our data.
Our incident team have continued to work on validating the accuracy of our data following the incident. We identified: Up to a 2 hour period from 25th Oct that has now been resolved Up to a 7 hour period from 20th Oct The outstanding data from the 20th, requires our teams to rebuild the data in an alternative environment for testing prior to repeating in Production. This is a slow process of recreating and manually checking, which is likely to take us at least 2 weeks and we'll continue to report our progress here. We would like to apologise again for any impact that this incident has had on our users.
The system has successfully worked through the backlog of data, and all data has been successfully synced apart from a small 2 hour period that will require some manual curation to re-sync. We are setting out a plan to conduct this work and will continue to keep this incident updated with progress.
The system is continuing to work through the processing backlog and at current rates we hope to be back up to date within the next 24 hours. This means that vast the majority of data will be correct between the Details Pages and the Explorer once our daily snapshot process has completed on Wednesday 27th October. We will update again tomorrow morning (Tuesday 26th)
Our major incident team has identified a backlog of processing which is not expected to clear before early next week. Non-essential processing has been paused to allow the Altmetric Explorer processing to take priority. Incremental updates will be posted here as the become available before a full update before 1700UTC/1800BST on Monday 25th October. We're very sorry for any users who are impacted. The Details Pages and Details Page API remain available throughout.
Our major incident team have reconvened this morning to review the overnight processing progress and are assessing the likely recovery time for Altmetric Explorer data. The impact on our Explorer services remains the same and we expect to provide a further update before 12:00UTC/1400BST 22nd Oct
We have identified the root cause of the issue which relates to a backend service responsible for preparing the nightly snapshot used to populate the Altmetric Explorer database for our Explorer application and API. Our teams are working to identify the quickest way to restore full access to the latest data while safeguarding the availability of the service. We would like to apologise for the inconvenience to our users and will provide a further update before 0900UTC/1000BST 22nd Oct.
We are currently investigating the root cause of an issue which is causing: New mentions are not visible in the Altmetric Explorer since 23:59 on 19th Oct Research outputs that have not previously been mentioned, have not appeared in the Altmetric Explorer since 23:59 on 19th Oct The Altmetric score on Publisher badges may not match the score within the Altmetric Explorer Altmetric searches may not be as performant as customers would normally experience The explorer database is usually updated on a nightly basis, this means that one update is currently missing for customers.
Report: "ISP Network Outage"
Last updateOur datacenter ISP has confirmed full resolution of the incident and we have verified that our services are now fully operational.
Our datacenter ISP has restored network connectivity and we are working to check status of services and data pipelines across our infrastructure. The Altmetric Explorer, Detail Pages and API are now operational. We will continue to monitor the situation as we wait for our ISP to confirm full resolution of issues.
The current network outage affecting our datacenter ISP seems to have occurred during routing network equipment maintenance and we are currently working with them to obtain a timeline for service restoration.
We are currently investigating a widespread outage affecting all of our services that seems to be due to a complete network outage of our main datacenter ISP.
Report: "Altmetric outage affecting multiple services"
Last updateThis incident has been resolved, all services remain stable and performant, all mentions have been processed.
We are continuing to monitor for any issues, there is a two hour window where we are required to reprocess mentions and this incident will remain open until that activity is complete. We expect the incident to remain open until the next update at 1800 BST. In the meantime, all services remain up and operating as normal.
All services have now been restored. We have requested a post-incident root cause report and are monitoring the performance of our services. We expect to monitor for the next 12-15 hours at which point this incident will be closed. We apologise for the interruption in service for our customers.
Our data continues to synchronise across our databases. All servers have been restored. We expect full service to be resumed shortly. Our next update will be before 2100 BST
The Altmetric website and Explorer have been returned to service. The Details Pages and API are being restored now. 50 of 58 servers are restored and data is now synchronising across our databases. We expect full service to be resumed shortly. Our next update will be before 1800 BST
Our hosting provider has acknowledged the root cause and is working to manually restore our services. At present, we have approx 10% restored so far. We shall continue to monitor the restoration and provide updates here. The next update will be before 1630 BST.
We currently experiencing a technical outage across most of our customer facing services. This relates to an issue with our external hosting provider. We are in direct contact with our hosting provider’s leadership team and working to restore service as a matter of the utmost urgency. We expect this to happen quickly but we do not have an actual estimate from them as yet. We shall keep the service status page updated as regular as possible. Our Major Incident Team is in place and ready to respond once our hosting provider resolve the root cause. Our next update is expected before 15:30 BST
We are currently investigating an issue that is affecting multiple services. Our Incident Team is investigating and we will report back in 30 minutes.
Report: "Data Processing Delays - new mention processing"
Last updateThis incident has been resolved and mentions are processing as normal.
Our data processing infrastructure is operating with reduced resilience which is causing a slight delay when processing new Mentions. Customers may notice Mentions taking a little longer than usual to appear. No data has been lost and the system should be caught up shortly.
Report: "Altmetric API and Detail Pages outage"
Last updateThis incident has been resolved.
The main fault has been corrected and full functionality has been restored to the Altmetric API and Detail Pages. We will be continuing to monitor the situation over the next hour and begin a full post-mortem inquiry into the root causes of the outage.
The underlying issue has been addressed and all Altmetric services are back to normal.
We have currently identified a fault with one of the database systems backing the Altmetric API which is causing an outage for this application as well as for the Altmetric Detail Pages. We are currently working to remedy the fault and hope to have these systems operational shortly. Updates will be posted as the situation evolves.
Report: "Degraded performance for Altmetric Explorer"
Last updateThe Altmetric Explorer is stable with no further issues reported so we are closing this incident.
The reported issues have been tracked down to a long-running database query that has now been addressed and we are currently seeing no further errors or performance issues with the Altmetric Explorer. We will continue to monitor the situation for the next hour.
We are currently investigating reports of degraded performance and sporadic errors for the Altmetric Explorer.
Report: "Fastly CDN global outage"
Last updateFastly has resolved their on-going incident and Altmetric applications continue to be available as normal. As such we are now closing this incident as well.
Altmetric applications are now operational again but our team will continue monitoring the situation until the Fastly incident is confirmed as fully resolved.
Our upstream provider, Fastly, has now identified the issue and begun implementing a fix. We will continue monitoring while the fix is being deployed to ensure services return to normal operation.
Our upstream content distribution network provider, Fastly is currently experiencing a global outage of their primary services which is critically affecting all Altmetric applications. The incident is being investigated by Fastly currently with updates available on their own status page at https://status.fastly.com/. We will update when further information becomes available.
Report: "Altmetric Explorer outage"
Last updateThis incident has been resolved.
The root cause of the outage has been identified as a cascading failure of the primary database instance powering the Altmetric Explorer due to a failure of replication to secondary instances. The main fault has been rectified with the primary database now online and functionality restored to the Altmetric Explorer. We will continue monitoring the situation for any further issues and proceed with a full post-mortem analysis during the course of the day.
We have identified an incident involving the main database powering the Altmetric Explorer resulting in a complete loss of functionality and are currently working on a solution to restore service. Further update will be posted at 5AM BST.
Report: "Altmetric Explorer outage"
Last updateAll systems look stable and no further outages have been detected.
The Altmetric Explorer application was determined to be in a non-responsive deadlocked state and a general restart of all instances has been issued. This seems to have resolved the outage and the Altmetric Explorer is now accessible with initial tests showing normal functionality. We will be continuing to monitor the situation for any further issues and will perform a detailed investigation of the ultimate cause of the incident.
We have detected an on-going outage for the Altmetric Explorer and are currently investigating.
Report: "Largescale incident with a number of services inaccessible"
Last updateWe are pleased to report that the Major Incident is now closed, our Major Incident Team has disbanded and all services are returned to normal operation. If you see anything that you believe to be incorrect, please contact your support team. Our teams shall be working to review the Major Incident for opportunities to improve our resilience and disaster recovery and we would like to take this opportunity to thank our customers for their patience throughout. There will be no further updates to this incident.
We are continuing to finalise the resolution of our last outstanding piece of processing. This relates to a very small number of social mentions with negligible customer impact however we will continue until fully resolved. Our next update will be before 1700 UTC Thursday, 1st April.
After the bulk processing was completed, we identified an edge case effecting a small number of social mentions. We are working on a manual fix for this after which we hope to close the incident. Our next update will be before 1700 UTC Wednesday, 31st March.
We have now completed our bulk processing of the missed attention that occurred as a result of the major incident earlier this month. Tomorrow we will be performing our final checks to ensure everything is restored. Our next update will be before 1700 UTC Tuesday, 30th March.
We are now in the final stages of processing the missed attention and we expect this to complete early next week. Our next update will be before 1700 UTC Monday, 29th March.
We are now in the final stages of processing the missed attention and we expect this to complete early next week. Our next update will be before 1000 UTC Monday, 29th March.
Our team are working to complete the processing of missed attention and this is expected to continue later into the week. Our next update will be before 1700 UTC Friday, 26th March.
All customer facing services are fully operational. Our team are working to complete the processing of missed attention and this is expected to continue later into the week. Our next update will be before 1000 UTC Friday, 26th March.
We are continuing to process our backlog of mentions and we expect this to continue until later into the week. Our next update will be before 1030 UTC Thursday, 25th March.
We are continuing to process our backlog of mentions and we expect this to continue until later into the week. Our next update will be before 1700 UTC Wednesday, 24th March.
We are continuing to process our backlog of mentions, and so are continuously adding mentions to our badges, details pages and API. Unfortunately the rate at which we are processing is slower than we expected and proving harder to predict. This means our completion date for processing the backlog is likely to be pushed out into later this week. Our next update will be before 1000 UTC Wednesday, 24th March.
We are continuing to process our backlog of mentions. We aim to provide a revised estimate for completion later today. Our next update will be before 1700 UTC Tuesday, 23rd March.
We are continuing to process our backlog of mentions. We aim to provide a revised estimate for completion tomorrow. Our next update will be before 1000 UTC Tuesday, 23rd March.
The backlog of reprocessing continues and in order to provide the best service to our customers, we are prioritising the processing of recent mentions. In addition, we're taking a very cautions approach to the speed of reprocessing in order that customer performance across our services remains consistent. Our next update will be before 1700 UTC today, Monday, 22nd March.
We are continuing to process our backlog of mentions and we expect this to continue over the weekend. Our next update will be before 1000 UTC Monday, 22nd March.
We are continuing to process our backlog of mentions and we expect this to continue over the weekend. Our next update will be before 1700 UTC today, Friday 19th March.
We are continuing to process our backlog of mentions and we expect this to continue over the weekend. Our next update will be before 1000 UTC tomorrow, Friday 19th March.
The details pages, explorer and API's are all synchronised. We have started the process to correctly attribute the remaining mentions and processing new and queued mentions. Our next update will be before 1700 UTC today, Thursday 18th March.
We have identified and removed 97% of misattributed mentions caused by our recent outage. This is now reflected in the Detail Pages and API. The Explorer will need some more time to process the changes, and we expect this to be available before the end of today. Over the course of the next week, we will be focusing on correctly attributing the remaining mentions to more recent research outputs, re-enabling the processing of new and queued mentions and taking steps to improve resilience. Our next update will be before 10am tomorrow.
Our teams continue to work on the incident, we continue to progress our restoration plan and and our next update will be before 1800 UTC today, Tuesday 17th March.
In order to provide our customers with as much information as possible, we have created a blog post which holds some additional information for our customers. <a href="https://www.altmetric.com/blog/customer-update-altmetric-technical-major-incident/">https://www.altmetric.com/blog/customer-update-altmetric-technical-major-incident/</a> Our teams continue to work on the incident and our next update will be before 1000 UTC tomorrow, Tuesday 17th March.
Our Counts and Commercial API, including badges and details pages are now fully restored. Our work to resolve the issue concerning several mis-attributed mentions is ongoing. We will provide another update before 6pm.
We are continuing to work on a fix for this issue.
Our data centre provider has restored functionality to our hardware infrastructure and we are currently working to re-synchronise databases and bring services back online. We expect our services to start restoring within the next hour. Our next update will be at 12pm.
Unfortunately, one of our data centres is suffering from an outage which has taken down our Counts and Commercial API, including badges, and details pages. We do not have an ETA for a fix on this, but we are investigating work arounds. Our next update will be at 10am.
We have identified the root cause of several mis-attributed mentions which we are in the process of resolving. This will require us to restore mentions data to a point in time (9th March) and to review all mentions processed since then to ensure they're attributed correctly. Our next update will be before 1700 UTC tomorrow, Tuesday 16th March.
Following our response to the Major Incident declared by our hosting provider last week, we have identified several mis-attributed mentions which we are in the process of resolving. As a further measure, we are planning to review all mentions processed since the Major Incident and ensure they're attributed correctly. Our next update will be before 1700 UTC today, Monday 15th March.
Following a stable weekend, the Major Incident Team are regrouping this morning to assess outstanding actions with a view to resolving the incident and completing the post incident report. Our next update will be before 1700 UTC today, Monday 15th March.
All Altmetric services have been restored and queues restarted. System resilience has been restored and data is being synchronised across our datacentres. The incident shall remain open at a reduced severity while we monitor performance and backup processes over the weekend. We continue to see no direct customer impact in accessing our services. We will update again by Monday 15th March at 10am.
We are continuing to work on a fix for this issue.
We are continuing to work on a fix for this issue.
Our teams are processing the backlog of missed mentions and related data, and we are working across our infrastructure teams to bolster capacity and resilience for the weekend. This incident will remain in a monitoring state until we are confident that we have fully addressed all systems affected by the outage and returned them to their previous state. We will update again by Monday 15th March at 10am.
The stability of our critical systems is holding steady, and we are making our way through the backlog of missed mentions from the last 24 hours. You may still experience some degradation in performance within the Explorer, but this should continue to improve over the coming days. Our next update will be by 5pm today.
We have successfully stabilized all of our critical systems, and improved our capacity with temporary servers to mitigate performance issues while we fully recover and reprovision our lost infrastructure. We are receiving new mentions, but there will be a delay in these being visible in our platform as we work through the backlog and pull in mentions that were missed during the outage. We will update this page again by 9am tomorrow morning.
We have successfully restored all of our customer facing services, though we continue to operate with reduced server capacity so some services may be slower than normal. There is still a delay to populating the system with new mentions, so you may not see the mentions you expect attached to research outputs. Now that our core services are restored, we will begin working to retrieve any missed mentions, and get the pipeline of new mentions back to a stable state. Our next update will be at 5pm.
Our team has begun bringing services back online. Our database capacity is still reduced, so we will be monitoring our server capacity carefully and this will lead to a delay in mentions being shown in the Explorer. Our next update will be by 2pm.
Our team is now working through a plan to restore necessary infrastructure, and we will continue to update as we restore our service capacity back to normal levels. Our next update will be at 12pm.
Good morning. The team has returned to assess the changes we need to make to get back into a healthier position. Overnight on March 10th 2021, our data centre provider suffered a major incident, during which one of the data centres was destroyed. This leaves us with less capacity than normal to provide our services with some noticeable impact. You may notice lack of availability or degradation of the Explorer, Details Pages and Badges, and some Explorer API endpoints are not responding. The Details Page API should be unaffected, but will be up to date only from 1.30am. New mentions are not processing at the moment (including removing mentions which should be removed). Our team is now focused on restoring key services, and then reclaiming missing data from our sources. We are not currently expecting any permanent data loss, however we will update this page immediately if that changes. Our next update to this page will be at 11am.
We've been made aware of a fire in one of our datacentres. This means that we're not running as many instances of our software as we would like. This could present itself as some services being slow, as usage is not spread across as many machines. Our main data processing pipeline is also offline at the moment, so new mentions will take some time to appear on details pages and in the Explorer. As it's 3am here in England, and our services are largely stable, work to return data processing will be continued in the morning. We are hopefull that all data during this downtime can be recovered.
An issue was spotted with our API and its service has been returned.
There's an outage on our provides side which has caused disruption to one of our datacentres. We're currently able to provide services only from one of our datacentres, which will cause disruption in some places.
We're currently attempting to track down the reason behind a number of our services being unresponsive. Signs at the moment point towards a data centre having issues.
Report: "API outage"
Last updateAfter a period of monitoring, all systems appear to be stable. Thank you all for your patience.
After further investigation, we're more confident in the cause of the incident. It is not likely to happen again. We'll monitor for the next three hours.
The cause of the issue has been found to be a misconfiguration of our service discovery software. This is being fixed now to avoid further issues.
All services are currently expected to be working as normal. The root cause is still unidentified however, so we're staying on high alert until we have more information.
We are currently investigating an outage in our core APIs.
Report: "Some details pages are returning an error message"
Last updateThis incident has been resolved.
Some details pages across Altmetric are experiencing issues today, which will leave some inaccessible for the time being. We're aware of the cause, and are implement a hotfix soon. This will remove one of the tabs from "Score in Context", which will be added back as soon as our data has been reprocessed.
Report: "Explorer and Details Pages performance degradation"
Last updateThis incident has been resolved.
We've implemented a fix for our performance issues and things are now back to normal.
We've identified performance degradation issues with some of our products and we are working on a fix.
Report: "Service interruption"
Last updateThis incident has been resolved.
A fix has been implemented and service functionality fully restored. We are now monitoring the situation for any further problems
The core issue has been traced to newly introduced configuration and a fix is being implemented.
We are currently investigating an infrastructure incident that is causing service interruption for several Altmetric services. We will update when we have identified the cause.
Report: "Unplanned Explorer outage"
Last updateWe've been made aware, and resolved, an issue that affected our main load balancers causing many parts of altmetric.com to become unresponsive. We apologise for the interruption and are confident your experience should return to normal now.
Report: "API Outage"
Last updateAll dataset reloads have completed succesfully and we have now verified that the API and dependent services are fully operational and in a stable state.
Dataset reload has been completed on 2/3 instances and we are currently monitoring the third for completion. API functionality is currently restored for all services but we will continue monitoring for the next hour to ensure service is fully stable.
The issues have been traced back to a database primary instance restarting and causing a full re-sync and dataset reload across all secondary instances, thus blocking requests while the dataset is loaded into memory. We are currently working to resolve the situation and restore API service and will update shortly.
We are currently experiencing issues with the main Altmetric API (this includes basic, badges and Detail pages functionality) and are currently investigating the cause.
Report: "Network outage"
Last updateNetwork conditions looks stable and all services have been restored to normal functioning.
ISP network issues have now been resolved and Altmetric applications should once again be available. We are currently performing post-incident recovery and monitoring and will update when the network state is considered stable and the outage has been fully resolved.
Our ISP has confirmed there is an ongoing network incident and engineers have been dispatched to diagnose and resolve the issue.
We are currently investigating.
We are currently investigating a network level incident affecting our core infrastructure. Currently there are on-going outages for the Explorer and Detail Pages as well as our main website, but API access should be unaffected.
Report: "Delays in new Twitter data"
Last updateThe issues we were seeing with our Twitter feed are now resolved. Details pages will be immediately showing the Twitter mentions which were recovered, whilst some of those mentions will take up to 12 hours to propagate to the Explorer.
We've resolved the issue with the Twitter feed and are now processing the newly received data.
We're currently experiencing issues with our Twitter feed, which means new twitter data is being delayed. We're working to get this resolved and restore all data as soon as possible.
Report: "Datacenter outage affecting multiple Altmetric products"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
Our hosting provider has confirmed that they are having major network issues. We are monitoring the situation as things are slowly recovering.
We are currently investigating an outage affecting multiple Altmetric products.
Report: "OVH Network Issue"
Last updateThe backend network issue is now resolved.
All applications are now fully operational. We're flagging the network issue with OVH and will continue to monitor until the incident is fully resolved.
We are having an issue with OVH network. We failed over to our secondary servers, which means that applications have limited functionality (no writes are allowed). We are working on solving the issue.
Report: "Intermittent outage on altmetric.com"
Last updateFollowing some recent configuration changes to the altmetric.com website we started noticing intermittent 502 error messages being generated. An investigation revealed some resource overload problems on one of the machines in our infrastructure and consequently the problematic node has been remove from deployment rotation. This has resolved the 502 error issues and an investigation will follow into the root causes of the overload on the infrastructure node as well as ways we can address this quicker/automatically in the future.
Report: "Network incident"
Last updateThis incident has been resolved.
We have now successfully switched over to our secondary database and functionality is restored to the Altmetric Explorer and Details Pages. We'll continue to monitor the situation.
As the network issue seems to be affecting one of our database primaries, we are now failing over to our secondary to restore functionality to our applications.
We're currently investigating a network incident at our hosting provider, OVH, which is preventing some of our applications from contacting essential services and therefore causing issues with the Altmetric Explorer and Details Pages.
Report: "Network Issue"
Last updateAll services are operational and network issues have been resolved.
Connectivity has been restored, we will continue to investigate the network issue until that is fully resolved.
We are continuing to investigate this issue.
We're currently having network issues with our primary databases. We'll continue to monitor the situation and post an update once resolved.
Report: "Strasbourg Data Centre Outage"
Last updateAll servers are now back online and we are recovering news mentions.
Most user-facing services are now restored (Explorer for Institutions, Explorer for Publishers, Details Pages) but some of our servers responsible for mainstream media monitoring and policy document mining are still unavailable. A detailed account of the OVH outage is available at http://status.ovh.net/?do=details&id=15162 and we continue to liaise with them to fully restore all services.
Most of our servers are once again available but we're still seeing issues within our hosting provider's network which we continue to monitor while restoring services as best we can.
Access to our servers in Roubaix has been restored but we're still waiting for the second data centre in Strasbourg to come back online before we can fully restore service.
Our hosting provider OVH have now posted an update that power is being restored to the Strasbourg data centre with an ETA of 15 minutes and that the corruption in the Roubaix data centre is being fixed by restoring a backup with an ETA of 30 minutes, please see https://twitter.com/olesovhcom/status/928552251458818048 for more information. Once access to our servers is restored, we'll work to bring the main web site, Explorer and details pages online as soon as possible.
Disruption across all services continues due to an outage with multiple data centres at our hosting provider OVH.
We're currently experiencing issues across Altmetric services due to an electrical fault with one of our data centres in France. Our hosting provider OVH is investigating the issue and we'll keep this incident updated as the situation develops.
Report: "DNS Resolution issues"
Last updateInternal domain DNS changes are now confirmed to have fully propagated and all services are operating normally.
Public facing services continue to be operational and we are monitoring global DNS propagation of the internal domain.
All public facing services are now restored to full operational status. The root cause of the outage has now been resolved, however we are still working to alleviate internal DNS resolution issues whilst monitoring the situation and waiting for full propagation of authoritative domain nameserver changes.
We are currently encountering an issue with authoritative nameservers on one of our internal domains, the cause of which has been traced to a faulty domain transfer. This has caused a DNS resolution outage in certain parts of Altmetric infrastructure which is impacting access and performance across a number of our services. Action is ongoing to remedy the problem and updates will be posted as they become available.
Report: "Altmetric Explorer for Institutions Primary Database Failover"
Last updateThe Explorer for Institutions database has been stable and was updated with the latest data at 3:21 PM. Closing this incident until the old primary has fully resynchronised and we decide to switch back over.
At 1:41 AM this morning, we received alerts that the Altmetric Explorer for Institutions health check was failing. Shortly afterwards, we received a message from our hosting provider OVH that a fault had been discovered on our primary database server and that their on site technicians would intervene. At 2:15 AM, while waiting for the intervention, we switched over to our secondary database server, updated our internal DNS configuration and restarted relevant applications, restoring access to the Explorer by 2:24 AM. OVH completed their intervention at 2:30 AM, restarting our primary database server and bringing it back online. We continue to monitor the two servers and ensure everything is running as expected.
Report: "OVH Network Issue"
Last updateOVH have now closed the original incident and all services are operational.
Connectivity has been restored and the API is responding once again. OVH continue to investigate the network issue and we'll continue to monitor until that is fully resolved.
We're currently having issues with our API servers due to an on-going network problem at our hosting provider, OVH: http://status.ovh.co.uk/?do=details&id=14316 We'll continue to monitor the situation and post an update once resolved.