Historical record of incidents for Logit.io
Report: "Some Stacks not appearing on the Dashboard"
Last updateSome customers are reporting that stacks do not appear on their dashboard. This is not related to the upcoming maintenance, we are investigating. We rolled back the latest build which resolved the issue
Report: "[Logs] [Europe] Filter server offline"
Last updateThe server is back online. All logstash filter containers have started up again.
We have started additional filter processes and almost all stacks should have processed their backlogs The provider has reported a power issue and are expecting to provide their next update in 3 hours
Our alerts have detected one of our filter servers has gone offline. No data should be lost, processing of incoming logs may be interrupted for some stacks.
Report: "New Stacks and DNS Changes are failing due a cloudflare outage"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Report: "Cloudflare Pages and Load Balancers are offline"
Last updateThis incident has been resolved.
It appears that cloudflare has resolved the issue for our Marketing site and Dashboard Authentication
Dashboard is operational for existing logged in users. Our authentication provider has the same issue, and new users will not be able to Login
Our upstream provider cloudflare has an outage, we will continue to monitor. https://www.cloudflarestatus.com/incidents/l6x2h1zp69bc Data ingestion is still operational
We are currently investigating this issue.
Report: "Issues connecting to our dashboard"
Last updateOur provider has resolved the issue
We have applied some mitigation and you should be able to access our dashboard, we will continue to monitor our upstream provider
We are having issues with some load balancers due to our provider. https://www.cloudflarestatus.com/incidents/cx141zv0rfd8
We are currently investigating this issue.
Report: "UK - Opensearch underlying hosts are offline"
Last updateAll stacks are operating as we expect and processing messages
Stacks are now processing queued messages
Servers are back online, we are just monitoring the recovery
We should have the Rack back in less than 1 hour
A rack at our data centre is offline effective a number of hosts, we are working with our provider to restore access. Update in 1 hour
We are currently investigating this issue.
Report: "Network Outage to Opensearch Dashboards and Kibana"
Last updateWe had a number of alerts for Opensearch Dashboard and Kibana instances that could not connect to the public network with our provider. We have confirmed that connections have been re-established and are working with our provider to understand the root cause of their outage. We had connectivity issues between 22:57 to 23:22 UTC
Report: "A number of US Kibana and Grafana Hosts are offline"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Our engineers are continuing to investigate this issue and will update in 1 hour.
Our engineers are continuing to investigate this issue and will update in 1 hour.
Our engineers are continuing to investigate this issue and will update in 1 hour.
We are continuing to monitor for any further issues.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
Our engineers are continuing to investigate this issue and will update in 1 hour.
Our engineers are currently investigating this issue and will provide an update in 30 minutes.
Report: "Logit.io Response to Apache Log4j 2 CVE-2021-44228"
Last updateAll stacks have been patched and are operational. We have removed the org/apache/logging/log4j/core/lookup/JndiLookup.class file from all installations of log4j
All Elasticsearch clusters in the UK and US regions have been patched and updated. We continue in parallel to update the EU region
We can confirm we have applied the patches to all Logstash Inputs for Stacks in the EU Regions.
Update: We can confirm that we have now also applied the patches to Logstash for Filter nodes in the EU region.
Update: We can confirm that we have now also applied the patches to Logstash for Filter nodes in the EU region.
Update: Our engineers are in the process of rolling out a patched version of Logstash for both Input and Filter nodes for all active versions of ELK. We can confirm we have applied the patches to all Stacks in the UK and US Regions, and we are awaiting the completion of the EU Region.
Update: Our engineers are in the process of rolling out a patched version of Logstash for both Input and Filter nodes for all active versions of ELK. We can confirm we have applied the patches to all Stacks in the UK and US Regions, and we are awaiting the completion of the EU Region.
Follow updates here on how our teams are currently responding with the highest priority to vulnerability CVE-2021-44228 that is impacting multiple versions of the Apache Log4j 2 which was disclosed publicly via Github on December 9th 2021. Logit.io engineers and security incident teams continue to actively analyse, identify and where necessary patch all affected log4j versions across all Logit.io data centre instances. Logit.io are initially updating and following the current remediation advice as a first priority as detailed here https://github.com/advisories/GHSA-jfh8-c2jp-5v3q "In previous releases (>=2.10) this behaviour can be mitigated by setting system property "log4j2.formatMsgNoLookups" to “true” or by removing the JndiLookup class from the classpath (example: zip -q -d log4j-core-*.jar org/apache/logging/log4j/core/lookup/JndiLookup.class)" This remediation will be applied to all versions of Logit.io instances and we will post further updates here as this is rolled out. In addition the Logit.io security team has analysed its logs and updated its monitoring of all internal services to include active alerting of any attempts to issue remote code execution via the JndiLookup class. The information leak in Log4j does not permit access to data within an Elasticsearch cluster but allows an attacker to extract certain environmental data via DNS. The data leaked is limited to that available via Log4j lookups. If you have any questions or concerns, please reach out to us via the normal support channels or email support@logit.io. You can also choose to subscribe to updates at the top of this page to receive all future notifications.
Report: "Elasticsearch server offline"
Last updateThe server is back online - all stacks are showing green again
One of our elasticsearch servers is offline. The issue has been raised with the provider and we will begin migrating the affected stacks onto other servers shortly.
Report: "Dashboard.logit.io unavailable for some customers"
Last updateIn the morning of October 13th at 07:30 am UTC, [Logit.io](http://Logit.io) on-call engineers were automatically alerted to a new issue affecting the stability of the platform. Our engineers responded quickly to identify the root cause of the problem and by 07:50 UTC had confirmed with our incident response team that the issue was relating to an intervention on the underlying infrastructure provider, which led to disturbances on the entire network. These interventions were aimed at reinforcing anti-DDoS protections. Our teams worked closely with the underlying infrastructure providers teams who then isolated the equipment at 08:15 am UTC, restoring the normal service. Our engineers then performed a series of health checks across the infrastructure and individual stacks to confirm the incident was resolved. We sincerely apologise to all of our customers affected by this incident and we commit to be as transparent as possible about the causes and consequences in relation to the incident.
This incident has been resolved.
Our engineers have reported that all services are available. Our engineers are now performing healthchecks across all Stacks.
Our engineers have reported that we now have connectivity and are continuing to check that all services are fully operational
The issue persists with the underlying networking with our hosting provider, our engineers are working with them to resolve. We will provide an update in 30 minutes.
Our engineers have reported that all services are available. Our engineers are now performing healthchecks across all Stacks.
The issue is with the underlying networking with our hosting provider, our engineers are working with them to resolve. We will provide an update in 30 minutes.
We have a network outage with our provider, we are working with them to fix their networking issue. We will update in 30 minutes
We have been notified that dashboard.logit.io is unavailable for some customers and our engineers are actively investigating. We will update in 30 minutes.
Report: "Server offline due to provider maintenance"
Last updateThis incident has been resolved.
Servers are all back online - all stacks should be returning to green
We have a server offline due to provider maintenance
Report: "A number of UK Elastic search nodes are offline"
Last updateThe components have been replaced
There is an underlying networking issue with some of the servers, this has been identified and the faulty components are being replaced
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Reports of issues connecting to kibana instances"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Elasticsearch server offline"
Last updateThe server is back online and all stacks are reporting healthy
One server is offline, Performance might be slightly impaired while clusters rebalance
One elasticsearch server is offline - we've raised this with our hosting provider
Report: "Major outage affecting EU Data Center"
Last update[Click here to view postmortem](https://logit.io/blog/post/logit-io-platform-outage-incident-report)
This incident has been resolved.
We have recovered all of the major core services from backups, that had been lost in the EU data center fire. We will continue to monitor the platform for the coming hours to ensure stability, but we believe we have fully recovered all services, if you have any questions or need support please reach out to us
The ingestion API is now back online. If you are still having issue with the API please reach out to the support team. Our engineers are working to bring the alerting infrastructure back online. We will update in 1 hour.
All Kibana instances are now back online. If you are still having issue with Kibana please reach out to the support team. Our engineers are bringing the api and other core services back online now. We will update in 1 hour.
All Kibana instances are now back online. If you are still having issue with Kibana please reach out to the support team. Our engineers are continuing to work to restore other core services including the shared api. We will update in 1 hour.
All Kibana instances are now back online. If you are still having issue with Kibana please reach out to the team. Our engineers are continuing to work to restore other core services. We will update in 1 hour.
The majority of affected Kibana instances are now back online. Our engineers are continuing to work to restore other core services. We will update in 1 hour.
Our engineers are in the process of recreating all Kibana instances and are progressing well implementing our DR plan. Note: this does not impact Logstash and Elasticsearch logs ingestion which remain unaffected. We will update in 1 hour.
Our engineers are continuing to restore all services and are progressing well implementing our DR plan. Note: this does not impact Logstash and Elasticsearch logs ingestion which remain unaffected. We will update in 1 hour.
Our engineers have restored the platform dashboard https://dashboard.logit.io and other core services. We will provide another update in 2 hours
There has been a major fire at one of our data centers affecting some core services. Note this does not impact Logstash and Elasticsearch logs ingestion which remain unaffected. We have invoked our DR/BCP plan to migrate and restore the affected services to a different data center. We will update in 2 hours.
We are working with our hosting provider to restore access to services. We will update in 60 minutes
We are working with our hosting provider to restore access to services. We will update in 60 minutes
We are working with our hosting provider to restore access to services. We will update in 30 minutes.
We are continuing to work on a fix for this issue. We will update in 30 minutes
We are working with our hosting provider to restore access to services. We will update in 30 minutes.
We are currently investigating this issue.
Report: "Reports of delays on some Logstash Input nodes"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We have been notified of some delays on certain Logstash Input nodes and our engineers are actively investigating. We will update in 30 minutes.
Report: "Elastic Search Servers - Some clusters Yellow and Red"
Last updateThis incident has been resolved.
All servers are back online and nodes are rebalancing where required.
We are continuing to work on a fix for this issue and will update in 1 hour.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are working on restoring the affected Servers and will update in 30 minutes
Report: "Some Logstash hosts are offline"
Last updateThis incident has been resolved.
All hosts are online, we are continuing to monitor.
We are continuing work on recovering the affected instances. We will update in 30 minutes.
We have identified the issue and are working on recovering the affected instances. We will update in 30 minutes.
We are currently investigating this issue.
Report: "Elastic Search - Clusters Yellow and Red"
Last updateAll clusters have recovered and are green
All servers are operational, we are monitoring clusters health from Yellow to Green
We have connectivity back to a number of servers, we are working on ensuring all stacks on the effected hosts are operating as expected
We are working with our hosting provider to fix the networking, they are replacing the hardware. Further update in 1 hour
We are working with out hosting provider to restore Network connectivity to the effected hosts, we will provide an update in 30 minutes
We are investigating the root cause, we have a range of yellow and red clusters
Report: "Logit Cloud Logstash Incident - Failed healthcheck for some EU Region Logstash instances"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Our engineers are actively investigating this issue and we will update in 30 minutes.
Report: "Some Logstash Instances are offline"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
There are only a small amount of logstash instances still offline, we will update in 1 hour
There are only a small amount of logstash instances still offline, we will update in 1 hour
We are continuing to work on a fix for this issue.
We have an issue with the underlying hosting provider and a range of servers are offline, we are working with them to resolve
Report: "Small number of Stacks are reporting failed healthcheck for Logstash"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Intermittent Networking Issue"
Last updateThis incident has been resolved.
We are experiencing an intermittent networking issue and our engineers are actively investigating
Report: "Small number of stacks are reporting failed healthcheck for logstash"
Last updateThis incident has been resolved.
Logstash instances have been restarted and are reporting healthy
We are investigating the issue
Report: "Intermittent Network Issues"
Last updateAll services continue to report healthy. All networking is now restored and operatining as we would expect
We no longer have any alerts for our networking, we are still working with our hosting provider to monitor the situation
The issue is with the underlying networking with our hosting provider, we are working with them to resolve. We will provide an update in 30 minutes.
We are currently investigating this issue.
Report: "Alerts unavailable due to hosting provider issue"
Last updateThis incident has been resolved.
A fix has been applied and we are monitoring the effected instances
We are still working with the hosting provider to resolve the issue. Further update in 2 hours
We are still working with the hosting provider to resolve the issue. Further update in 1 hour
The alerting infrastructure will be unavailable for some customers due to an issue with our hosting provider. We will keep you updated but expect it to be resolved within the next hour
Report: "Partial Logstash ingestion"
Last updateAll instances are back to fully operational
A fix has been implemented and we are monitoring the result
We are currently investigating an issue with an underlying host for some logstash instances will post an update in the next hour
The issue has been identified and the underlying host is being rebooted
We are currently investigating an issue with an underlying host for some logstash instances will post an update in the next hour