Logit.io

Is Logit.io Down Right Now? Check if there is a current outage ongoing.

Logit.io is currently Operational

Last checked from Logit.io's official status page

Historical record of incidents for Logit.io

Report: "Some Stacks not appearing on the Dashboard"

Last update
resolved

Some customers are reporting that stacks do not appear on their dashboard. This is not related to the upcoming maintenance, we are investigating. We rolled back the latest build which resolved the issue

Report: "[Logs] [Europe] Filter server offline"

Last update
resolved

The server is back online. All logstash filter containers have started up again.

monitoring

We have started additional filter processes and almost all stacks should have processed their backlogs The provider has reported a power issue and are expecting to provide their next update in 3 hours

monitoring

Our alerts have detected one of our filter servers has gone offline. No data should be lost, processing of incoming logs may be interrupted for some stacks.

Report: "New Stacks and DNS Changes are failing due a cloudflare outage"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

Report: "Cloudflare Pages and Load Balancers are offline"

Last update
resolved

This incident has been resolved.

monitoring

It appears that cloudflare has resolved the issue for our Marketing site and Dashboard Authentication

identified

Dashboard is operational for existing logged in users. Our authentication provider has the same issue, and new users will not be able to Login

identified

Our upstream provider cloudflare has an outage, we will continue to monitor. https://www.cloudflarestatus.com/incidents/l6x2h1zp69bc Data ingestion is still operational

investigating

We are currently investigating this issue.

Report: "Issues connecting to our dashboard"

Last update
resolved

Our provider has resolved the issue

monitoring

We have applied some mitigation and you should be able to access our dashboard, we will continue to monitor our upstream provider

investigating

We are having issues with some load balancers due to our provider. https://www.cloudflarestatus.com/incidents/cx141zv0rfd8

investigating

We are currently investigating this issue.

Report: "UK - Opensearch underlying hosts are offline"

Last update
resolved

All stacks are operating as we expect and processing messages

monitoring

Stacks are now processing queued messages

monitoring

Servers are back online, we are just monitoring the recovery

identified

We should have the Rack back in less than 1 hour

identified

A rack at our data centre is offline effective a number of hosts, we are working with our provider to restore access. Update in 1 hour

investigating

We are currently investigating this issue.

Report: "Network Outage to Opensearch Dashboards and Kibana"

Last update
resolved

We had a number of alerts for Opensearch Dashboard and Kibana instances that could not connect to the public network with our provider. We have confirmed that connections have been re-established and are working with our provider to understand the root cause of their outage. We had connectivity issues between 22:57 to 23:22 UTC

Report: "A number of US Kibana and Grafana Hosts are offline"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Our engineers are continuing to investigate this issue and will update in 1 hour.

investigating

Our engineers are continuing to investigate this issue and will update in 1 hour.

investigating

Our engineers are continuing to investigate this issue and will update in 1 hour.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Our engineers are continuing to investigate this issue and will update in 1 hour.

investigating

Our engineers are currently investigating this issue and will provide an update in 30 minutes.

Report: "Logit.io Response to Apache Log4j 2 CVE-2021-44228"

Last update
resolved

All stacks have been patched and are operational. We have removed the org/apache/logging/log4j/core/lookup/JndiLookup.class file from all installations of log4j

identified

All Elasticsearch clusters in the UK and US regions have been patched and updated. We continue in parallel to update the EU region

identified

We can confirm we have applied the patches to all Logstash Inputs for Stacks in the EU Regions.

identified

Update: We can confirm that we have now also applied the patches to Logstash for Filter nodes in the EU region.

identified

Update: We can confirm that we have now also applied the patches to Logstash for Filter nodes in the EU region.

identified

Update: Our engineers are in the process of rolling out a patched version of Logstash for both Input and Filter nodes for all active versions of ELK. We can confirm we have applied the patches to all Stacks in the UK and US Regions, and we are awaiting the completion of the EU Region.

identified

Update: Our engineers are in the process of rolling out a patched version of Logstash for both Input and Filter nodes for all active versions of ELK. We can confirm we have applied the patches to all Stacks in the UK and US Regions, and we are awaiting the completion of the EU Region.

identified

Follow updates here on how our teams are currently responding with the highest priority to vulnerability CVE-2021-44228 that is impacting multiple versions of the Apache Log4j 2 which was disclosed publicly via Github on December 9th 2021. Logit.io engineers and security incident teams continue to actively analyse, identify and where necessary patch all affected log4j versions across all Logit.io data centre instances. Logit.io are initially updating and following the current remediation advice as a first priority as detailed here https://github.com/advisories/GHSA-jfh8-c2jp-5v3q "In previous releases (>=2.10) this behaviour can be mitigated by setting system property "log4j2.formatMsgNoLookups" to “true” or by removing the JndiLookup class from the classpath (example: zip -q -d log4j-core-*.jar org/apache/logging/log4j/core/lookup/JndiLookup.class)" This remediation will be applied to all versions of Logit.io instances and we will post further updates here as this is rolled out. In addition the Logit.io security team has analysed its logs and updated its monitoring of all internal services to include active alerting of any attempts to issue remote code execution via the JndiLookup class. The information leak in Log4j does not permit access to data within an Elasticsearch cluster but allows an attacker to extract certain environmental data via DNS. The data leaked is limited to that available via Log4j lookups. If you have any questions or concerns, please reach out to us via the normal support channels or email support@logit.io. You can also choose to subscribe to updates at the top of this page to receive all future notifications.

Report: "Elasticsearch server offline"

Last update
resolved

The server is back online - all stacks are showing green again

identified

One of our elasticsearch servers is offline. The issue has been raised with the provider and we will begin migrating the affected stacks onto other servers shortly.

Report: "Dashboard.logit.io unavailable for some customers"

Last update
postmortem

In the morning of October 13th at 07:30 am UTC, [Logit.io](http://Logit.io) on-call engineers were automatically alerted to a new issue affecting the stability of the platform. Our engineers responded quickly to identify the root cause of the problem and by 07:50 UTC had confirmed with our incident response team that the issue was relating to an intervention on the underlying infrastructure provider, which led to disturbances on the entire network. These interventions were aimed at reinforcing anti-DDoS protections. Our teams worked closely with the underlying infrastructure providers teams who then isolated the equipment at 08:15 am UTC, restoring the normal service. Our engineers then performed a series of health checks across the infrastructure and individual stacks to confirm the incident was resolved. We sincerely apologise to all of our customers affected by this incident and we commit to be as transparent as possible about the causes and consequences in relation to the incident.

resolved

This incident has been resolved.

monitoring

Our engineers have reported that all services are available. Our engineers are now performing healthchecks across all Stacks.

identified

Our engineers have reported that we now have connectivity and are continuing to check that all services are fully operational

identified

The issue persists with the underlying networking with our hosting provider, our engineers are working with them to resolve. We will provide an update in 30 minutes.

monitoring

Our engineers have reported that all services are available. Our engineers are now performing healthchecks across all Stacks.

identified

The issue is with the underlying networking with our hosting provider, our engineers are working with them to resolve. We will provide an update in 30 minutes.

identified

We have a network outage with our provider, we are working with them to fix their networking issue. We will update in 30 minutes

investigating

We have been notified that dashboard.logit.io is unavailable for some customers and our engineers are actively investigating. We will update in 30 minutes.

Report: "Server offline due to provider maintenance"

Last update
resolved

This incident has been resolved.

monitoring

Servers are all back online - all stacks should be returning to green

monitoring

We have a server offline due to provider maintenance

Report: "A number of UK Elastic search nodes are offline"

Last update
resolved

The components have been replaced

identified

There is an underlying networking issue with some of the servers, this has been identified and the faulty components are being replaced

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Reports of issues connecting to kibana instances"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Elasticsearch server offline"

Last update
resolved

The server is back online and all stacks are reporting healthy

identified

One server is offline, Performance might be slightly impaired while clusters rebalance

investigating

One elasticsearch server is offline - we've raised this with our hosting provider

Report: "Major outage affecting EU Data Center"

Last update
postmortem

[Click here to view postmortem](https://logit.io/blog/post/logit-io-platform-outage-incident-report)

resolved

This incident has been resolved.

monitoring

We have recovered all of the major core services from backups, that had been lost in the EU data center fire. We will continue to monitor the platform for the coming hours to ensure stability, but we believe we have fully recovered all services, if you have any questions or need support please reach out to us

identified

The ingestion API is now back online. If you are still having issue with the API please reach out to the support team. Our engineers are working to bring the alerting infrastructure back online. We will update in 1 hour.

identified

All Kibana instances are now back online. If you are still having issue with Kibana please reach out to the support team. Our engineers are bringing the api and other core services back online now. We will update in 1 hour.

identified

All Kibana instances are now back online. If you are still having issue with Kibana please reach out to the support team. Our engineers are continuing to work to restore other core services including the shared api. We will update in 1 hour.

identified

All Kibana instances are now back online. If you are still having issue with Kibana please reach out to the team. Our engineers are continuing to work to restore other core services. We will update in 1 hour.

identified

The majority of affected Kibana instances are now back online. Our engineers are continuing to work to restore other core services. We will update in 1 hour.

identified

Our engineers are in the process of recreating all Kibana instances and are progressing well implementing our DR plan. Note: this does not impact Logstash and Elasticsearch logs ingestion which remain unaffected. We will update in 1 hour.

identified

Our engineers are continuing to restore all services and are progressing well implementing our DR plan. Note: this does not impact Logstash and Elasticsearch logs ingestion which remain unaffected. We will update in 1 hour.

identified

Our engineers have restored the platform dashboard https://dashboard.logit.io and other core services. We will provide another update in 2 hours

identified

There has been a major fire at one of our data centers affecting some core services. Note this does not impact Logstash and Elasticsearch logs ingestion which remain unaffected. We have invoked our DR/BCP plan to migrate and restore the affected services to a different data center. We will update in 2 hours.

identified

We are working with our hosting provider to restore access to services. We will update in 60 minutes

identified

We are working with our hosting provider to restore access to services. We will update in 60 minutes

identified

We are working with our hosting provider to restore access to services. We will update in 30 minutes.

identified

We are continuing to work on a fix for this issue. We will update in 30 minutes

identified

We are working with our hosting provider to restore access to services. We will update in 30 minutes.

investigating

We are currently investigating this issue.

Report: "Reports of delays on some Logstash Input nodes"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We have been notified of some delays on certain Logstash Input nodes and our engineers are actively investigating. We will update in 30 minutes.

Report: "Elastic Search Servers - Some clusters Yellow and Red"

Last update
resolved

This incident has been resolved.

monitoring

All servers are back online and nodes are rebalancing where required.

identified

We are continuing to work on a fix for this issue and will update in 1 hour.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are working on restoring the affected Servers and will update in 30 minutes

Report: "Some Logstash hosts are offline"

Last update
resolved

This incident has been resolved.

monitoring

All hosts are online, we are continuing to monitor.

identified

We are continuing work on recovering the affected instances. We will update in 30 minutes.

identified

We have identified the issue and are working on recovering the affected instances. We will update in 30 minutes.

investigating

We are currently investigating this issue.

Report: "Elastic Search - Clusters Yellow and Red"

Last update
resolved

All clusters have recovered and are green

monitoring

All servers are operational, we are monitoring clusters health from Yellow to Green

identified

We have connectivity back to a number of servers, we are working on ensuring all stacks on the effected hosts are operating as expected

identified

We are working with our hosting provider to fix the networking, they are replacing the hardware. Further update in 1 hour

identified

We are working with out hosting provider to restore Network connectivity to the effected hosts, we will provide an update in 30 minutes

investigating

We are investigating the root cause, we have a range of yellow and red clusters

Report: "Logit Cloud Logstash Incident - Failed healthcheck for some EU Region Logstash instances"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Our engineers are actively investigating this issue and we will update in 30 minutes.

Report: "Some Logstash Instances are offline"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

There are only a small amount of logstash instances still offline, we will update in 1 hour

identified

There are only a small amount of logstash instances still offline, we will update in 1 hour

identified

We are continuing to work on a fix for this issue.

identified

We have an issue with the underlying hosting provider and a range of servers are offline, we are working with them to resolve

Report: "Small number of Stacks are reporting failed healthcheck for Logstash"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Intermittent Networking Issue"

Last update
resolved

This incident has been resolved.

investigating

We are experiencing an intermittent networking issue and our engineers are actively investigating

Report: "Small number of stacks are reporting failed healthcheck for logstash"

Last update
resolved

This incident has been resolved.

monitoring

Logstash instances have been restarted and are reporting healthy

investigating

We are investigating the issue

Report: "Intermittent Network Issues"

Last update
resolved

All services continue to report healthy. All networking is now restored and operatining as we would expect

monitoring

We no longer have any alerts for our networking, we are still working with our hosting provider to monitor the situation

identified

The issue is with the underlying networking with our hosting provider, we are working with them to resolve. We will provide an update in 30 minutes.

investigating

We are currently investigating this issue.

Report: "Alerts unavailable due to hosting provider issue"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been applied and we are monitoring the effected instances

identified

We are still working with the hosting provider to resolve the issue. Further update in 2 hours

identified

We are still working with the hosting provider to resolve the issue. Further update in 1 hour

identified

The alerting infrastructure will be unavailable for some customers due to an issue with our hosting provider. We will keep you updated but expect it to be resolved within the next hour

Report: "Partial Logstash ingestion"

Last update
resolved

All instances are back to fully operational

monitoring

A fix has been implemented and we are monitoring the result

identified

We are currently investigating an issue with an underlying host for some logstash instances will post an update in the next hour

identified

The issue has been identified and the underlying host is being rebooted

investigating

We are currently investigating an issue with an underlying host for some logstash instances will post an update in the next hour