Sleuth

Is Sleuth Down Right Now? Check if there is a current outage ongoing.

Sleuth is currently Operational

Last checked from Sleuth's official status page

Historical record of incidents for Sleuth

Report: "Website down"

Last update
resolved

We experienced minor issues during the deploy of a new version, which caused our website and services to be unavailable for a period of 4 minutes.

Report: "Website down"

Last update
Resolved

We experienced minor issues during the deploy of a new version, which caused our website and services to be unavailable for a period of 4 minutes.

Report: "Service unavailability"

Last update
resolved

We were experiencing service unavailability for period of 7 minutes caused by network configuration issues.

Report: "Reduced performance across the board"

Last update
resolved

Why, oh why is 'ANALYZE VERBOSE' not automatically ran on a major Postgres upgrade :( (we are all green across the board, and even faster, if you can believe it)

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We identified a database issue related to the recent upgrade, and performance seems to be returning to normal. We will continue to monitor the situation

investigating

We are currently investigating this issue.

Report: "Delayed deploy and impact processing"

Last update
resolved

This incident has been resolved.

monitoring

The underlying infrastructure issue has been resolved and Sleuth is again fully operational. We're still actively monitoring the situation.

identified

The underlying AWS infrastructure problem was identified. To ensure data consistency we will delay processing of deploy and impact data until the issues are resolved.

investigating

We are currently experiencing a degradation of service due to infrastructure network issues. Please stand by as we investigate possible resolution.

Report: "Performance Degraded"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating degraded performance on the sleuth application.

Report: "Degraded Application Performance"

Last update
resolved

This incident has been resolved.

monitoring

The web application and deploy processing are no longer experiencing a performance degradation and we are actively monitoring them.

investigating

The sleuth application is still experiencing delays in deploy processing and slower than normal load times. We have mitigated one root cause, and are still investigating the continued performance issues.

investigating

We are currently investigating degraded performance of the sleuth website & deploy processing

Report: "Deploys incorrectly marked as rolled back"

Last update
resolved

We have corrected data for all impacted rollback deploys.

monitoring

We are working to correct the incorrectly marked rollback deploys.

identified

A bug in the system caused deploys from 2023-02-02 22:31 UTC and 2023-02-03 09:24 UTC to be processed and to be incorrectly marked as rolled back. The team is working on correcting the deploys incorrectly marked as rolled back.

Report: "Immediate Session Expiration: All Users"

Last update
resolved

This morning, a security attack vector was discovered by a paid independent researcher. There is no evidence that this attack vector has been exploited. This has been addressed by our team and the vector has been closed at this time. Out of an abundance of caution, we have logged out all Sleuth users. The only action required from you is logging back in the next time you access Sleuth. Thank you for your understanding and continued trust. Please reach out to us if you have any questions regarding this matter.

Report: "We're experiencing a delay in actions processing"

Last update
resolved

Action execution has been running normally and has completely stabilized. The incident has been resolved.

monitoring

We have stabilized the execution of affected actions. We are continuing to monitor the performance, but you should be seeing normal behavior with action execution and Slack messages.

identified

The issue has been identified that is causing a slowdown in the following areas: * Sleuth actions evaluations * Slack message delivery * PR locking You will experience delayed Sleuth actions evaluations, Slack message delivery, and PR locking. All actions are still being registered and will be executed at a later time. No deploy data is being lost.

Report: "Impact collection is delayed"

Last update
resolved

Impact collection has been running normally and has completely stabilized. The incident has been resolved.

monitoring

We have stabilized impact collection and it's running normally now.

investigating

We are currently investigating an issue causing us to collect impact at a delayed rate.

Report: "We're experiencing a delay in detecting deploys from CI/CD integrations"

Last update
resolved

Deploy detection via CI/CD integrations is now operating normally. We've implemented a work around that allows us to mitigate this kind of issue moving forward and at the same time the provider has completed their maintenance.

identified

We've identified an issue with processing deploys from CI/CD integrations. One of our supported CI/CD integrations is taking maintenance and this has revealed an issue with how we handle this situation. You will experience delayed deploy detection through CI/CD providers while we mitigate the issue. Webhook deploy processing is functioning normally but is also slightly delayed.

Report: "We're seeing site-wide slowdowns, we're investigating an increase in DB operations"

Last update
resolved

This incident has been resolved.

monitoring

We've identified the problem and remediated it. We had a few very long running queries get stuck in our database which had negative follow on effects. We've killed the offending queries and removed the code that trigged them. We'll be following up with a change to stop this kind of vector in the near future.

investigating

We are seeing slow downs related to increased queries against our database. We are investigating the cause.

Report: "Impact tracking has been suspended for a short time"

Last update
resolved

We have reenabled impact collection for all.

identified

We are seeing some issues related to collecting impact. We've temporarily suspended impact collection. We will reenable within a few hours.

Report: "We are seeing some issues with our Redis instance"

Last update
resolved

We're back to fully operational.

monitoring

We identified the issue. Our background tasks were creating keys that weren't being cleaned up and eventually chewed up most of our storage. We've cleared those keys and are putting in place a way to stop this from happing moving forward. The service is now restored to normal and we are monitoring.

investigating

Users may see some sporadic errors and impact processing may be delayed. We are investigating the issue.

Report: "Website down"

Last update
resolved

Website unresponsive, investigating

Report: "We're seeing a slowdown on all provided services"

Last update
resolved

We identified the issue and the site is back to fully functional. Our primary DB was running low on disk IOPS credits. We've increased our RDS instance size and storage size which has significantly increased our available IOPS.

identified

We've identified an issue with our DB such that we are seeing slower performance than usual. The site is still operational but is running at a reduced capacity.

Report: "We are experiencing downtime related to a bad migration"

Last update
resolved

We've successfully re-run an updated version of the migration and all systems are back to normal.

monitoring

We have cleared the problem. A migration locked our main deploys table and killing the initiating process did not clear the lock. We've cleared the lock and the site has resumed it's normal functioning. We're monitoring to make sure everything is completely back to normal.

investigating

We are continuing to investigate this issue.

investigating

We're investigating the cause now

Report: "Site unavailable due to a bad deploy"

Last update
resolved

We've fully resolved the issue.

monitoring

We are continuing to monitor for any further issues.

monitoring

We have rolled back the bad change and the site is available again. We are monitoring and will update this incident as we learn more.

investigating

We are investigating site issues that seem due to a bad code deploy. We will update as soon as we have more details.

Report: "Impact collection is delayed"

Last update
resolved

This incident has been resolved.

monitoring

We have rolled out a fix and impact collection has returned to normal. We're just monitoring a bit before we resolve this incident.

identified

We've identified an issue with our impact collection being delayed for new deploys. We're working on a fix and will update this incident as we progress.

Report: "We're having trouble with our background jobs"

Last update
resolved

The behavior of the application has returned to normal. The issue was a bad deploy where we changed the threading model of our background jobs. Some of the libraries we depend on were not supported in this new model. We have reverted to the old model for now.

identified

We are having trouble with out background processing. We're working on a fix.

Report: "We are seeing an issue collecting data from sources that are authenticated via an API key"

Last update
resolved

We've restored the integrations and all things are working again. If you see any issues please contact support.

identified

We have identified the issue and are working to resolve normal operations. Integrations that are authenticated via API key are affected. This includes Jira, Sentry, Rollbar, Honeybadger, Datadog, CircleCI. We will be able to restore full service once we've worked through the root cause.

Report: "Performing a major server upgrading"

Last update
resolved

Service is back to normal.

identified

We've run into issues with the upgrade and are in the process of rolling back.

identified

We're currently performing a major server upgrade. The service will be unavailable for about 15 minutes.