Storj DCS

Is Storj DCS Down Right Now? Discover if there is an ongoing service outage.

Storj DCS is currently Operational

Last checked from Storj DCS's official status page

Historical record of incidents for Storj DCS

Report: "Elevated error levels on Storj US1"

Last update
investigating

Some requests to Storj US1 are failing with error 500. We are investigating the issue.

Report: "Performance Impact"

Last update
postmortem

On May 22, 2025 from 00:12 to 02:54 UTC Storj experienced a service disruption causing elevated error rates and increased request latency that affected the US1 API, Gateway, Linksharing, and US1 - Select services. We discovered lock contention triggered by high-load delete operations. In the short term, we are artificially slowing down some of these intensive requests to leave needed headroom. We will eliminate the lock contention going forward.

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are investigating reports of performance impacts to one or more components of our system.

Report: "Performance Impact"

Last update
postmortem

On May 22, 2025 from 00:12 to 02:54 UTC Storj experienced a service disruption causing elevated error rates and increased request latency that affected the US1 API, Gateway, Linksharing, and US1 - Select services. We discovered lock contention triggered by high-load delete operations. In the short term, we are artificially slowing down some of these intensive requests to leave needed headroom. We will eliminate the lock contention going forward.

resolved

This incident has been resolved.

investigating

We are investigating reports of performance impacts to one or more components of our system.

Report: "Performance Impact"

Last update
Postmortem
Resolved

This incident has been resolved.

Monitoring

A fix has been implemented and we are monitoring the results.

Update

We are continuing to investigate this issue.

Investigating

We are investigating reports of performance impacts to one or more components of our system.

Report: "Accidental Deletion of Paid User Accounts"

Last update
postmortem

## **Incident Summary** Between March 18 and April 10, 2025, Storj initiated a routine cleanup of long-frozen user accounts as part of ongoing system maintenance. Unfortunately, due to a series of procedural oversights, this process resulted in the accidental deletion of about a dozen paid user accounts on the EU1 satellite that had upgraded their frozen account during the deletion procedure. This post provides a high-level overview of what happened, how we responded, and what we’re doing to prevent similar issues in the future. ## **What Happened** * On March 18, a list of accounts was generated for deletion, targeting users whose accounts had been frozen \(due to lack of payment\) prior to January 1, 2025, and were still frozen as of March 18th, 2025. * On April 10, an internal command was run to prepare a set of frozen EU1 accounts for deletion and ultimately delete them. This internal command was not part of our usual SOP for account deletion, which is automated with robust QA and testing. * This command mistakenly included some active, paid users who had upgraded their frozen account during the two week period after the deletion lists were generated and did not include robust safeguards to catch this sort of error. The command directly set a new account status, regardless of its previous state. * The issue was discovered on April 24 after a user reported account access issues, triggering a broader investigation. ## **Impact** * 12 users total were identified as paid users who were marked for deletion while active. * 11 users were mistakenly deleted from the EU1 satellite  * No users on AP1 or US1 were affected. ## **Root Cause** The deletion logic did not account for users who had recently upgraded to a paid tier after the deletion lists were generated but before the deletion process was executed \(a two week period\). We mistakenly assumed that accounts which had been frozen \(i.e. data was inaccessible\) and delinquent on payment for three months or more, were abandoned and therefore would not be upgraded. Our processes did include one or more reviews on all steps, including the list generation and the development and execution of the account deletion steps. The reviewers of the list generation and tool generation and command execution were independent and were not aware of the broader context and sequence of steps being taken in this case. **Response and Remediation** * We immediately halted further deletions once the issue was identified. * Three independent investigations and cross-references were run to definitively identify all affected  accounts. * Impacted users are being contacted directly, and we are offering account restoration support and compensation where possible. * Internal tools are being updated to: * Prevent any future deletions of active paid users in the edge case that they upgrade their account while account cleanups are happening. The old internal tools are being removed and replaced with new versions that only allow explicit account status transitions from one state to another, rather than direct status update. * Add validation checks before executing bulk actions including end to end reviews of any future manual deletion processes. This will be in addition to our existing review, QA, and testing processes. * Require and include additional defensive safeguards on internal tools used for deletion including checks for other account upgrade information, including date filters beyond just status check.  These checks are additional defense-in-depth measures that will catch any future mistakes on top of the above measures. ## **Our Apology** We deeply regret the disruption this has caused our affected users. Ensuring data integrity and account reliability is our top priority, and this event fell short of our standards. We appreciate the community’s patience and feedback as we work to make things right and build stronger safeguards moving forward.   — The Storj Team

resolved

.

Report: "Elevated Error Rates"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are investigating reports of elevated error rates from our API services.

Report: "AP1 Linksharing Intermittent Connectivity Issues"

Last update
resolved

The incident has been resolved.

monitoring

A fix has been implemented, and we are monitoring the results.

identified

We have identified the issue and are implementing a fix.

investigating

We are investigating intermittent connectivity issues affecting the Linksharing service in the AP1 region.

Report: "US1 Elevated Error Rates"

Last update
resolved

We are investigating reports of elevated error rates from our US1 services.

Report: "Brief period of US Select Uploads interruption"

Last update
resolved

This incident has been resolved. The short-lived disruption in uploads was due to an entire zone losing connectivity for a brief period of time. US Select is designed with three zone redundancy and the service automatically excluded the affected zone. The brief upload failures were during the period during cutover. Reads were never impacted. We apologize to all who might have been affected by the disruption.

identified

The issue has been identified and automated fail over kept the disruption minimal.

investigating

Our monitoring detected a brief period of uploads failing for our US Select service only. Our engineering team is actively investigating.

Report: "Issue with the US metadata region"

Last update
resolved

The issue with the US metadata region has been resolved for all affected users. We thank you for your patience while we worked on resolving the issue.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are experiencing an issue with the US1 satellite (metadata region), which is intermittently inaccessible. Our engineering team is actively investigating. We apologize to all who are affected by the disruption.

Report: "EU1: Link Sharing service degraded performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently experiencing issues with the Link Sharing service in the Europe region. Customers may notice increased request times and error rates. Our engineering team is actively investigating. We apologize to all who are affected by the disruption.

Report: "High load causing application instability"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

We have scaled up to meet the load and are monitoring behavior.

Report: "AP1: Elevated errors and degraded performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "US1 Elevated Error Rates"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating reports of elevated error rates from our API services.

Report: "Issue with edge services connectivity affecting customers in Asia-Pacific"

Last update
resolved

This issue has been resolved for all affected users. We thank you for your patience while we worked on resolving the issue.

investigating

We are experiencing an issue with edge services connectivity affecting customers in Asia-Pacific. Our engineering team is actively investigating. We apologize to all who are affected by the disruption.

Report: "US1: Elevated Errors and Degraded Performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

Report: "US1: Elevated Errors and Degraded Performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating reports of performance impacts to one or more components of our system in the US1 metadata region.

Report: "US1: Elevated errors and performance Impact"

Last update
resolved

API pod started experiencing issues at 14:09 UTC. It was taken out of rotation at 14:25 UTC, resolving the incident.

Report: "US1: Elevated Errors and Degraded Performance"

Last update
resolved

This incident has been resolved.

investigating

The issue has been resolved.

investigating

We are investigating reports of performance impacts to one or more components of our system in the US1 metadata region.

Report: "EU1: Elevated Errors and Degraded Performance"

Last update
resolved

The previously identified issue has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating reports of performance impacts to one or more components of our system in the EU1 metadata region.

Report: "Intermittent listing failures"

Last update
resolved

Duration: December 19, 2024 - January 7, 2025 Customers experienced intermittent failures when attempting to list objects in storage buckets. Affected users received incomplete bucket listings, with some objects not appearing in their directory listings. This issue did not affect object storage or retrieval operations. As of January 7, 2025, full object listing functionality has been restored and validated across all regions. For customers who experienced this issue: We recommend performing a fresh listing of your buckets to confirm all objects are now visible. If you continue to observe any inconsistencies, please contact our support team. We apologize for any disruption this may have caused to your operations. We are implementing additional monitoring and failsafes to prevent similar incidents in the future.

Report: "Intermittent EU1 upload errors"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

Investigating increased error rates.

investigating

We are currently investigating this issue.

Report: "AP1 Satellite"

Last update
resolved

The incident has been resolved.

monitoring

We have implemented a fix and are monitoring the situation.

identified

Today, at approximately 5:50PM UTC, the AP1 satellite was affected by an issue wherein all API keys were rendered inoperational. We have identified the issue and are working on a fix. Thank you for your patience.

Report: "Performance Impact"

Last update
resolved

We conducted a minor configuration change in our infrastructure that disabled a specific required setting that impacted our Europe services. We identified the issue, restored the required setting, and the services continued to work normally

Report: "Elevated error rates on us-select-1 uploads"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating the issue.

Report: "Issue with edge services' outage affecting customers in Europe and North America"

Last update
resolved

After extended monitoring, we are marking this incident as resolved. We again apologize to all who are affected by the disruption and appreciate your patience and support. The team will conduct a post mortem and we share more details as we investigate fully.

monitoring

All services have been restored for nearly all customers. We are currently addressing isolated cases with individual customers to ensure full resolution. We continue to monitor.

monitoring

We are deploying an additional fix. All services are stabilizing, but the situation is ongoing so we continue to monitor and fix as we see issues.

monitoring

US1 API services are operational. We are still monitoring US1 linksharing and gateway elevated error rates. EU1 Linksharing and EU1 gateway are also now seeing elevated error rates.

monitoring

US1 API has been restored, we are monitoring linksharing and gateway but error rates are decreasing. EU1 and AP1 continue to function normally.

monitoring

US1 API is now experiencing errors affecting all US1 services (gateway, linksharing, uplink). EU1 and AP1 are unaffected.

monitoring

All services are operational. We are continuing to monitor.

identified

Linksharing is being brought online in the U.S. and error rates are dropping. E.U. will be restored next.

identified

Gateway services (S3 API) are still online and functioning normally. Linksharing is being brought back online.

identified

E.U. Gateway services (S3 API) are restored. The team is focusing on bringing back online linksharing next.

identified

U.S. Gateway (S3 API) services are restored. E.U. Gateway services are seeing decreased error rates and will be restored soon.

identified

We are continuing to bring gateway services (S3 API) back online, requests in U.S. are seeing error rates decrease. E.U. gateway services are coming back online.

identified

We've implemented a fix and gateway services (S3 API) is gradually coming back online. Linksharing is still experiencing an issue.

identified

The issue has been identified and a fix is being implemented. The team will send another update in 15 minutes.

investigating

We are experiencing an issue with edge services' outage affecting customers in the Europe and North America regions. Our engineering team is actively investigating. We apologize to all who are affected by the disruption.

Report: "Issue with edge services' outage in Europe"

Last update
resolved

The issue with edge services' outage in the Europe region has been resolved for all affected users. We thank you for your patience while we worked on resolving the issue.

monitoring

The emergency team has implemented a mitigation in which all users from the Europe region will be rerouted to North America. We are actively investigating bringing back Europe and apologize to all who are affected by the disruption.

investigating

We are experiencing an issue with edge services' outage in the Europe region. Our engineering team is actively investigating. We apologize to all who are affected by the disruption.

Report: "Intermittent US1 select upload issues"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

Report: "Intermittent US1 select upload issues"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating reports of elevated error rates from our API services.

Report: "Increased Edge Services' Latency in Europe"

Last update
resolved

The previously identified issue affecting performance has been resolved.

monitoring

Due to a sudden load spike in the Europe region, customers might be experiencing increased latency while they are load-balanced to other regions. The on-call engineering team is monitoring this incident. We apologize for any inconvenience caused.

Report: "Temporary outage on AP1"

Last update
resolved

Services on AP1 were unreachable for a few minutes. Our team is investigating the root cause.

Report: "Issue with edge services in Europe having intermittent connection timeouts and slow response times"

Last update
resolved

The issue with edge services in Europe having intermittent connection timeouts and slow response times has been resolved for all affected users. We thank you for your patience while we worked on resolving the issue.

investigating

We are experiencing an issue with European edge services capacity having intermittent connection timeouts and slow response times. Our engineering team is actively investigating. We apologize to all who are affected by the disruption.

Report: "Issue with us-select-1"

Last update
resolved

The issue with us-select-1 has been resolved. We thank you for your patience while we worked on resolving the issue.

monitoring

We have applied a fix for the previously mentioned connection issues and are continuing to monitor the system.

monitoring

We are applying a fix for the previously mentioned connection issues.

identified

We have identified the cause of an issue affecting performance in one or more components of the system and are working on a fix.

investigating

We are experiencing an issue with us-select-1 connectivity. Our engineering team is actively investigating. We apologize to all who are affected by the disruption.

Report: "AP1 Edge connectivity issues"

Last update
resolved

The issue with AP1 Edge services has been resolved for all affected users. We thank you for your patience while we worked on resolving the issue.

monitoring

We experienced connectivity issues to AP1 Edge (Gateway and Linksharing) between 11:33 and 11:48 UTC from US West and EMEA. We are monitoring the service and analyzing the root cause.

Report: "Issue with connectivity to edge services while using NTT as transport"

Last update
resolved

The issue with connectivity to edge services while using NTT as transport has been resolved for all affected users. We thank you for your patience while we worked on resolving the issue.

identified

Customers using NTT as their transit provider might be experiencing issues while connecting to edge services. We are working with our upstream providers to resolve the issue. We apologize to all who are affected by the disruption.

Report: "US1 Outage"

Last update
resolved

This incident has been resolved.

monitoring

Service is completely restored, we are continuing to monitor.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified, fixes are being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are investigating an outage of our API services.

Report: "Performance Impact"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are investigating reports of performance impacts to one or more components of our system.

Report: "Issue with edge services connectivity affecting customers in the Northeastern United States"

Last update
resolved

The issue with intermittent timeouts while trying to reach edge services (S3 Gateway, Link Sharing Service, and Auth Service) has been resolved for all affected users. We thank you for your patience while we worked on resolving the issue.

investigating

We are experiencing an issue with intermittent timeouts while trying to reach edge services (S3 Gateway, Link Sharing Service, and Auth Service), impacting customers in the Northeastern United States. Our engineering team is actively investigating. We apologize to all who are affected by the disruption.

Report: "US1 service degradation"

Last update
postmortem

# Overview On May 20, 2024, at approximately 14:56 UTC, the Storj US1 satellite experienced a service disruption due to an issue that occurred during routine database maintenance. The incident affected only the US1 satellite, while the AP1 and EU1 satellites remained fully operational throughout the event. # Root Cause During a routine, manual database maintenance procedure, a transaction was inadvertently left open while other work was being performed. Open database transactions can cause issues because they hold locks on the affected data, preventing other operations from accessing or modifying that data until the transactions are committed or rolled back. This open transaction eventually caused write operations to fail, and as the situation progressed, read operations were also impacted, leading to a general service disruption. # Impact The incident affected customers and applications relying on the Storj US1 satellite for storage and retrieval operations. Uploads were the first to be impacted, followed by downloads. The AP1 and EU1 satellites were not affected by this incident and continued to operate normally. # Timeline 14:55 UTC: Routine database maintenance began on the US1 satellite. 14:55 UTC: A transaction was inadvertently left open during the maintenance process. 14:56 UTC: Write operations began to fail due to the open transaction. 14:59 UTC: The on-call team received a page and started investigating the issue. 15:22 UTC: The open transaction was closed, and the team started the recovery process. 15:23 UTC: Operations were restored. # Remediation and Prevention To address the issue, the open transaction was identified and closed, allowing the satellite to recover and resume normal operation. To prevent similar incidents from occurring in the future, we will be implementing the following measures: ‌ 1. Reviewing and updating our database maintenance procedures and training to ensure that all transactions are properly closed before moving on to other tasks. 1. Implementing additional monitoring and alerting mechanisms to detect and notify the team of any open transactions that exceed a predetermined duration. 1. Conducting thorough post-mortem analysis to identify any other potential improvements to our processes and systems, including building tools that ease or eliminate the need for manual maintenance. ‌ We apologize for any impact this service disruption may have caused our customers and users. We are committed to learning from this incident and continuing to improve the reliability and resilience of our platform.

resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue

Report: "Storj Select instability"

Last update
resolved

Intermittent failure of some uploads to Storj Select. Primary issue resolved within 30 minutes.

Report: "AP1 services - degraded performance"

Last update
resolved

The issue identified at approximately 2024-04-18T12:18:00Z has been resolved.

Report: "AP1 services - degraded performance"

Last update
resolved

The issue identified at approximately 2024-04-18T04:16:00Z has been resolved.

Report: "US Edge Services intermittent failures"

Last update
resolved

The issue identified at approximately 1:52 UTC has been resolved.

Report: "AP1 services - degraded performance"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

Report: "Intermittent timeouts while connecting to edge services within proximity of Los Angeles Metropolitan Area"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating reports of performance impacts to one or more components of our system.

Report: "Investigating reports of slower uploads"

Last update
resolved

While we are continuing to monitor this issue, p90 and p95 upload metric values are returning to our expected ranges. We anticipate some upcoming changes will continue to positively affect these values in the coming weeks. For the interim, we have scaled backend resources accordingly to account for any spikes in load.

monitoring

Issues have been identified, and fixes have been scheduled. Updating the the state to monitoring until the changes can be applied.

monitoring

Issues have been identified, and fixes have been scheduled. Updating the the state to monitoring until the changes can be applied.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue

investigating

We are currently investigating the issue.

Report: "Increased satellite error rates especially with listing objects"

Last update
resolved

This incident has been resolved.

monitoring

The fix has been implemented and deployed and we are monitoring the results.

identified

We are continuing to work on this issue. Satellite operations have degraded performance.

identified

The issue has been identified and a fix is being deployed.

investigating

We are currently investigating the issue.

Report: "Small percentage of gateway requests with slow response rates"

Last update
resolved

This incident has been resolved.

monitoring

Performance has returned to normal, we are continuing to monitor for any issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating the issue

Report: "Connectivity Outages to Edge Services in Eastern North America Between 10:35 UTC – 10:48 UTC"

Last update
resolved

Customers in eastern part of North America might have experienced a series of intermittent connectivity issues where clients would indicate that no network route is available to edge services within the 10:35 UTC – 10:48 UTC window. We apologize for any inconvenience caused, and we thank you for your patience.

Report: "Connectivity Outages to Edge Services in Asia, North America and South America Between 20:00 UTC – 20:19 UTC"

Last update
resolved

Customers in Asia, North America and South America might have experienced a series of intermittent connectivity issues for up to a maximum of 1-minute within the 20:00 UTC – 20:19 UTC window. We apologize for any inconvenience caused, and thank you for your patience.

Report: "Increased Edge Services’ Latency in North America"

Last update
resolved

This incident has been resolved. We thank you for your patience while we worked on resolving the issue.

monitoring

Users in the North American region should no longer be experiencing increased latency.

identified

Due to a sudden load spike in the North American region, customers might be experiencing increased latency while they are load-balanced to other regions. The on-call engineering team is monitoring this incident closely and are scaling on-demand capacity. We apologize for any inconvenience caused.

Report: "Unavailability of stats.storjshare.io"

Last update
resolved

This incident has been resolved. We thank you for your patience while we worked on resolving the issue.

monitoring

The on-call team has finished applying the fix, and we are currently waiting for DNS to propagate.

identified

Due to an invalid DNS configuration applied to stats.storjshare.io, it has become temporarily unavailable. The on-call team has identified the issue and is currently applying a fix. We apologize for any inconvenience caused.

investigating

We are currently investigating this issue.