Scale AI

Is Scale AI Down Right Now? Check if there is a current outage ongoing.

Scale AI is currently Operational

Last checked from Scale AI's official status page

Historical record of incidents for Scale AI

Report: "Degraded performance"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Site Outage"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating this issue.

Report: "Nucleus Degraded Performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Nucleus and Data Engine is currently experiencing degraded performance. Pages and queries will take longer to respond than usual.

Report: "Cloudflare Outage"

Last update
resolved

This incident has been resolved.

investigating

Confirmed that the issue has been resolved.

investigating

Cloudflare has implemented a fix and seems to be resolving any certificate issues.

investigating

We are currently investigating this issue.

Report: "Donovan Web Application Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an application issue. We will post updates shortly.

Report: "Site Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an issue, some operations are experiencing high latency or are failing.

Report: "Elevated Error Rate"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A temporary fix has been implemented and we are monitoring the status of the platform

identified

We are continuing to work on mitigating the issue. Our website is experiencing a technical issue with the hosting provider. Current ETA for resolution is ~2 hours, or around 6:30 PT.

identified

We are seeing elevated error rates with our cloud provider. We are currently investigating the issue.

investigating

We are currently investigating this issue.

Report: "Spellbook Application Error"

Last update
resolved

The Spellbook application is fully back up and operational.

identified

The fix for the deployment is in progress now.

identified

We have identified a platform issue impacting all Spellbook applications. Applications will not appear in the UI or be available over the API. We are deploying a fix for this issue.

Report: "Catalog Forge Platform Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "ML Pod Startup Failures"

Last update
resolved

There was an outage with a service that manages the metadata for models from 7:06am PT to 7:50am PT. It prevented model services from starting up new pods. ML services that received security updates, deployed a new version, or attempted to scale up were impacted. There was an additional 8 to 12 minutes of service degradation after restoring the impacted ML services to catch up on missed requests.

Report: "Site Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Elevated latencies"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Elevated API Error Rates"

Last update
resolved

This incident has been resolved.

investigating

We're experiencing an elevated level of API errors and are currently looking into the issue.

Report: "Catalog Forge slow or unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Degraded Site Performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

Many users may still be experiencing slow application loading times and less responsiveness. We have identified that our database has been experiencing internal issues and are continuing to work to resolve them. We have made efforts to mitigate the impact caused and have tuned our database to optimize for the current situation.

investigating

We are currently investigating this issue.

Report: "ML Inference Failures"

Last update
resolved

Networking issue with ML infra.

Report: "Platform slow or unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

investigating

We are seeing elevated error rates that may indicate a slow or unresponsive platform. We are currently investigating the issue.

Report: "Platform slow or unresponsive"

Last update
resolved

Incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Degraded performance"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an issue with our asynchronous jobs.

Report: "Site Outage"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating the issue.

Report: "Site Outage"

Last update
resolved

This incident has been resolved.

monitoring

We are starting to see signs of recovery and are monitoring for issues.

investigating

We are currently investigating an issue, jobs may be affected across all build types.

Report: "Error processing signed S3 attachment URLs"

Last update
resolved

A fix has been implemented and we are monitoring the results.

Report: "ML Infrastructure Latency Issues"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Site incident"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Platform slow or unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

Scale AI is now operational

investigating

We are seeing elevated error rates that may indicate a slow or unresponsive platform. We are currently investigating the issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Increased site latency & error rate"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are actively working with our underlying service providers to resolve the issues impacting Scale's service.

investigating

We are seeing elevated error rates that may indicate a slow or unresponsive platform. We are currently investigating the issue and its potential impact.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We are continuing to work with our underlying service providers to resolve the issues impacting Scale's service.

identified

We are actively working with our underlying service providers to resolve the issues impacting Scale's service.

investigating

We are continuing to investigate the issue.

investigating

We are seeing elevated error rates that may indicate a slow or unresponsive platform. We are currently investigating the issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing elevated error rates that may indicate a slow or unresponsive platform. We are currently investigating the issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

Site is unstable again and we're trying to implement a fix

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Platform availability issues"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Platform availability issues"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating this issue.

Report: "Platform Outage"

Last update
resolved

We have identified an issue related to database index performance. We have completed deploying a fix for this issue. The platform is up and running with non-elevated error rates.

investigating

The platform is experiencing a degraded service level and we are continuing to investigate. API and Website error rates are elevated.

monitoring

The platform has recovered. We are actively monitoring and root causing the issue.

investigating

We are currently experiencing a platform outage. All services are impacted, including the Scale Dashboard, API, and associated services. We are currently reviewing this issue.

Report: "Elevated error rates & increased latency"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Elevated error rates"

Last update
resolved

This incident has been resolved.

identified

We are experiencing elevated error rates across the platform

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Nucleus Database Offline"

Last update
resolved

This incident has been resolved.

identified

We are continuing to work on a fix for this issue.

identified

The upgrade appears stuck. We're restoring the database.

Report: "Intermittent site errors"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

Report: "Site down"

Last update
resolved

This incident has been resolved at 4:56PM PST

investigating

We are investigating the issue.

investigating

Site is not accessible around 3:55PM PST

investigating

We are currently investigating this issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue. Instability starts around 7:15PM PST

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

We've identified the cause and error rate has returned to normal around 9:40PM. We'll keep monitoring the issue.

investigating

We are seeing significantly higher latencies around 9PM and are investigating.

Report: "Platform Slow or Unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.