DataRobot

Is DataRobot Down Right Now? Check if there is a current outage ongoing.

DataRobot is currently Operational

Last checked from DataRobot's official status page

Historical record of incidents for DataRobot

Report: "DataRobot Experiencing Stability Issues"

Last update
identified

External cloud providers are experiencing outages which is impacting the DataRobot platform and services.

Report: "Managed EU AI Cloud Degraded Performance"

Last update
resolved

Managed EU AI Cloud had a degraded performance during 7:00 - 7:30 AM UTC. Some users could experience intermittent connection interruptions. Please reach out to support@datarobot.com if you have any questions.

Report: "Deployment drift and accuracy charts are broken"

Last update
resolved

Issue is now resolved and the drift charts are working as expected

identified

Team is working on implementing the fix

investigating

Root cause is identified and team is working on the fix

Report: "Deployment drift and accuracy charts are broken"

Last update
Identified

Team is working on implementing the fix

Investigating

Root cause is identified and team is working on the fix

Report: "Codespaces are not starting"

Last update
resolved

The Engineering team has applied the fix and the issue has been resolved.

identified

Codespaces are not starting on EU MTS because of internal connection issues. The Engineering team is working on the fix.

Report: "Codespaces are not starting"

Last update
Resolved

The Engineering team has applied the fix and the issue has been resolved.

Identified

Codespaces are not starting on EU MTS because of internal connection issues. The Engineering team is working on the fix.

Report: "Notebooks sessions getting terminated."

Last update
resolved

Engineering has successfully applied the fix, and all notebooks are now running as expected across all MTS environments.

monitoring

Engineering has found the potential root cause and applied a fix. Engineering is now monitoring impacted services.

investigating

Subset of running notebooks sessions getting terminated in all environments (US/EU/JP). Multiple customers are impacted. Engineering team investigating the root cause.

Report: "Creation of new trial users is occasionally failing"

Last update
resolved

The Engineering team has applied the fix to US and EU MTS and the Trial user creation has been restored across all MTS environments.

identified

The Engineering team has applied the fix to Japan MTS and the Trial user creation has been restored in that cluster. The team is now proceeding to apply the fix to US and EU MTS.

identified

The Engineering team has identified the root cause and is woking on a fix.

Report: "Users with language settings other than English in profile unable to launch or access AI apps on MTS"

Last update
resolved

A fix has been applied to all production environments, no-code applications are operational in all languages.

identified

ETA for the deployment of the hotfix to the Japan (JP) environment is approximately 4 hours.

identified

We are continuing to work on a fix for this issue.

identified

Engineering has identified the root cause of the issue and is currently preparing a fix.

investigating

Users with language settings other than English in profile unable to launch or access AI apps on MTS as the translation files are not loading correctly. As a workaround users can switch to English. We are actively investigating the issue and working on a fix.

Report: "DataRobot STS degraded performance"

Last update
resolved

The Engineering team has identified the issue and is currently applying mitigation steps. If you continue to experience any issues, please contact DataRobot Support.

investigating

STS customers might observe degraded performance caused by Regular Maintenance performed by DataRobot Engineering. Engineering team is working on the mitigation.

Report: "AWS outage in ap-northeast-1 region"

Last update
resolved

AWS had an outage in ap-northeast-1 region that has since been resolved. Outage times: 7.47am GMT to 9:30am GMT Engineering is working to restore any affected customers on STS.

Report: "Code Assistant functionality in Notebooks is unstable on US MTS"

Last update
resolved

This incident has been resolved.

identified

Code Assistant API requests are currently taking longer than usual for US Production customers. This is a result of a partial outage on Azure. The Engineering team is monitoring the situation and will provide updates when functionality is restored.

Report: "Issues during VDB creation in Multi-Tenant SaaS environments"

Last update
resolved

The issue has been successfully mitigated in the MTS production environments. The incident is now contained, and VDB creation functionality has been fully restored.

investigating

Engineering has observed that VDB creation began failing across all MTS environments following today's production deployment. The team is currently investigating the root cause and actively working on a fix.

Report: "DataRobot SSO Issues"

Last update
resolved

This incident has been resolved.

identified

We are continuing to work on a fix for this issue.

identified

SSO login has been broken, which may affect users in an attempt to log in. The Engineering team has prepared the fix. The approximate ETA for the hotfix release is 8 hours across all MTS Environments.

Report: "Issues while accessing Custom Applications on US Prod"

Last update
resolved

The affected users can access the Custom Applications now. The incident has been mitigated.

investigating

Some users on US SaaS are unable to access their Custom Applications. Engineering team is currently investigating, we will keep you updated.

Report: "The Trial user provisioning is not working."

Last update
resolved

The Trial user provisioning is back on and working as expected in the DataRobot US production environment. The issue is resolved.

investigating

The Trial user provisioning is not working on the DataRobot US production environment. The engineering team is currently investigating.

Report: "Issues with creating Custom Models from LLM Playground"

Last update
resolved

This incident has been resolved.

monitoring

The issue is mitigated and users are able to create custom models again. The engineering team will continue to monitor the environment and prepare a permanent fix until the incident is contained. The estimate is ~ 2 hrs at the moment.

investigating

Japan MTS cluster is experiencing issues with creating custom models from LLM Playground. The engineering team is investigating.

Report: "Delay updating the Deployment Monitoring Information."

Last update
resolved

This incident has been contained.

monitoring

The unprocessed message backlog continues to catch up. The engineering team is closely monitoring the process. We will provide an update once the processing of delayed messages is caught up.

monitoring

Our team has identified the root cause and implemented the fix. Service Health and Accuracy no longer have a delay and are operating normally. The delay in Data Drift monitoring is improving, however the Engineering team expects it will take several hours to fully recover as the system processes through accumulated data. The team can confirm there has been no data loss during this time. The team is currently monitoring the situation.

identified

Our team has identified an issue with our Deployment Monitoring Information. This is a process delay and no data loss is expected. Our team is currently investigating the root cause and is working on a fix. The following services are currently impacted Service Health, Data Drift, and Accuracy monitoring.

Report: "The custom workload is failing to start"

Last update
resolved

Engineering has identified the root cause and fixed the issue. The Custom Workloads are starting normally on the US MTSaaS environment, and the incident is resolved.

investigating

The Custom Workload is failing to start in the US MTSaaS environment. Existing running Custom Apps aren't affected. The Engineering team is investigating the root cause.

Report: "Notebooks are not starting in the US MTS environment"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently experiencing an issue with Notebook services which is preventing Notebooks from starting in the US MTS environment. Our engineering team is actively working on resolving the problem.

Report: "A Network Policy service issue is causing an interruption"

Last update
resolved

The problem related to Network Policy have been resolved. All services are operational.

monitoring

Engineering has identified the problem and mitigation has been applied. Engineering is currently monitoring the progress.

investigating

A Network Policy service issue is causing an interruption in functionality for some components on app.datarobot.com. Engineering is actively working on a resolution.

Report: "Custom model services issue"

Last update
resolved

The engineering has resolved the issue. This incident is now contained.

identified

We are continuing to work on a fix for this issue.

identified

Engineering has identified the root cause and is currently working on fixing the issue.

investigating

We are observing an issue with the Custom model services on the US production environment causing Custom Models, Custom Jobs and Custom Apps to stop working. Engineering is currently investigating the issue.

Report: "Notification policies, health checks, and scheduled custom jobs are currently not updating"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Due to an ongoing incident, notification policies, health checks, and scheduled custom jobs are currently not updating. Deployments themselves remain unaffected. The DataRobot team is working on resolution.

Report: "Search engine for documentation is not working for https://docs.datarobot.com/"

Last update
resolved

The engineering has implemented the fix and the search engine for documentation https://docs.datarobot.com/ is working as expected. This IR is now contained.

identified

The DataRobot engineering team has identified the issue and they are working towards permanently fixing the issue.

investigating

Search engine for documentation is not working for https://docs.datarobot.com/, but the documentation is still available. DataRobot team is working on resolving the issue, we apologise for inconvenience caused.

Report: "Issue with Custom Models and Custom Applications."

Last update
resolved

Custom Models and Custom Applications are back to Operational state. No further updates are expected.

monitoring

Our team has implemented a fix and the next update will be shared in an hour.

identified

Our team has noticed some issues with custom models and custom applications after the recent production deployment on the US MTSaas. Engineering team is working on mitigation.

Report: "Sending LLM blueprint to Model Workshop doesn't work for LLM Blueprints without a VDB"

Last update
resolved

This incident has been resolved.

identified

Starting the morning of September 16, 10 AM UTC, sending an LLM blueprint to the Model Workshop will fail if the Vector Database is not specified. The fix is ready, and the engineering team is currently verifying it.

Report: "Connection Issues with DataRobot Notebooks"

Last update
resolved

The engineering has implemented a fix and the notebooks on the DataRobot US, EU and JP clouds are functioning as expected. The incident is marked as contained.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently experiencing issues with the Notebooks. The users on the DataRobot US, EU, and JP clouds are experiencing issues with new and existing Notebooks. The engineering is working on identifying the root cause.

Report: "Predictions for custom models and custom apps are unavailable for users on the US cloud"

Last update
resolved

Custom Models and Custom Applications are back to Operational state. No further updates are expected.

investigating

Predictions for custom models and custom apps are unavailable for users on the US cloud. The Engineering team is investigating the issue. The next update will be in 30 minutes.

Report: "Free Trial users on US and EU SaaS environments have issues with real-time predictions on text generation deployment features"

Last update
resolved

This incident has been resolved.

identified

We are aware that a subset of free trial users on US and EU SaaS environments may experience some issues with real-time predictions on text generation deployment features. Engineering is working on a resolution.

Report: "Issue with a subset of email notifications on US and EU production."

Last update
resolved

Users may have experienced problems with receiving email notifications related to deployed models.

Report: "Users on US Production may experience slower allocation of Workers than normal"

Last update
resolved

The issue has been mitigated.

investigating

DataRobot Engineering has identified an issue where certain users on US Production may experience slower allocation of Workers than normal. We are investigating potential causes.

Report: "Issues Occuring with Project Creation and Model Deployments"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and engineering is currently monitoring the results.

investigating

Users are experience issues when creating projects or deploying models in DataRobot. Our engineering team is currently investigating the issue.

Report: "Customers Experiencing Errors with Custom Modeling and Notebooks"

Last update
resolved

Engineering has applied a fix in the managed EU AI cloud which resolved the issue. The issue is contained.

identified

Engineering has identified an issue with custom modeling and notebooks for customer in the Managed EU AI Cloud. Engineering has found a potential fix and is currently working on deploying that fix to production.

Report: "5xx errors from the prediction servers on US production"

Last update
resolved

The engineering team has identified the root cause and rolled back the changes. That has mitigated the issue and all the services are working as expected. The incident is marked as contained.

investigating

Multiple services including dedicated predictions are experiencing degraded performance. Engineering is investigating.

Report: "Projection and Notebook Creation Down"

Last update
resolved

Our Engineering team noticed that Project and Notebook creation were down temporarily. Engineering applied fix and Project and Notebook creation are working as expected.

Report: "Increased error rate on US production"

Last update
resolved

The issue with the increased error rate on the US production has been resolved after the token certificates update and service restart.

investigating

DataRobot has increased error rate on US production. The Engineering is investigating the root cause.

Report: "Predictions monitoring getting dropped intermittently."

Last update
resolved

The fix was deployed to the US production environment and has been verified. The incident is contained.

identified

The Engineering team has identified the root cause of the issue with predictions monitoring getting dropped intermittently. A fix is being worked on to resolve the issue. As the fix deploys this week, the Engineering team will continue to monitor and ensure no further issues arise.

investigating

Engineering has identified an issue with predictions monitoring getting dropped intermittently. Prediction by itself is not impacted by this issue. Engineering is currently investigating for the root cause.

Report: "Customer experiencing 503 error on US prod when launching Custom apps."

Last update
resolved

The incident has been resolved.

investigating

Users are unable to launch custom apps on US prod. The Engineering team is investigating the issue.

Report: "Users cannot start Notebook session since 05:00 UTC in US SaaS environment"

Last update
resolved

The Engineering team has been monitoring the fix for the last 3 days and the issue has been contained. No further updates are expected.

monitoring

Engineering has applied a fix, and notebook services are operational again. Engineering is continuing to monitor.

investigating

Notebooks are not able to start on US Prod. Engineering is currently investigating.

Report: "DataRobot STS - Single Tenant SaaS (Managed by DataRobot)"

Last update
resolved

This incident has been resolved.

identified

Engineering tea has identified the issues and fix is under way !

investigating

DataRobot Single Tenant SaaS platform unable to ingest data through AI Catalog, our engineering team is investigating the issue.

Report: "Users cannot start Notebook session since 11:30 CET in US SaaS environment"

Last update
resolved

The issue has been mitigated as of 12:45 CET. Notebooks working as expected.

identified

We are continuing to work on a fix for this issue.

identified

We are continuing to work on a fix for this issue.

identified

Engineers are working on fixing the issue.

Report: "Delay in Deployment Alerts"

Last update
resolved

This incident has been resolved.

monitoring

Engineering has applied a mitigating solution to stabilize the system, the issue is under monitoring and a permanent fix is under a way !

identified

Engineering has noticed delayed in health notification, the issues has been identified and a fix is under way. Scheduled health check notification will not be working where as no impact for RealTime notification for DataRobot US Managed AI cloud.

investigating

We are continuing to investigate this issue.

investigating

Some DataRobot users on US SaaS may see a delay in deployment alerts. Engineering is investigating.

Report: "Users are unable to sign-up for trial"

Last update
resolved

Users are now able to sign-up for trial in both US and EU environment. Engineering has resolved the issue.

investigating

Users are unable to sign-up for trial, in both EU and US environments. Engineering is investigating

Report: "Issue with Notebooks in the DataRobot EU cluster (app.eu.datarobot.com)"

Last update
resolved

This incident has now been resolved.

monitoring

Notebook functionality has been restored on the EU cluster. Engineering is currently monitoring this issue.

investigating

We are continuing to investigate this issue.

investigating

The creation of new Notebooks on the EU cluster is currently down. Engineering is currently investigating.

Report: "Issue with creating new AI applications on DataRobot US cluster(app.datarobot.com)"

Last update
resolved

The issue is contained. Please contact support@datarobot.com if you have any questions.

investigating

The engineering team is working on fixing the issue. No impact is expected for the existing applications.

Report: "Network Latency/Timeout noticed in Kubeworkers"

Last update
resolved

AWS updated that the operational issue US-EAST-1(use1-az1) Region is fully resolved. DataRobot has monitored our services and see everything is up and operational. The issue is contained.

monitoring

AWS has operational issues in US-EAST-1(use1-az1) Region. This could impact some of the DataRobot services in that region. Our team will continue to monitor for any kind of impact on the DataRobot services.

Report: "Users unable to start notebooks in the EU Production Environment"

Last update
resolved

Engineering has applied a fix and the problem of starting notebooks in the EU Production environment has been contained.

investigating

There is an incident affecting starting of notebooks in the EU Production environment. Our Engineering team is currently investigating.

Report: "Deployment report generation failure in US and EU Prod"

Last update
resolved

This incident has been resolved.

identified

The engineering team is still working on mitigating the broken deployment reports on production. The new ETA for the resolution of the issue is Thursday, 13:00 UTC, 31st of August.

identified

Some customers may experience unexpected behavior when generating deployment reports. This issue is limited to report generation only and all other DataRobot services, including predictions and model monitoring, are functioning normally. Fix will be provided on Monday evening UTC.

identified

We are continuing to work on a fix for this issue.

identified

Some customers may experience unexpected behavior when generating deployment reports. This issue is limited to report generation only and all other DataRobot services, including predictions and model monitoring, are functioning normally. Engineering is currently working on a fix for the issue.

Report: "A delay in prediction monitoring was observed on the EU SaaS cluster"

Last update
resolved

From 10:50 UTC to 13:27 UTC on 13 July 2023, a delay in prediction monitoring was observed for EU SaaS customers. The issue has been identified and corrected. All systems are functioning normally at this time.

Report: "Partial outage is reported with DataRobot"

Last update
resolved

This incident has been resolved. All affected DataRobot Services are functional.

monitoring

Our team has implemented a fix, and we are actively motoring the status.

identified

DataRobot Saas platform is experiencing import errors on Prediction requests. Our team has identified the issue and is working on a fix. Please expect the next update within 60 minutes, or reach out to support@datarobot.com if there are any questions.

Report: "Partial outage is reported with DataRobot"

Last update
resolved

This incident has been resolved. All affected DataRobot Services are functional.

investigating

Certain services within DataRobot are down, including but not limited to Notebooks and Modeling Jobs. Engineering is investigating.

Report: "Users might have experienced delay in data upload job processing on US Production."

Last update
resolved

An issue delayed the upload jobs from May 10th 2:34 AM UTC to May 10th 3:35 AM UTC. Users who tried uploading the data might have experienced a delay. The incident has been contained.

Report: "DataRobot Notebooks Not Unavailable."

Last update
resolved

This incident has been resolved.

monitoring

Engineering has identified the root cause and applied a mitigation. We are currently monitoring.

investigating

There is an incident affecting all usage of notebooks in our US Production environment. Our Engineering team is currently investigating.

Report: "Zepl Is Experiencing Partial Outage"

Last update
resolved

Engineering has applied the additional fixes to the database, and this incident has been resolved. Zepl services are back to normal.

monitoring

The fix that engineering applied to Zepl did not resolve the 404 Errors. We have applied additional modifications to the Zepl backend and monitoring the stability of the application.

monitoring

We have applied a fix to the Zepl platform backend. We are currently monitoring.

investigating

Zepl is experiencing intermittent application degraded performance and 404 Page Not Found errors. Our Engineering team is currently investigating.