DataOps.live

Is DataOps.live Down Right Now? Discover if there is an ongoing service outage.

DataOps.live is currently Operational

Last checked from DataOps.live's official status page

Historical record of incidents for DataOps.live

Report: "Platform Outage"

Last update
resolved

During an automated database patch, requests to the platform failed between 08:06 BST to 08:08 BST on Saturday 21st September 2024. This may have caused interruption to any users accessing the platform during that time. Additionally some pipelines jobs failed during this time due to failed requests to the platform. The system recovered automatically. We have identified the cause of this issue, and have taken action to ensure this particular situation does not happen again. We sincerely apologise for any impact this had.

Report: "Issue with Login"

Last update
resolved

We are pleased to inform you that all systems are operating normally, and users are able to log in without any issues. As a result, we consider this incident resolved, and we will be closing it. The log-in issue appeared to only affect platform users without an active session. Users with an active session, as well as scheduled pipelines were unaffected. Thank you for your understanding and cooperation throughout this process. Please reach out if you have any further concerns.

monitoring

We have observed successful logins, indicating that login functionality appears to be restored. However, our team will continue to closely monitor the situation for any potential issues. Please let us know if you experience any further difficulties. Thank you for your patience.

investigating

We are aware that some users may be currently unable to login due to an ongoing issue with our third-party authorisation provider. We are monitoring the providers status updates on the resolution of the problem and are aiming to restore login functionality as soon as possible. Thank you for your patience.

Report: "Degraded Performance"

Last update
resolved

Between 02:20 BST to 03:24 BST and 04:31 BST to 04:59 BST, the platform experienced periods of increased response time, resulting in degraded performance. Some requests failed between 03:05 BST and 03:08 BST, which may have caused interruptions for a minority of users accessing the platform during that time. The system recovered automatically, and all pipeline metrics remained at normal levels. We have identified and isolated the cause of this issue and will continue to monitor the situation.

Report: "Issue with login"

Last update
resolved

We are pleased to inform you that all systems are operating normally, and users are able to log in without any issues. As a result, we consider this incident resolved, and we will be closing it. Thank you for your understanding and cooperation throughout this process. Please reach out if you have any further concerns.

monitoring

We have observed successful logins, indicating that login functionality appears to be restored. However, our team will continue to closely monitor the situation for any potential issues. Please let us know if you experience any further difficulties. Thank you for your patience.

identified

We have observed that some Auth0 services are starting to return. Our team is in direct contact with Auth0, who are actively working to address and resolve the underlying cause of the problem. We appreciate your patience, and we will provide further updates as the situation progresses.

investigating

We are continuing our investigation into the login issues caused by Auth0.com. In the meantime, we can confirm that scheduled pipelines are functioning correctly as expected. We will provide further updates regarding the login issue as soon as more information becomes available. Thank you for your understanding.

investigating

We are continuing to investigate this issue.

investigating

We are aware that users are currently unable to login due to an ongoing issue with auth0.com. Our team is communicating with Auth0 to resolve the problem and restore login functionality as soon as possible. Thank you for your patience.

Report: "Docker Hub issue"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

Docker have resolved their issues

investigating

We are aware of an issue with Docker which may affect pipelines trying to pull from a Docker container. We are monitoring the situation and awaiting updates from Docker when they have resolved their issue.

Report: "Partial outage"

Last update
resolved

We have identified that the Snowflake Bundle 2023_05, automatically enabled on the 22nd August included a change in behaviour causing grant failures to stop a pipeline within DataOps.live. We believe to have found a permanent fix which we are currently testing and plan to release as a hotfix on Thursday 24th August. We will also provide instructions on how to enable the Snowflake Bundle afterwards. As a workaround and with the ACCOUNTADMIN role within your Snowflake tenant you can disable the Snowflake Bundle: select SYSTEM$DISABLE_BEHAVIOR_CHANGE_BUNDLE('2023_05'); After this the affected pipelines will start to run again. Once the hotfix is released you can re-enable the Snowflake Bundle again.

monitoring

We are continuing to monitor for any further issues.

monitoring

We working on an update to handle the 2023_05 Snowflake behaviour change.

identified

We believe that a Snowflake behaviour introduced by the Snowflake Bundle 2023_05 change is causing the issue. This change causes grant failures to stop the pipeline, whereas previously the pipeline would continue. Learn more about the bundle from the Snowflake release notes: https://docs.snowflake.com/en/release-notes/bcr-bundles/2023_05_bundle A workaround as the ACCOUNTADMIN role within your Snowflake tenant: select SYSTEM$DISABLE_BEHAVIOR_CHANGE_BUNDLE('2023_05'); ---We are continuing to investigate and are working on an update to handle this behaviour change.

investigating

We are aware of some DataOps.live pipelines failing and are investigating.

Report: "Issue pulling orchestrator images."

Last update
resolved

Orchestrator job image pulls have returned to normal and we are no longer seeing elevated failure rates when pulling docker images.

monitoring

We have observed successful orchestrator image pulls and believe the issue has been resolved at hub.docker.com. We will keep monitoring the situation.

investigating

We've observed a rise in job failure rate when pulling docker images from the repository and are actively monitoring the situation. We suspect that the issue lies with hub.docker.com, a third-party platform.

Report: "Delay in Pipeline Processing"

Last update
resolved

We experienced an issue which delayed pipeline jobs from starting. This issue has been resolved, all pipelines have started, and we are continuing to monitor the situation.

Report: "Delay in Pipeline Processing"

Last update
resolved

We experienced an issue which delayed pipeline jobs from starting. This issue has been resolved, all pipelines have started, and we are continuing to monitor the situation.

Report: "SOLE Configuration Validation Issue"

Last update
resolved

This incident has been resolved.

monitoring

Rollback complete.

identified

This morning a DataOps Orchestrator release tightened up configuration validations within SOLE. These caused jobs to fail where invalid SOLE configurations exist. We are rolling this upgrade back and will re-release later with these as warnings rather than errors.

Report: "DataOps Platform Web Interface - Intermittent Issues"

Last update
resolved

We experienced a short intermittent outage on app.dataops.live between 06:34 UTC and 06:54 UTC that may have affected a small number of users.

Report: "DataOps Platform Outage"

Last update
resolved

We experienced an outage on app.dataops.live between 01:16 - 01:53 UTC and 02:36 - 04:54 UTC