Historical record of incidents for Astro
Apr 18, 2025
Stuck worker pods resulting in tasks failing in the queued state
Last updateIn some deployments, worker pods are getting stuck in the initialization state for an extended period of time. Due to this, queued tasks are unable to run and fail.This is not affecting all deployments. We are investigating which deployments are affected and why.
Apr 17, 2025
Customers will not be able to create new Azure Clusters
Last updateA fix has been implemented and we are monitoring the results.
Cluster maintenance
Apr 7, 2025
Cost Breakdown Dashboard data update delayed
Last updateData shown in the Organization Dashboards Cost Breakdown (for Enterprise customers) is delayed. As stated on the page itself, the latest data is as of April 4th. The processing to update this dashboard is currently ongoing, and we expect the data to be refreshed at approximately 16:00 UTC.
Mar 27, 2025
We are experiencing an issue with new task execution on AWS clusters
Last updateThis incident has been resolved.
The team has identified the issue and is currently actively working on mitigating it. The issue concerns new task execution on AWS clusters and accessing the UI.
We are experiencing an issue with new task execution on AWS clusters and issue with accessing UI. The team is actively investigating the issue.
Mar 24, 2025
Airflow UI and API Unavailable for a few customers
Last updateWe have confirmed that no additional clusters were affected beyond those that were initially identified. This incident is fully resolved.
We have identified that this issue is specific to clusters that have custom networking, specifically route tables that require a carve-out for traffic back to Astro's control plane. The public IPs for the control plane were changed, and certain custom networking setups required that the IPs be updated accordingly.We have fixed this for all customers who reported this issue and are checking all clusters to determine if there are any others affected.
We are experiencing an issue in a few clusters, causing the Airflow UI and API to become unavailable.The team is actively investigating the issue.
Mar 23, 2025
Certain Astro clusters on AWS experiencing downtime
Last updateWe have determined the event that caused this downtime and we are confident that it will not occur again. We will post a public RCA in the coming week.
We have applied a remediation for all of the affected clusters. No clusters are currently experiencing downtime. We are continuing to examine the root cause and will update again when we are confident that the issue will not recur.
We have identified a problem with scaling behavior that is causing a limited number of clusters to experience downtime. The message 'Internal Server Error' displays on the UI preventing the viewing of DAGs and the Airflow UI. This is in some cases affecting task execution. We are working on a fix currently.
Mar 22, 2025
'Internal Server Error' when attempting to access Airflow UI
Last updateThis incident has been resolved
We are continuing to investigate this issue.
We are currently investigating this issue. Tasks do not appear to be impacted.
Mar 18, 2025
Astro UI and Astro API not available.
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently experiencing an issue impacting the Astro Control Plane Cluster during routine maintenance activities.Current Impact:Astro UI and Astro API are not available at this moment. However, airflow tasks will continue to run.Actions Being Taken:Our engineering team is actively monitoring and working to restore services promptly.Next Update:We will provide further status updates as more information becomes available.We apologize for the inconvenience and thank you for your patience.