Historical record of incidents for Fly.io
Report: "IPv6 Connectivity Loss in GDL"
Last updateWe have experienced a temporary loss in IPv6 connectivity in Guadalajara, Mexico (GDL) and are currently working with our upstream providers to resolve the issue. IPv4 connectivity is currently unaffected.
Report: "Network issues in LHR"
Last updateWe are observing network issues in LHR region. Apps continue to run, but may have network issues, and deploying/updating apps may fail.
Report: "Network maintenance in GRU (São Paulo, Brazil)"
Last updateAn upstream provider is performing network maintenance in GRU, from 2025-05-30 at 12:00 UTC (9:00am BRT local time) to 14:00 UTC (11:00am BRT local time). You may experience a short total loss of connectivity for up to 5 minutes within the scheduled maintenance window hours.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Report: "Network maintenance in LHR"
Last updateScheduled maintenance is currently in progress. We will provide updates as necessary.
An upstream provider is performing network maintenance on a subset of our servers in LHR, from 2025-05-29 at 23:00 UTC to 2025-05-30 at 03:00 UTC. You may experience network connectivity disruptions for some time within the maintenance window.
Report: "Network maintenance in CDG (Paris, France)"
Last updateScheduled maintenance is currently in progress. We will provide updates as necessary.
An upstream provider is performing network maintenance in CDG, from 2025-05-29 at 22:00 UTC (2025-05-30 12:00am CEST local time) to 2025-05-30 at 02:00 UTC (4:00am CEST local time). You may experience a short total loss of connectivity for up to 5 minutes within the scheduled maintenance window hours.
Report: "Burst of network related alerts from some servers in LHR"
Last updateThis incident has been resolved.
Alerts appear to be related to a network blip caused by an upstream provider's router failover, with no ongoing disruption.
We've received a flood of networking related alerts from a subset of servers running in LHR. We are not yet sure of the impact on customer workloads.
Report: "Burst of network related alerts from some servers in LHR"
Last updateAlerts appear to be related to a network blip caused by an upstream provider's router failover, with no ongoing disruption.
We've received a flood of networking related alerts from a subset of servers running in LHR. We are not yet sure of the impact on customer workloads.
Report: "WireGuard gateway issues"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating issues with WireGuard over websockets (the default connection mode in flyctl). `flyctl ssh`, `flyctl proxy`, `flyctl logs` commands as well as others may fail. If you are on a network that allows UDP connections, running `fly wg websockets disable` may fix the issue as a workaround.
Report: "WireGuard gateway issues"
Last updateWe are investigating issues with WireGuard over websockets (the default connection mode in flyctl).`flyctl ssh`, `flyctl proxy`, `flyctl logs` commands as well as others may fail.If you are on a network that allows UDP connections, running `fly wg websockets disable` may fix the issue as a workaround.
Report: "Production database is being migrated"
Last updateThis incident has been resolved.
The issue has been resolved
We are continuing to monitor for any further issues.
A fix has been implemented, and we're monitoring the results. API performance should be back to normal, although app creates may still be degraded.
We're continuing to work on fully restoring the Machines API. API calls are still taking longer than usual but we're no longer seeing failures.
We are continuing to work on a fix for this issue.
We identified an issue while migrating our production traffic, and have applied a fix to restore dashboard functionality. We're continuing to work on fully restoring the Machines API.
The Fly Dashboard is also affected and may prevent certain dashboard functionality, like the support portal. If you're on a paid support plan, please submit tickets using your support email address in the meantime.
We’re migrating production traffic over to a new production database. GraphQL queries, including flyctl commands, may be slow.
Report: "Production database is being migrated"
Last updateWe’re migrating production traffic over to a new production database. GraphQL queries, including flyctl commands, may be slow.
Report: "Network maintenance in AMS (Amsterdam, The Netherlands)"
Last updateScheduled maintenance is currently in progress. We will provide updates as necessary.
An upstream provider is performing network maintenance in AMS, from 2025-05-19 22:30 UTC (00:30 local time) to 2025-05-20 04:00 UTC (06:00 local time). No operational impact is expected.
Report: "Network maintenance in BOG (Bogotá, Colombia)"
Last updateAn upstream provider is performing network maintenance in BOG on 2025-05-17, from 11:00 UTC (06:00am local time) to 15:00 UTC (10:00am local time). No operational impact is expected.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Report: "Machines API degraded performance"
Last updateWe identified the problem and deployed a fix.
We're investigating degraded performance with the Machines API metadata update endpoint.
Report: "Machines API degraded performance"
Last updateWe're investigating degraded performance with the Machines API metadata update endpoint.
Report: "Network issues in NRT/HKG"
Last updateThis incident has been resolved.
Machines API requests (including `fly deploy` or `fly machines` commands) may occasionally fail when trying to create/update machines in NRT or HKG regions. We are investigating.
An upstream provider is investigating a network issue in NRT and HKG regions. Apps continue to run, but requests may occasionally fail.
Report: "Network issues in NRT/HKG"
Last updateAn upstream provider is investigating a network issue in NRT and HKG regions. Apps continue to run, but requests may occasionally fail.
Report: "Network maintenance in SEA (Seattle, Washington, USA)"
Last updateAn upstream provider is performing critical network maintenance in SEA, from 14:00 UTC (07:00am PDT local time) to 16:00 UTC (09:00am PDT local time). You may experience a short total loss of connectivity for up to 15 minutes within the scheduled maintenance window hours.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Report: "New MPG clusters cannot be provisioned in FRA"
Last updateThis incident has been resolved, all operations in FRA are working as expected.
A fix has been implemented and we are seeing MPG creations in FRA succeed again. The MPG tab of the fly.io dashboard is working again for users with clusters in FRA.
New MPG cluster creations in Frankfurt (FRA) region are currently failing. Cluster creation in other MPG regions is working as normal. We are working to restore FRA cluster creation. Existing, running database clusters in FRA are not impacted and continue to work as normal. However the MPG page in the Fly.io dashboard may not load for users with clusters in FRA.
Report: "Errors (5xx, timeouts) in Fly.io dashboard"
Last updateThis incident is resolved, Dashboard, API and CLI operations should be working normally now.
We continue to monitor the deployed fix. Dashboard and API/CLI operations should be functional now.
We have identified the troublesome component and a fix has been rolled out. We are monitoring the results and may need to perform further updates to fully stabilize things.
Our metrics and user reports show Fly.io/dashboard and portions of the API backend are timing out or returning 5xx errors. All operations in the Fly dashboard and most operations using fly CLI will fail or timeout at this point. Currently-running machines or workloads should not be affected.
Report: "New MPG clusters cannot be provisioned in FRA"
Last updateNew MPG cluster creations in Frankfurt (FRA) region are currently failing. Cluster creation in other MPG regions is working as normal. We are working to restore FRA cluster creation.Existing, running database clusters in FRA are not impacted and continue to work as normal. However the MPG page in the Fly.io dashboard may not load for users with clusters in FRA.
Report: "Errors (5xx, timeouts) in Fly.io dashboard"
Last updateOur metrics and user reports show Fly.io/dashboard and portions of the API backend are timing out or returning 5xx errors. All operations in the Fly dashboard and most operations using fly CLI will fail or timeout at this point.Currently-running machines or workloads should not be affected.
Report: "Depot builders experiencing issues"
Last updateFrom roughly 11:00AM Pacific to 3:00PM Pacific, Depot builders were unable to complete deploys (https://status.depot.dev/cmafni8la004z9pwuozks8vwx). During this time, deploys defaulted back to our legacy Fly builders, and users may have seen slower-than-usual deploys depending on the size of the build. This has been resolved, and deploys are now defaulting to Depot builders again.
Report: "Depot builders experiencing issues"
Last updateFrom roughly 11:00AM Pacific to 3:00PM Pacific, Depot builders were unable to complete deploys (https://status.depot.dev/cmafni8la004z9pwuozks8vwx). During this time, deploys defaulted back to our legacy Fly builders, and users may have seen slower-than-usual deploys depending on the size of the build.This has been resolved, and deploys are now defaulting to Depot builders again.
Report: "IAD Managed Postgres control plane unavailability"
Last updateThis incident has been resolved.
We are investigating intermittent unavailability of the Managed Postgres control plane in IAD region. Database clusters continue to run.
Report: "IAD Managed Postgres control plane unavailability"
Last updateWe are investigating intermittent unavailability of the Managed Postgres control plane in IAD region. Database clusters continue to run.
Report: "Some or all *.fly.dev subdomains are currently returning NXDOMAIN errors in IAD"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
applications may be inaccessible via DNS.
Report: "WireGuard connectivity into CDG is unavailable"
Last updateWe have re-enabled the CDG gateway for flyctl.
A fix has been implemented and we are monitoring the results.
Inbound wireguard connections to our CDG gateways is currently unavailable due to an upstream networking issue. Any static peers configured in CDG will be unavailable until this is resolved.
Report: "Loss of connectivity in IAD"
Last updateWe are experienced an outage with one of our upstream transit providers in IAD for around 10 minutes. Traffic has been re-routed to alternate paths and connectivity should be back to normal.
Report: "Loss of connectivity in IAD"
Last updateWe are experienced an outage with one of our upstream transit providers in IAD for around 10 minutes. Traffic has been re-routed to alternate paths and connectivity should be back to normal.
Report: "Some or all *.fly.dev subdomains are currently returning NXDOMAIN errors in IAD"
Last updateapplications may be inaccessible via DNS.
Report: "WireGuard connectivity into CDG is unavailable"
Last updateInbound wireguard connections to our CDG gateways is currently unavailable due to an upstream networking issue. Any static peers configured in CDG will be unavailable until this is resolved.
Report: "Upstream network outage in MAD"
Last updateThis incident has been resolved.
Power has been brought back online for the region. We're closely monitoring for any further complications.
Our edges in Madrid, Spain are currently affected by an upstream outage caused by ongoing power issues in the region. Regional and static egress IPs may be temporarily unavailable. Access via Anycast IPs is currently unaffected. We are working with our upstream to resolve this situation.
Report: "Upstream network outage in MAD"
Last updateThis incident has been resolved.
Power has been brought back online for the region. We're closely monitoring for any further complications.
Our edges in Madrid, Spain are currently affected by an upstream outage caused by ongoing power issues in the region. Regional and static egress IPs may be temporarily unavailable. Access via Anycast IPs is currently unaffected. We are working with our upstream to resolve this situation.
Report: "Fly.io dashboard down"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Fly.io dashboard down"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Network performance issues in ORD"
Last updateThis incident has been resolved. The issues impacting performance on the affected routes do not seem to have been caused by issues within our network infrastructure.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
Some network paths in a single region (ORD) are slightly slower than expected. You may experience lower network performance for requests in ORD.
Report: "Network performance issues in ORD"
Last updateThis incident has been resolved. The issues impacting performance on the affected routes do not seem to have been caused by issues within our network infrastructure.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
Some network paths in a single region (ORD) are slightly slower than expected. You may experience lower network performance for requests in ORD.
Report: "Degraded performance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We're investigating degraded performance on our web dashboard and GraphQL API. You may notice slower responses as well as occasional 500 errors at this time.
Report: "Degraded performance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We're investigating degraded performance on our web dashboard and GraphQL API. You may notice slower responses as well as occasional 500 errors at this time.
Report: "Network maintenance in SCL (Santiago, Chile)"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
An upstream provider is performing critical network maintenance in SCL, from 7:00am UTC (3:00am local time) to 9:00am UTC (5:00am local time). You may experience a short total loss of connectivity for up to 25 minutes within the scheduled maintenance window hours.
Report: "Scheduled Maintenance in GIG Region (Rio De Janeiro)"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
We are performing networking upgrades at our GIG data centre from 06:00 - 09:00 UTC (03:00 - 06:00 Local Time). Users with machines in GIG may experience networking downtime of up to 40 minutes within the scheduled maintenance period. We recommend users scale up to nearby regions, such as GRU, if needed.
Report: "Network maintenance in QRO"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
An upstream provider is performing critical network maintenance in QRO. You may experience a short total loss of connectivity for up to 25 minutes within the scheduled maintenance window hours.
Report: "Issues with API"
Last updateA fix has been deployed and the API is back up.
We are currently investigating issues with our Graphql API. You might experience issues connecting to the dashboard and flyctl.
Report: "Organization invites failing on dashboard"
Last updateThis incident has been resolved.
We are investigating an issue where inviting users to organization from the web dashboard may fail. As a workaround, inviting users using the flyctl command-line (`fly orgs invite` command) is working.
Report: "Networking issues in HKG"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We are investigating intermittent network issues in the HKG region. Apps running in the region may have trouble reaching apps in other regions at this time.
Report: "Network issues in GDL"
Last updateThis incident has been resolved.
We are investigating network issues in the GDL region. Apps running in the region may be unreachable at this time.
Report: "504 Errors from Logs API"
Last updateHistorical logs are back up.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue with looking up historical logs. The fly logs command may fail. Streaming logs with NATS is not affected..
Report: "Capacity issues in FRA"
Last updateThis incident has been resolved.
New capacity has been added in FRA, we will continue to monitor the region for capacity constraints.
We are continuing to work on a fix for this issue.
We are continuing to work on a fix for this issue.
We are continuing to work on a fix for this issue.
We are actively working to add additional capacity in the FRA region. We'll provide another update in the next 15-30 minutes.
We are experiencing low capacity in FRA. You may see machine launch failures. We are working on adding new capacity to FRA as soon as possible.
Report: "Network issues in SJC"
Last updateNetworking in SJC is working as expected on all hosts. This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We've identified the cause of the issue and have applied a fix. We are seeing improvements are continuing to monitor for full recovery.
A small number of hosts in SJC are continuing to experience networking issues after the earlier scheduled maintenance. We are working with our upstream provider to restore full connectivity to these hosts. Machines on impacted hosts may see reduced networking performance connecting to other machines within Fly.io and the broader internet.
We are investigating network issues resulting from the earlier scheduled maintenance in SJC.
Report: "Capacity issues in LHR region"
Last updateThis incident has been resolved.
We've provisioned new host capacity in LHR region, machine/volume creates have been re-enabled and deploys should now be possible again. We are monitoring capacity and will provide updates if the situation changes.
New machine/volume creates in LHR regions are currently unavailable as there is no host capacity available. Any workloads currently running will continue to run; it is also still possible to update existing machines/volumes. Increasing `fly scale count` in LHR region is not possible. Blue-green deploys are also not possible at the moment, as well as deploys with `release_command`. We expect more capacity to become available in the coming weeks. For the time being, please choose a nearby region for new workloads, such as AMS (Amsterdam, Netherlands) or ARN (Stockholm, Sweden).
Report: "Management plane for managed postgres in ORD is unavailable"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Degraded connectivity to Fly Registry"
Last updateWe have identified that transoceanic subsea cable faults resulted in degraded connectivity to some registry instances in AMS, FRA, WAW regions. Our monitoring indicates error rates have improved after cordoning the affected instances at 16:40 UTC.
We are continuing to monitor results after cordoning affected registry instances.
We are investigating timeouts connecting to instances of registry.fly.io in AMS, FRA, WAW regions. Customers may experience slower image pushes and pulls within Fly Machines in the affected regions.
We have cordoned the affected registry instances in AMS, FRA, WAW and are seeing timeout errors decrease.
We are continuing to investigate the cause of increased connection timeouts to instances of our primary registry in AMS, FRA, WAW. Affected customers may be able to work around by pushing images to an alternate registry, registry2.fly.io: FLY_REGISTRY_HOST=registry2.fly.io fly deploy
We are investigating timeouts connecting to registry.fly.io. Customers may experience slower image pushes and pulls within Fly Machines.
Report: "Capacity issues in IAD and AMS"
Last updateWe have provisioned additional capacity in the affected regions.
New machine/volume creates in IAD regions may fail as there is no host capacity available. Any workloads currently running will continue to run; it is also still possible to update existing machines/volumes. Increasing `fly scale count` in these regions may not work. Blue-green deploys may also be unavailable at the moment, as well as deploys with `release_command`. We are provisioning additional capacity in this region.
Report: "Leader Election Issues with PG Flex Clusters close to NA region"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring
We are investigating an issue where postgres flex clusters are unable to elect a new leader.
Report: "Network issues in AMS region"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
Networking on the impacted hosts has been restored. Machines and apps on those hosts will now be reachable. We're continuing to monitor to ensure everything remains stable.
The hardware switchover is complete. We are continuing the process of re-connecting the downed hosts to the network.
Installation of the new hardware has completed and we are starting the switchover process. A networking blip may be observed on Machines in the AMS region during this process.
Installation of the replacement hardware is still ongoing.
Replacement hardware is onsite and is being installed.
The upstream provider has identified this issue to a broken switch, and are working to replace the switch. They expect connectivity to return in ~1 hour.
Various hosts in AMS region have lost network connectivity. We are investigating this along with our upstream provider.
Report: "Network issues in ARN"
Last updateLoad has subsided on the edge nodes and we are not observing any related errors at this time.
Our edge nodes in Stockholm are currently experiencing high load. Some incoming connections may fail while we work to address the issue.
Report: "Network outage"
Last updateThis incident has been resolved.
Network connectivity in IAD has been restored. Our APIs should be working again, but might have higher response times.
Network connectivity in IAD has been restored. Our APIs should be working again, but might have higher response times.
We're bringing our platform up in another region and waiting for things to settle. Our upstream provider is also replacing the affected networking devices in IAD.
We're continuing work to move our APIs away from affected regions/providers. Another update will be provided at 13h00 UTC or earlier.
The IAD region is unavailable due to an incident at an upstream provider. Our API is hosted in this region and as such is unavailable.
We are investigating widespread reports of networking issues. Apps appear to be running correctly but requests made to the apps may fail. The API and dashboard are also unavailable at the moment.
Report: "Edge network issues in GRU and SCL"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are seeing network issues on our edge servers in regions GRU and SCL. Machines are running correctly, but inbound requests from clients in those regions may fail intermittently..
Report: "Network issues in JNB"
Last updateThis incident has been resolved.
We have implemented a workaround for the network issue and are monitoring the situation.
There is an issue with an upstream network provider in JNB. Apps are still running but may observe network issues. New deploys for apps may fail.
Report: "Depot builders failing with internal error"
Last updateThis incident has been resolved.
A fix has been implemented on Depot side. https://status.depot.dev/cm6zolsn40009f2dj5ss7lrd7
The Depot service is currently degraded due to a database outage. We're continuing to monitor for recovery. Customers can also follow the Depot status page at https://status.depot.dev/ for updates. Customers that need to deploy can use legacy Fly.io hosted builders with `fly deploy --depot=false`
We are investigating failures when trying to build using the default Depot builders. The recommended workaround is to use `--depot=false` with `fly deploy`. The error from Depot builders is `Error: failed to fetch an image or build from source: error building: input:3: ensureDepotRemoteBuilder {"code"=>"internal", "message"=>"internal error"}`
Report: "SSH failing for newly created machines"
Last updateThis incident has been resolved.
This issue has been fixed, newly created machines will have working SSH. Machines created during this incident will need to be updated (`fly machine update --yes <id>`) or deleted/recreated to fix SSH.
As a workaround, run the `fly ssh console` command with `--pty --command /bin/sh` flags.
We are investigating reports that connecting to newly created machines via SSH (`fly ssh console`) may fail.
Report: "Elevated network latency in FRA"
Last updateNetwork functionality is fully restored in FRA.
We've deployed a fix for this incident and we are monitoring while network latency and bandwidth return to normal. All user apps should start seeing improved and normal response times.
We're addressing elevated network latency and saturation affecting the FRA region. Apps with machines in this region might experience longer response times and possible timeouts (502 errors).
Report: "Capacity Constraints in IAD"
Last updateThis incident has been resolved
We have brought additional IAD capacity online. Customers should see machine creation, deploy, and scaling operations succeed as normal in the region. We're continuing to monitor to ensure full recovery.
We are continuing the process of adding additional machine capacity in the IAD region.
Machine capacity in the IAD region is currently low. We're working to bring additional capacity online. In the meantime, you may see errors deploying new machines in IAD, or increasing the size of existing machines in the region. Customers may want to deploy machines to nearby regions, such as ewr
Report: "Deploys using Depot Builders failing"
Last updateThis issue has been resolved, deploys using Depot Builders are succeeding as expected.
The Depot builder service is partially recovered and we are seeing deploys using Depot builders succeed again. Some customers may still experience degraded performance using Depot builders at this time. We're continuing to monitor for full recovery. Customers can still deploy using Fly.io hosted builders with `fly deploy --depot=false`
The Depot service is currently degraded due to a database outage. We're continuing to monitor for recovery. Customers can also follow the Depot status page at https://status.depot.dev/ for updates. Customers can still deploy using Fly.io hosted builders with `fly deploy --depot=false`
We are investigating increased error rates when deploying apps using the default Depot Builders. Customers who experience this issue can work around it by using `fly deploy --depot=false` to deploy your image with a Fly.io hosted builder.
Report: "API errors"
Last updateThis incident has been resolved.
We are investigating error 503 when making requests to our GraphQL API, or running flyctl commands.
Report: "Bluegreen healthchecks not passing"
Last updateThis incident has been resolved.
A fix has been implemented and bluegreen deploys are succeeding as expected. We're continuing to monitor deploys to ensure stability, but customers should see BlueGreen deploys succeed in all regions.
The issue has been identified and a fix is being implemented.
We are seeing signs of recovery, with Bluegreen deployments succeeding for many customers. We are continuing to investigate the root cause of the issue. Customers who still experience a Bluegreen deployment failure can retry using the rolling strategy with `fly deploy --strategy rolling`.
A temporary workaround for new deployments is to use rolling strategy: `fly deploy --strategy rolling`.
We are still investigating the issue.
When deploying with bluegreen strategy some green machines (new app version) won't pass healthchecks. Temporary workaround: unless bluegreen is a must for your app you can temporarily deploy using a different strategy by `fly deploy --strategy NAME`.
Report: "Machine creation errors in LHR"
Last updateWe observed several periods where Machine creations in LHR resulted in authentication errors from 11 Jan to 15 Jan 2025. Customers creating new Machines in the region may have seen failures with: failed to launch VM: permission_denied: bolt token: failed to verify service token: no verified tokens; token <token>: verify: context deadline exceeded The disruptions were caused by degraded connectivity to our token creation service from three hosts. We deployed a preventative fix for the network issues on 15 Jan 2025 at 12:58 UTC. Timestamps of occurrences (UTC): 2025-01-11 03:32 to 2025-01-11 04:11 2025-01-11 17:07 to 2025-01-11 17:54 2025-01-14 11:36 to 2025-01-14 12:14 2025-01-15 07:46 to 2025-01-15 09:49
Report: "Network issues in SJC region"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating inbound network connectivity issues in SJC region. Users routed to SJC may be unable to access apps, or latency may be increased.
Report: "Transient networking issue in FRA"
Last updateThis incident has been resolved.
We have noticed a spike in packet loss across the FRA region at around 14:44 UTC caused by an upstream issue. This has recovered since 14:47 UTC, and we are currently monitoring the situation along with our upstream providers.
Report: "IPv6 Networking Issue in SCL"
Last updateThis incident has been resolved.
We are aware of a temporary IPv6 networking issue in SCL when accessing certain IPv6 ranges / providers caused by an upstream maintenance and are working with our upstream for a fix. IPv6 request originating from your machines in SCL may see increased error rates.
Report: "Network Instability"
Last updateThis incident has been resolved.
We're monitoring the platform which continues to be stable and work normally. Additionally we are in the process of deploying the Fly Proxy build that contains the fix for the bug that caused this issue.
We have identified the cause of the network blip to be a bug in our Fly proxy and we're applying a fix.
We have noticed a temporary blip in our upstream network(s) between 16:38-16:40 UTC that affected our platform. This has been resolving and we are monitoring for any continuing effects.
Report: "Machine Creates and Updates currently failing"
Last updateAll changes have been deployed and Machine Create/Update API operations are healthy.
The validation fix has been deployed and our monitoring has resolved for the API error rate.
We were alerted to elevated error rates for machine creates and updates. A deploy caused a validation error which is now being reverted.
Report: "Networking issues in GDL"
Last updateThis incident has been resolved.
The issue has been identified and a fix is currently being implemented.
Report: "sjc region capacity"
Last updateThis incident has been resolved.
We are currently at capacity in our SJC region. We're actively working on fixing this, however you may wish to deploy to nearby regions (lax or phx) as a workaround.
We are currently at capacity in our SJC region. We're actively working on fixing this, however you may wish to deploy to nearby regions (lax or phx) as a workaround.
Report: "Elevated API Latency and Timeout Errors"
Last updateThis incident has been resolved.
A fix has been implemented and both Machines API and GraphQL API performance have returned to normal.
We have identified the cause of the API latency increase and are working to mitigate
We are currently investigating elevated error rates with our Machines and Graphql APIs. Users may experience slower responses or timeouts using the Machines API and flyctl commands
Report: "Degraded Connectivity"
Last updateWe have determined that some customers' machines are being throttled due to our full rollout of CPU quotas, separate from the incident yesterday. This in turn caused apparent networking issues. We have now temporarily rolled back these changes while we work with customers to better adapt to CPU quotas.
We are aware of customer-reported issues with internal networking and are investigating.