Historical record of incidents for Sanity
Report: "Google Cloud Platform outage impacting auto-updating Studios, asset services, and Google auth"
Last updateA Google Cloud Platform outage is currently impacting auto-updating Studios, the image pipeline, file asset service, and Google authentication. Cached images will continue to be served. API and API CDN endpoints are not currently affected.
Report: "Dataset and projectId GROQ functions returned null in webhooks"
Last updateBetween 12:30 and 18:50 UTC, the sanity::dataset and sanity::projectId GROQ functions returned null in GROQ webhook filters and projections. This null value may have caused unexpected GROQ webhook results. The underlying cause has been resolved and additional testing is now in place to prevent this from reoccurring in the future. We apologize for the disruption of operations caused by these unexpected GROQ webhook values.
Report: "Customers may experience difficulty assigning roles"
Last updateWe are currently experiencing an issue where some customers may have trouble assigning roles to their users. We are investigating and working toward remediation.
Report: "Dashboard fails to load intermittently"
Last updateWe have found the cause, and services are now restored.
Around 12:40 UTC we started seeing intermittent errors when loading the Sanity Dashboard. Our engineers are on it investigating the root cause. In the meantime, you can still reach your project by browsing directly to its Studio URL. We'll provide an update once we know more.
Report: "Elevated CDN error rates"
Last updateThis incident has been resolved.
Asset CDN error rates have returned to normal levels as of 17:36 UTC. We are continuing to monitor.
We are currently investigating a partial outage of the Asset CDN.
Report: "Increased request failures for some customers"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring results. As of 12:52 UTC, error rates have returned to a nominal level.
Beginning at 12:10 UTC, some customers have been experiencing request failures with the majority of them being mutation errors. We have identified the root cause and are actively working to resolve the issue.
Report: "Issue with login and signup via email/password"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Login functionality via email/password has been restored. Account creation using email/password as a login provider is currently unavailable. Please use Google or GitHub as a login provider for immediate account creation.
Login functionality via email/password has been restored. Account creation using email/password as a login provider is currently unavailable. Please use Google or GitHub as a login provider for immediate account creation.
We are currently investigating an issue with account login and signup via email/password (i.e., sanity as login provider). This will not affect Google, GitHub, or SAML SSO account creation nor login.
Report: "Elevated API and API CDN error rates for some customers"
Last updateBetween 12:10 and 12:45 UTC, we observed an increased number of 5xx errors for a small subset of customers. This affected the API and the API CDN with errors surfacing in both queries and mutations. The issue has since been resolved.
Report: "Dataset creation failures"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
Starting Feb 21 at 13:35 UTC, some customers are experiencing dataset creation failures. Around 1 out of 4 dataset creation attempts are impacted by this issue. We have identified the root cause and are deploying a fix.
Report: "Reduced Query Performance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring results.
The issue has been identified and a fix is being deployed.
We're experiencing reduced query performance for some customers and are currently looking into the issue.
Report: "Increased error rate for API calls"
Last updateWe're confident that the issue is now resolved and all operations are back to normal. Apologies for the turbulence!
Engineering has implemented a fix and we are monitoring API request activity now. API errors should be returning to expected levels across impacted customers. We'll continue to monitor the situation closely.
We have identified the root cause of the issue and are working to implement a fix.
Around 16:44 UTC we began experiencing increased error rates on API requests. Currently, some customers may experience failing requests. We are working to identify the issue as quickly as possible.
Report: "Empty key in GROQ queries causing errors"
Last updateGROQ Queries are now functioning as expected and this issue is resolved.
A fix has been implemented and rolled out fully. We continue to monitor the issue, but things should be back to normal operation now.
At 13:44 UTC, a change was deployed impacting GROQ queries using the "": shorthand syntax for empty keys. As a result, these queries may return an unexpected response. We have rolled back the change and are working to fully resolve the issue. This issue will not affect actual data, just the format of query results.
Report: "Issues with Sanity onboarding via external services"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring results.
We are currently investigating failures with Sanity onboarding via external services, including Vercel. While we work to resolve the issue, we recommend that people sign up directly via sanity.io/get-started or the CLI.
Report: "Login/Signup with Sanity Username/Password is temporarily down"
Last updateThis issue has now been resolved.
Issue has been identified and we are working towards a fix.
Login/Signup with Sanity Username/Password is temporarily down. SSO with other providers (Google, Github, etc.) will still work. Team is actively investigating.
Report: "New role assignments may not have taken effect for some users"
Last updateThis incident has been resolved. All role assignments should now proceed without delay. Apologies for any confusion or business impact the earlier role sync delays may have caused.
The role assignment delay is now known to have been at most an hour during that six-day period. A role assignment would have been delayed by no more than 60 minutes following the change.
Between January 15 and January 21, new role assignments may not have taken effect for some users. This would have impacted document-related permissions and would have given affected roles less permissions than expected. We have identified the cause and released a fix. We are monitoring the change and are syncing the role assignment changes that were made over that timeframe. There will be no data loss or dropped changes once this sync is complete.
Report: "Elevated error rates for dataset copy functionality"
Last updateDataset copy functionality is now restored for all customers. This incident has been resolved.
A fix has been implemented and dataset copy functionality is restored for most customers. We continue monitoring results.
The issue has been identified and a fix is being implemented.
We are experiencing an elevated level of errors for dataset copy (Cloud Clone) jobs that were started after 11:49 UTC. We are currently investigating this issue.
Report: "Webhook delivery failures"
Last updateThis incident has been resolved. We apologize for the impact this issue had on your operations and business.
We are continuing to monitor for any further issues. Apart from a small subset of failures between 09:30 and 09:31 UTC that will not be retried, all failed webhook deliveries have now been retried.
From 09:00 to 09:30 UTC, we experienced an issue impacting webhook delivery. A fix has been implemented and webhook delivery is now resumed.
Report: "Limited Permissions on default developer role"
Last updateBetween 10:49 AM UTC and 8:50 PM UTC today (11/20/2024): Users with the default Project Developer role were unable to access the API tab or Project Settings tab in the Manage Web Application.
Report: "Elevated API CDN error rate for a small subset of requests"
Last updateThe incident causing an elevation in 503 responses has been resolved, and all systems are operating normally. We apologize for any inconvenience this may have caused and appreciate your patience.
A fix has been implemented and we are monitoring the results.
We are seeing an elevated number of 503 responses from the API CDN endpoint, affecting a small subset of requests. We have identified the issue and are implementing a fix.
Report: "Elevated error rates for dataset copy functionality"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring results.
We are experiencing an elevated level of errors for people using our dataset copy (Cloud Clone) functionality. We are currently investigating the issue.
Report: "Increased API and API CDN latency for some customers"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently observing increased latency for queries to the API and API CDN, affecting a small subset of customers. The issue has been identified and a fix is being implemented.
Report: "Issue with Role Mapping User Interface (SAML SSO)"
Last updateThe issue has now been fully resolved, and the role mapping modals are rendered as expected. We are actively monitoring the system to ensure continued performance and reliability.
We are currently experiencing an issue affecting enterprise customers using SAML SSO. The role mapping modal is not rendering correctly, which may prevent users from managing role assignments effectively. Our team has identified the underlying issue and reviewing the fix for release. We are committed to resolving this as quickly as possible and will provide updates as they become available. We apologize for any inconvenience this may cause and appreciate your patience. If you have any urgent concerns, please reach out to our support team for assistance.
Report: "Delayed API CDN Invalidations"
Last updateWe experienced a delay in API CDN invalidations between 08:30 and 09:30 UTC. During this period, users may have noticed that changes to cached content were not reflected immediately due to the delay in invalidating the content on our CDN. The issue has now been fully resolved, and invalidations are processing as expected. We are actively monitoring the system to ensure continued performance and reliability. We apologize for any inconvenience caused and appreciate your understanding. If you continue to experience any issues, please contact our support team.
Report: "Mutations failing for a small subset of customers"
Last updateThis incident has been resolved.
This incident has now been resolved. The fix will temporarily affect the ability to access document history for the subset of customers that were impacted. No data was lost and all history will be restored shortly.
We are continuing to implement a fix for this issue.
We are continuing to implement a fix for this issue.
We are continuing to implement a fix for this issue. Until roll-out is complete, impacted customers will still see mutations failing and webhooks will be unavailable on their affected datasets. All queries to the API and API CDN remain unaffected by this incident.
We are experiencing an issue with mutations failing for a small subset of customers. The main impact of this issue is on Studio connectivity. We have identified the cause and a fix is being implemented.
Report: "Webhook delivery delayed for some customers"
Last updateThis incident has been resolved.
The pending webhook delivery queue has dropped to zero. We are continuing to monitor.
The issue has been identified and the webhook delivery queue is shrinking.
There is a backlog of webhooks that is causing delayed delivery for some customers. We are currently investigating and are working on clearing that queue.
Report: "Authentication slowness and timeouts affecting usage of Studio and Manage interface"
Last updateThis incident has been resolved.
The authentication service has fully recovered and normal service is restored. We are continuing to monitor for any further issues.
From 07:05 to 07:35 UTC, we experienced an issue with our authentication service causing slowness and timeouts for people logging in to Sanity. This issue impacted people's ability to use the Studio and the Manage interface. Other core services including the API and API CDN were unaffected. We have mitigated the issue and continue monitoring closely.
Report: "Sanity.io home page unavailable for a subset of visitors"
Last updateFrom 08:18 to 08:43 UTC, a subset of visitors were served a 404 response when attempting to access the home page of our marketing site at https://www.sanity.io/. All other pages (including login, sign-up, and docs) and the global navigation bar continued to be available to everyone. Our core services were also unaffected by this issue.
Report: "Sanity.io marketing site outage - core services unaffected"
Last updateThis incident has been resolved.
Vercel has implemented a fix and is continuing to monitor the situation. Access to www.sanity.io has been restored.
People are encountering 400 errors when accessing our marketing site at sanity.io. Our core services are unaffected by this issue. Vercel has reported an incident on their status page of elevated 400s across middleware and edge function invocations. They have identified the root cause and are working on resolving the issue.
Report: "Elevated API error rates for some customers"
Last updateBetween 14:29 and 14:47 UTC, we observed an increased number of 5xx errors for a subset of customers. This affected the API, which would have been observed in queries, mutations, and in the Studio. The issue has since been resolved.
Report: "Custom plan features incorrectly showing as disabled for some customers"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are experiencing an issue with custom features incorrectly showing as disabled for a small subset of customers. This may lead to a "Feature Unavailable" warning when trying to use the feature in the studio. We have identified the issue and a fix is being implemented.
Report: "Webhook delivery failures"
Last updateThis incident has been resolved.
A fix has been implemented and we have resumed sending webhooks. We are closely monitoring the results.
We are investigating the stalled delivery of webhooks since 16:40 UTC.
Report: "Increased API latency for some customers"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring results.
We are investigating reports of intermittent slowness in the Studio for some customers. We are currently observing increased latency on two shards, affecting customers with datasets on that shard.
Report: "Increased API and API CDN latency for some customers"
Last updateThis incident has been resolved.
The issue has been identified and a fix was recently implemented. We are observing decreased CPU load affecting that shard. We are currently monitoring.
We are continuing to investigate this issue. It was determined the API CDN is not affected.
At 17:16 UTC, we began to experience high load on one particular shard. This will only affect customers with datasets on that shard. We are currently investigating.
Report: "Delayed API CDN cache invalidation affecting some customers"
Last updateWe have observed no subsequent degradation of API CDN performance. This incident is resolved.
Performance of the API CDN has returned to normal for affected customers. We are continuing to monitor.
Starting at 16:45 UTC, we are seeing stalled cache invalidation for a small number of customers. We have identified the issue and are in the process of restoring normal cache invalidation performance.
Report: "Elevated API CDN error rates for some customers"
Last updateIn response to the elevated 502 errors, we reviewed and increased capacity in the ingress layer. There have been no re-occurrences since and we are considering the issue resolved.
We are continuing to investigate the cause of this issue. There have been no re-occurrences of these elevated 502 errors since 13:36 UTC.
Between 13:34 and 13:36 UTC, we noticed an elevated number of 502 errors served for requests to the API CDN. Although error levels have since returned to normal, we are investigating the cause.
Report: "Small subsection of customers experienced CDN issues related to DNS resolution"
Last updateThis incident has been resolved.
A small subsection of customers have experienced degraded service with DNS resolution of our CDN. The issue was identified, a fix was implemented, and we continue to monitor.
Report: "Elevated API error rates for some customers"
Last updateBetween 13:54 and 14:18 UTC, we experienced an elevated level of API errors (502 status code) served to some of our customers. The issue was identified, a fix was implemented, and we continue to monitor.
Report: "Manage for Plans is unavailable"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating an issue with Plans being inaccessible on sanity.io/manage. Customers are currently unable to view and update plans. This issue is affecting both upgrades and downgrades of plans.
Report: "Some queries failing with 5xx status codes"
Last updateBetween 2024-01-24 at 22:00 and 2024-01-25 at 04:10 UTC, we observed a small number of 504 and 524 errors were served for a small number of query and graphql requests. We have resolved the underlying issue and are no longer observing those errors.
Report: "Sanity.io marketing domains reporting down - core services unaffected"
Last updateVercel has reported that error rates have stabilized and is continuing to monitor the situation.
Reports indicate multiple users have reported encountering 500 errors when accessing sanity.io. Vercel has reported an incident on their status page of elevated 500 error rates on edge requests. We have updated the status page to inform users about the ongoing issue.
We are continuing to investigate this issue.
We are currently investigating reports of domains on sanity.io returning 500 internal server errors.
Report: "Studio login issues for European users seeing telemetry data prompt"
Last updateBetween 06:03 UTC and 10:58 UTC, European users experienced difficulties logging in to their Studios because of a telemetry data prompt getting them stuck in a loop. We have rolled back an update and the issue is now resolved. We continue investigating the root cause and apologise for the impact this issue may have had on your operations and business.
Report: "Some cross-dataset references returning null"
Last updateBetween 16:02 and 23:14 UTC, some customers may have experienced cross-dataset references that were returning null rather than the expected data. This regression was introduced by a changed query in the datastore codebase; customer data was not impacted and remained accessible via direct query. Normal reference functionality was unaffected. We regret this may have resulted in incomplete query results and are sorry for any impact this had on your operations or business.
Report: "Issues accepting invitations to projects"
Last updateBetween 14:38 and 16:59 UTC, people experienced difficulties accepting project invitations, receiving an error message "Unable to accept invitation. An internal server error has occurred." A fix was implemented and the issue was resolved. People should now be able to accept their original invites. We continue monitoring closely and apologize for any impact this may have caused.
Report: "Issues with Sanity's Vercel integration"
Last updateBetween Dec 5, 15:47 UTC and Dec 6, 12:10 UTC, people were unable to use our Vercel integration. A fix was implemented and the issue is now resolved. We apologize for any impact this might have caused when integrating Sanity with your Vercel projects.
Report: "Elevated API error rates for a subset of customers"
Last updateBetween 13:06 and 13:20 UTC, we experienced an elevated level of API errors for some of our customers. The issue was identified, a fix was implemented, and we continue monitoring results.
Report: "Datasets tab does not render the list of datasets for some customers"
Last updateThis issue has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and we are releasing a fix.
The Datasets tab in Manage is not rendering the list of datasets for some customers. This is limited to the Manage dashboard and does not impact the datasets themselves. We are currently investigating the issue.
Report: "Elevated API and API CDN error rates for some customers"
Last updateBetween 16:03 UTC and 16:15 UTC, we experienced an elevated level of API and API CDN errors for some of our customers. An upstream issue with a service provider was mitigated and we continue monitoring closely.
Report: "Elevated failures for dataset Cloud Clone"
Last updateThis incident has been resolved.
Dataset Cloud Clone is failing for some users. We're currently investigating this issue.
Report: "Queries and mutations failing for some datasets"
Last updateBetween 12:31 and 12:41 (UTC), queries and mutations failed for some datasets. We have identified and fixed the cause of this incident, and will continue to monitor API requests.
Report: "Elevated failures for dataset Cloud Clone"
Last updateThis incident has been resolved.
We have identified and fixed the cause of this issue. We're continuing to monitor Cloud Clone processes.
Dataset Cloud Clone is failing for some users. We're currently investigating this issue.
Report: "Degraded performance and issues loading data on Manage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Starting at 10:50 UTC, we are experiencing degraded performance and issues loading data on Manage. People may be unable to see and edit their plans. Plan data and billing are not impacted. We are currently investigating this incident. Services affected: Manage Next update: 30 minutes
Report: "Asset pipeline issue for some customers"
Last updateWe experienced an issue with our asset pipeline that affected a small subset of customers. This issue may have impacted those trying to upload an asset for a subsequent time, which is a common occurrence during the dataset import process. The issue would have started at or shortly after 17:47 UTC on 2023-09-13 and was resolved at 08:26 UTC on 2023-09-14. We apologize for the impact this had and any backup processes that were disrupted.
Report: "Global API CDN unavailable for 30 minutes"
Last update## **Incident summary** On Tuesday, 5 September 2023, from 11:40 to 12:10 UTC, customers observed errors consisting of 503 response codes when trying to access cached objects through the API CDN. Access to Studio and the ability to log in to Sanity was also disrupted during this incident. The incident was identified and mitigated within 30 minutes. ## **Incident timeline** All times UTC on 2023-Sep-05. **11:40 Tracing changes rolled-out - Outage begins** 11:44 First system alert is fired \(elevated 5xx errors\) **11:47 Incident declared** 12:06 Health check identified as root cause of API CDN outage 12:08 Health check corrected 12:08 Tracing work rolled back - **Incident mitigated** 12:09 503 error rate returns to normal level 12:10 API CDN service fully recovered, customers can log in to Sanity **12:15 Incident state moved to monitoring - Root cause analysis underway** ## **Root cause** We recently developed more advanced tracing capabilities for our platform to improve system observability. This change was rolled out several weeks ago, but hit an edge case that unexpectedly increased the load on our identity management service in a way that was not caught in our testing and staging environments. The safety mechanism in the tracing library to prevent this sort of failure had a default value set too high for the system to cope with, causing our identity management service to fail. Our API CDN depended on a health check to this service and this dependency caused our API CDN to stop serving traffic. The ability to use Sanity Studio or log in to Sanity was also blocked by the unavailability of the identity service. _This is our current understanding of the incident and as we continue to investigate, if anything new and material comes to light, we will update with further details._ ## **Remediation and prevention** Sanity engineers were alerted to the issue at 11:44 UTC and began investigating the API CDN failures promptly. At 12:06 UTC the team had determined that the root cause for the API CDN outage was a health check to the identity service, which was immediately corrected. The team also rolled back the tracing change which was the root cause for the identity service outage. As these changes were rolled out, error rates subsided, the identity service started answering requests again, and regular API CDN traffic resumed. In addition to resolving the underlying cause, we will be implementing updates to both prevent and minimize the impact of this type of failure in the future. Given the critical nature of our CDN infrastructure, we are also initiating a complete audit of our caching layer, including making sure no additional legacy dependencies exist. We would like to apologize to our customers for the impact this incident had on their operations and business. We take the reliability of our platform extremely seriously, especially when it comes to availability across regions.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are experiencing an elevated level of API errors for some of our customers and are currently investigating the issue.