Historical record of incidents for Census
Report: "GCP Outage May Affect Syncs to and from Google Resources"
Last updateInternal monitoring indicates that syncs to and from BigQuery started experiencing increased failure rates at around 18:05 UTC. The rate of failures is now decreasing but has not returned to usual levels.
We are aware of the current GCP outage and we are monitoring its impact on Census syncs to and from Google and Google Cloud Resources. Census does not run any critical components on GCP but we do use GCP resources, including Cloud Storage, to facilitate interaction with BigQuery and other Google Cloud resources.
Report: "Sync Runs Failing with Undefined Column Error"
Last updateSome syncs started failing with the following error at 12:18pm (EST) "ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column sync_configurations.new_diff_timestamp_expression_id does not exist" This was due to a brief issue during a schema update. The problem has been resolved, and we are monitoring.
Report: "Sync list UI not loading"
Last updateThe Census sync list UI is not loading for some customers. We have identified a faulty deployment and we are reverting it. The sync engine is not affected.
Report: "Web Outage"
Last updateDuring a window of time from 9:58 am EST until 10:32 am EST, the Census web UI, and a portion of the Census Public Management API experienced an outage. Sync Runs were unaffected and operated normally.
Report: "Web Outage"
Last updateDuring a window of time from 9:58 am EST until 10:32 am EST, the Census web UI, and a portion of the Census Public Management API experienced an outage. Sync Runs were unaffected and operated normally.
Report: "Some syncs failing with "T.cast: Expected type T::Struct, got type TrueClass""
Last updateA recent code change introduced a bug resulting in some syncs failing with the message "T.cast: Expected type T::Struct, got type TrueClass". This impacted sync runs where there were no incremental changes identified and the source connection utilizes the Advanced Sync Engine. We identified the problematic change, and have reverted. Syncs are now running successfully.
Report: "Background job delays (Syncs not impacted)"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We've identified an issue that caused background jobs to be delayed. The issue does not impact running syncs.
Report: "Sync might be failing with mismatch errors"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
This issue is resolved.
We experienced a performance issue from ~3:00-3:15pm Pacific that may have caused some syncs to fail with a message about "mismatch counts". We believe we have resolved the problem, but syncs that encountered the issue may not report an error until sometime after the fact, so errors may continue to trickle in. If you encounter this error, please attempt to rerun your sync. If a rerun does not succeed, please contact support.
We are currently investigating issue with some syncs failing with mismatch errors.
Report: "Mirror Syncs Failing with SQL error"
Last updateThis incident has been resolved.
We have reverted to the previous working version of the system and sync success rates have returned to normal levels. We are investigating why our integration tests were insufficient to prevent this issue.
We have identified a bug that is causing "mirror" syncs to fail with an error in the SQL generated by Census; the error message varies by data warehouse, but relates to a missing column. We are rolling back the incorrect deployment. Note that this is unrelated to the scheduled maintenance for Census Store, which has not yet started and will be postponed until this issue is resolved.
Report: "New SaaS Dataset, CSV Dataset, and ER Dataset creation unavailable for some customers"
Last updateThis incident has been resolved.
We have applied a fix and functionality has been restored. We are continuing to monitor.
We have identified the cause and are working on a resolution.
Report: "Some syncs failing with "Missing corresponding dataset file for prefix: record_deletes.0""
Last updateNo errors have been seen for over 30 minutes. We are declaring the incident resolved.
The revert has deployed. We are monitoring to ensure that the errors subside.
A recent code change introduced a bug resulting in some syncs failing with the message "Missing corresponding dataset file for prefix: record_deletes.0". We have identified the problematic change, and a revert is in progress.
Report: "Degraded sync performance"
Last updateThis incident has been resolved.
Our monitoring has notified us that some syncs are performing more slowly than usual - we are investigating
Report: "IP Address Change for Database Requests"
Last updateThis incident has been resolved.
We have rolled back the service to a previous configuration and we are monitoring our system to ensure database / warehouse connectivity returns to normal levels.
We have identified the root cause and are applying a fix to the networking configuration.
Due to a networking misconfiguration, requests originating from Census's database access service are originating from IPs that we have not previously advertised. If your database is configured to reject requests that are not from Census' IP addresses, syncs will start failing. We are investigating the root cause of the issue and it is our intent to restore service from the original IP addresses; we do not expect customers to need to reconfigure their databases at this time. More updates will be posted here.
Report: "Many syncs failing for 30 minute period"
Last updateWe have not seen any further failure and are declaring the incident resolved.
A misconfiguration of our sync engine resulted in many syncs failing from approximately 2025-01-25 01:08 to 01:38 UTC. We have identified the issue and deployed a fix.
Report: "Census unavailable. 503 Error page from the Web."
Last updateEverything is now 100% operational. The problem came from an upgrade of our infrastructure that inadvertently removed services. We are planning now to make sure this dependency problem won't be at risk in future upgrades.
All services are operational. Any queued syncs are starting up now.
The Census UI is now back up. Our sync engine is in the process of coming back up.
The root cause has been identified and we are working to resolve.
We are investigating the issue and will update ASAP.
Report: "Slowness and some requests failing"
Last updateThe outage was due to a db migration that took longer than was anticipated. Should be resolved now and will monitor for further slowness.
We are currently investigating this issue.
Report: "Connection tests failing"
Last updateWe've confirmed the issue only affects a few connection tests, and the issue has stopped as of the last hour from deploying an old version of our application. We will continue to monitor for connection tests failing with 401 errors.
We've confirmed the issue only affects a few connection tests, and the issue has stopped as of the last hour from deploying an old version of our application. We will continue to monitor for connection tests failing with 401 errors.
We are currently investigating an issue where Source & Destination connection tests result in 401 errors.
Report: "Sync Completion Rate Slowed"
Last updateThis incident has been resolved.
We have rolled back a feature that we believe caused the slowdown and added extra capacity to compensate for the slowdown. Sync completion rates have returned to normal levels and any slowdown or queueing should be resolved in the next 30 minutes.
We have identified stuck sync workers and we are replacing these containers with new capacity
Syncs are completing at slower than usual rates and we are investigating the issue
Report: "Sync performance degraded"
Last updateThe team considers this issue resolved. We will continue to monitor for delayed issues but do not expect any further impact.
An issue with an earlier deployment was identified as cause of the issue. The issue has been fixed and sync engine has largely recovered. The team is monitoring the recovery and expect it to be resolved shortly.
We are currently investigating an issue causing slow sync execution. The team has been paged and is currently investigating the cause of the issue. Census management console is unaffected and there are no problems with sync completion at this time. Only the speed of sync execution is currently affected. We will update the status page once the issue is identified.
Report: "Sync Alerts missing as of last 2 days"
Last updateSync Alert data has been restored. Sync alerting should work again for any syncs that trigger alerts moving forward.
As part of this incident, any sync runs that should have triggered an alert in the last 2 days would not have resulted in an email alert. Once we restore the alert data, any sync runs that fail will trigger an alert, even though the sync may have been failing for the last 2 days. In very rare cases, we may email you at most twice about a failure as we perform the recovery process.
Sync Alerts were deleted from our database in the last 2 days. The cause has been identified, and we are working on restoring the deleted alerts.
Report: "Partial Sync Outage"
Last updateThe issue has been resolved.
A subset of the previously affected syncs are now hitting "No function matches the given name and argument types" errors. It's primarily limited to advanced sync engine mirror syncs with no deletes. Investigation continues.
A fix has been implemented and we are monitoring the results.
Syncs empty unload batches have a chance of failing with either a "IO Error: No files found that match the pattern IGNORED_EMPTY_FILE" or a "Parquet Conversion Error". The issue has been identified and a fix is being released.
Report: "Increased Sync Failure Rate"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are seeing instances of Syncs failing with a message indicating "Parquet conversion failed." The problematic system has been identified and we are working to mitigate and resolve the issue as soon as possible.
Report: "Sync execution halted"
Last updateThis incident has been resolved.
Sync success rates have returned to expected levels. There is some delay in sync execution as we work through the backlog during the outage period. We are continuing to monitor
We have identified the likely root cause and are deploying a fix at this time
We have determined that the bug does not affect all syncs, nor does it cause records to be sent multiple times - this was a red herring. A high percentage of syncs are still failing, but there is not a correctness issue, so we are reenabling the sync engine globally so that syncs that are not affected by the issue can return to operation.
We have identified an issue which could cause records to be synced multiple times to destinations. Since this can affect the correctness of syncs for certain destinations, we have paused all syncs proactively until the issue can be diagnosed and resolved. We are continuing to investigate and are in the act of rolling back to a previous version of the sync engine.
Production sync execution has been paused while we investigate an issue
Report: "Linkedin Syncs Failing"
Last updateThis incident has been resolved.
We have fixed the issue. All Linkedin syncs affected by the issue should succeed the next time they run.
Linkedin Syncs are failing as the API version used by Census has been deprecated. The issue has been identified, and we are bumping the API version to unblock syncs.
Report: "Elevated Error Rates Detected"
Last updateThis incident has been resolved.
Mitigation has been deployed to aid in reducing failures from our upstream provider. We have seen a significant reduction in errors and are now monitoring the system.
We are currently seeing increased error rates in our us region.
Report: "[ROLLBACK OF] Modified logic for calculating Failed Records % could result in new Sync Alert emails"
Last updateThis change has been rolled back to reduce noisy alerts for existing customers. Any alerts that were opened yesterday due to this change have been closed. Recovery emails will not be sent for the alerts opened yesterday. Failed record alerts will be triggered with the old logic moving forward. ------------Original Update-------------------------- To improve accuracy, we recently updated the calculation that powers the Failed Records % Sync Alert. This one-time correction could result in new alerts to those syncs where the correction applies. The new logic calculates the percent of failed records based on the number of changed records in a sync run, rather than the total number of records in the source.
Report: "Delays starting scheduled syncs"
Last updateWe have confirmed that syncs are now running as scheduled.
We have identified the issue, which appears to have predominantly impacted syncs scheduled to run around 9am Eastern/6am Pacific. We have put in place a mitigation and will monitor sync runs running that time tomorrow to make sure the issue does not recur.
We are currently experiencing delays of up to 20 minutes starting scheduled syncs due to a downstream vendor dependency. This is especially pronounced for syncs that run at common times (e.g., around hour boundaries). We are working with the vendor to identify and fix the issue.
Report: "Sync runs failing"
Last updateThis incident has been resolved.
We have identified a storage issue affecting a core database and applied a fix - error rates are decreasing
A subset of sync runs are failing due to an infrastructure issue. We are investigating
Report: "Brief downtime for unscheduled database maintenance"
Last updateThis incident has been resolved.
We are performing unscheduled emergency maintenance of one of our primary databases. Users may see brief downtime of the Census UI and API. Syncs that fail due to this will be automatically retried
Report: "Some Microsoft Advertising Connections Require Reauthorization"
Last updateCensus has updated our integration with Microsoft Advertising (Bing Ads) in order to comply with new verified publisher requirements from Microsoft. Depending on how your company's Microsoft Advertising account is set up (as a Microsoft "personal account" or as an Entra / Azure "work account") and your organization's security policies, you may need to reauthorize your Microsoft Advertising connection in Census. To reauthorize, perform the following steps: 1. Sign in to Census at https://app.getcensus.com 2. Select the workspace that contains your Microsoft Advertising connections 3. Click "Destinations" under "Connect" in the navigation sidebar 4. Find your "Microsoft Advertising" destination and click "Reauthorize" On authorization, you should see the Census logo with the verified publisher as "Sutro Labs" (the company that develops Census). You can use the "Test" button to verify that your connection is working properly. Please reach out to our support team or your account manager with any questions.
Report: "Some users unable to log in to Census UI"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Census management UI login is failing with "Verify Your Email" and "Something went wrong with your session" for some users. We are investigating.
Report: "Syncs not running"
Last updateThis incident has been resolved.
A fix has been implemented and we're monitoring for further impact.
We are investigating the issue.
Report: "UI slowdown across several areas"
Last updateThis incident has been resolved.
A previous version of the job processor has been deployed and these tasks are once again being completed. We are continuing to monitor
We have identified a faulty deploy and we are rolling it back
We have identified slowdowns in the Census UI associated with several areas: * Model Previews * Metadata Refreshes * Sync Alerts These are related to a slowdown in our job processing engine. At this time, our metrics do not show that sync performance or scheduling is affected but we are investigating these as well.
Report: "Census UI not loading for some users"
Last updateThis incident has been resolved.
The fix has been deployed. We are monitoring to make sure we are no longer seeing the issue.
We have identified an issue causing pages for large workspaces to fail to render and 0.5% of other requests to fail. We are rolling out a fix.
We are currently investigating this issue.
Report: "Records not being sent to destination"
Last update## Incident Summary From Nov 28, 2023 until Feb 23, 2024 the Census Sync Engine contained a timing bug which could cause syncs to mark records as successfully synced even though they had not been sent over to the destination. The bug impacted 0.012% of sync runs during the incident and was patched Feb 23, 2024. We will be reaching out to impacted customers with steps to remediate impacted syncs. In most cases running a full sync restores correct record tracking. ## Incident Details ### Background The Census Sync Engine runs syncs as a workflow of multiple discrete activities: sync preflight, unload, service load, commit, etc. Historically these activities would run to completion on a single host before scheduling the next one. On Nov 28, 2023 our team introduced a change, referred to from here on as _asynchronous activities,_ which would allow an activity to suspend itself after issuing a query, or a set of queries, to the warehouse via our Query Runner Service. Since certain warehouse queries may take many minutes, this allows for much more efficient utilization of our worker fleet - it allows us to pipeline other activities while waiting for warehouse queries to complete. This pattern is heavily utilized in our unload activity. ### Initial Report and Discovery On February 13, 2024 a customer reported seeing records marked as synced on the UI which could not be found in the sync’s destination service. Our initial investigation seemed to suggest that the query we use to unload data from the warehouse was not producing any files in the cloud storage system \(this customer was using our [Advanced Sync Engine](https://docs.getcensus.com/sources/overview)\). After adding additional telemetry to track down the cause of the failed unload, the team discovered that the unload queries were actually never being issued to the warehouse because the entire query set they were a part of was being cancelled by the Query Runner Service. ### Root Cause Analysis The cause of the query cancellation was a timing bug between two modules of the Query Runner Service: one that supports asynchronous activities, and the query garbage collector \[1\]. The code that added asynchronous queries to the query execution queue would also mark these queries as ineligible for garbage collection. These two calls did not happen atomically or in a protected block, however. This meant that under periods of high load in the Query Runner Service—which makes extensive use of multi-threading—it was possible for the garbage collector to cancel an asynchronous query before it was opted out of garbage collection. This would occur when all of the following were true: * The asynchronous activity thread would be paused by the thread scheduler after adding the query to the execution list but before adding it to the garbage collection exclusion list. * The query took longer than one minute to execute. * The garbage collector was scheduled to run before the asynchronous activity thread was resumed. ## Impact The incidence rate of the bug impacted 0.012% of all runs and 0.026% of runs with row changes, but had selection effects that made it more likely for certain customers to be impacted: * The bug only affected syncs on the advanced sync engine. * Customers with slow or congested warehouses were more likely to be impacted since the longer queries ran the more likely they were to be garbage collected. * Customers who run lots of similar syncs on the exact same schedule were also more likely to be impacted. These syncs were more likely to issue asynchronous queries at the same time, thus increasing the load on the Query Runner Service and increasing the odds of one of them being selected for garbage collection. ## Remediation Our team has rolled out a fix for the timing issue to prevent further occurrences. In addition, we are putting in place additional safety checks throughout the sync pipeline. While this particular bug was subtle, its effects could easily have been detected by a simple invariant: ensuring that the number of records we unloaded was consistent with the count inside the warehouse. We take our responsibility as stewards of our customers’ data seriously, and while we strive to deliver that data as quickly and efficiently as possible, we value correctness above all else. In this case we’ve failed to deliver on that promise, and we will be reaching out to impacted customers to offer our full support with remediation options. In most cases running a full sync of the data is sufficient, but we’ll work with customers for cases where that’s not possible or desirable. If you have any questions about any of the above details don’t hesitate to reach out to your Census representative or to [support@getcensus.com](mailto:support@getcensus.com). \[1\] Query Garbage Collection exists on the Query Runner service to facilitate other query modes: synchronous and polled. It ensures that we’re not running queries that are no longer of interest to the requester.
Some Advanced Sync Engine syncs are showing successfully synced records that aren't visible in the destination.
Report: "Syncs scheduled with a cron expression are run immediately"
Last updateWe have resolved this issue.
Syncs that have a cron schedule (either at sync creation time, or edited to change to a cron schedule) are being run immediately upon edit / creation, irrespective of their scheduled time. This applies to all syncs created since Feb 28, at approximately 22:00 UTC. Our team has identified the issue and we are rolling out a fix.
Report: "Facebook Ads Destination Partial Outage"
Last updateThis incident has been resolved.
This issue has been resolved. Impacted connections are healthy again.
We are continuing to investigate this issue.
Some customers are reporting that some Facebook Ads destination connections are not working. We are currently investigating this issue.
Report: "Slow Sync Starts"
Last updateThis incident has been resolved.
We have applied a fix and we are continuing to monitor
Census is experiencing increased latency to start syncs. We are investigating
Report: "Census Web Increased Latency and Increased Error Rate"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Duplicate Sync Runs In UI"
Last updateThis incident has been resolved.
We are no longer attempting to create duplicate Sync Runs.
We are currently displaying two triggered Sync Run instances for every actual triggered Sync that has been recently created. The duplicate run is getting marked as Canceled. This is a display and notification bug only at this time, and Syncs are not actually running twice. We have found root cause and are working to mitigate.
Report: "Google Ads Customer Match List outage"
Last updateThis incident has been resolved. If your Google Ads Customer Match List sync has failed as a result of this incident please retry manually.
The fix is in production, reported syncs have been manually rerun successfully.
Root cause has been identified and a revert has been deployed.
Affected syncs are failing with the message "An internal error occurred while loading data into destination service. Cannot sync to list because type is not CrmBasedUserList."
We are currently investigating this issue.
Report: "Sync Engine Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Upgrading incident to full outage. Syncs currently only succeed if they have a minimal set of updates to track.
Some syncs are failing during our Service Load phase. We are actively investigating the issue.
Report: "Hubspot Connector Outage"
Last updateThis incident has been resolved.
We have applied a fix and we are monitoring the health of HubSpot connections.
We have identified the issue and are working on a fix.
Syncs to Hubspot are currently failing due to a Cloudflare issue. The team is investigating now
Report: "Syncs Failing to Schedule / Run"
Last updateThis incident has been resolved.
We are still observing elevated delays in sync start times due to the backlog of syncs; we will update this incident once those return to normal levels.
All scheduled syncs are running as usual, though they may take slightly more time to complete as we add additional sync processing resources to clear the backlog
We have reverted a configuration change that led to the issue and we are monitoring to ensure syncs are running as scheduled.
We are investigating an issue that is causing a large percentage of syncs to not be scheduled or executed.
Report: "Connectivity issue to HubSpot API"
Last updateThis incident has been resolved.
We have updated our HubSpot API configuration and sync behavior is returning to normal. We are working with HubSpot on a resolution of the root cause.
Census is investigating an issue connecting to the HubSpot API. All syncs to HubSpot are impacted.
Report: "Census Experiencing Issues"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue affecting the Census Sync Management UI and Sync Engine.
Report: "Snowflake sources performing unnecessary full syncs"
Last updateThe advanced sync engine state has been repaired and all syncs are unpaused. If you have questions about a particular sync's state or history please reach out to your customer success team or support@getcensus.com.
We are unpausing all of the affected syncs and they will run on their normal schedules. We are still investigating an issue with detection of deleted rows for "Mirror" syncs - delete propagation to destination systems may be delayed. We have a plan in place to restore delete propagation and we'll update here when in place.
We've finished repairing the Snowflake syncs, and have re-enabled the Snowflake syncs that we paused. We'll post more updates as we finish repairing Postgres and Redshift syncs.
We've expanded the pauses to Postgres and Redshift syncs. We're repairing data for all affected syncs and will provide further updates soon.
We have administratively paused all affected Snowflake syncs while we repair the data in the advanced sync engine to ensure no more unnecessary full syncs occur
We are investigating an issue that has impacted Snowflake syncs. Syncs from Snowflake using the Advanced Sync Engine are sending records that have already been synced (acting like full syncs)
Report: "Issues with BigQuery using Advanced Sync Engine"
Last updateThis incident has been resolved.
Syncs are functioning correctly. We have identified an issue with warehouse writeback that we are investigating as well.
We have applied a fix for this issue and we are continuing to monitor the health of these syncs
We have identified additional issues with the Census BigQuery connector causing a portion of syncs to fail
Report: "Syncs from BigQuery Tables and Models Failing"
Last updateThis incident has been resolved.
We have identified a regression causing syncs from BigQuery sources (tables and models) to fail. A fix is being applied.
Report: "Unable to Authenticate"
Last updateIncident is resolved. The underlying cause was an outage at Cloudflare impacting our authentication service. We'll be monitoring for any changes to this status.
We are currently investigating a problem folks are having trying to login to Census. It appears our Authentication provider is down.
Report: "Elevated Rate of Sync Failures"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Issue has been identified. All syncs have been paused until we've resolved.
We are continuing to investigate this issue.
We are currently investigating the issue.
Report: "Elevated Rate of Sync Failures"
Last updateThis incident has been resolved.
We have put a fix in place and are monitoring syncs to ensure they complete successfully.
We have identified an issue that is causing an elevated rate of sync failures and are deploying a fix
Report: "Sync runs not starting"
Last updateThis incident has been resolved.
We've kicked off rework of failed jobs from this incident. Things should be running nominally, and we'll continue to monitor.
We've identified the issue impacting < 10% of our sync runs. The problem has been fixed, and now we're working to get those bad syncs re-run.
We are investigating an issue that is causing syncs not to run
Report: "Census Web UI Unavailable"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to see issues with the Census web UI and our team is taking steps to restore service
Some syncs are affected by the issue as well; we are monitoring those and will ensure they are retried as needed
The UI has been restored though performance is still degraded; we are continuing to monitor and fix the underlying issue
We are applying a fix for this issue.
The Census management UI is currently unavailable. We are investigating