Is Vitally Down Right Now? Discover if there is an ongoing service outage.

Vitally is currently Operational

Last checked Jul 29, 2025 14:52 UTC from Vitally's official status page

Historical record of incidents for Vitally

Jul 24, 2025

Report: "Custom surveys outage"

Last update 2025-07-24T15:47:00.959Z

identified2025-07-24T15:46:56.278Z

The custom survey submission page won't load when accessed via generic link instead of a personalized email invite link We have identified the cause and are working on a fix

Jul 18, 2025

Report: "Google Outage"

Last update 2025-07-18T15:39:24.953Z

investigating2025-07-18T15:39:24.950Z

We are seeing elevated failure rate of processes depending on Google for SSO login. If you utilize Google SSO to log in to Vitally, you may experience issues in doing so. We are not yet seeing errors in Google for Calendar or Mail syncs and are still waiting for Google to publicly acknowledge this outage but the internet is containing many other reports

Jun 3, 2025

Report: "External vitally.io Site Down"

Last update 2025-06-03T18:02:04.128Z

resolved2025-06-03T18:02:04.113Z

Our vendor has resolved this and the page loads on refresh

identified2025-06-03T16:54:55.613Z

Our marketing vendor is experiencing an outage (https://status.webflow.com) which is causing `vitally.io` to not load. In order to login, you can directly navigate to https://login.vitally.io. No Vitally Product experiences are currently impacted

Report: "External vitally.io Site Down"

Last update 2025-06-03T13:02:00.000Z

Resolved2025-06-03T13:02:00.000Z

Our vendor has resolved this and the page loads on refresh

Identified2025-06-03T11:54:00.000Z

Our marketing vendor is experiencing an outage (https://status.webflow.com) which is causing `vitally.io` to not load.In order to login, you can directly navigate to https://login.vitally.io. No Vitally Product experiences are currently impacted

May 27, 2025

Report: "Feature flags not evaluating, 360s reverting for some users"

Last update 2025-05-27T16:48:00.907Z

resolved2025-05-27T16:48:00.890Z

LaunchDarkly has resolved the incident affecting our feature flags. Access to the new 360s, calendar views, and meetings should be fully operational.

monitoring2025-05-27T14:22:02.967Z

Our feature flag provider, LaunchDarkly, has identified the issue. Access to the new 360s, calendar views, and meetings should now be restored for users. We are continuing to monitor the situation to ensure stability.

identified2025-05-27T14:01:57.000Z

Our feature flag provider, LaunchDarkly, is experiencing an incident which is causing feature flags not to evaluate for some users. Users who have switched to the new 360 may see a regression to the old 360. Additionally, calendar views and meetings may be unavailable.

Report: "Feature flags not evaluating, 360s reverting for some users"

Last update 2025-05-27T11:48:00.000Z

Resolved2025-05-27T11:48:00.000Z

LaunchDarkly has resolved the incident affecting our feature flags. Access to the new 360s, calendar views, and meetings should be fully operational.

Monitoring2025-05-27T09:22:00.000Z

Identified2025-05-27T09:01:00.000Z

May 1, 2025

Report: "CSV Upload Errors"

Last update 2025-05-01T13:09:39.328Z

resolved2025-05-01T13:09:39.308Z

We have not seen errors since shipping a fix yesterday and consider this incident resolved

monitoring2025-04-30T20:47:12.590Z

The fix has shipped to restore CSV upload ability and we are monitoring for additional errors

identified2025-04-30T19:35:22.118Z

We've identified why CSV upload requests are failing and are preparing a fix now

investigating2025-04-30T19:04:51.156Z

We are seeing a high rate of CSV uploads failing, we are investigating the root cause now

Apr 25, 2025

Report: "Elevated Server Response Times"

Last update 2025-04-25T14:19:47.398Z

resolved2025-04-25T14:19:47.381Z

This incident has been resolved.

investigating2025-04-24T17:38:32.794Z

We are continuing to investigate this issue.

investigating2025-04-24T16:46:49.406Z

We are continuing to investigate this issue.

investigating2025-04-24T15:48:08.105Z

We are continuing to investigate this issue.

investigating2025-04-24T14:38:03.138Z

We are seeing elevated server response times which can result in a slow experience using Vitally. We are investigating the root cause.

Apr 22, 2025

Report: "Degraded Search Performance"

Last update 2025-04-22T17:53:14.079Z

resolved2025-04-22T17:53:14.062Z

This incident has been resolved.

monitoring2025-04-22T17:53:09.614Z

We are continuing to monitor for any further issues.

monitoring2025-04-21T18:44:35.601Z

A fix has been implemented and we are monitoring the results.

identified2025-04-21T18:44:28.577Z

Playbooks have been resumed. We are monitoring as the system recovers from delays.

identified2025-04-21T18:20:37.544Z

Search clusters are currently reindexing. Playbooks are paused until complete. Analytics tracks not currently ingesting.

identified2025-04-21T16:50:47.230Z

The issue has been identified and a fix is being implemented.

investigating2025-04-21T16:46:17.798Z

Certain searches in the product will be significantly slower. Some searches may error. We are currently investigating this issue.

Apr 10, 2025

Report: "Analytics API Degraded Performance in EU"

Last update 2025-04-10T20:38:48.423Z

resolved2025-04-10T20:38:48.406Z

This incident has been resolved.

monitoring2025-04-10T19:48:09.458Z

A fix has been implemented and we are monitoring the results.

investigating2025-04-10T19:12:50.902Z

We are currently investigating this issue.

Mar 8, 2025

Report: "Unexpected playbook behavior"

Last update 2025-03-08T17:25:44.006Z

resolved2025-03-08T17:25:43.987Z

This incident has been resolved. Please contact support if you see any remaining issues.

monitoring2025-03-08T02:54:46.901Z

At approximately 1:30pm ET Thursday, March 6th, we shipped a bug to production that caused all “is not set” filters to evaluate to “false”. This caused a number of systemic issues: - Lifecycle tracking rules may have been incorrectly applied. - Many views and widgets were not showing accurate data. - Playbooks with an “is not set” rule would have all of their enrollees unmatched and undone. At 4pm ET, our Support team escalated widespread reports of these issues and an incident was declared. The team identified the pull request with the bug and reverted it. We also paused playbook execution pre-revert to evaluate impact. That revert finished at 5pm ET, at which point views & widgets started working again. At that point, the team also forced a reprocess of lifecycle tracking rules, which corrected incorrectly tracked accounts and organizations. However, the impact of re-enrolling entities into playbooks was potentially dangerous. So, the team started writing a restore job to reset playbooks to their prior place within the playbook run and re-do actions wherever possible. Writing and testing that job took until about 10pm ET Friday, March 7th. We have since run those jobs for all affected playbooks. As things stand now, we have reverted all undone playbooks to the best of our ability. For playbooks with an “is not set” rule, enrollees would have seen the following action behavior: - Assign a Key Role: *if* assignment strategy is “unassign”, would have unassigned affected enrollees. Those have been restored. - Create an Indicator: would have ended the indicator. Those have been restored. - Add to Segment: would have exited the segment. Those have been added back into the segment. - Start a conversation, reply to a conversation: *if* undo strategy is “delete if unset” *and* messages had been created but not yet sent, they would have been deleted. This is unrecoverable. - Create a task: *if* undo strategy is “delete the task”, tasks would have been deleted. Those have been restored. - Update a task: *if* undo strategy is “revert the updates”, task updates would have been reverted. Those have been restored. - Create a project: *if* undo strategy is “delete the project”, projects and incomplete tasks would have been deleted. Those have been restored. - Create a doc: *if* undo strategy is “delete the doc”, docs would have been deleted. Those have been restored. - Set a goal: *if* undo strategy is “delete the goal”, goals would have been deleted. Those have been restored. - Update a trait: *if* undo strategy is “delete the trait’s value”, traits would have been set to null. Those values have been restored. - Send a segment track: unaffected - Ping a webhook: unaffected - Show an NPS survey: NPS surveys would not have been shown to enrollees previously matching affected playbooks until they were restore. However, there may have been second-order effects throughout the system that are difficult to identify. If one playbook was responsible for maintaining customer segmentation had an “is set” filter, it would have removed customers from segments. If a second playbook had filters on those segments, it may have also run incorrectly. All playbooks have since been re-run and should be reset to the most accurate state we’re able to attain right now. If you still see systemic issues with any of your playbooks, please report them to our Support team via in-app chat or at support@vitally.io. We have a dedicated internal channel for these issues and will work to the best of our ability to minimize the impact on your business and customers. We deeply apologize for the impact this has caused you in managing your customer relationships and will continue to dedicate resources to ensure continuity of your business operations.

identified2025-03-08T00:59:28.106Z

We are continuing to work towards recovery. At this point, we have restored data for most misapplied playbooks and are continuing to work towards resolving the remaining ones. Once we finish this portion of the incident response, we will be resuming all playbook execution. We will continue to do everything possible to expedite the resolution of this incident.

identified2025-03-07T14:21:27.879Z

All playbooks remain paused while we work to correct the state of data to avoid propagation of unintended effects by partially restoring playbook execution. The team is working through this and apologizes for this disruption to your workflows

identified2025-03-07T12:35:59.246Z

The incident has been isolated to playbooks that contain an “is not set” filter. Playbooks that do not contain an “is not set” filter have been resumed. Playbooks that contain an “is not set” filter remain paused. Targets in playbooks with an “is not set” filter were incorrectly removed and may have had actions undone. We are working to restore those playbooks to the state they were before the bug shipped before resuming execution.

investigating2025-03-07T05:02:49.262Z

Due to a bug, we are experiencing unexpected playbook behavior. We have identified the root cause and are currently working to correct these unexpected behaviors. Playbooks are paused in the meantime.

Feb 19, 2025

Report: "Degraded performance in US"

Last update 2025-02-19T18:00:43.795Z

resolved2025-02-19T18:00:43.781Z

Due to the database maintenance we performed on Feb 15, our database was struggling to handle operational load. We rolled back some of those changes and have prioritized work to optimize our database usage to prevent incidents like this in the future.

identified2025-02-17T06:43:13.181Z

We have mitigated the issue, though background jobs may experience delays.

investigating2025-02-17T06:19:31.043Z

Vitally is experiencing an unusual amount of server timeouts. We are looking into the incident.

Feb 16, 2025

Report: "Degraded performance in US"

Last update 2025-02-16T03:59:22.931Z

resolved2025-02-16T03:59:22.915Z

We have identified and resolved the cause of the deadlocking which caused increased latencies for customers in our Virginia data center and now consider the incident resolved

identified2025-02-15T21:07:16.226Z

We have identified the cause of increased latencies and are in the process of mitigating the issue. We do expect some background job delays in the meantime

investigating2025-02-15T19:28:18.877Z

Vitally is actively monitoring performance following some database maintenance in one of our US data centers

Feb 14, 2025

Report: "Elevated Error Rates in EU"

Last update 2025-02-14T03:51:21.566Z

resolved2025-02-14T03:51:21.549Z

We have seen error rates return to our baseline levels. AWS's most recent update indicates successful mitigation. We will continue to monitor but are not expecting impact to business hours operations in our EU data center.

identified2025-02-14T00:32:14.796Z

AWS has just provided an update, though we are still seeing an elevated error rate. Thank you for your patience as our infrastructure provider addresses their issue Feb 13 4:31 PM PST We are seeing early signs of recovery for error rates and latencies for multiple AWS Services in the EU-NORTH-1 Region. As we work toward full recovery, some requests may continue to timeout or be throttled. We recommend customers retry failed requests where possible. We will continue to provide additional information as we have it, or within the next 60 minutes.

identified2025-02-13T23:58:41.809Z

AWS has provided an update ~2 minutes ago. We can confirm that the networking issues within the eu-north-1 region persist and affect connectivity to the Vitally web app as well as Vitally's ability to receive analytics data via our APIs Feb 13 3:55 PM PST We can confirm increased error rates and latencies for multiple AWS Services in the EU-NORTH-1 Region. This is due to a networking issue that we are actively working to mitigate as quickly as possible, and have all engineers engaged at this time. This issue is not impacting connectivity to and from the region, but is impacting inter-region traffic. During this time, the bulk of the impact will be contained to eun1-az3, but some individual services and operations may be impacted in other zones in the EU-NORTH-1 Region. If possible, we recommend weighting traffic away from AZ eun1-az3. We will provide an additional update within the next 45 minutes, or sooner if we have additional information to share.

identified2025-02-13T23:40:22.976Z

AWS has acknowledged their outage https://health.aws.amazon.com/health/status Operational issue - Multiple services (Stockholm) Service Multiple services Severity Impacted RSS Increased Error Rates and Latencies Feb 13 3:35 PM PST We are investigating increased error rates and latencies for AWS Services in the EU-NORTH-1 Region.

investigating2025-02-13T23:20:54.493Z

We are seeing elevated error rates in requests and general operations in our EU data center. This appears to be an issue internal to AWS but we are monitoring

Jan 27, 2025

Report: "Dashboards are currently broken"

Last update 2025-01-27T15:44:57.864Z

resolved2025-01-27T15:44:57.846Z

This incident has been resolved, but we will continue to monitor closely.

identified2025-01-27T15:33:21.919Z

A fix is being deployed and is expected to be live in 10-15 minutes.

identified2025-01-27T15:27:35.450Z

We are investigating an issue where dashboards are erroring. We have identified the issue and service should be restored shortly.

Jan 6, 2025

Report: "Playbook Execution Delays"

Last update 2025-01-06T20:16:57.492Z

resolved2025-01-06T20:16:57.475Z

This incident has been resolved.

monitoring2025-01-03T23:01:39.654Z

A fix has been implemented and we are monitoring the results.

investigating2025-01-03T15:12:31.078Z

Our playbook execution infrastructure is experiencing an unusually high number of jobs in our EU data centers, causing playbook executions to be delayed up to several hours. The delays started approximately 8 hours ago. No data has been lost, and playbook execution is almost caught up.

Nov 25, 2024

Report: "Login page loading errors"

Last update 2024-11-25T18:08:03.981Z

resolved2024-11-25T17:50:00.000Z

Starting at 12:50 PM EST, we experienced errors when loading the login page. This has now been resolved.

Nov 18, 2024

Report: "External doc sharing errors"

Last update 2024-11-18T21:23:28.301Z

resolved2024-11-18T21:23:28.265Z

This incident has been resolved.

identified2024-11-18T21:20:44.324Z

We are continuing to work on a fix for this issue.

identified2024-11-18T20:43:04.036Z

The issue has been identified and a fix is rolling out now.

investigating2024-11-18T20:38:22.544Z

We are currently investigating an issue with external docs shared on Vitally not loading correctly. The core product remains available.

Oct 30, 2024

Report: "Elevated Analytics API Errors"

Last update 2024-10-30T17:00:28.606Z

resolved2024-10-30T16:30:00.000Z

We experienced an elevated level of Analytics API errors in our EU region for less than 10 minutes, from 16:45 UTC - 16:53 UTC. Some requests to the Analytics API failed with a 504 error during this time. The error rate has since recovered. We have identified the issue and are working to mitigate it moving forward.

Oct 23, 2024

Report: "User-Level Filtering Error"

Last update 2024-10-23T21:44:50.766Z

resolved2024-10-23T21:44:50.748Z

This incident has been resolved.

monitoring2024-10-23T21:37:21.395Z

A fix has been implemented and filtering is now restored.

investigating2024-10-23T21:27:43.933Z

We are continuing to investigate this issue.

investigating2024-10-23T21:27:34.567Z

We have identified a problem with user-level filters in the Vitally product and are working on a fix.

Oct 21, 2024

Report: "Login Failures"

Last update 2024-10-21T16:18:55.125Z

resolved2024-10-21T16:18:55.110Z

This incident has been resolved

monitoring2024-10-21T16:04:14.814Z

A fix has been implemented, and we are monitoring the results.

identified2024-10-21T15:45:11.055Z

We have identified a problem with login failures and are working on a fix

Jul 24, 2024

Report: "Analytics API Data Processing Delays"

Last update 2024-07-24T21:05:00.319Z

resolved2024-07-24T21:05:00.305Z

This incident has been resolved.

monitoring2024-07-24T19:28:29.501Z

A small portion of Vitally Analytics API requests failed due to an unhealthy kafka broker, returning an error response to clients. We are now recovering from residual data delays in consumption of inbound Analytics API data.

Jul 20, 2024

Report: "Degraded Performance"

Last update 2024-07-20T13:08:56.194Z

resolved2024-07-20T13:08:56.180Z

This incident has been resolved.

monitoring2024-07-19T15:08:12.816Z

A fix has been implemented and we are monitoring the results.

investigating2024-07-19T12:49:10.684Z

Vitally is currently experiencing degraded performance in one of our US data centers causing longer response times. We are actively investigating.

Jul 15, 2024

Report: "Degraded Performance"

Last update 2024-07-15T13:04:17.032Z

resolved2024-07-15T13:04:17.016Z

A fix was deployed over the weekend and we are no longer seeing increased latencies in our ongoing operations

investigating2024-07-12T16:28:31.702Z

Vitally is currently experiencing degraded performance in one of our US datacenters causing longer response times. We are actively investigating.

Jun 27, 2024

Report: "Degraded performance"

Last update 2024-06-27T01:08:52.929Z

resolved2024-06-27T01:08:52.916Z

This incident has been resolved.

monitoring2024-06-26T19:03:42.134Z

Response times have returned to normal, and we are no longer seeing degraded performance.

investigating2024-06-26T15:43:27.877Z

Vitally is currently experiencing degraded performance in one of our US datacenters. We are actively investigating. We have implemented a partial fix and response times are improving. We will continue to update as we make progress.

Jun 22, 2024

Report: "Intercom Data Delays"

Last update 2024-06-22T04:04:55.249Z

resolved2024-06-22T04:04:55.235Z

Delays in processing Intercom data are now resolved.

monitoring2024-06-21T21:29:35.843Z

We have found the cause of the delays, and have implemented a mitigation. We will continue to monitor closely as the remaining data is processed.

investigating2024-06-20T22:29:14.119Z

We are aware of delays in processing Intercom data received by Vitally through webhooks and are working towards a solution

Apr 22, 2024

Report: "Vitally Not Loading"

Last update 2024-04-22T14:26:20.511Z

resolved2024-04-22T14:26:20.496Z

We have resolved the issue impacting page loads

investigating2024-04-22T14:02:39.514Z

We are investigating an issue preventing users from loading the Vitally web app

Apr 10, 2024

Report: "Delay in Custom Metrics"

Last update 2024-04-10T20:56:33.695Z

resolved2024-04-10T20:56:33.678Z

This incident has been resolved.

monitoring2024-04-09T14:43:47.385Z

Our overnight data pipeline for computing Custom Metrics encountered a stall. We have mitigated this and are monitoring as we recover. In the mean time, you may experience a delay in accessing the most recent Custom Metric data

Mar 22, 2024

Report: "Search Unavailable"

Last update 2024-03-22T20:17:19.264Z

resolved2024-03-22T19:30:00.000Z

Search in Vitally was unavailable for approximately 20 minutes between 3:45 and 4:05 PM Eastern Time, we apologize for the inconvenience

Feb 26, 2024

Report: "Reports of the Vitally app not loading on Google Chrome"

Last update 2024-02-26T17:04:02.592Z

resolved2024-02-26T17:04:02.575Z

We have not gotten additional reports of this issue and as a next step are implementing additional instrumentation to help us identify and troubleshoot any other instances of this happening. If you do experience problems with Vitally loading in Chrome, a workaround is to close your current tab and reopen Vitally in a new tab. Additionally, please don't hesitate to reach out to support@vitally.io if you are experiencing any difficulties.

investigating2024-02-16T23:16:48.788Z

We are continuing to investigate this issue. Based on our investigations so far, if you are encountering slow load times when loading Vitally on Google Chrome, the best workaround is to close the current tab and open a new tab. (Note: This supersedes the previous guidance to update Google Chrome.) Please don't hesitate to reach out to support@vitally.io if you are experiencing any difficulties, and we'll continue to update our status page as we have updates.

investigating2024-02-14T00:20:10.739Z

We have received several reports this week of the Vitally app not loading on Google Chrome browsers. We are investigating the root cause of this issue, but in the meantime one workaround is to update Google Chrome to its latest version. Here is an article on how to do this: https://support.google.com/chrome/answer/95414?sjid=18099954259407209262-NC. Please don't hesitate to reach out to support@vitally.io if you are experiencing any difficulties, and we'll update our status page as we have updates.

Dec 28, 2023

Report: "Elevated error rates"

Last update 2023-12-28T07:38:36.871Z

resolved2023-12-28T07:38:36.853Z

This incident has been resolved.

monitoring2023-12-28T04:18:06.050Z

We have resolved the root cause of the errors and will continue to monitor.

investigating2023-12-28T03:10:28.243Z

We are currently investigating an issue with access to Vitally.io and our APIs.

Dec 7, 2023

Report: "Site slowness"

Last update 2023-12-07T16:05:01.150Z

resolved2023-12-07T16:05:01.137Z

Our systems are functioning as expected

monitoring2023-12-07T06:37:13.194Z

Maintenance has completed. Vitally was unavailable for approximately 2 minutes between 01:17 ET and 01:19 ET (06:17 UTC and 06:19 UTC). We will continue to monitor performance.

identified2023-12-07T05:00:02.434Z

Maintenance is beginning now in the US.

investigating2023-12-06T17:09:53.839Z

Vitally’s primary US database is under increased load leading to general slowness. Some integration data is delayed as a result. We will be performing a brief database maintenance at 05:00 UTC (00:00 ET) tonight to increase database capacity in our US datacenter. Vitally’s platform and API will be unavailable for up to 10 minutes.

Dec 5, 2023

Report: "DB Failover"

Last update 2023-12-05T16:52:00.467Z

resolved2023-12-02T06:30:00.000Z

We experienced a database failover overnight, causing a period of 3 minutes between 1:37AM Eastern and 1:40AM eastern in which web requests against Vitally would have failed. We do not expect lingering impact on users

Oct 12, 2023

Report: "Duplicate Background Jobs"

Last update 2023-10-12T22:04:03.224Z

resolved2023-10-12T22:04:03.211Z

We have prioritized the delayed workstreams that were impacted by this incident. They continue to catch up without issue, and we have not observed any additional degradation. This incident is now resolved.

monitoring2023-10-12T20:35:28.231Z

Between 1am and 12pm US Eastern Time on 10/12, Vitally's background jobs system experienced sporadic failures. These failures caused individual background job runners to degrade and have their work marked as failed so that another job runner would pick up its slack. Unfortunately, the original runner would eventually complete the work that was marked as failed, in addition to another runner starting the same work, causing duplicative downstream impacts. This has surfaced to customers in a few ways, such as multiple emails to reconnect an integration being sent when only one was needed, some delayed and duplicated integration syncs, and potential duplicative pushes of updated data to integrations. This would not have impacted execution of playbooks--conversations, tasks, and projects created by playbooks would remain singular and not impacted by this specific cluster's outage. So we do not expect your customers to have been impacted by this incident. We've identified the workload that caused our background clusters instability and have mitigated it as of 12pm US Eastern Time today. We are catching up on our delayed workstreams and will dedicate resources to preventing this in the future.

Aug 8, 2023

Report: "Analytics Ingestion Delayed"

Last update 2023-08-08T13:30:49.294Z

resolved2023-08-08T13:30:49.277Z

This issue has been resolved and all systems are functioning normally

monitoring2023-08-08T01:32:54.843Z

Most of our customers are no longer facing data ingestion delays. We will continue to monitor the ones which are. We are also monitoring delays in computing success metrics which are impacted by the degraded read replica.

monitoring2023-08-07T23:05:38.907Z

The issue has been identified as being due to a failed launch of a new read replica. We've taken action to remove this read replica and are monitoring as our old replica catches back up to real-time.

investigating2023-08-07T19:42:55.789Z

We have identified degraded DB performance reducing throughput of data consumption in our analytics pipeline. We are investigating now

Jul 20, 2023

Report: "Elevated error rates"

Last update 2023-07-20T20:49:02.266Z

resolved2023-07-20T20:48:51.036Z

This incident has been resolved.

monitoring2023-07-20T20:32:07.249Z

Error rates have returned to normal and we are continuing to monitor.

identified2023-07-20T20:28:00.186Z

We have identified the cause and fixed the underlying issue, error rates are returning to normal.

investigating2023-07-20T20:23:58.145Z

We are currently investigating elevated error rates for our APIs and application.

Report: "Success Metric Calculation Delays"

Last update 2023-07-20T13:09:10.646Z

resolved2023-07-20T13:09:10.628Z

This incident has been resolved and success metrics are calculating

identified2023-07-19T14:01:06.086Z

We've identified and mitigated root cause of Success Metrics delays and are currently backfilling any metrics that were missed. We'll continue to backfill and monitor and will update once fully resolved.

investigating2023-07-18T21:01:47.860Z

We are continuing to investigate delays in calculating Success Metrics.

investigating2023-07-18T13:16:06.107Z

We are currently investigating delays in calculating Success Metrics

Jul 19, 2023

Report: "Integration data delays (Gmail, Stripe, Intercom, Recurly, Chargebee)"

Last update 2023-07-19T22:00:52.104Z

resolved2023-07-19T22:00:52.085Z

This incident has been resolved.

monitoring2023-07-19T21:28:37.178Z

A fix has been implemented and we are monitoring the results.

identified2023-07-19T20:05:54.231Z

The issue has been identified and a fix is being implemented.

investigating2023-07-19T20:02:49.819Z

We are currently investigating delays with syncing data from some integrations. This is limited to impacting Gmail, Stripe, Intercom, Recurly, and Chargebee integrations.

Jul 14, 2023

Report: "Data processing delays"

Last update 2023-07-14T14:24:46.556Z

resolved2023-07-14T14:24:46.541Z

This incident has been resolved.

monitoring2023-07-14T13:36:56.644Z

A fix has been implemented and we are monitoring the results.

Jun 18, 2023

Report: "Slowness and elevated error rates"

Last update 2023-06-18T20:10:20.501Z

resolved2023-06-18T20:10:20.490Z

This incident has been resolved.

monitoring2023-06-18T19:33:10.767Z

Performance has normalized and we're continuing to monitor.

investigating2023-06-18T17:29:41.257Z

We're investigating application slowness and elevated error rates

May 15, 2023

Report: "Analytics API Data Processing Delays"

Last update 2023-05-15T22:20:55.682Z

resolved2023-05-15T22:20:55.667Z

This incident has been resolved.

monitoring2023-05-15T21:08:52.937Z

A fix has been implemented and we are monitoring the results.

investigating2023-05-15T17:39:20.090Z

Data from the Vitally Analytics API is currently delayed in updating in Vitally. We are actively investigating this issue.

Apr 14, 2023

Report: "Elevated error rates"

Last update 2023-04-14T22:38:32.015Z

resolved2023-04-14T22:38:32.004Z

This incident has been resolved.

monitoring2023-04-14T22:22:06.150Z

We've implemented a fix and are continuing to monitor.

identified2023-04-14T21:13:34.067Z

We have identified and are currently working on mitigating a database issue leading to an elevated error rate for writes to the database.

Report: "Elevated error rates & latency"

Last update 2023-04-14T14:30:13.206Z

resolved2023-04-14T14:30:13.190Z

This incident has been resolved. We are continuing to monitor for any residual issues.

identified2023-04-13T23:37:08.835Z

We are continuing to see elevated error rates for writes to the database. We are actively working on resolving this incident.

identified2023-04-13T19:58:14.000Z

We have identified and are currently working on mitigating a database issue leading to an elevated error rate for writes to the database, as well as elevated latency in the Vitally application.

Apr 12, 2023

Report: "Analytics API Data Processing Delays"

Last update 2023-04-12T02:07:06.617Z

resolved2023-04-12T02:07:06.606Z

This incident is resolved.

monitoring2023-04-11T22:47:34.933Z

We have observed improved stability in inbound data processing and are continuing to monitor the situation. There may be residual delays as we process a backlog of data.

investigating2023-04-11T19:29:58.098Z

Data from the Vitally Analytics API is currently delayed in updating in Vitally. We are actively investigating this issue.

Apr 3, 2023

Report: "Elevated Error Rates and Latency in the Vitally App"

Last update 2023-04-03T20:18:44.224Z

resolved2023-04-03T20:18:44.210Z

This incident has been resolved.

monitoring2023-04-03T19:36:28.646Z

A fix has been implemented and we are monitoring the results.

investigating2023-04-03T19:18:01.523Z

We're experiencing an elevated level of latency and errors and are currently looking into the issue.

Mar 16, 2023

Report: "Elevated 500 errors and data delays"

Last update 2023-03-16T15:30:27.298Z

postmortem2023-03-16T15:28:38.694Z

From Sunday, March 12th at 02:35 UTC through Tuesday, March 15th at 21:06 UTC \(5:06PM ET\), Vitally experienced an issue with our primary PostgreSQL database that caused writes to the database to fail at an elevated rate. Those write failures impacted our users directly by causing some actions that modify data in the Vitally UI to sporadically fail. This impacted most user workflows, including but not limited to: taking notes, starting conversations, creating tasks and projects, and using the REST API. Additionally, background processing of data was degraded due to the write failures. Integration syncs, analytics data processing, playbooks, success metric calculation, and more were all affected. The root cause of the incident was a multi-terabyte database table containing event data that became too large for routine database maintenance to complete in a timely manner. Some background: PostgreSQL requires periodic “vacuuming” of tables for general system health and to prevent what is called “transaction ID wraparound”, which is an error state that causes the database to completely shut down for maintenance. That process had not successfully completed for our largest database table in over a week, which caused resource exhaustion in the “multixact” table, which is an PostgreSQL internal system table, which in turn caused writes to the database to intermittently fail. The engineering team had been aware of the risk of transaction ID wraparound for some time and had been working to prevent this from happening. However, despite our awareness and monitoring of the risk, we had not been aware of the potential _**earlier**_ threshold that would cause exhaustion of the multixact table and that we had less visibility into. This was the error that started occurring on Sunday and is the reason that only database **writes** were affected. Before this incident, we had multiple projects in progress targeted at removing this risk. A scheduled database configuration change to make our vacuum config more aggressive went out three hours into the start of the errors. Unfortunately, that configuration change also had the side effect of interrupting the ongoing vacuum process and restarting the clock on maintenance. Additionally, we had made significant progress on a migration of all that data to a new home, which we were able to leverage during the incident by completing the cut-over on an accelerated timeline. Once our database entered this error state, the only way to recover was to complete the vacuum process, which we knew could take days. Our incident response team, which grew to half the engineering team, including the CTO and CEO, focused on three separate workstreams: 1. Finding any way we could to get the VACUUM process to complete more quickly. 2. Completing the cut-over to a new home for analytics data, which we completed on Monday. 3. Mitigating the implications of write failures across the system through adding retries, shifting system load, and attempting to reduce the load on the multixact table for the most critical user workflows, like creating notes. Ultimately, we were able to complete an accelerated VACUUM process on Tuesday, March 15th at 21:06 UTC \(5:06 EDT\). Errors immediately subsided, and the team turned their attention to ensuring every part of the system was working well, including catching up job and integration queues and ensuring that the risk of recurrence was low. Because of the migration on Monday, we are no longer writing to the problematic table and therefore are no longer at risk of recurrence of this specific issue. That said, we learned many lessons during this incident and the engineering team is hard at work at making the system more resilient, putting in place additional monitoring, and adjusting database configuration to prevent an incident like this from ever recurring. We sincerely apologize for the impact of this issue. System availability and stability is job #1 for the engineering team at Vitally, and we understand the gravity of a disruption like this to your ability to do your jobs. We are working hard to ensure that Vitally comes out of this incident even more resilient.

resolved2023-03-15T00:57:23.616Z

This incident is now resolved. Database maintenance has completed and errors have returned to normal levels. Integrations and analytics data as well as other asynchronous operations may continue to be delayed as our system catches up, but we anticipate these delays to be resolved soon.

monitoring2023-03-14T21:20:48.827Z

Database maintenance has completed and error levels have returned to normal. Integrations, analytics data, and other asynchronous operations may continue to be delayed as the system catches up. We are continuing to monitor closely.

identified2023-03-14T19:25:27.656Z

We are continuing to work on a fix for this issue.

identified2023-03-14T19:25:10.600Z

Database maintenance continues to progress is still expected to complete today.

identified2023-03-14T13:18:14.614Z

Database maintenance is ongoing and we expect it to complete at some point today. We continue to ship mitigations to improve the success rate of integration syncs and user and API requests.

identified2023-03-14T04:24:45.539Z

Database maintenance is ongoing and we are continuing to work on mitigations.

identified2023-03-14T01:32:02.053Z

Database maintenance is ongoing and we are continuing to work on mitigations.

identified2023-03-13T21:49:09.000Z

We are performing database maintenance which we expect to resolve elevated error rates when performing writes. Concurrently, we are actively working on multiple mitigations to reduce error rates at the application layer. We sincerely apologize for the inconvenience and are working hard to fix this.

investigating2023-03-13T20:26:42.072Z

We are continuing to investigate and remediate elevated 500 errors across the app. Additionally, integrations data may temporarily be delayed.

investigating2023-03-13T15:36:05.156Z

We are continuing to investigate this issue.

investigating2023-03-13T13:40:20.321Z

We are continuing to investigate this issue.

investigating2023-03-13T12:53:30.660Z

We are currently investigating this issue

Feb 25, 2023

Report: "Performance degraded"

Last update 2023-02-25T19:38:58.476Z

resolved2023-02-25T19:38:58.465Z

This incident has been resolved.

monitoring2023-02-25T18:08:25.418Z

A fix has been implemented and we are monitoring the results.

identified2023-02-25T17:26:11.201Z

Site and API performance is degraded. We are investigating and resolving.

Feb 23, 2023

Report: "Elevated latency and 500 errors"

Last update 2023-02-23T15:45:42.823Z

resolved2023-02-23T15:45:42.807Z

This incident has been resolved.

monitoring2023-02-23T15:04:42.803Z

A fix has been implemented and we are monitoring the results.

investigating2023-02-23T14:33:40.660Z

We are currently investigating this issue

Feb 18, 2023

Report: "Elevated latency and 500 errors"

Last update 2023-02-18T17:03:53.695Z

resolved2023-02-18T17:03:53.680Z

This incident has been resolved.

identified2023-02-18T16:44:07.251Z

This issue has been identified and a fix is being implemented.

investigating2023-02-18T16:11:25.278Z

We are continuing to investigate this issue.

investigating2023-02-18T16:10:23.072Z

We are currently investigating this issue.

Feb 13, 2023

Report: "Degraded performance across Vitally REST and integrations APIs"

Last update 2023-02-13T19:36:00.276Z

resolved2023-02-13T19:36:00.262Z

This incident has been resolved.

monitoring2023-02-13T19:16:46.899Z

A fix has been implemented and we are monitoring the results.

identified2023-02-13T18:56:00.546Z

Vitally REST and integrations APIs are no longer facing degraded performance. Playbook execution and analytics API data processing continue to be delayed.

investigating2023-02-13T18:50:35.159Z

We are continuing to investigate this issue. Playbook execution and analytics API data processing are currently delayed in addition to degraded performance on integrations API endpoints.

investigating2023-02-13T18:18:30.247Z

We are currently investigating this issue.

Dec 9, 2022

Report: "Degraded Performance for Server and Analytics API"

Last update 2022-12-09T18:28:44.630Z

resolved2022-12-09T18:28:44.617Z

This incident has been resolved.

monitoring2022-12-09T17:34:32.737Z

A fix has been implemented and we are monitoring the results.

investigating2022-12-09T17:24:30.237Z

We are currently investigating this issue.

Nov 9, 2022

Report: "Degraded app performance"

Last update 2022-11-09T13:18:04.231Z

resolved2022-11-09T13:18:04.216Z

This incident has been resolved

monitoring2022-11-09T12:51:53.440Z

A fix has been implemented and we are monitoring the results

investigating2022-11-09T12:26:12.563Z

We are currently seeing poor API and app performance across several parts of the app. We're investigating and will update shortly.

Oct 7, 2022

Report: "Elevated error rates & latency"

Last update 2022-10-07T14:32:46.493Z

resolved2022-10-07T14:32:46.479Z

This incident has been resolved.

monitoring2022-10-07T03:28:44.658Z

A fix has been implemented and we are monitoring the results.

investigating2022-10-07T02:30:03.705Z

We are continuing to investigate this issue.

investigating2022-10-07T02:29:56.727Z

We are experiencing loading delays and errors and are investigating.

Oct 6, 2022

Report: "Elevated Error Rates"

Last update 2022-10-06T14:38:11.784Z

resolved2022-10-06T14:38:11.766Z

This issue has been resolved, but we continue to monitor closely

monitoring2022-10-06T13:31:14.403Z

Error rates have normalized and we are continuing to monitor

identified2022-10-06T12:58:22.770Z

We have mitigated the main cause of errors but are still seeing degraded performance in some areas

investigating2022-10-06T12:01:49.447Z

We are seeing elevated error rates when accessing the product and our REST APIs