Historical record of incidents for Vitally
Report: "External vitally.io Site Down"
Last updateOur vendor has resolved this and the page loads on refresh
Our marketing vendor is experiencing an outage (https://status.webflow.com) which is causing `vitally.io` to not load. In order to login, you can directly navigate to https://login.vitally.io. No Vitally Product experiences are currently impacted
Report: "External vitally.io Site Down"
Last updateOur vendor has resolved this and the page loads on refresh
Our marketing vendor is experiencing an outage (https://status.webflow.com) which is causing `vitally.io` to not load.In order to login, you can directly navigate to https://login.vitally.io. No Vitally Product experiences are currently impacted
Report: "Feature flags not evaluating, 360s reverting for some users"
Last updateLaunchDarkly has resolved the incident affecting our feature flags. Access to the new 360s, calendar views, and meetings should be fully operational.
Our feature flag provider, LaunchDarkly, has identified the issue. Access to the new 360s, calendar views, and meetings should now be restored for users. We are continuing to monitor the situation to ensure stability.
Our feature flag provider, LaunchDarkly, is experiencing an incident which is causing feature flags not to evaluate for some users. Users who have switched to the new 360 may see a regression to the old 360. Additionally, calendar views and meetings may be unavailable.
Report: "Feature flags not evaluating, 360s reverting for some users"
Last updateLaunchDarkly has resolved the incident affecting our feature flags. Access to the new 360s, calendar views, and meetings should be fully operational.
Our feature flag provider, LaunchDarkly, has identified the issue. Access to the new 360s, calendar views, and meetings should now be restored for users. We are continuing to monitor the situation to ensure stability.
Our feature flag provider, LaunchDarkly, is experiencing an incident which is causing feature flags not to evaluate for some users. Users who have switched to the new 360 may see a regression to the old 360. Additionally, calendar views and meetings may be unavailable.
Report: "CSV Upload Errors"
Last updateWe have not seen errors since shipping a fix yesterday and consider this incident resolved
The fix has shipped to restore CSV upload ability and we are monitoring for additional errors
We've identified why CSV upload requests are failing and are preparing a fix now
We are seeing a high rate of CSV uploads failing, we are investigating the root cause now
Report: "Elevated Server Response Times"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are seeing elevated server response times which can result in a slow experience using Vitally. We are investigating the root cause.
Report: "Degraded Search Performance"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
Playbooks have been resumed. We are monitoring as the system recovers from delays.
Search clusters are currently reindexing. Playbooks are paused until complete. Analytics tracks not currently ingesting.
The issue has been identified and a fix is being implemented.
Certain searches in the product will be significantly slower. Some searches may error. We are currently investigating this issue.
Report: "Analytics API Degraded Performance in EU"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Unexpected playbook behavior"
Last updateThis incident has been resolved. Please contact support if you see any remaining issues.
At approximately 1:30pm ET Thursday, March 6th, we shipped a bug to production that caused all “is not set” filters to evaluate to “false”. This caused a number of systemic issues: - Lifecycle tracking rules may have been incorrectly applied. - Many views and widgets were not showing accurate data. - Playbooks with an “is not set” rule would have all of their enrollees unmatched and undone. At 4pm ET, our Support team escalated widespread reports of these issues and an incident was declared. The team identified the pull request with the bug and reverted it. We also paused playbook execution pre-revert to evaluate impact. That revert finished at 5pm ET, at which point views & widgets started working again. At that point, the team also forced a reprocess of lifecycle tracking rules, which corrected incorrectly tracked accounts and organizations. However, the impact of re-enrolling entities into playbooks was potentially dangerous. So, the team started writing a restore job to reset playbooks to their prior place within the playbook run and re-do actions wherever possible. Writing and testing that job took until about 10pm ET Friday, March 7th. We have since run those jobs for all affected playbooks. As things stand now, we have reverted all undone playbooks to the best of our ability. For playbooks with an “is not set” rule, enrollees would have seen the following action behavior: - Assign a Key Role: *if* assignment strategy is “unassign”, would have unassigned affected enrollees. Those have been restored. - Create an Indicator: would have ended the indicator. Those have been restored. - Add to Segment: would have exited the segment. Those have been added back into the segment. - Start a conversation, reply to a conversation: *if* undo strategy is “delete if unset” *and* messages had been created but not yet sent, they would have been deleted. This is unrecoverable. - Create a task: *if* undo strategy is “delete the task”, tasks would have been deleted. Those have been restored. - Update a task: *if* undo strategy is “revert the updates”, task updates would have been reverted. Those have been restored. - Create a project: *if* undo strategy is “delete the project”, projects and incomplete tasks would have been deleted. Those have been restored. - Create a doc: *if* undo strategy is “delete the doc”, docs would have been deleted. Those have been restored. - Set a goal: *if* undo strategy is “delete the goal”, goals would have been deleted. Those have been restored. - Update a trait: *if* undo strategy is “delete the trait’s value”, traits would have been set to null. Those values have been restored. - Send a segment track: unaffected - Ping a webhook: unaffected - Show an NPS survey: NPS surveys would not have been shown to enrollees previously matching affected playbooks until they were restore. However, there may have been second-order effects throughout the system that are difficult to identify. If one playbook was responsible for maintaining customer segmentation had an “is set” filter, it would have removed customers from segments. If a second playbook had filters on those segments, it may have also run incorrectly. All playbooks have since been re-run and should be reset to the most accurate state we’re able to attain right now. If you still see systemic issues with any of your playbooks, please report them to our Support team via in-app chat or at support@vitally.io. We have a dedicated internal channel for these issues and will work to the best of our ability to minimize the impact on your business and customers. We deeply apologize for the impact this has caused you in managing your customer relationships and will continue to dedicate resources to ensure continuity of your business operations.
We are continuing to work towards recovery. At this point, we have restored data for most misapplied playbooks and are continuing to work towards resolving the remaining ones. Once we finish this portion of the incident response, we will be resuming all playbook execution. We will continue to do everything possible to expedite the resolution of this incident.
All playbooks remain paused while we work to correct the state of data to avoid propagation of unintended effects by partially restoring playbook execution. The team is working through this and apologizes for this disruption to your workflows
The incident has been isolated to playbooks that contain an “is not set” filter. Playbooks that do not contain an “is not set” filter have been resumed. Playbooks that contain an “is not set” filter remain paused. Targets in playbooks with an “is not set” filter were incorrectly removed and may have had actions undone. We are working to restore those playbooks to the state they were before the bug shipped before resuming execution.
Due to a bug, we are experiencing unexpected playbook behavior. We have identified the root cause and are currently working to correct these unexpected behaviors. Playbooks are paused in the meantime.
Report: "Degraded performance in US"
Last updateDue to the database maintenance we performed on Feb 15, our database was struggling to handle operational load. We rolled back some of those changes and have prioritized work to optimize our database usage to prevent incidents like this in the future.
We have mitigated the issue, though background jobs may experience delays.
Vitally is experiencing an unusual amount of server timeouts. We are looking into the incident.
Report: "Degraded performance in US"
Last updateWe have identified and resolved the cause of the deadlocking which caused increased latencies for customers in our Virginia data center and now consider the incident resolved
We have identified the cause of increased latencies and are in the process of mitigating the issue. We do expect some background job delays in the meantime
Vitally is actively monitoring performance following some database maintenance in one of our US data centers
Report: "Elevated Error Rates in EU"
Last updateWe have seen error rates return to our baseline levels. AWS's most recent update indicates successful mitigation. We will continue to monitor but are not expecting impact to business hours operations in our EU data center.
AWS has just provided an update, though we are still seeing an elevated error rate. Thank you for your patience as our infrastructure provider addresses their issue Feb 13 4:31 PM PST We are seeing early signs of recovery for error rates and latencies for multiple AWS Services in the EU-NORTH-1 Region. As we work toward full recovery, some requests may continue to timeout or be throttled. We recommend customers retry failed requests where possible. We will continue to provide additional information as we have it, or within the next 60 minutes.
AWS has provided an update ~2 minutes ago. We can confirm that the networking issues within the eu-north-1 region persist and affect connectivity to the Vitally web app as well as Vitally's ability to receive analytics data via our APIs Feb 13 3:55 PM PST We can confirm increased error rates and latencies for multiple AWS Services in the EU-NORTH-1 Region. This is due to a networking issue that we are actively working to mitigate as quickly as possible, and have all engineers engaged at this time. This issue is not impacting connectivity to and from the region, but is impacting inter-region traffic. During this time, the bulk of the impact will be contained to eun1-az3, but some individual services and operations may be impacted in other zones in the EU-NORTH-1 Region. If possible, we recommend weighting traffic away from AZ eun1-az3. We will provide an additional update within the next 45 minutes, or sooner if we have additional information to share.
AWS has acknowledged their outage https://health.aws.amazon.com/health/status Operational issue - Multiple services (Stockholm) Service Multiple services Severity Impacted RSS Increased Error Rates and Latencies Feb 13 3:35 PM PST We are investigating increased error rates and latencies for AWS Services in the EU-NORTH-1 Region.
We are seeing elevated error rates in requests and general operations in our EU data center. This appears to be an issue internal to AWS but we are monitoring
Report: "Dashboards are currently broken"
Last updateThis incident has been resolved, but we will continue to monitor closely.
A fix is being deployed and is expected to be live in 10-15 minutes.
We are investigating an issue where dashboards are erroring. We have identified the issue and service should be restored shortly.
Report: "Playbook Execution Delays"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Our playbook execution infrastructure is experiencing an unusually high number of jobs in our EU data centers, causing playbook executions to be delayed up to several hours. The delays started approximately 8 hours ago. No data has been lost, and playbook execution is almost caught up.
Report: "Login page loading errors"
Last updateStarting at 12:50 PM EST, we experienced errors when loading the login page. This has now been resolved.
Report: "External doc sharing errors"
Last updateThis incident has been resolved.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is rolling out now.
We are currently investigating an issue with external docs shared on Vitally not loading correctly. The core product remains available.
Report: "Elevated Analytics API Errors"
Last updateWe experienced an elevated level of Analytics API errors in our EU region for less than 10 minutes, from 16:45 UTC - 16:53 UTC. Some requests to the Analytics API failed with a 504 error during this time. The error rate has since recovered. We have identified the issue and are working to mitigate it moving forward.
Report: "User-Level Filtering Error"
Last updateThis incident has been resolved.
A fix has been implemented and filtering is now restored.
We are continuing to investigate this issue.
We have identified a problem with user-level filters in the Vitally product and are working on a fix.
Report: "Login Failures"
Last updateThis incident has been resolved
A fix has been implemented, and we are monitoring the results.
We have identified a problem with login failures and are working on a fix
Report: "Analytics API Data Processing Delays"
Last updateThis incident has been resolved.
A small portion of Vitally Analytics API requests failed due to an unhealthy kafka broker, returning an error response to clients. We are now recovering from residual data delays in consumption of inbound Analytics API data.
Report: "Degraded Performance"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Vitally is currently experiencing degraded performance in one of our US data centers causing longer response times. We are actively investigating.
Report: "Degraded Performance"
Last updateA fix was deployed over the weekend and we are no longer seeing increased latencies in our ongoing operations
Vitally is currently experiencing degraded performance in one of our US datacenters causing longer response times. We are actively investigating.
Report: "Degraded performance"
Last updateThis incident has been resolved.
Response times have returned to normal, and we are no longer seeing degraded performance.
Vitally is currently experiencing degraded performance in one of our US datacenters. We are actively investigating. We have implemented a partial fix and response times are improving. We will continue to update as we make progress.
Report: "Intercom Data Delays"
Last updateDelays in processing Intercom data are now resolved.
We have found the cause of the delays, and have implemented a mitigation. We will continue to monitor closely as the remaining data is processed.
We are aware of delays in processing Intercom data received by Vitally through webhooks and are working towards a solution
Report: "Vitally Not Loading"
Last updateWe have resolved the issue impacting page loads
We are investigating an issue preventing users from loading the Vitally web app
Report: "Delay in Custom Metrics"
Last updateThis incident has been resolved.
Our overnight data pipeline for computing Custom Metrics encountered a stall. We have mitigated this and are monitoring as we recover. In the mean time, you may experience a delay in accessing the most recent Custom Metric data
Report: "Search Unavailable"
Last updateSearch in Vitally was unavailable for approximately 20 minutes between 3:45 and 4:05 PM Eastern Time, we apologize for the inconvenience
Report: "Reports of the Vitally app not loading on Google Chrome"
Last updateWe have not gotten additional reports of this issue and as a next step are implementing additional instrumentation to help us identify and troubleshoot any other instances of this happening. If you do experience problems with Vitally loading in Chrome, a workaround is to close your current tab and reopen Vitally in a new tab. Additionally, please don't hesitate to reach out to support@vitally.io if you are experiencing any difficulties.
We are continuing to investigate this issue. Based on our investigations so far, if you are encountering slow load times when loading Vitally on Google Chrome, the best workaround is to close the current tab and open a new tab. (Note: This supersedes the previous guidance to update Google Chrome.) Please don't hesitate to reach out to support@vitally.io if you are experiencing any difficulties, and we'll continue to update our status page as we have updates.
We have received several reports this week of the Vitally app not loading on Google Chrome browsers. We are investigating the root cause of this issue, but in the meantime one workaround is to update Google Chrome to its latest version. Here is an article on how to do this: https://support.google.com/chrome/answer/95414?sjid=18099954259407209262-NC. Please don't hesitate to reach out to support@vitally.io if you are experiencing any difficulties, and we'll update our status page as we have updates.
Report: "Elevated error rates"
Last updateThis incident has been resolved.
We have resolved the root cause of the errors and will continue to monitor.
We are currently investigating an issue with access to Vitally.io and our APIs.
Report: "Site slowness"
Last updateOur systems are functioning as expected
Maintenance has completed. Vitally was unavailable for approximately 2 minutes between 01:17 ET and 01:19 ET (06:17 UTC and 06:19 UTC). We will continue to monitor performance.
Maintenance is beginning now in the US.
Vitally’s primary US database is under increased load leading to general slowness. Some integration data is delayed as a result. We will be performing a brief database maintenance at 05:00 UTC (00:00 ET) tonight to increase database capacity in our US datacenter. Vitally’s platform and API will be unavailable for up to 10 minutes.
Report: "DB Failover"
Last updateWe experienced a database failover overnight, causing a period of 3 minutes between 1:37AM Eastern and 1:40AM eastern in which web requests against Vitally would have failed. We do not expect lingering impact on users
Report: "Duplicate Background Jobs"
Last updateWe have prioritized the delayed workstreams that were impacted by this incident. They continue to catch up without issue, and we have not observed any additional degradation. This incident is now resolved.
Between 1am and 12pm US Eastern Time on 10/12, Vitally's background jobs system experienced sporadic failures. These failures caused individual background job runners to degrade and have their work marked as failed so that another job runner would pick up its slack. Unfortunately, the original runner would eventually complete the work that was marked as failed, in addition to another runner starting the same work, causing duplicative downstream impacts. This has surfaced to customers in a few ways, such as multiple emails to reconnect an integration being sent when only one was needed, some delayed and duplicated integration syncs, and potential duplicative pushes of updated data to integrations. This would not have impacted execution of playbooks--conversations, tasks, and projects created by playbooks would remain singular and not impacted by this specific cluster's outage. So we do not expect your customers to have been impacted by this incident. We've identified the workload that caused our background clusters instability and have mitigated it as of 12pm US Eastern Time today. We are catching up on our delayed workstreams and will dedicate resources to preventing this in the future.
Report: "Analytics Ingestion Delayed"
Last updateThis issue has been resolved and all systems are functioning normally
Most of our customers are no longer facing data ingestion delays. We will continue to monitor the ones which are. We are also monitoring delays in computing success metrics which are impacted by the degraded read replica.
The issue has been identified as being due to a failed launch of a new read replica. We've taken action to remove this read replica and are monitoring as our old replica catches back up to real-time.
We have identified degraded DB performance reducing throughput of data consumption in our analytics pipeline. We are investigating now
Report: "Elevated error rates"
Last updateThis incident has been resolved.
Error rates have returned to normal and we are continuing to monitor.
We have identified the cause and fixed the underlying issue, error rates are returning to normal.
We are currently investigating elevated error rates for our APIs and application.
Report: "Success Metric Calculation Delays"
Last updateThis incident has been resolved and success metrics are calculating
We've identified and mitigated root cause of Success Metrics delays and are currently backfilling any metrics that were missed. We'll continue to backfill and monitor and will update once fully resolved.
We are continuing to investigate delays in calculating Success Metrics.
We are currently investigating delays in calculating Success Metrics
Report: "Integration data delays (Gmail, Stripe, Intercom, Recurly, Chargebee)"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating delays with syncing data from some integrations. This is limited to impacting Gmail, Stripe, Intercom, Recurly, and Chargebee integrations.
Report: "Data processing delays"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Report: "Slowness and elevated error rates"
Last updateThis incident has been resolved.
Performance has normalized and we're continuing to monitor.
We're investigating application slowness and elevated error rates
Report: "Analytics API Data Processing Delays"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Data from the Vitally Analytics API is currently delayed in updating in Vitally. We are actively investigating this issue.
Report: "Elevated error rates"
Last updateThis incident has been resolved.
We've implemented a fix and are continuing to monitor.
We have identified and are currently working on mitigating a database issue leading to an elevated error rate for writes to the database.
Report: "Elevated error rates & latency"
Last updateThis incident has been resolved. We are continuing to monitor for any residual issues.
We are continuing to see elevated error rates for writes to the database. We are actively working on resolving this incident.
We have identified and are currently working on mitigating a database issue leading to an elevated error rate for writes to the database, as well as elevated latency in the Vitally application.
Report: "Analytics API Data Processing Delays"
Last updateThis incident is resolved.
We have observed improved stability in inbound data processing and are continuing to monitor the situation. There may be residual delays as we process a backlog of data.
Data from the Vitally Analytics API is currently delayed in updating in Vitally. We are actively investigating this issue.
Report: "Elevated Error Rates and Latency in the Vitally App"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We're experiencing an elevated level of latency and errors and are currently looking into the issue.
Report: "Elevated 500 errors and data delays"
Last updateFrom Sunday, March 12th at 02:35 UTC through Tuesday, March 15th at 21:06 UTC \(5:06PM ET\), Vitally experienced an issue with our primary PostgreSQL database that caused writes to the database to fail at an elevated rate. Those write failures impacted our users directly by causing some actions that modify data in the Vitally UI to sporadically fail. This impacted most user workflows, including but not limited to: taking notes, starting conversations, creating tasks and projects, and using the REST API. Additionally, background processing of data was degraded due to the write failures. Integration syncs, analytics data processing, playbooks, success metric calculation, and more were all affected. The root cause of the incident was a multi-terabyte database table containing event data that became too large for routine database maintenance to complete in a timely manner. Some background: PostgreSQL requires periodic “vacuuming” of tables for general system health and to prevent what is called “transaction ID wraparound”, which is an error state that causes the database to completely shut down for maintenance. That process had not successfully completed for our largest database table in over a week, which caused resource exhaustion in the “multixact” table, which is an PostgreSQL internal system table, which in turn caused writes to the database to intermittently fail. The engineering team had been aware of the risk of transaction ID wraparound for some time and had been working to prevent this from happening. However, despite our awareness and monitoring of the risk, we had not been aware of the potential _**earlier**_ threshold that would cause exhaustion of the multixact table and that we had less visibility into. This was the error that started occurring on Sunday and is the reason that only database **writes** were affected. Before this incident, we had multiple projects in progress targeted at removing this risk. A scheduled database configuration change to make our vacuum config more aggressive went out three hours into the start of the errors. Unfortunately, that configuration change also had the side effect of interrupting the ongoing vacuum process and restarting the clock on maintenance. Additionally, we had made significant progress on a migration of all that data to a new home, which we were able to leverage during the incident by completing the cut-over on an accelerated timeline. Once our database entered this error state, the only way to recover was to complete the vacuum process, which we knew could take days. Our incident response team, which grew to half the engineering team, including the CTO and CEO, focused on three separate workstreams: 1. Finding any way we could to get the VACUUM process to complete more quickly. 2. Completing the cut-over to a new home for analytics data, which we completed on Monday. 3. Mitigating the implications of write failures across the system through adding retries, shifting system load, and attempting to reduce the load on the multixact table for the most critical user workflows, like creating notes. Ultimately, we were able to complete an accelerated VACUUM process on Tuesday, March 15th at 21:06 UTC \(5:06 EDT\). Errors immediately subsided, and the team turned their attention to ensuring every part of the system was working well, including catching up job and integration queues and ensuring that the risk of recurrence was low. Because of the migration on Monday, we are no longer writing to the problematic table and therefore are no longer at risk of recurrence of this specific issue. That said, we learned many lessons during this incident and the engineering team is hard at work at making the system more resilient, putting in place additional monitoring, and adjusting database configuration to prevent an incident like this from ever recurring. We sincerely apologize for the impact of this issue. System availability and stability is job #1 for the engineering team at Vitally, and we understand the gravity of a disruption like this to your ability to do your jobs. We are working hard to ensure that Vitally comes out of this incident even more resilient.
This incident is now resolved. Database maintenance has completed and errors have returned to normal levels. Integrations and analytics data as well as other asynchronous operations may continue to be delayed as our system catches up, but we anticipate these delays to be resolved soon.
Database maintenance has completed and error levels have returned to normal. Integrations, analytics data, and other asynchronous operations may continue to be delayed as the system catches up. We are continuing to monitor closely.
We are continuing to work on a fix for this issue.
Database maintenance continues to progress is still expected to complete today.
Database maintenance is ongoing and we expect it to complete at some point today. We continue to ship mitigations to improve the success rate of integration syncs and user and API requests.
Database maintenance is ongoing and we are continuing to work on mitigations.
Database maintenance is ongoing and we are continuing to work on mitigations.
We are performing database maintenance which we expect to resolve elevated error rates when performing writes. Concurrently, we are actively working on multiple mitigations to reduce error rates at the application layer. We sincerely apologize for the inconvenience and are working hard to fix this.
We are continuing to investigate and remediate elevated 500 errors across the app. Additionally, integrations data may temporarily be delayed.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are currently investigating this issue
Report: "Performance degraded"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Site and API performance is degraded. We are investigating and resolving.
Report: "Elevated latency and 500 errors"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue
Report: "Elevated latency and 500 errors"
Last updateThis incident has been resolved.
This issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Degraded performance across Vitally REST and integrations APIs"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Vitally REST and integrations APIs are no longer facing degraded performance. Playbook execution and analytics API data processing continue to be delayed.
We are continuing to investigate this issue. Playbook execution and analytics API data processing are currently delayed in addition to degraded performance on integrations API endpoints.
We are currently investigating this issue.
Report: "Degraded Performance for Server and Analytics API"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Degraded app performance"
Last updateThis incident has been resolved
A fix has been implemented and we are monitoring the results
We are currently seeing poor API and app performance across several parts of the app. We're investigating and will update shortly.
Report: "Elevated error rates & latency"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are experiencing loading delays and errors and are investigating.
Report: "Elevated Error Rates"
Last updateThis issue has been resolved, but we continue to monitor closely
Error rates have normalized and we are continuing to monitor
We have mitigated the main cause of errors but are still seeing degraded performance in some areas
We are seeing elevated error rates when accessing the product and our REST APIs