Historical record of incidents for Fullstory
Report: "Fullstory login and UI unavailable"
Last updateWe are currently investigating a site outage which is impacting logging in and loading the Fullstory UI. We are actively working to identify the root cause and will provide regular updates as they are available.
Report: "Slow loading in Fullstory UI"
Last updateWe are currently experiencing delays in loading in the Fullstory Web Application. The issue started around 10:25am ET. We are actively investigating the issue and will provide an update as soon as possible.
Report: "Scheduled Maintenance"
Last updateThe scheduled maintenance has been completed.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Fullstory will be conducting scheduled maintenance for our EU data center on Thursday, March 27th between 7:00 PM EDT through 9:00 PM EDT. Customers may experience partial search and API unavailability as well as potential indexing delays during the period of planned maintenance. If you are experiencing any issues past 9:30 PM EDT, please email us at support@fullstory.com
Report: "Slow or failure to load Fullstory"
Last updateWe have implemented a fix that has resolved the issue and our team is still actively investigating the underlying cause to prevent future occurrences. Please reach out to support@fullstory.com if you experience any further issues.
At 4:11 PM EDT, we detected an issue that resulted in degraded performance on Fullstory. The service has been restored, and we are actively investigating the root cause. We will provide further updates as our investigation progresses. If you encounter any additional issues or have questions, please contact support@fullstory.com.
We are continuing to investigate this issue.
At 4:19pm ET, we noticed users experiencing slow loading or failure to load Fullstory. We are rolling back recent changes and continuing to investigate and fully resolve this issue.
Report: "Slow or failure to load Fullstory"
Last updateWe have implemented a fix that has resolved the issue and our team is still actively investigating the underlying cause to prevent future occurrences. Please reach out to support@fullstory.com if you experience any further issues.
At 4:11 PM EDT, we detected an issue that resulted in degraded performance on Fullstory. The service has been restored, and we are actively investigating the root cause. We will provide further updates as our investigation progresses. If you encounter any additional issues or have questions, please contact support@fullstory.com.
We are continuing to investigate this issue.
At 4:19pm ET, we noticed users experiencing slow loading or failure to load Fullstory. We are rolling back recent changes and continuing to investigate and fully resolve this issue.
Report: "Slow or failure to load Fullstory"
Last updateAfter careful consideration, we have made changes on our end and this incident is now resolved. Please write into support@fullstory.com if you experience further trouble.
We have identified the root cause but out of an abundance of caution are planning to implement a fix on Monday. Our teams will monitor in the meantime and, should there be any further issues, they will be quickly addressed. When our fix is implemented on Monday we’ll provide another status update, or by 12:00pm EDT at the latest. If you experience any issues prior to Monday, please reach out to support@fullstory.com for assistance.
We’ve identified a slow query that was leading to degraded performance. We are implementing the necessary changes to guard against future performance degradation. Fullstory should be back up and running. Please reach out to support@fullstory.com with any further issues/questions
At 12:44am EST today, we noticed users experiencing slow loading or failure to load Fullstory. We are actively investigating the issue.
Report: "Delayed session processing for EU"
Last updateFrom 6:30am ET until 9:00am ET we were experiencing delays in session processing for accounts in our EU data center. New sessions may not have appeared until 9:30am ET but no data was lost. The issue is now fully resolved and all sessions captured during this time have been processed.
Report: "Fullstory unavailable for some users"
Last updateAll systems are back to normal operations and are working as expected.
We've rolled back some changes that caused this outage and Fullstory should be back up and running again. Please reach out to support@fullstory.com with any further issues/questions.
We are currently investigating a Fullstory web application outage for some users. Note: This outage does not affect session capture.
Report: "Delayed session processing"
Last updateA fix has been implemented and we are monitoring the results.
We are currently experiencing delays in session processing and indexing that began at 4:00pm ET. There may be a delay in your ability to view newly captured sessions. We are investigating the issue.
Report: "Issues logging in to Fullstory"
Last updateFrom 10:15 - 10:35am EST today, we identified an issue with logging in to the Fullstory application. This issue has since been resolved and access to Fullstory has been restored. As a result of this issue, there is a slight delay in session processing that will be caught up within 30 - 60 minutes following the resolution.
Report: "Elevated Data Display Latency"
Last updateThings are back to normal now and users should no longer be seeing latency in data loading. If you continue to have issues, please reach out support@fullstory.com.
We are continuing to work on a fix for the elevated data display latency. Thanks for your patience.
We are experiencing an elevated latency for data loading for a subset of orgs. Users may receive errors while attempting to load Dashboards, Metrics, and Segments. Some pages may take longer than normal to appear.
Report: "Elevated Data Display Errors, Delayed Segment Alerts"
Last updateThe delays have subsided and this incident is now resolved. Please write into support@fullstory.com if you experience further trouble.
We are experiencing an elevated error rate displaying data in app for a subset of orgs. Charts may be slow to load. They may timeout with an error. Data capture is not affected. Segment alerts are also delayed for all orgs. We are actively investigating the issue.
Report: "Partial data capture issues (web & mobile)"
Last update# Postmortem 2024/11/13 Failed Tasks Not Retried / Pages Not Initialized On October 31st, 2024, an issue introduced in our service caused a small percentage \(< 0.05%\) of sessions to not process a portion of data within them. This data was effectively “stuck” in our system. The issue lasted until November 13th, when it was fixed for all future sessions. # Customer Impact The issue caused the affected sessions to be missing data, causing gaps/disruptions when loading playback, webhooks not firing, and events not being exported via the Data Direct feature. Affected sessions captured after Nov 6th were identified and frozen so that we could re-process the data. Starting the week of Dec 9th, we ran all the frozen data back through our pipeline and the majority of it was recovered successfully for sessions with enough data to process. Affected sessions before Nov 6th will remain in this incomplete state, as the data that had been unprocessed got deleted as it was misidentified as something that our system needed to clean up and remove, due to being stuck for some time. We apologize for any impact this may have had on your business, and are working to better understand how to identify issues of this nature more quickly, and resolve them with even less disruption. # Root Cause Due to a code change that altered the behavior of an uncommon failure condition, part of our system marked the data in question to be “completed” when it had actually failed. Normally in this case, it automatically retries this operation and the data gets processed, but did not do so during the incident period. Due to this being a rare circumstance, the issue was not immediately noticed by our typical monitoring systems. # Resolution After noticing some odd system behavior and finding that we had unprocessed data that should have been successful, we quickly traced it back to the code change, and deployed a fix immediately that would ensure no future data gets into this state. # Process Changes and Prevention **Actions Taken:** * Fix the issue for future sessions. * Freeze and reprocess all affected sessions still in our system. * Added more automated testing around this failure case. * Adjusted our monitoring and alerting to better identify “stuck” data that’s legitimate, even when it’s a very small percentage of overall processing. **Ongoing Improvements:** * Applying lessons learned to other parts of the system, including better detection systems and testing for any other components that process data in a similar way. * Better automated observability triaging so that rare-circumstance issues don’t blend in with general service metric variability. We deeply regret this incident and invite any FullStory customer who was materially affected to contact [support@fullstory.com](mailto:support@fullstory.com). We stand by ready to fully address all of your concerns.
This issue is now resolved. Most accounts will notice a small percentage of sessions affected by this issue. This would result in entirely missing sessions, or pages in sessions, which would also impact Search, Conversions, Metrics, Funnels, and Dashboards. We’re actively remediating the sessions from November 6th - 13th. After remediation, sessions and analytics will be fully recovered. Data sent to a data warehouse or via a webhook for this time period will be delayed. If you watch a session from October 31 - November 5th and experience skipping of playback, it’s likely caused by this issue. If you would like to confirm that your account was impacted, please reach out to support@fullstory.com.
Our team has identified an issue impacting data capture for web and mobile sessions that started October 31st, 2024 at 4:03pm ET. This issue impacted 4,770 Fullstory accounts, but only impacts about .5% of total captured sessions per account. Due to our internal retention timeframe, sessions impacted by this issue prior to November 6, 2024 at 5:28pm ET are nonrecoverable. We increased our retention to ensure no further impacted session data would be lost. A fix for the issue has been deployed and we're monitoring this further.
Report: "Fullstory APIs Unavailable"
Last updateOn October 31, 2024, our API service experienced a disruption that resulted in a significant number of dropped requests and errors. The incident lasted for approximately 1.5 hours and impacted all users of our public server API. # Customer Impact Most public server API requests sent during the incident time frame \(18:37:29 UTC to 20:12:39 UTC\) were unsuccessful and should be re-submitted. We acknowledge the inconvenience and disruption this caused our customers and sincerely apologize for any impact on your operations. # Root Cause The root cause of the incident was a decrease in the capacity of our API service due to an unsuccessful deployment. This decrease in capacity, combined with higher than usual request volume, led to the service being unable to handle the incoming traffic. # Resolution Upon discovery of the issue, our on-call operations team was immediately notified and responded swiftly. The team's actions included re-executing the failed deployment to restore normal API service functionality. # Process Changes and Prevention **Actions Taken:** * We re-executed the unsuccessful deployment. * We permanently scaled up the capacity of our API service to handle the increased traffic. * We are implementing additional monitoring and alerting to detect and prevent similar issues in the future. **Ongoing Improvements:** * We are reviewing our deployment processes to minimize the risk of similar incidents. * We are investigating ways to improve the resilience of our API service to handle unexpected spikes in traffic. We deeply regret this incident and invite any Fullstory customer who was materially affected to contact [support@fullstory.com](mailto:support@fullstory.com). We stand by ready to fully address all of your concerns.
From October 31, 2024 at 2:38PM ET through October 31, 2024 at 4:07PM ET, the Fullstory APIs were unintentionally unavailable. The majority of requests to Fullstory APIs would have returned an error. Fullstory customers that attempted to send an API request during the outage window will need to process these requests again. If you believe your account was impacted or have any questions, please reach out to support@fullstory.com.
Report: "Unindexed server events"
Last updateStarting on March 26th, as a result of processing optimization improvements, a bug was introduced that resulted in some server events for some accounts not being properly indexed or sent to data warehouses. We were made aware of this issue on September 11th and released a fix on September 12th. All server events after the fix was released have been properly indexed and sent to data warehouses, where applicable. All server event data was captured, and no event data has been lost, but server events may appear missing in session replay. If you are missing server events between March 26th and September 12th and would like for us to reindex them for your account, please reach out to support@fullstory.com. Note, reindexing will not send updates to data warehouses. Specific situation where server events were impacted: Sessions containing multiple server events sent via Server API may not have all of their server events processed, indexed and warehoused. When a session receives events and are placed into a new page, that page is processed immediately. When subsequent events are received with timestamps that are not later than the previous server events, the processed for that updated page is skipped since no new data is detected. As a result, server events may appear missing if their timestamp was not later than the prior events and those events were received after those prior events had been processed.
Report: "Delayed indexing"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently experiencing indexing delays for sessions and Data Warehouse updates in our NA1 data center. We are investigating the issue.
Report: "Fullstory dashboard availability issues"
Last updateFor a period of 40 minutes starting from May 16 11:25 PM UTC, Fullstory customers experienced an elevated rate of errors in the web application. This is related to networking issues in our infrastructure; application availability has returned to normal and is being monitored by the team.
Report: "Login Issues & Partial Capture Outage (Web & Mobile)"
Last updatePartial Capture Outage - Web & Mobile: For web and mobile traffic captured between 10:50 AM ET to 12:50 PM ET, there will be delayed processing. Additionally, there may be missing sessions, as well as sessions with missing activity, which could impact Session Playback, Search, Metrics, Funnels, and Conversions. Login Issues: After further investigation, attempts to login to the Fullstory application failed between 10:25 AM ET and 12:00 PM ET. This is fully resolved. If there are any follow up questions on the above, please reach out to support@fullstory.com.
We've applied a fix and we're monitoring the results.
At 10:23 AM ET, we detected login issues and partial data capture outage for web and native mobile. Login issues were resolved by 10:54 AM ET. However, the partial capture outage is ongoing and our team is actively investigating. We'll be sure to update you as soon as we have more information.
Report: "Delayed search and dashboard results"
Last updateThis incident has been resolved. If you continue to experience any issues, please reach out to support@fullstory.com for assistance.
A fix has been implemented and we're monitoring the results.
At approximately 2:10 PM ET today, we began experiencing application delays affecting search and dashboard results. Our team is actively investigating this. We appreciate your patience and will provide updates as we make progress.
Report: "Data processing delays"
Last updateOn Monday, May 6th, 2024 between 6:45pm and 12:30am ET, there was a data processing delay that affected results showing up in the web application or warehouses. Any delays are now fully processed and available.
Report: "Application errors - login"
Last updateWe have investigated and resolved errors received while attempting to log in to or use the web application. Data capture was unaffected, though there were delays in processing data which would affect multiple parts of the application (Session Replay, Metrics, Webhooks, APIs, Destinations).
Report: "Data Direct warehousing disruption"
Last updateThis issue has been resolved and all data has been successfully backfilled. If you notice any further disruptions to your data pipeline please reach out to support@fullstory.com.
A fix has been implemented and we are monitoring the results.
We are experiencing some disruption to the data pipeline that powers Data Direct warehousing. We are currently working to resolve the issue.
Report: "Elevated Search & Indexing Errors"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
App queries and overall responsiveness have improved. Users should no longer see errors or slow loading. While impact has been mitigated, we are still investigating cause.
We are experiencing an elevated error rate for searches and session indexing. Users may receive errors while attempting to load session lists (e.g. "Failed to load user list.") and new sessions may take longer than normal to appear.
Report: "Delayed session processing for EU data center"
Last updateWe experienced delays in processing sessions for our EU data center, starting from Friday March 22nd at 9 PM ET. This situation was confined to the EU data center and did not affect our NA data center. Despite all sessions being captured and stored, processing was delayed, impacting segments, metrics, funnels, and dashboards. The issue has now been fully resolved, and all systems are operating normally again.
Report: "Delayed processing of web sessions"
Last updateAs of 5:00pm EST, the sessions that were impacted by the processing delay have been fully processed and are available to view in FullStory.
On March 6th, 2024 from 3:15pm - 3:55pm EST, FullStory experienced a delay in processing sessions. The root of the issue has been identified and we are actively monitoring the fix that was implemented. The majority of sessions captured during this time period have been processed and are available to view. We will share another update as soon as the remaining sessions have been processed.
Report: "Delayed mobile session data processing"
Last updateAffected session data has been fully indexed. In the process of resolving the issue, it was determined that this affected fewer mobile app session pages than originally believed, only ~160 pages total. For any inquiries related to this incident, please reach out to us at support@fullstory.com.
We are continuing to work on a fix for this issue.
We are currently experiencing delays in ingesting mobile app session pages. No data loss has occurred, but segments, metrics, funnels, and dashboards that rely on this data will be impacted until the data is fully indexed. We have identified the root cause and are working to get the affected session data processed as quickly as possible.
Report: "Delayed Imports with Batch User and Events APIs"
Last updateAs of 8:21pm EST on Friday 2/9/24, batch imports are processing at normal speed. Thank you for your patience as we resolved this issue.
We've implemented a fix and we're working through the backlog of data. We anticipate that all batch imports data will be recovered by end of day tomorrow, Saturday February 10th.
Starting on Wednesday, February 7th around 1am EST, batch imports triggered via server API began to experience delayed processing and are currently taking longer than usual to complete. We are investigating the issue and will update you as soon as we know more.
Report: "NA Partial Capture Outage (Web & Mobile)"
Last updateOn January 31 between approximately 5:55 PM EST to 6:40 PM EST, FullStory engineering observed a rise in errors related to session capture. During that time period, a cloud workload impacted the ability to correctly capture new sessions. This issue affected the North America data center, which applies to Org IDs *not* ending in eu1. Due to natural fluctuations in traffic and the possibility that some orgs were more affected than others, it is difficult to exactly state the capture impact to an individual Org. That said, the following generalizations apply: - Sessions that started before 5:55 PM EST would likely have had their captured events delayed during and shortly after the issue period. - New web and mobile sessions starting after 5:55 PM EST may not have been captured. - Events sent to FullStory using the Server API would have likely received HTTP error responses and failed to have been associated with a corresponding session. - From 6:15 PM EST to 6:40 PM EST, the number of new sessions that failed to be captured was approximately half the number from 5:55 PM EST to 6:15 PM EST (i.e. capture rate improved during this time). If you have any questions regarding this incident, please reach out to support@fullstory.com.
Report: "Server Event Processing Issue"
Last updateWe didn’t properly ingest server event data from Jan 11, 2024 through Jan 19, 2024 and as a result data processing was delayed across all customers, meaning that custom events sent during the incident period would not appear in FullStory search, sessions, etc. or Data Destinations. Our engineers worked to restore as much of this data as possible. Due to the delay, some associated event context (e.g. timestamps, location, screen size, etc.) was lost. Any customers that had webhooks tied to custom events would not have received those webhooks during this incident. If you have any questions regarding this incident, please reach out to support@fullstory.com.
Report: "DeleteIndividual API Unavailable"
Last updateFrom November 22, 2023 at 3:28PM EST through November 23, 2023 at 6:13PM EST, the DeleteIndividual API was unintentionally unavailable as a part of our holiday code-freeze. Any requests to delete users through this API would have returned an error. FullStory customers that attempted to process a user deletion during the outage window will need to process these deletions again, either via API or our in-app delete user option. If you believe your account was impacted or have any questions, please reach out to support@fullstory.com.
Report: "Session Capture Outage"
Last updateBetween 2023-11-07 5:49 PM UTC and 2023-11-08 11:39 AM UTC an update to our capture service caused some web sessions to be initialized in a corrupted state that prevented capture data from being processed successfully. Replay and analytics features that rely on this session and event data were impacted; missing activity during this time period may impact Metrics, Funnels, Dashboards, and Conversions. Additionally, the impacted sessions are not available for session replay. This postmortem details the customer impact, the root cause of what happened, how we addressed the problem, and how we will prevent similar incidents from happening in the future. # Customer Impact During the incident, customers using [FullStory Relay](https://help.fullstory.com/hc/en-us/articles/360046112593-How-to-send-captured-traffic-to-your-First-Party-Domain-using-FullStory-Relay) or those with [CSP](https://help.fullstory.com/hc/en-us/articles/360020622854-Can-I-use-Content-Security-Policy-CSP-with-FullStory-) policies that disallow access to the [edge.fullstory.com](http://edge.fullstory.com) CDN might have failed to capture entire sessions or have captured sessions that are missing a subset of their pages. # Root Cause An update to our capture settings service caused some sessions to be initialized with inconsistent state and for which our backend data capture service was not able to process some pages for these sessions. In cases where the primary CDN-backed settings could not be accessed \(see above notes on Relay and CSP\), the client hits a fallback endpoint. This fallback endpoint did not contain accurate capture instructions for the client, leading to corruption of the local capture state. # Resolution Our internal monitoring alerted Engineering of corrupted sessions being processed, which resulted in the deployment being rolled back at 8:52 PM UTC. After observing lingering data capture errors after the rollback and testing possible impacts of the defect, a remediation process was developed to recover corrupted device identifiers for impacted users such that new capture sessions could be initiated. # Process Changes and Prevention We are committed to preventing this type of incident in the future. We’ve completed the following action items: * Implemented a change that allows recovery of device identifiers when invalid settings have been propagated * Developed and run a data recovery process for sessions captured with invalid identifiers during the brief period of time for which data recovery is possible * Increased the period of time for which data with invalid identifiers are preserved allowing for a greater possibility of data recovery in case of similar incident Here are additional steps we’re taking: * Add additional automated testing to ensure consistency between primary and fallback capture settings * Add additional checks in our web capture client to guard against possible corruption of device identity in the observed and similar circumstances We deeply regret this incident and invite any FullStory customer who was materially affected to contact [support@fullstory.com](mailto:support@fullstory.com). We stand by ready to fully address all of your concerns.
On November 7, 2023, a subset of FullStory orgs experienced a session ingestion outage. If your organization uses Relay or needs to update Content Security Policy (CSP) settings, sessions captured between November 7, 12:49 PM ET, and November 8, 6:39 AM ET may be lost or missing some pages. The problem was caused by a change in settings, which was quickly reversed once we noticed the impact on November 7, 3:52 PM ET. However, users that initiated sessions during the specified time frame might have still been affected until a fix to repair was deployed on November 8, 6:39 AM ET. We're sorry for the inconvenience and are working to put safeguards in place to prevent and detect similar scenarios in the future. If you suspect your org was affected or want more information, contact us at support@fullstory.com.
Report: "Rolled back: login changes for SAML-based SSO"
Last updateOn Wednesday, September 27th 6:25 PM ET, we released an account security enhancement for a subset of orgs that have SAML-based single-sign on enabled. Unfortunately, this introduced friction to the login process and some users were unable to access their accounts as a result. On Thursday, September 28th 11:07 AM ET, we reverted the change, allowing users to authenticate as they previously did. We will review the implementation of the change and work with impacted customers to ensure that future release will be non-disruptive. There will be communications shared in advance of future updates affecting login flows.
Report: "Data capture outage for orgs on EU data center"
Last update# Postmortem 2023/09/13 EU1 Data Capture Outage Due to an infrastructure issue caused by a sudden spike of unanticipated traffic, end user session data was not captured for both web and mobile sessions on 2023-09-13 between 18:15 UTC and 19:05 UTC, for all Orgs hosted in our European \(EU1\) data center. Orgs hosted on our North American \(NA1\) data center were not impacted. Any existing sessions being captured during the impacted timeframe may have gaps in intermediate pages, resulting in missing segments of time in playback, and any new sessions \(web or mobile\) started during this time may have been dropped. Analytics features that rely on this session and event data are also impacted, which means that there may be missing data points in metrics, funnels, dashboards, and conversions. This postmortem details the impact on our customers, the root cause of the issue, how we addressed the problem, and the steps we're taking to prevent this and similar types of issues in the future. # Customer Impact All Orgs on our EU data center were impacted by this incident. You can check if your Org was impacted by seeing if your [Org’s ID](https://help.fullstory.com/hc/en-us/articles/360047075853-How-do-I-find-my-FullStory-Org-Id-) ends in “-eu1”. Any web and mobile capture data coming to FullStory between 18:15 UTC and 19:05 UTC was not captured and is not recoverable. # Root Cause On 2023-09-13 at 18:15 UTC our EU data center experienced an unexpected spike in unanticipated traffic. Our backend data capture service was unable to scale fast enough to accommodate the traffic increase and the service eventually crashed. The service would then crash again on attempted restarts as it could not scale fast enough to handle the incoming traffic. # Resolution Our on-call operations team was immediately alerted and intervened to resolve this issue after it presented itself, including scaling up the data capture service manually to resume proper operation of our data capture service. # Process Changes and Prevention So far we have: * Scaled up existing resources for our data capture service to handle the new volume of traffic * Updated our infrastructure so that the data capture service will be able to scale up faster in the future To prevent a recurrence of this incident we will be: * Modifying our service scaling policies even more so that we can handle similar spikes like this more smoothly * Improving our monitoring and alerting so we can address these kinds of issues more quickly We deeply regret this incident and invite any FullStory customer who was materially affected to contact [support@fullstory.com](mailto:support@fullstory.com). We stand by ready to fully address all of your concerns.
End user sessions and data were not captured from 2:25pm ET to 3:05pm ET for accounts on our EU data center. All missing activity during this time period is non-recoverable and may impact Metrics, Funnels, Dashboards, and Conversions. During this time, native mobile builds could also have failed. Re-running the build will properly upload the assets. This issue did not impact any FullStory accounts utilizing our US data center. (Your FullStory URLs would start with app.eu1.fullstory.com if you are utilizing our EU data center.)
Report: "Blank pages in a subset of sessions."
Last update### Session Replay Corruption Postmortem Due to a software defect, session replay data for approximately 1% of sessions captured between July 18 \(approximately 16:07:21 UTC\) and July 31 \(20:45:29 UTC\) was corrupted. Although the associated product analytics remain unaffected for web sessions, the originating sessions cannot be viewed in session replay. An “Unable to retrieve session” error will display if one of these sessions is accessed. In an attempt to recover lost sessions, mobile analytics data for the affected sessions were also deleted for some accounts. Your Customer Success Manager will be reaching out to you if your account was impacted by this deletion. If you do not have a Success Manager and believe you are missing mobile analytics data, please contact [support@fullstory.com](mailto:support@fullstory.com). This postmortem details the impact on our customers, the root cause of the issue, how we addressed the problem, and the steps we're taking to prevent this and similar types of issues in the future. ## Customer Impact Approximately 1% of the web and mobile sessions captured during the incident window are unplayable. Product analytics for the mobile sessions impacted by the mitigation attempt are irrecoverable. ## Root Cause On July 18th, a code change was introduced to address clock skew during session creation. This change inadvertently affected the archival process for raw session data. Pages that were recorded and appeared to be "from the future" had their timestamps clamped down to the server time. If those pages were still sending data, the page would eventually be reinitialized with the "correct" future timestamp. As a result, these sessions were either completely discarded or misattributed in our event storage database. During an attempt to repair raw session data for the missing web sessions, product analytics associated with the affected 1% of mobile sessions were inadvertently deleted. ## Resolution Upon identifying the defect, we promptly fixed and deployed an update, ensuring page archival for raw session data functioned correctly for all subsequent sessions. All sessions recorded post-incident are and will remain playable. ## Process Changes and Prevention To prevent a recurrence, we've implemented the following action items: * We've enhanced monitoring and alerting for failures impacting the archival of session replay data. * We've updated session replay metadata to be completely immutable to eliminate the possibility of raw event storage becoming corrupted. We are also working on: * Eliminating the timestamp from the object key used for session replay archival, which will reduce the possibility of future clock-related session archival issues. * Integrating monitoring and alerting within the playback client to detect and log potential session storage issues which cause playback failure, in order to provide immediate detection of this type of issue. We deeply regret this incident and invite any FullStory customer who was materially affected to contact [support@fullstory.com](mailto:support@fullstory.com). We stand by ready to fully address all of your concerns.
From July 18 to July 31 there was a bug in our raw data storage that affected a subset of sessions (~1%). These sessions currently fail to play back as expected. During attempted recovery, mobile analytics data for the affected sessions was also deleted for some accounts. Please see our detailed postmortem for more information.
Report: "Data capture outage (web & mobile)"
Last updateOn August 23, 2023 from 2:41 p.m. ET to 3:40 p.m ET an update to the server cluster responsible for hosting caching services essential to our data capture pipeline created a resource constraint which caused an outage in our session capture pipeline. * Approximately 95% of new session capture requests failed for the duration of the incident. * Sessions that had started before the incident were preserved; however, new user activity for those sessions was not captured once the incident began. * This incident affected session capture for both web and mobile apps, as well as events captured using our Server Events API. The affected sessions will not be available within FullStory. ## Customer Impact Approximately 95% of new session requests failed throughout the duration of the incident. As a result, customers may notice a significant drop in the number of newly captured sessions for this period. All missing activity during this time period is non-recoverable and may impact Metrics, Funnels, Dashboards, and Conversions. Additionally, the impacted sessions are not available for session replay. Customers using our mobile SDK may have experienced build errors that prevented apps from compiling. Re-running the build will properly upload the assets. ## Root Cause An update to the log collection process on the server cluster hosting caching services led to insufficient resources to guarantee availability of all caches used in the capture pipeline. This change was made due to a deprecation of an existing cloud service that required migration to new clusters, requiring logging configuration changes to existing clusters to support migrated services. The updates applied caused an increase in memory usage on these servers, which resulted in caching services being preempted. As a result of the decrease in the availability of multiple levels of caching, new session capture requests were unable to enforce privacy configuration and validate new session requests. The failure of both cache layers caused these requests to time out before settings could be retrieved, resulting in all session and data capture requests to be rejected with a server error. This change was made first in our testing environment and validated there, however due to differences in resource allocations and traffic volume this problem was not detected. ## Resolution FullStory operations detected the problem within 5 minutes of the change and immediately took steps to diagnose. The update was reverted at 3:15 p.m., which enabled caching services to restart and fully repopulate, approximately 34 minutes after the initial change. Due to the delays in handling requests during the issue, capture services were unresponsive to new requests and to internal health monitoring. These services were automatically restarted, however without sufficient capacity to handle the backlog of requests. Additional capacity was provisioned and made available at 3:37 p.m., which restored session capture fully at 3:41 p.m. ## Process Change and Prevention We are committed to preventing this incident in the future. We’ve completed the following action items: * Removed log collection process from caching service cluster to eliminate the resource contention that initially caused the incident. * Migrated all services and processes not related to caching services to different server clusters. * Added additional alerting to caching services cluster to monitor for problematic behaviors in restarting and redistributing currently running services, to be promptly notified of additional issues affecting availability Here are additional steps we’re taking: * Improve capture pipeline behavior to ensure that capture settings can always be retrieved directly from the source database in the event of multiple levels of caching being unavailable. * Improve capture pipeline behavior to fail fast when cache retrievals are unsuccessful. This will allow adequate time for multiple fallback data access strategies to be executed. * Update the rollout policies for capture pipeline services to better handle sudden surges of traffic while completely restarting services. This will guarantee adequate capacity during the rollout process and ensure prompt automated recovery.
This incident has been resolved.
End user sessions and data were not captured from 2:41pm ET to 3:40pm ET for accounts on our US data center. All missing activity during this time period is non-recoverable and may impact Metrics, Funnels, Dashboards, and Conversions. During this time, native mobile builds would also have failed. Re-running the build will properly upload the assets. This issue did not impact any FullStory accounts utilizing our EU data center. (Your FullStory URLs would start with app.eu1.fullstory.com if you are utilizing our EU data center.) A fix has been implemented and we are actively monitoring the issue.
Report: "Delayed Indexing - Custom Events via API"
Last updateAll delayed custom events have been fully re-indexed and indexing has returned to normal. We appreciate your patience. If you're still experiencing any issues with custom events from the impacted time frame, please reach out to support@fullstory.com.
All delayed custom events have been fully re-indexed and indexing has returned to normal. We appreciate your patience. If you're still experiencing any issues with custom events from the impacted time frame, please reach out to support@fullstory.com.
We've identified that custom events via API received between 12:25pm EDT and 1:45pm EDT are experiencing delayed indexing. Prior and subsequent custom events via API are indexed as expected. We will provide further updates as soon as they become available. Thank you for your patience and understanding.
We are currently experiencing delays while indexing custom events via API. We are investigating the issue and will update you as soon as we know more.
Report: "Mobile Build Resource Uploads Failing"
Last updateBeginning at 8:37am EDT, all mobile app builds began failing due to errors uploading images during the build. Additionally, mobile apps may have experienced issues uploading dynamic/runtime assets. As a result some sessions between this time period may be missing images. The issue was resolved by 9:13am EDT
Report: "Delayed search indexing"
Last updateSearch indexing is back to normal and the incident is resolved. If you're still having trouble please reach out to support@fullstory.com and we'll be glad to help.
A fix has been implemented and we're monitoring the results.
At approximately 9:00am EDT, FullStory's search indexing data pipeline began experiencing some delays. This will cause indexing delays and search disruptions for most FullStory accounts. The issue is being actively investigated.
Report: "Data Destinations Indexing Delay"
Last updateAll Data Destinations data has been successfully backfilled and this issue is fully resolved.
Data continues to be backfilled for all Data Destinations customers. There have been no further delays in data syncing.
Data Destinations indexing has been delayed for the past 17 hours. We are aware of the issue and are currently backfilling all data. No data loss has occurred as a result of this incident.
Report: "Data Capture Outage"
Last updateWe understand how important our service is to your business, and we take reliability and performance seriously. On August 23, 2022, beginning at 7:55 a.m. EDT \(11:55 a.m. UTC\), an outage with one of our primary databases led to session capture failures and application UI degradation. * Approximately 50% of new capture requests failed between 7:55 and 11:35 a.m. * This incident affected session capture for both web and mobile apps, as well as events captured using our Server Events API. The affected sessions will not be available within FullStory. * Additionally, sessions successfully captured during this outage were delayed in processing, as processing was impacted for the duration of the incident. Some users experienced slow page loads and intermittent errors from the product UI. Our goal is to avoid interruptions, and we use every opportunity to analyze the cause, learn from it, and minimize the chance of it happening again to improve future reliability. This post mortem details the customer impact, the root cause of what happened, how we addressed the problem, and how **we are committed to preventing it from happening in the future**. ## Customer Impact Approximately 50% of new session requests failed throughout the duration of the incident. As a result, customers may notice a significant drop in the number of sessions for this period. Customers may also notice an underrepresentation of conversion rates and revenue attribution when looking across the affected time due to a lack of data. During this time, we also observed delays loading session replay or accessing other portions of the platform. Finally, the culmination of this issue led to a delay in captured sessions being available due to delayed indexing. Customers using our mobile SDK may have experienced build errors that prevented apps from compiling. ## Root Cause The root cause of the incident was database errors impeding the creation of new device and session identification during session capture. Due to database infrastructure migration being performed by our cloud provider, additional contention on the creation of new identifiers led to unrecoverable failures in starting the capture process. The additional requirements of managing identifiers during the database migration, coupled with the volume of existing identifiers and the provisioning of additional query indexes, resulted in unexpected contention and exceeded the processing capacity expectations for the migration process. Advance plans to mitigate impact of the migration and to limit contention during the process were insufficient, and the migration needed to be terminated and rolled back. This caused a delay in restoration. ## Resolution Once we identified the root cause, the infrastructure migration was canceled and reverted, beginning at 9:56 a.m. and completing at 11:35 a.m., which enabled new identifiers to be allocated. Sessions initiated after this time were captured but not immediately processed. To recover from the canceled migration, additional dedicated database capacity was provisioned by our cloud provider and activated at 1:53 p.m. to eliminate processing delays and enable the backlog of captured sessions to be fully processed. All sessions were processed and available as of 3:45 p.m., and processing fully returned to normal. ## Process Changes and Prevention We are committed to preventing this incident in the future. We’ve completed the following action items: * Postponed database infrastructure migration until additional safeguards and capacity can be established. * Terminated the creation of additional query indexes impacting the start of session capture to reduce database contention. Here are additional steps we’re taking: * Launch enhanced session capture protocol that reduces reliance on stateful database interaction to **greatly improve resiliency, performance, and latency end to end** during session capture. * Migrate capture of session information to optimized transaction journals to **reliably and expediently store** session data. * Aggressively clean up expired data and further optimize search and retrieval operations to **reduce database capacity requirements.** * Improve internal settings caching to **improve resiliency** to database issues during processing of captured session data.
Thanks for your patience and bearing with us. At this time, all issues have been resolved and we're back to normal operations. If you're still experiencing issues please reach out to support@fullstory.com and we'll dig in!
Data capture and mobile builds are back to full operation. We are still experiencing a delay in session indexing and will post an update on that issue by 4:30 PM EDT.
Our indexing and session processing pipeline continues to work through the backlog of recently captured sessions. At this time, new sessions may not be immediately available. A small subset of users may also experience unexpected timeouts and delays when loading parts of the FullStory UI including session replay. Mobile app builds may continue to fail intermittently at this time.
At this time we're happy to report that the issue impacting data and session capture has been resolved. Our processing and indexing pipeline is working through a backlog of captured sessions, so recently-captured sessions may not be immediately available. Customers using our Mobile SDK may still experience intermittent build errors. We are continuing to investigate and will follow up shortly.
In addition to the previously mentioned impact we have identified that customers using our Mobile SDK may experience intermittent build errors as a result of this outage. We'll share more details as we have them. Thank you for your patience.
We have identified additional areas of impact: New sessions that don't fail to capture are taking longer to index and process as a result they may not appear until several minutes later. Session replay is also experiencing slowness to load. We are continuing to work with our cloud services provider and explore alternative approaches in parallel.
The root cause of the outage has been identified and is related to an error with one of our cloud service providers. We are working to mitigate the issue now.
FullStory session and data capture is currently experiencing elevated failure rates for all users. We are actively investigating the issue.
Report: "EU Partial Outage: Integrations - 5/26-6/9"
Last updateBeginning on May 26, a defect was introduced to accounts on our EU data center impacting certain integrations. As a result, custom events from our Data Layer Capture integration may have failed to capture. In addition, users of Slack, Intercom, Jira, MS Teams, Productboard, Trello, Mixpanel, and Qualtrics may be missing session links in these integrations for this time period and notes would have failed to send. A fix was implemented on June 9th; however, any lost data during this period is irretrievable. We apologize for the inconvenience. Accounts on our US data center were not impacted by this defect. If you have any questions, please reach out to our support team via support@fullstory.com.
Report: "Partial Outage: Integrations - 5/5-5/19"
Last updateBeginning on May 5, a defect was introduced impacting certain integrations. As a result, custom events from our Data Layer Capture integration may have failed to capture for some customers. In addition, users of Intercom, Mixpanel, Bugsnag, Olark, Google Analytics Universal and Qualtrics may have missing session URLs attached to users captured in these platforms. A fix was implemented on May 19; however, any lost data during this period is irretrievable. We apologize for the inconvenience. If you have any questions, please reach out to our support team via support@fullstory.com. All impacted customers have been notified via email.
Report: "Mobile build resource uploads failing"
Last updateStarting at 2:50pm ET, all mobile app builds were failing. The issue was resolved at 3:47pm ET and we're now seeing a 0% error rate.
Report: "Delayed page loads within FullStory"
Last updateStarting at 1:50pm ET, initial page load times within the FullStory application were taking 10 seconds or more. The issue was resolved at 2:25pm ET and page load performance has returned to normal.
Report: "Delayed mobile app session ingestion for EU data center"
Last updateThis issue is now resolved and all data has been backfilled.
A fix has been implemented and we are monitoring the results.
We are currently experiencing delays in ingesting mobile app sessions in our EU data center which began on March 28th. No data loss has occurred, but segments, metrics, funnels, and dashboards that rely on this data will be impacted until the data is fully indexed. There is no impact on our NA data center or web sessions in the EU data center. We are actively investigating the issue.
Report: "Delayed page indexing"
Last updateAll page indexing from this issue is now complete. Thank you for your patience.
Page indexing continues to catch up. We expect all pages to be indexed by Monday, March 13th, at 10am ET and will provide an update at that time.
Starting on February 28, 2023 at 5:00pm ET we began experiencing delays in indexing a small subset of pages in sessions. We have identified the cause and page data is now indexing, though data may not yet be available in search or metric results.
Report: "Delayed page loads within FullStory"
Last updateAll pages have continued to load as expected since 11:28am ET.
Starting at 7:23am ET, initial page load times within the FullStory application for about 50% of pages were taking 15 seconds. The issue was resolved at 11:15am ET and page load performance has returned to normal. We will continue to monitor page load times.
Report: "Nonsequential playback"
Last updateThis playback issue is now fully resolved. If you're still experiencing playback inconsistencies, please contact support@fullstory.com.
We have deployed a fix for the playback issue affecting some sessions and are monitoring the situation. All sessions should now play back as expected. If you're still experiencing issues, please contact support@fullstory.com.
Engineering has identified a solution to the issue and are in the process of testing and validating that solution. We'll continue to post updates as we have them. Thanks again for your patience and apologies for the turbulence!
We have identified an issue where some sessions' content may appear out of order during playback. Our engineering team is currently working to resolve the issue. Thank you for your patience.
Report: "Indexing is delayed since ~11am ET for a subset of customers"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently experiencing delays in indexing for a subset of FullStory accounts. New sessions may not appear until several minutes later. We are investigating the issue.
Report: "Delayed session data indexing"
Last updateThis incident has been resolved. If you continue to experience any issues, please reach out to support@fullstory.com for assistance.
We are currently experiencing delayed session data indexing & search results. New sessions may not appear until several minutes later. We are investigating the issue.
Report: "FS.event properties dropped"
Last update### 2022/07/29 Capture FS.event Outage Beginning on July 29, 2022 at 9:09 AM EDT \(13:09 UTC\), a defect released with an update to our fs.js web capture script caused all properties associated with any logged [Custom Events](https://developer.fullstory.com/custom-events) to be dropped from the event payload. The outage lasted for 3 days and 13 minutes, until a fix was deployed on August 1, 2022 at 9:22 AM EDT \(13:22 UTC\). Note that sessions captured using FullStory for Mobile Apps were not impacted by this defect. Additionally, events captured using our [Server Events](https://developer.fullstory.com/server-events) API were not impacted by this defect This postmortem details the customer impact, the root cause of what happened, how we addressed the problem, and how we will prevent it from happening in the future. ## Customer Impact Data capture and collection was fully available for the duration of incident, meaning data capture supporting session replay, server events, other API endpoints, and standard FullStory events were unaffected. However, the Custom Events ingested during this period only contained the _name_ of the Custom Event without any of the _event properties_ associated with it. This means that sessions will still show the Custom Event was called, but the properties of those events will not be available or indexed. Capabilities impacted by this unavailability include: Segments, Metrics, Funnels and Conversions relying on this Custom Event data, during the outage window. Data Export files would also be affected for Custom Events and missing their event properties. It’s possible that the loss of custom event properties during this time window will appear as an underrepresentation of conversion rates or revenue attribution for that time period. We would like to work with customers to uncover the extent of this impact and explore further actions we can take to help in these areas. ## Root Cause The root cause was the introduction of a defect in the code to our FullStory script \(fs.js\) that was incompatible with the current version of the FullStory [Data Capture Snippet](https://help.fullstory.com/hc/en-us/articles/360020828273-Getting-Started-with-FullStory#h_01FXB8T39JB6TPBWMR3727QMVV) used by our web capture clients. Our systems include a rigorous set of automated integration testing of the FullStory script in place for all our FullStory Client APIs during deployments; however, in a parallel effort to upgrade our FullStory snippet we updated our automation suite to use the upcoming snippet and lost coverage for the previous snippet version. The code was thought to be safe because it had passed the checks and tests within our Continuous Integration \(CI\) pipeline and had been successfully deployed in our staging environment for 14 days without a reported issue. ## Resolution Once we identified the root cause — incompatibility with the current FullStory snippet and the FullStory script \(fs.js\) — we rolled out a fix for the FullStory script to begin data collection for Custom Events. Sessions started an hour after the 9:22 AM EDT \(13:22 UTC\) deployment should be working as expected. ## Process Changes and Prevention To prevent a recurrence of this incident in the future, we’ve completed the following action items: * Rolled back FullStory script and snippet release to the prior working version. * Added manual QA verification steps of our FullStory Client APIs before each release. We also have work slated on our immediate roadmap to: * Expand our Continuous Integration pipeline with more automated testing against all major versions of the FullStory snippet and FullStory script releases \(and release candidates\) to catch incompatibilities in development * Build monitoring dashboards to better and more quickly observe issues in Client APIs as part of Release Playbook * Build internal alerting against our monitoring to more quickly take remediation action * Explore possibility for enterprise customers to subscribe to long-term version options of the FullStory script and snippet We will report on the progress of the above action items on a regular basis. We deeply regret this incident and invite any FullStory customer who was materially affected to contact [support@fullstory.com](mailto:support@fullstory.com) if you feel unsatisfied after this explanation or if we can provide additional processing to your data to help with the metrics reporting over the impact period. We stand by ready to fully address all of your concerns.
On Friday, July 29th, at 9:09am EDT, a bug was introduced via a release to fs.js that caused FS.event options to be dropped. This issue was patched on August 1st, at 9:22am EDT, and the issue was resolved. Any FS.event call with eventProperties (as defined by our API documentation, https://developer.fullstory.com/custom-events) were dropped. Events generated in affected time period will not have the relevant data attached to the event.
Report: "Delayed session data indexing"
Last updateThis incident has been resolved. If you continue to experience any issues, please reach out to support@fullstory.com for assistance.
A fix has been implemented and we are monitoring the results.
Engineering has identified the issue and we are working to rectify it at this time.
We have received reports of delayed session data indexing & search results for some FullStory customers. We are currently investigating the issue.
Report: "Delayed Indexing"
Last updateSession indexing has returned to normal. If you experience any delays with newly captured sessions appearing in your org please reach out to support@fullstory.com and we'll be happy to help out.
Engineering has taken steps to resolve the issue and we are monitoring the situation closely.
Engineering has identified the issue and we are working to rectify it at this time.
We are currently experiencing delays while indexing session events. New sessions may not appear until several minutes later. We are investigating the issue and will update you as we know more.
Report: "Delayed session processing"
Last updateSession capture and indexing have stabilized and this incident is now resolved. If you continue to experience any issues, please reach out to support@fullstory.com.
Performance is operational but we are still monitoring closely to ensure there are no remaining issues.
We are continuing to monitor this issue for any further session capture and indexing delays.
We are continuing to monitor this issue throughout the remainder of today and will provide another update by 11:00am ET tomorrow (6/22).
We have implemented a fix but will continue to monitor for any continued session capture and indexing delays.
We've identified the cause for delays in session capture and indexing and are working to resolve.
We are currently experiencing delays in session capture and indexing. New sessions may not appear until several minutes later. We are investigating the issue.