Historical record of incidents for Simon Data
Report: "System outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We're experiencing an outage. Flows and journeys will be delayed as we repair the impacted system.
Report: "Snowflake Partial Outage"
Last updateThis incident has been resolved.
Simon is experiencing impacts to all data pipe refreshes, syncs, and journeys due to the Snowflake partial outage described here, https://status.snowflake.com/.
Report: "Simon Web App and Audience API Disruption"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently experiencing a service disruption with the Simon Web App and Audience API. We are investigating the degraded performance.
Report: "Simon Web App Service Disruption"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results. Web App may be slower to load.
Simon experienced a database outage. The database was restarted, we are working with AWS to determine the root cause.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are currently experiencing a service disruption with the Simon Web App and are investigating the degraded performance.
Report: "Simon Documents Page Not Accessible"
Last updateThis incident has been resolved.
Access to the Simon Documents pages have been restored and we are monitoring the results. We apologize for the downtime.
docs.simondata.com is currently unavailable. We are communicating with our partner on resolution. We will post more updates as the issue is resolved. We apologize for the inconvenience.
Report: "Journey Log Data Outage"
Last updateFrom 13:00 to 14:40 EST, Journey actions, e.g. sends, delays, continued as expected. However, the associated log data was not written to Journey Contact History. Therefore, Journey actions during this time will not be reflected in Simon Journey's Reporting UI as well as available in Contact History log data feeds.
Report: "Elevated 5xx Transmissions error rates and latency for some SparkPost customers (US hosted only)"
Last updateSimon has finalized recovery efforts as a result of the reported incident. We are closing out this incident. If you have any questions or see any issues, please report it to our support team.
We are still monitoring as we recover any failed flows or journeys.
Sparkpost indicated a fix has been implemented and Simon is monitoring the results as we recover flows and journeys.
Sparkpost is experiencing an elevated level of errors and latency on the Transmissions API for some SparkPost customers. Some Simon Mail users may experience issues. Our engineering team is closely monitoring the issue for updates from Sparkpost.
Report: "Simon Data Web App may be inaccessible to users"
Last updateAs of 1:15PM EDT, AWS has determined that services relevant to Simon Data have recovered. We will continue to monitor on our end for any changes. If you have any specific questions for our team, please reach out via support ticket.
We have a few confirmed reports that the Web App is available for some users, but performance may be degraded. We continue to monitor the situation and will provide updates here.
There is a known AWS outage impacting our Simon Data endpoints. We are actively monitoring the issue and will provide updates here.
We are continuing to investigate the issue with the Simon Web App. It should be noted that the Audience API was also impacted, but has since recovered. The duration of this impact to AAPI was 10:45 - 10:55 AM Central. No other products/features are known to be impacted at this time, but will update as information becomes available. If you are experiencing an issue with product/features outside what is mentioned, please let us know by submitting a support ticket to our team.
Simon Data is currently investigating an issue where the Web App may not be accessible by users. Our team is currently working on resolution. Updates will be posted here.
Report: "Journeys Degraded"
Last updateThis incident has been resolved and all impacted journeys have been recovered
At 03:40 PM PDT on July 30th AWS noticed increased error rates and latencies for some service APIs within the US-EAST-1 region. At 03:59 PM PDT on July 30th AWS determined it was due to the Kinesis service in US-EAST-1. The root cause was identified and they began working on a solution As of July 30th at 09:55 PM PDT AWS fully recovered Kinesis errors and closed the issue Simon Data has been recovering failed journeys since the service has returned to normal. We are actively working through the backlog and will provide another update once we are recovered.
The Journeys product is in a degraded state due to an outage of AWS Kinesis, upon which Journeys depends. Though many contacts are not impacted, some contacts that should be flowing through Journeys are not.
Report: "Simon Mail Preheader Rendering"
Last updateSafe guards have been implemented to ensure proper rendering in cases of multiple pre header configurations. It is still recommended that best practices are followed per our documentation.
We are continuing to work on a fix for this issue.
For Simon Mail customers, it has been identified that the inclusions of a preheader in both the email template and the campaign configuration causes rendering errors in some email clients, resulting in a blank page. This issue is not immediately discernible because it manifests differently across various email service providers (ESPs). We are actively working on implementing safe guards to restrict this configuration. More immediately, to avoid this issue, clients should refrain from using duplicate preheaders and follow best practices outlined in our documentation. If they encounter rendering issues during testing, they should remove the preheader from either the template or the campaign configuration.
Report: "Simon Web App Service Disruption"
Last updateThis incident has been resolved.
We are currently experiencing a service disruption with the Simon Web App. We are monitoring the degraded performance.
Report: "Journeys Experiencing Performance Issues"
Last updateThis incident has been resolved.
We have implemented a fix and contacts are successfully entering journeys. Our team is continuing to monitor the situation.
We are currently investigating an issue that is preventing contacts from entering journeys.
Report: "Audience API Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring performance.
Audience API is currently experiencing an outage. We are currently looking into this issue and will provide updates.
Report: "Flows Unable to Successfully Sync"
Last updateThis incident has been resolved.
We have implemented a fix and are continuing are monitoring performance.
We have implemented a fix and are continuing to monitor. Syncs are running successfully with the exception of event triggered syncs. We will continue to provide updates
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
Simon is currently experiencing an issue that is preventing flows from successfully syncing.
Report: "Email Exports for Datasets, Segments and Campaigns Unavailable"
Last updateThe new workflow is now live for exports
We are QA-ing an adjustment to the functionality and will post here when it's resolved.
We are currently investigating an issue that prevents users to successfully export via email datasets, segments or campaigns. Our team is looking into this and we will continue to provide updates.
Report: "Some Event Datasets Experiencing Higher Latency"
Last updateThis incident has been resolved.
We have implemented a fix and will provide another update once the issue has been resolved.
We have identified an issue for some customers that is causing event datasets to have higher latency than normal. Our team has identified the issue and is working on a solution. We will provide updates.
Report: "Simon Mail event collection degraded performance"
Last updateThis incident has been resolved. Our MTA partner started replaying historical missed data at 8:30ET. Customers will begin to see missed events delivered to their webhooks and continuing until all data has been replayed. We plan to replay 1 hour of open and click events from 2/22 1800UTC through (and including) 2/23 19:00UTC with a 5 minute pause between each hour of data.
We are continuing to monitor for any further issues.
From our partner: "We have verified that the fix is working as expected and customers are seeing generation of live open and click events. We are actively finalizing testing on the replays and anticipate starting the recovery of historical open and click events within the next few hours. " It is expected that events will be replayed and data might not be up to date as they "plan to replay 1 hour of data at a time, starting from the latest hour and working up to present time, with a pause of 5 minutes between each hour of data." We are continuing to monitor the fix and will update this page with any further information.
Due to an outage with our mail transfer agent (MTA) partner, email events like opens and clicks may currently be underreported for any sends after 1PM EST 2/22/24. A fix has been identified and those events will be backfilled once the fix is in place. The fix is expected to go into place in the next few hours, then replay work will begin. Will will update here with more info as it comes in.
Report: "Degraded Attentive Performance"
Last updateThis incident has been resolved.
Attentive has identified the issue and is actively working on a fix
Attentive is currently experiencing an issue between our servers. We have a working hypothesis to the problem and are working on a fix.
Report: "Simon Web App Service Disruption"
Last updateThis incident has been resolved.
We are currently experiencing a service disruption with the Simon Web App. We are monitoring the degraded performance, we will send an additional update in 30 minutes if not resolved.
Report: "Audience API unavailable"
Last updateThis incident has been resolved.
We are currently investigating an issue making the Audience API and portions of the UI regarding content blocks/templates unavailable for a subset of our customers.
Report: "Simon Mail Outage"
Last updateThis incident has been resolved.
The issue has been identified and we are seeing Simon Mail sends delivering to inboxes. We will let you know once the issue has been fully resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating a Simon Mail outage. Our team will be providing more information as it becomes available.
Report: "Simon Web App Service Disruption"
Last updateThis incident has been resolved.
We are monitoring a recent brief service disruption with the Simon Web App. Our engineering team is currently monitoring the issue.
Report: "Simon Web App Service Disruption"
Last updateThis incident has been fully resolved.
A fix has been implemented and we are monitoring the results.
We are currently experiencing a service disruption with the Simon Web App. Our engineering team has identified the root cause and is implementing a solution. All users attempting to use the web application may be affected. We will send an additional update in 30 minutes if not resolved.
Report: "AWS Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The AWS outage has been resolved and Simon recovery is on it's way. We are continuing to monitor our systems as components become operational. We will provide more updates as we make progress on recovery efforts.
The AWS outage has been resolved and Simon recovery is on it's way. We are continuing to monitor our systems as components become operational. We will provide more updates as we make progress on recovery efforts.
The AWS outage has been resolved and Simon recovery is on it's way. We are continuing to monitor our systems as components become operational. We will provide more updates as we make progress on recovery efforts.
AWS has started to see signs of recovery which has positively affected Simon components affected. We are continuing to work on this issue in partnership with AWS support.
We are continuing to work on a fix for this issue.
We are continuing to work on a fix for this issue.
Our upstream provider, Amazon Web Services (AWS), is currently experiencing an issue which is affecting our services. We are currently looking into this issue in partnership with AWS and will provide updates.
Report: "Facebook Integration Service Disruption"
Last updateThis issue has been resolved by the Meta Developer Operations Team and performance is back to normal.
We are continuing to investigate this issue.
We are currently experiencing a service disruption with Simon Data's Facebook integration affecting data syncs to the Facebook platform. While we work with the Meta Developer Operations Team to resolve this issue, Facebook syncs will not be updated.
Report: "Data Processing Delays - 5/17/2023"
Last updateThis incident has been resolved.
The root cause of the data pipe delay has been identified and a permanent solution has been in place. Currently monitoring all pipe runs as they land on a delayed schedule.
Our data processing infrastructure is running behind due to a transient issue identified at 1206am EST. This is causing temporary delays in data delivery and should be resolved soon. No data has been lost and the system should be caught up shortly.
Report: "Network Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We're currently investigating the issue.
Report: "Degraded Simon Data Pipes and Syncs Performance"
Last updateThis incident has been resolved.
We are monitoring degraded performance of Simon Data Pipes and Syncs. Your organization may experience delays in journey syncs and/or data refreshes.
Report: "Degraded Simon Data Pipes Performance"
Last updateThis incident has been resolved.
We are monitoring degraded performance of Simon Data Pipes. Your organization may experience data refreshes completing slightly later than expected.
Report: "Degraded Campaign Execution"
Last updateThis incident has been resolved. All flows have been successfully recovered.
At 5:31PM EST a fix for the infrastructure that executes campaigns was implemented. The new infrastructure is now live and processing jobs in a normal, stable state. We expect all campaign sends that were delayed to be recovered in say an hour. Newly sent flows are processing normally as expected
At 3:13PM EST the infrastructure that executes campaigns began experiencing degradation impacting flows, data feed exports, and our legacy journeys product. Our team identified the symptom and addressed it, but there was a subsequent recurrence which is now being addressed via a deeper infrastructural remediation. We expect delays of messaging to be not longer than 3 hours in total before all campaigns are back on track.
Report: "Degraded Web App Performance"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
We are beginning to see improved stability on the AWS system causing the service degradation and are monitoring to see if it persists.
We are experiencing sporadic issues with AWS that is leading to intermittent 500 errors for users of the Simon Data application. We have identified the source of this issue and are currently working with AWS to resolve.
Report: "Simon Mail Drag-and-Drop Editor Outage"
Last updateThis incident has been resolved and the Drag-and-Drop Editor is fully operational.
The issue has been identified and a fix has been implemented. We are working on a full resolution of any remaining impact.
We are currently experiencing an outage to our Drag-and-Drop Editor used by all Simon Mail customers. This is leading to errors creating, editing, and viewing templates in the Drag-and-Drop Editor. Any Simon Mail flows leveraging templates that had already been saved are not impacted and templates are being sent appropriately to customers.
Report: "Temporary impact on some data refreshes, scheduled one-time flows, and event-triggered flows"
Last update# **Overview** Last week on July 27th, Amazon Web Services \(AWS\) was performing routine maintenance on a piece of infrastructure that Simon uses to power segmentation. During the maintenance window, AWS' routine failed and this left parts of Simon's segmentation functionality temporarily unavailable. This had ramifications for use of several Simon Data platform interfaces as well as on-time delivery of downstream flows & journeys. The attached file documents the steps we took to react to and remediate the issue while in parallel escalating the emergency to AWS who ultimately was able to identify the source of the issue and fix. AWS acknowledged that this bug was introduced by AWS and that the remediation they implemented on our infrastructure is a permanent solution.Executive Summary # **Executive Summary** In the early morning EST of July 27th 2022, a maintenance routine to an Amazon-managed service that Simon uses to power segmentation left that service unavailable. Simon was immediately paged and our on-call support reacted. Unfortunately, the most direct and fastest-acting remediations were not possible due to persistent hardware failure from Amazon, requiring Simon to fall back on reconstructing data on new services. In the end, Amazon rectified the bug in their maintenance program and remediation took effect before Simon finished reconstructing data on new services - rendering that effort unnecessary. No data or messages were lost, but both data refreshes and active or planned campaigns were delayed while Simon performed an emergency migration to a functional database. This resulted in delayed data refreshes and campaign launches, in addition to leaving the core segmentation product, and those products that depend upon it \(like unified contact view and selecting sample contacts in content\), to be unavailable until the migration completed. # **Root Cause** Simon Data splits data, leveraged by our platform, by kind into different databases and business logic into other databases. Each database comes equipped with multiple, redundant nodes. We routinely exercise node failover logic during upgrades, maintenance, and failures. Maintenance on active Amazon Web Services \(AWS\) databases that Simon Data uses is frequent and rarely are there unplanned outages because maintenance processes with AWS are typically predictable. On July 27th, 2022, AWS’ maintenance routine introduced a subtle bug that prevented its maintenance from finishing. As a result, the segmentation database was being reported as still under maintenance. The Simon Data on-call team was alerted and immediately reacted, however contacting our technical support at AWS took longer than expected and the Simon on-call team only received 1 downstream page. While the Simon team performed an emergency data reconstruction / migration to healthy infrastructure, the Simon team escalated through multiple teams at AWS. Despite the parallel escalations, it took longer than normal to have AWS view this incident as an emergency situation instead of an unhealthy situation. Once recognized as an emergency situation, AWS resolved quickly and this happened before Simon’s internal reconstruction / migration completed. AWS has provided their post mortem to Simon Data: “_temporary tables were inadvertently created \[by AWS\] during a maintenance period that caused a corruption of metadata and prevented the cluster from powering on”_ # **Impact Analysis** All customers using one of our specific segmentation databases saw interruptions in service in Unified Contact View \(UCV\), Segmentation, and content loading tools in Simon # **Remediation Plan** ## **Quicker Time to Detection & Alerting** Simon Data has reviewed the process we use for publishing incidents to our status page. We have cut out the manual steps which delayed the alert to our customers longer than expected. ## **Quicker Time to Resolution & Recovery** Simon Data has revisited design and usage of our segmentation databases such that a new process was added to validate that maintenance has been conducted correctly. If it doesn’t finish as intended, immediate migration to another segmentation database will occur to stem interruption of service for Simon Data customers.
This incident has been resolved.
Approximately 20% of affected client organization's pipes have been fully restored. We will continue to update here as we restore the remainder of affected pipes.
The fix has been implemented and we are starting to recover pipes of affected client orgs.
A fix has been identified and is currently being implemented. We will provide further updates as the fix is rolled out.
We have identified this issue due to unplanned maintenance by AWS. We have escalated to our technical support team at the provider for immediate assistance and will update this status soon.
Report: "Attentive Integration Transient Errror"
Last updateThis incident has been resolved.
We've discovered a transient error with the Attentive integration and are currently investigating. While we investigate the issue, we have paused the integration.
Report: "Simon Platform Degraded Performance"
Last updateThis issue is now resolved. The root cause was a build up of queued tasks, resulting in a slowdown of site performance. We have addressed the performance impact and are also working on improvements to increase our processing capacity. Thank you for your patience as we worked to resolve this!
We are continuing to work on a fix for this issue.
The Simon Data platform is currently experience an issue that is causing a slower than normal user experience. We are actively remediating the issue and will post an update as soon as the issue is resolved.
Report: "ETF Pipeline Outage"
Last updateThis incident has been resolved and all ETFs are now in full recovery.
We are currently seeing issues with the AWS Lambda service that writes events from Kinesis to our Postgres DB. This is leading to a failure of some Event Triggered Flows. Our engineering team is currently investigating this issue.
Report: "AWS Incident Affecting Some Data Pipe Latency"
Last updateThis incident has been resolved and any delayed data pipes are now in a recovery state.
AWS East-1 is experiencing an issue affecting its Elastic Load Balancing APIs - leading to intermittent delays in some Simon Data Pipes. We are actively monitoring this issue and will resume normal data operations as soon as the AWS incident is resolved.
Report: "Facebook Integration Service Disruption"
Last updateThis incident has been resolved.
We are currently experiencing a service disruption with Simon Data's Facebook integration affecting data syncs to the Facebook platform. Our engineering team is working to restoring the data syncs.
Report: "Facebook Integration Service Disruption"
Last updateThis incident has been now resolved.
Simon’s Facebook integration functionality has been restored, however we will be actively monitoring for any additional issues. Our team will contact you directly if it is necessary for your account to re-authenticate your Facebook credentials on the Simon Data integrations page.
Our engineering team, in collaboration with Meta's team, has identified the issue and is working with Meta's development team to put the solution in place.
We are currently experiencing a service disruption with Simon Data's Facebook integration affecting data syncs to the Facebook platform. Our engineering team is working to identify the root cause and implement a solution in collaboration with the Meta Developer Operations team.
Report: "Facebook Integration Service Disruption"
Last updateThis incident has been resolved.
We are currently experiencing a service disruption with Simon Data's Facebook integration affecting data syncs to the Facebook platform. Our engineering team is working to identify the root cause and implement a solution in collaboration with the Meta Developer Operations team.
Report: "Simon Mail outage 4/14 to 4/15"
Last updateWe have reached out to all affected customers to confirm which flows to re-sync.
Simon Mail experienced an outage from April 14 at 11:52am ET to April 15 at 11:12 AM ET. Simon Mail is now operational, and we will be reaching out to individual customers confirming which flows to re-sync.
Report: "Simon Data Audience API is experiencing issues"
Last updateAt 11:57AM EST one of our primary datastores had a partial outage. The Simon Data engineering team was immediately notified and began investigating the issue. With the outage, some of the data that is used for the Audience API was unavailable, causing a surge of failures when attempting to retrieve data. While the engineering team worked to restore the hosts, they identified the root cause of the issue that directly impacted the Audience API. At 12:45PM EST the team restored the data and the Audience API immediately started handling requests. The team continued to watch traffic to ensure that there were no other issues before communicating more broadly. The Simon Data engineering team is further reviewing the architecture to see if there are improvements that can be made for increased resiliency or to better isolate outages from impacting our customers. We apologize for any disruption to your business and are constantly striving to improve our systems to better serve your needs.
Simon Data has resolved issues with the Audience API and all customer tables have been fully restored. The team is currently working on a post mortem.
We are continuing to investigate this issue.
The Simon Data team has identified the issue and is making steps to recovery
The Simon Data Audience API is currently down tied to issues occurring in our upstream data processes. The team is actively investigating this and an update will be posted shortly.
Report: "Flow index page temporarily unavailable"
Last updateOur engineering team has applied a temporary fix to unblock flow index page use, while unplanned maintenance continues.
We are continuing to investigate this issue.
The flow index page is currently unavailable due to maintenance. Syncs are not affected by this issue.
Report: "Slack Notification Service Outage"
Last updateOn January 11th at 11:43 AM EST, Simon Data deployed a change to our Notification Service. This change caused a bug that impacted delivery of alerts about Data Pipe refreshes and Flow Sync statuses via Slack. During this window, email alerts were still working. This bug was identified at 2:28pm EST on January 11th and was fully resolved at 2:42pm EST on January 11th.
Report: "Simon Data Pipes Delayed"
Last updateWe can confirm a data refresh has been started for every Simon customer. While pipe completion will still be delayed, the issue has been resolved.
Simon Data Pipes have been delayed due to issues with AWS this morning. While AWS has not yet resolved these issues at this time, we are actively recovering all pipes and expect them to complete later today. See the AWS health dashboard for reference: https://status.aws.amazon.com/
Report: "Simon Mail Subscription Center - Unsubscribe Link Bug"
Last updateOn December 8th at 11:15am EST, Simon Data deployed a change to our Simon Mail Subscription Center. This change caused a bug that impacted a recipient's ability to unsubscribe from email using unsubscribe links for all Simon Mail customers. This bug was identified at 9:28pm EST on December 8th and was fully resolved at 12:52pm EST on December 9th.
Report: "Event-triggered flows interrupted"
Last updateOn the morning of Tuesday, December 7 [our primary cloud provider had an outage](https://aws.amazon.com/message/12721/) that impacted a number of our systems. The outage impacted Simon from Tuesday, December 7th, 10:44 AM EST to Wednesday, December 8th, 3:38 PM EST. The primary services impacted were Event-Triggered Flows, Data Pipes, and Data Syncs. During our vendor's outage, we noticed delays and problems impacting our services. We initiated our safety protocols to prevent loss of data and continued to monitor and provide updates. At 8:13 PM EST on Tuesday we received information from our vendor that some systems had become operational. After our team validated system health we reverted our safety protocols and allowed our systems to scale back up. Working with our customers, we identified and continued data processes in flight. We finalized the requested processes by 1:30 AM EST. On Wednesday at 12:31 PM EST we found additional processes that needed manual intervention and continued to unblock our data processes. We were completely back to normal operations by Wednesday December 8th, 3:38 PM EST. Simon Data apologizes for any disruption that was caused to your business. We are committed to the efficient processing of your data in a timely manner and take matters like this seriously. Please reach out to your Account Manager if you have any follow-up questions or concerns.
We've identified the root cause of the issue and resolved it.
We're still working on the issue.
We are continuing to work on a fix for this issue.
The issue has been identified as the result of an outage in AWS US-EAST-1.
We are continuing to investigate this issue.
We are currently experiencing issues with AWS that are interrupting flows, pipes, and some aspects of the Simon Web App. See AWS' health dashboard for reference: https://status.aws.amazon.com/