Split by Harness

Is Split by Harness Down Right Now? Check if there is a current outage ongoing.

Split by Harness is currently Operational

Last checked from Split by Harness's official status page

Historical record of incidents for Split by Harness

Report: "Console authentication errors: "Unable to start the FME functionality""

Last update
resolved

The fix resolved the issue. This incident is closed.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Our team is currently investigating an issue affecting user authentication into the Split console for some users.

Report: "Data Ingestion Delay"

Last update
resolved

All data pipelines have been caught up.

identified

The issue continues to clear up with only the S3 ingestion still backlogged.

identified

Delays have been brought down and we are continuing to work on this issue.

identified

We've identified an issue where we started experiencing a lag ingesting events, this is clearing up but has caused downstream effects where some customers have a three hour window where data is delayed (not yet showing up in the UI or possibly in a web hook). This is not affecting all customers, but if a customer is not seeing data where they would normally expect to this is quite possibly connected. We are addressing the issue and will let people know once we have backfilled data successfully.

Report: "Increased latency for web console"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and our services are back to normal. We continue to monitor the results.

investigating

We have identified an issue causing increased latency and sporadic timeouts affecting our web console. Customers may also observe increased latencies for API calls made by our SDKs, but these do not affect the SDK's performance or feature flag evaluations. Our team is working to resolve this issue as soon as possible.

Report: "Support email address experiencing issues"

Last update
resolved

This issue has been resolved, and now support cases received through support@split.io are being processed normally.

identified

We have identified an issue affecting our ability to process support cases through support@split.io and our team is working on resolving this. In the meantime, support tickets can be created normally using our support@splitsoftware.zendesk.com address. Support cases opened through our Help Center or Web console are unaffected.

Report: "Elevated error rate for Web Console"

Last update
resolved

A configuration error was identified and mitigated before full propagation. This lead to a short period of time with an elevated error rate affecting our Web Console between 9:14PM and 9:22PM GMT on November 6th. No other services were affected and all of our SDKs operated normally during this period of time.

Report: "Split console inaccessible"

Last update
resolved

The incident has been resolved.

monitoring

A fix has been implemented and the system is available. We are monitoring the system.

investigating

We are currently investigating an issue where the console is inaccessible.

Report: "Calculate jobs are not starting for some experiments"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and the calculations are recovering. We are monitoring the system.

investigating

We are currently investigating this issue.

Report: "CDN purges not propagating"

Last update
resolved

This incident has been resolved.

monitoring

After working with our CDN vendor, we can confirm purges are now propagating normally and all responses are serving current feature flag definitions. We continue to monitor the results.

investigating

We are investigating an issue affecting our CDN purges, which can in some lead to older definitions being served from cached responses. Customers may observe issues with change propagation for their feature flags and segments. SDK instances initializing against a CDN POP with cached data for their environment may fetch outdated definitions. SDK instances connected to streaming will continue to fetch changes normally.

Report: "Investigating an issue with SDK timing out"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are experiencing an issue with segment-related queries which cause calls from SDKs to get this data to fail. We are investigating the issue now.

Report: "Elevated error rate for Auth API and login issues"

Last update
resolved

Today between 8:00AM-9:11AM GMT we experienced a degradation in several internal services. This resulted in an elevated 5xx error rate for our Auth API as well as login issues for some of our users. Any streaming-enabled SDKs facing errors from the Auth API automatically fell back to a polling strategy and continued to receive updates without any interruptions during this period. Users affected by this incident were unable to log in using their username and password in https://app.split.io/login. SSO was unaffected.

Report: "S3 data ingestion delay"

Last update
resolved

This incident has been resolved.

identified

We have identified an issue causing the S3 data ingestion to be delayed and are working on a fix.

Report: "Intermittent 503 responses from SDK API"

Last update
resolved

This issue has been resolved and all our systems are healthy. We thank you for your patience while our team addressed this.

monitoring

Our team detected the root cause for this issue in a high amount of variants for cached resources being created in certain scenarios. A fix has been implemented to ensure a more efficient caching strategy by reducing unwanted revalidation and preventing excessive variants for objects in entries critical for SDK initialization. All indicators are currently healthy and validate that the fix has effectively mitigated the issue. We continue to monitor the results.

identified

We detected a spike in 503 responses from our SDK API today between 15:04 and 15:06 GMT. Our team continues to work on a fix for this issue.

identified

We are investigating some usage patterns affecting our CDN and sporadically causing slightly elevated error rates during short periods of time in certain regions. This can lead to initialization timeouts for SDKs initializing from an empty cache. Running SDK instances are unaffected, and synchronization will continue to work without issues. All of our SDKs have built-in retry mechanisms which means they will automatically recover once the event is resolved, typically within minutes of the onset of the issue. Client side SDKs with a populated cache from a previous session are unaffected. Our mobile SDKs have persistent caching always enabled, and for our browser-based SDKs this can be enabled by setting the storage type to 'localstorage' through the SDK factory configuration (see: https://help.split.io/hc/en-us/articles/360020448791-JavaScript-SDK#configuration). In order to minimize any potential impact to our customers' apps, we highly recommend following our best practices by handling SDK initialization timeouts and control treatments. More information on best practices to build a resilient integration with Split can be found here: https://help.split.io/hc/en-us/articles/25255992258701-How-to-build-a-resilient-integration. Our team is currently working on a fix for to remediate this issue as a top priority, and it is expected to be deployed shortly. We at Split sincerely apologize for the impact caused to any customers who were affected by this issue, and thank you for your patience this week as our engineering team investigated and addressed it.

Report: "Intermittent 503 responses from SDK API"

Last update
resolved

We are investigating some usage patterns affecting our CDN and sporadically causing slightly elevated error rates during short periods of time in certain regions. This can lead to initialization timeouts for SDKs initializing from an empty cache. Running SDK instances are unaffected, and synchronization will continue to work without issues. All of our SDKs have built-in retry mechanisms which means they will automatically recover once the event is resolved, typically within minutes of the onset of the issue. Client side SDKs with a populated cache from a previous session are unaffected. Our mobile SDKs have persistent caching always enabled, and for our browser-based SDKs this can be enabled by setting the storage type to 'localstorage' through the SDK factory configuration (see: https://help.split.io/hc/en-us/articles/360020448791-JavaScript-SDK#configuration). In order to minimize any potential impact to our customers' apps, we highly recommend following our best practices by handling SDK initialization timeouts and control treatments. More information on best practices to build a resilient integration with Split can be found here: https://help.split.io/hc/en-us/articles/25255992258701-How-to-build-a-resilient-integration. Our team is currently working on a fix for to remediate this issue as a top priority, and it is expected to be deployed shortly. We at Split sincerely apologize for the impact caused to any customers who were affected by this issue, and thank you for your patience this week as our engineering team investigated and addressed it.

Report: "Elevated error rate for SDK API"

Last update
resolved

Today our SDK API experienced a slightly elevated error rate between 4:13PM and 4:26PM GMT. During this period, some customers may have observed isolated SDK initialization timeouts. Any running SDK instances were unaffected.

Report: "Persistent HTTP 500 Errors from streaming.split.io"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an issue with 500 Errors from streaming.split.io.

Report: "SDK API elevated error rate"

Last update
resolved

This incident has been resolved. A postmortem will be available shortly.

monitoring

We are currently investigating an issue that caused an elevated error rate for our SDK API between 3:05PM and 3:32PM GMT. During this period customers may have observed SDK initialization timeouts. Any running SDK instances were unaffected. The issue has been mitigated and our services fully recovered by 3:45PM GMT.

Report: "Investigating an issue with web console slowness"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are seeing issues in the web console while trying to browse the site. We are investigating the issue now.

Report: "Investigating an issue with delay in processing impressions"

Last update
resolved

This incident has been resolved.

investigating

Split has identified an issue causing a delay in processing impressions. We are investigating the issue now.

Report: "Investigating an issue with SSO login"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are investigating issues affecting SP-initiated SSO. Customers experiencing issues attempting to log in using SSO through app.split.io/login may log into the web console through an IdP-initiated login using their IdP portal.

Report: "Investigating an issue with impressions and events collection"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

Split has identified an issue affecting the impressions and events collection in our backend. We are investigating the issue now.

Report: "Intermittent errors with SDK API"

Last update
resolved

This incident has been resolved.

monitoring

We are currently investigating intermittent errors in SDK API. Requests are being processed but you might see occasional errors that will be retried.

Report: "Impressions/Events traffic ingestion issue"

Last update
resolved

This incident has been resolved.

monitoring

We have identified an issue that might have caused some data loss for impressions and events but our SDKs will retry. We are deploying a fix for this issue.

Report: "Investigating an issue with Live Tail not picking up impressions"

Last update
resolved

A fix has been deployed and the incident is resolved.

investigating

Split has identified an issue causing Live Tail to not pick up impressions. We are investigating the issue now.

Report: "We have identified an issue that can cause registering new users to fail."

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have identified the cause of the issue and are taking steps to remediate it.

Report: "Split Corporate site migration"

Last update
resolved

This incident has been resolved.

identified

Currently, we are in the process of migrating our corporate site at split.io to a new version, which could result in temporary degradation.

Report: "Issue with split editor"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an issue rendering complex feature flag definitions that can cause unresponsiveness in the web console editor on Chrome 116. While we work towards a solution, affected customers may use other browsers such as Firefox, Safari, or Edge to continue their work using feature flags.

Report: "Admin API keys scoping error"

Last update
resolved

This incident has been resolved.

identified

We have identified an issue with the admin API that is causing 404 errors. We are working on a solution.

Report: "S3 data ingestion delay"

Last update
resolved

This incident has been resolved.

identified

We have identified the cause of the issue and are taking steps to remediate it. As of now, s3 inbound integration is seeing delays. So, any events or new event types being uploaded via S3 inbound will be delayed getting into Split.

Report: "Potential Log in issue"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We are investigating a potential issue that could impact logging. If you are seeing issue, please try clearing the cache.

Report: "Environments tab incorrect layout"

Last update
resolved

This incident has been resolved.

identified

We have identified an issue causing an incorrect layout to be displayed in the environments tab in our web console. A fix is being worked on and will be deployed shortly.

Report: "Investigating an issue with login page"

Last update
resolved

This incident has been resolved.

identified

Split has identified an issue causing the user to not be able to login. We are investigating the issue now.

Report: "Backend resource issue impacting web console"

Last update
resolved

This incident has been resolved.

identified

We have identified the cause of the issue and are taking steps to remediate. As of now, the impact is slow, unresponsive, or unexpected behavior from web console.

Report: "Investigating an issue with Live tail"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

Split has identified an issue causing Live tail to be unavailable. We are investigating the issue now.

Report: "Investigating an issue with backend performance affecting some services"

Last update
resolved

We have made a rollback and the service has returned to normal operation

identified

Split has identified an issue causing slow services. We are investigating the issue now.

Report: "Web console might be slow or unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

The web console is now back to normal and we are monitoring the results.

identified

Split has identified an issue affecting a few of our applications and causing our web console to become slow or unresponsive due to internal database load. We are investigating the issue now.

Report: "Segment or Metrics page show error when editing owners."

Last update
resolved

This incident has been resolved.

identified

We have identified the cause of the issue and are taking steps to remediate. As of now, the impact is minor.

Report: "Split Console not working on Safari 15"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified and a fix is being implemented.

investigating

We have identified an issue preventing the web console from loading on Safari on versions prior to 16. We are working on a fix for this. In the meantime, the web console can be accessed normally using other browsers or Safari 16.

Report: "Elevated errors for SDK API"

Last update
resolved

A service degradation occurred today between 3:48PM and 4:45PM UTC. Customers might have experienced intermittent errors and increased latencies for the requests performed by our SDKs. All SDKs have retry mechanisms in place to mitigate this kind of issue resulting in an unaffected end user experience.

Report: "Issues loading metrics impact tab"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

An issue has been identified causing the metrics impact tab to fail to load in some cases. A fix is currently being worked on.

Report: "Issue with split editor"

Last update
resolved

An issue in our service's back end prevented the split editor from loading for some of our customers. During this period of time accessing splits through the webconsole resulted in infinite loading. Feature flags were unaffected and continued to work according to their rollout plans. The incident lasted from 11:20AM until 12:32PM GMT.

Report: "DNSSEC protocol rollout"

Last update
resolved

Since no impact was found we will close this notification.

monitoring

Forwarding to Twitter

monitoring

Split will start rolling out the DNSSEC protocol this Monday 13th of February in order to increase our security posture. More information here: https://www.cloudflare.com/dns/dnssec/how-dnssec-works/ If you have reasons to believe that your devices don’t support DS responses of more than 512 bytes, please contact support@split.io.

Report: "Data Processing Delays"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are currently there is a delay in the event property job and we are behind. We are working on getting the job to catch up. [Impression data is not delayed]

Report: "Metrics impact calculations delayed"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and our data processing service is back to normal. Customers should see delayed calculations complete over the next hours as our jobs catch up. We continue to monitor the results.

identified

We have identified an issue on our side causing metrics impact calculations to be delayed since 12/12 15:30 UTC. A fix is being deployed and delayed calculations will be performed without any further user action needed. There is no data loss.

Report: "Investigating an issue with Livetail"

Last update
resolved

The issue has been resolved. Please contact us at support@split.io if you have experience any related problems.

monitoring

The problem has been identified and live-tail is recovered and lastimpressions and event types will slowly recover back to normal as they catch up.

investigating

Split has identified an issue causing Livetail to return an error when queried. We are investigating the issue now.

Report: "Web console increased latency"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and web console latency is back to normal. We are monitoring the results.

investigating

We are investigating an increase in latency for our web console. Some users may encounter difficulties to log in.

Report: "Increased error rate for web console"

Last update
resolved

A sudden traffic spike caused an increased error rate for our services between 12:11 PM - 12:13 PM GMT. During this time some customers may have had issues accessing the web console. Our SDKs, data processing and integrations were unaffected.

Report: "Delays in data processing"

Last update
resolved

This incident has been resolved.

monitoring

We are seeing recovery for our data processing services. Metrics calculation, data exports, and S3 outbound and inbound integrations are processing data normally now. Some customers may still experience some delays while all systems fully recover. We continue to monitor the results.

identified

An ongoing event for AWS SSM service is impacting our data processing services, resulting in delays. We are working on applying a remedy for this on our end. Customers should expect delays in the following services: - Metrics impact calculation - S3 inbound and outbound integration There is no data loss. S3 inbound and outbound data will be backfilled once our services are restored. Data exports through our Data Hub may not be available for this period of time. For more details on AWS's SSM service event: https://health.aws.amazon.com/health/status

Report: "Web App unavailable"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Split Help Center Down"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating an issue with the Help Center being offline.

Report: "Intermittent issues loading web console"

Last update
resolved

An unusual traffic spike caused intermittent issues for our web console between 14:53 and 15:14 UTC. Some customers may have experienced prolonged load times or timeouts curing this time.