Retail Zipline

Is Retail Zipline Down Right Now? Check if there is a current outage ongoing.

Retail Zipline is currently Operational

Last checked from Retail Zipline's official status page

Historical record of incidents for Retail Zipline

Report: "Possible Platform Instability This Morning"

Last update
monitoring

Our hosting provider, Heroku, is currently investigating an issue in their platform, which hosts the Zipline app: https://status.heroku.com/incidents/2822 As a result of this upstream issue, our engineers currently do not have access to many of the tools that allow us to manage system load, and we understand that the systems which "autoscale" new resources to handle busy peaks may also be offline. While this issue is ongoing, it's possible that our service will be degraded as it becomes busier over the course of the day. We will continue to directly monitor Heroku's (so far scant) information regarding the causes and likely resolution time for this incident.

Report: "Incorrect Zipline Module Availability"

Last update
resolved

This incident has now been fully resolved and all Zipline modules are displaying correctly.

monitoring

A fix has been implemented by our engineering team. Most customers should notice that the correct Zipline modules are now appearing once again. We will continue to monitor the situation closely and provide the next update upon full resolution.

identified

We have identified an issue in the Zipline app where customers will see modules that they have not previously had access to. We have identified the root cause and are working towards a resolution. While this is ongoing there might be unintended interactions with the app.

Report: "Upstream incident affecting Zippy Availability"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

Our upstream provider is currently experiencing downtime, which is affecting Zippy Availability. The incident is posted here: https://status.openai.com/incidents/ctrsv3lwd797

Report: "Intermittent Connection Errors in the European Environment"

Last update
resolved

This incident has been resolved. All requests have succeeded since 00:12 PDT / 9:12 am UTC. Our upstream provider has confirmed that the issue is resolved on their end.

monitoring

All requests have succeeded since 00:12 PDT / 9:12am UTC. We are waiting for a confirmation from our upstream provider that the issue is fully resolved on their end.

identified

Impact: Intermittent connection errors affecting approximately 20% of requests Details: We are currently experiencing intermittent connection errors in our European environment. Approximately 20% of requests are returning error pages. Refreshing the page solves the issue. Our investigation indicates that the issue is related to an upstream provider. Actions Taken: - Our team is working with our upstream provider to identify and resolve the issue. - We are monitoring the situation closely and will provide updates once we have more information. We apologize for any inconvenience this may cause and appreciate your patience as we work to restore normal service. Next Update: We will provide the next update when progress is made.

Report: "Partial outage: clock punches"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating an issue with a downstream provider, where clock punches do not seem to be accessible to our API

Report: "Partial slowdown in application"

Last update
resolved

The incident is resolved. Latency has remained at pre-incident levels. Thank you for your patience!

monitoring

We have identified a solution, and have implemented it. We are no longer seeing abnormal latency for the application. We will continue to monitor the situation

investigating

We are investigating a issue where app latency has increased

Report: "Outage due to broken deployment."

Last update
resolved

This incident has been resolved.

monitoring

Services are back up and running we will continue to monitor the application to ensure there are no further hiccups

monitoring

We have rolled back and are monitoring all systems to ensure stability.

identified

The application is back up, and we are rebooting satellite services

identified

Our previous deployment has run into a snag in production. We're rolling back the deployment and service should be resumed within the next 5 minutes. We apologise for the impact to your day

Report: "EU Upstream provider: degraded performance affecting customers on EU instance"

Last update
resolved

Resolution of this issue has been confirmed by Heroku. As of yet, no further details on the cause have been given.

monitoring

An upstream provider, Heroku, is experiencing degraded performance on their EU-based platform. As a result, we are seeing degraded performance for customers using the EU-based instance of Zipline. We will continue monitoring and communicating with our provider for clarification on the issue and resolution time

Report: "Latency / Timeout on Resource Library"

Last update
resolved

This incident has been resolved.

monitoring

We found replication lag with a follower database was causing slowdown for the resource library. The follower database was decommissioned, and latency has returned to normal levels

investigating

We are investigating an issue where the Resource Library is slow and/or timing out for some customers

Report: "High Latency and Errors"

Last update
resolved

The delays with Resource Library have been alleviated, and all systems are functioning as expected

investigating

We are continuing to investigate the issue. Resource library has been disabled temporarily for some customers in order to troubleshoot the incident further.

investigating

Zipline is currently investigating an issue with the application resulting in high latency and errors seen throughout the application. Symptoms would include slow response times and page loads within the application as well as timeout errors.

Report: "Errors with file uploads"

Last update
resolved

We have now resolved all upload issues

identified

We've resolved the issue with SFTP, and are now remedying our magic file uploader

identified

We're working to resolve an issue where an upstream provider is experiencing network outages, affecting our ability to upload assets via SFTP and our magic Document uploader.

Report: "Inbound email processing issue"

Last update
resolved

We have added additional handling for when our up-stream email service fails, to ensure the issue will not recur. All emails that could be re-processed have been done at this point.

identified

We're continuing the work to reprocess emails that initially did not process.

identified

We have tested and deployed the fix for the issue, and are now focussing on processing emails that had previously failed. We do not anticipate any loss of data or emails.

identified

We have identified the root cause for the issue, and we are currently remedying the cause.

investigating

We are still investigating this issue.

investigating

We are investigating an issue where inbound emails are not being processed as expected

Report: "Higher latency for some requests"

Last update
resolved

We believe the latency was caused by a race condition, where a deploy at high traffic pushed load to a reduced number of available application servers.

investigating

We are no longer seeing any increased latency, but we are continuing to investigate the issue to identify a root cause

investigating

We are investigating an issue where we are seeing higher-than-expected latency on requests for some customers

Report: "Zipline Integrations Issue"

Last update
resolved

We have resolved the incident. All services are operational

investigating

We are investigating an issue where features leveraging our integrations infrastructure are reporting errors

Report: "High latency on web application"

Last update
resolved

Latency rates have remained stable. The incident is resolved.

monitoring

Resource Library has been restored for all customers. We are now monitoring for any changes to latency.

monitoring

Resource Library has been re-enabled for a majority of customers. Out of an abundance of caution, we are re-enabling customers one by one and monitoring latency to ensure we do not trigger more latency.

investigating

Response rates have come back down to typical rates. We have temporarily disabled Resource Library for a handful of customers while we further investigate

investigating

We are seeing high latency on page requests on the Zipline web application. Our engineers are currently investigating.

Report: "Digest Emails Not Sent"

Last update
resolved

Due to misconfiguration of our digest email delivery system, digest emails were not sent today (Tuesday, March 14). The issue has been resolved, and customers will receive their digest for today, and their digest for tomorrow, at their regularly scheduled time tomorrow. We are adding further monitoring to ensure this does not happen again.

Report: "iOS Notifications Delayed"

Last update
resolved

All queued notifications have now been pushed. We appreciate your patience!

monitoring

We have identified the issue as an expired certificate, which has been rectified. All notifications are queued and we expect they will be delivered within the next 20 minutes.

investigating

We're investigating an issue where push notifications to iOS devices are currently delayed.

Report: "Heroku OAuth Breach"

Last update
resolved

At Zipline, the trust of our customers, and the security of our system and our customers’ data is of utmost importance. Following Heroku’s announcement of an OAuth breach within their infrastructure, Zipline has undergone a rigorous security audit to ensure that our application has not been affected or compromised by the issue. We are confident that Zipline is not affected by the breach at Heroku. We have completed the following steps as part of our audit process: - We have disabled connections to Github (prior to this incident, these were only used to generate review apps, part of our development and QA process, which do not have access to production databases). - We have undergone a full security audit of Github use, and found no unauthorized access to our code base or repositories. - We have reviewed our OAuth-authorized application list to ensure all applications are trustworthy and secure. We will continue to monitor developments at Heroku and Github in case further information comes to light.

Report: "Delays in Daily Digest Delivery"

Last update
resolved

This incident has been resolved.

monitoring

We had a brief issue that led to lag in Daily Digests being delivered to some customers. The issue has been resolved and all customers should be receiving their Daily Digests momentarily. We're monitoring to ensure that there aren't any further delays.

Report: "Partial downtime caused by upstream provider"

Last update
resolved

This incident has been resolved.

monitoring

Our monitoring is indicating that everything is back up and running smoothly. We're continuing to monitor the situation.

monitoring

We're seeing the site come back for most users. It seems that the issue has mostly been resolved but we are checking everything and monitoring the fix.

investigating

We have heard from Heroku and they are working on fixing the issue. At the same time, we're working on two different backup plans in case it takes them much longer to resolve.

investigating

Our monitoring system has alerted us to the application being inaccessible for some users. It appears our upstream provider, Heroku, is having issues. We are investigating and preparing for any changes necessary if it is not resolved quickly.

Report: "Experiencing slowness with background jobs"

Last update
resolved

Our background job system is operating smoothly and at normal capacity.

monitoring

We've added capacity to our background job system, and the backed-up jobs are now all processing. We're monitoring the situation to ensure everything is running smoothly.

investigating

We are continuing to investigate this issue.

investigating

We're investigating an issue where our background jobs are processing slower than normal. App activity is not affected.

Report: "Investigating: Tasks throwing error"

Last update
resolved

Monitoring is complete. With the data update, we are back to full functionality.

monitoring

We have identified the issue as being data-driven, related to a new feature deployment. We have updated the data and are monitoring to ensure no further issues ensue

investigating

We're currently investigating an issue where team tasks can't be created

Report: "Bad Deploy"

Last update
resolved

A bad deploy cause a lack of connectivity for all users, for 7 minutes. The deploy was rolled back and connectivity was restored.

Report: "Configuration change resulting in unexpected behavior and partial outage"

Last update
resolved

After monitoring the service and double checking the integrity of the system we have determined that the issue has been resolved.

monitoring

The issue was resolved at 12:33, for a total of 9 minutes of unexpected behavior.

monitoring

A configuration update was done to the Zipline service that resulted in an older version of the product being visible to a number of users for a few minutes. The issue has already been resolved, but we are investigating if there are any knock-on effects from the change now.

Report: "Downstream provider incident"

Last update
resolved

The provider has remedied their outage and service levels at Zipline are back to normal.

investigating

One of our downstream providers, AWS CloudFront, is experiencing a networking issue, which is having a minor effect on our image upload service. Approximately 1% of image uploads are failing at this time. All other aspects of our application remain available. We are currently monitoring the situation.

Report: "No impact to Zipline from the recent Log4j vulnerability"

Last update
resolved

After investigating, we're pleased to inform you that Zipline’s system was not affected by CVE-2021-44228, the Apache / Log4j vulnerabilities announced yesterday and over the weekend. We don’t use Java, Apache, or Log4j to serve the application. We have one internal system that uses Elasticsearch to provide our search infrastructure. Elasticsearch is built in Java and uses log4j. We have investigated all access points and confirmed that none of them were vulnerable to an attack. We have patched all Elasticsearch domains to increase their protection going forward. If you have any questions about our response to Log4j, our infrastructure, or anything related to security please email security@zipline.inc

Report: "An upstream provider's outage is affecting our ability to deliver push notifications"

Last update
resolved

This incident has been resolved.

monitoring

Our upstream provider has mitigated the underlying issue and Zipline is fully operational. All delayed mobile push notifications have been delivered and we are processing the user and team alignment files.

identified

Our upstream provider is continuing to work on a number of different mitigation and resolution actions. They've identified the root cause of this issue. While there are some early signs of recovery, there remains no ETA for full recovery yet. User and Team alignments processing is also delayed due to this incident. We have received the files successfully and they will be processed once the incident is resolved.

identified

Push notifications to mobile devices are currently being delayed and will be delivered after the incident is resolved.

identified

An upstream provider is encountering errors that are affecting Zipline's ability to deliver push notification to mobile devices. The rest of Zipline is fully functional and we are actively monitoring the situation.

Report: "Slower than normal response times"

Last update
resolved

This incident has been resolved.

monitoring

We have seen no additional issues and are continuing to monitor.

investigating

Response times are back to normal. We are continuing to monitor.

investigating

We are currently seeing slower than normal response times that might be resulting in some errors. We are actively investigating.

Report: "Search returning errors"

Last update
resolved

We discovered an issue with a recent code deploy that affected our search system. We've rolled back the deploy, and search is back up and running

investigating

We are investigating an issue where search is returning an error message for users

Report: "Some tasks from HQ Communications not appearing on daysheet"

Last update
resolved

We have verified that the hotfix addresses the issue, and that the incident has been resolved.

monitoring

The fix has been deployed, and we are monitoring to ensure the issue has been resolved

identified

We've identified the issue and are deploying a hotfix.

investigating

We're investigating an issue where customers who have migrated to the Zipline Visual Refresh are not seeing all tasks created by an HQ communication. The tasks are created, and appear on the dashboard and calendar, and do show up if users filter on Day Sheet.

Report: "Delay processing inbound email"

Last update
resolved

Some customers experienced a delay with inbound email processing. This resulted in emails sent to Zipline taking longer to show up. No emails or data were lost during this incident. All inbound emails were caught up as of this time. We've been monitoring and all new emails are being processed as normal. We reached out directly to notify each affected customer. Additional monitoring has been added to prevent this issue from recurring.

Report: "Resource Library is unavailable for some users"

Last update
resolved

The issue has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

The fix has been verified and deployed to production. We are now monitoring to ensure no further issues.

identified

We've generated a fix for the issue, and are testing in lower environments.

investigating

Resource libraries that include documents where thumbnails inherit from parent documents are throwing an error, causing them to be inaccessible. We have identified the issue and are preparing a hotfix.

Report: "Connection issues affecting all users"

Last update
resolved

A misconfiguration of our SSL certificates caused the service to be offline for all users. The certificate was configured to automatically renew itself, but that process failed. As a result, the Zipline Application was unavailable until 5:28am PST, when a new SSL certificate was installed. We are working on a Root Cause Analysis that will be shared out.

monitoring

A misconfiguration of our SSL certificates caused the service to be offline for all users. The certificate was configured to automatically renew itself, but that process failed. We have installed a new SSL certificate are are monitoring the service.

identified

A misconfigured SSL certificate is currently being used in production, which is causing browsers to reject the connection to Zipline. We are investigating the situation.

Report: "Delay processing inbound email"

Last update
resolved

All existing emails were processed and any new emails are being processed as normal. We've seen no new issues as we monitored this overnight.

monitoring

A fix has been put in place and the remaining emails are being processed in a timely manner. We will continue to monitor the inbound emails.

investigating

Some customers are experiencing a delay with inbound email processing. This is resulting in emails sent to Zipline taking longer to show up.

Report: "Delays for iOS push notifications."

Last update
resolved

This has been resolved. All previous queued notifications have been delivered and new iOS push notifications are being processed normally.

monitoring

A fix has been implemented and we are monitoring as the backlog of push notifications is being delivered.

identified

An issue in our iOS notification delivery that is prevent push notifications from being sent to devices has been identified and is being addressed. We are still queuing up notifications and they will be delivered after the fix is applied. Please contact support@retailzipline.com if you have any questions.

Report: "Potential Sign On Issues"

Last update
resolved

This incident has been resolved. Our service provider removed the malfunctioning CDN node, and as a result customers who were affected are now being directed to a functioning node, and are able to continue working.

monitoring

We believe we have identified the issue as a misconfigured downstream provider server (a CloudFront server on the East Coast). The server has been disabled, and we are observing error rates dropping, returning to normal bounds. We are continuing to monitor and test.

identified

We have identified that the problem is only happening with a few select customers. We believe there may have been a browser upgrade or device change overnight that introduced the issue. We have reached out to IT leadership at the identified customers and are having the conversation directly to learn more in an attempt to isolate the problem. If you're not in an active conversation with our IT team, your organization has not been identified as being affected.

investigating

We are continuing to investigate this issue. We believe it may have to do with a browser update that some of our customers rolled out and we are working to verify whether that is actually the case.

investigating

We were concerned this was impacting a large number of users, however, this is isolated to a small number of users. We're continuing to investigate the root cause.

investigating

We're continuing to investigate the issue, and have narrowed the scope of the concern to the switchboard.

investigating

We're hearing reports that some users are having an authentication issue when they navigate from the switchboard page. We're currently investigating the issue.

Report: "Search is partially degraded. Document contents are not searchable for some customers."

Last update
resolved

All documents have been reindexed successfully and should be searchable. We have been monitoring for the past hour and seen no more issues. We are working on a root cause analysis to share what happened in more detail.

monitoring

Resource Library documents are still being reindexed. Organizations have been prioritized by their support tier and geography. We estimate the complete reindexing to be finished by 3:30 PDT. The issue is isolated to the content of Resource Library documents not being searchable. Search continues to operate normally during this time for all other content.

identified

The search issue that users were experiencing has been resolved. However, in an abundance of caution we are rebuilding the document index. The contents of documents are not currently searchable. We will continue to update the status page as that index is rebuilt.

identified

The issue has been identified and a fix is being implemented.

investigating

Some customers are experiencing problems with search results in the Resource Library. Our engineers are currently applying a fix.

Report: "Investigated and Resolved Reports of Increased 404 Responses in the Resource Library"

Last update
resolved

A small number of users reported no longer being able to access certain files in the Resource Library. We investigated the reports and determined that those documents had more limited targeting than intended. This caused those documents to be hidden from those users. Our team responded quickly and resolved the situation in all reported cases. We have also tested all other organizations that use the Resource Library to ensure it is not a broader issue.

Report: "We're investigating elevated error rates"

Last update
resolved

We have see no more issues relating to this blip. Our system seems to be scaling according to plan without issue.

monitoring

The increase in errors was a 20 second blip due to a massive increase in third party traffic. Our system adjusted to address the load, but there was a small window that wasn't covered. We're continuing to monitor the service to ensure the traffic load is covered.

investigating

We're seeing elevated error rates across the system. Our team is investigating and we will update as soon as we know more.

Report: "Investigating elevated Error Rates"

Last update
resolved

Everything has been resolved and error rates have subsided as of 9:14 PST. In total, it was 5 minutes of increased error messages impacting a portion of our users. Most users saw no issues. The error rates were due to an overload of the system because of a runaway database backup. We're will be revising our backup strategy to ensure this does not happen again.

monitoring

The elevated error rates have subsided. We've continuing to monitor the site and investigating the cause.

investigating

Our monitors are reporting elevated error rates affecting a portion of user traffic. We're investigating the cause and will report as soon as we have more.

Report: "In-App Support out of service due to a third party outage"

Last update
resolved

We've not seen any more issues and Intercom is reporting that everything is resolved. https://www.intercomstatus.com/incidents/kftxs04jy93k

monitoring

Intercom seems to be back up. We're continuing to monitor the situation via their status page and our experience.

identified

Our third-party support vendor, Intercom, is currently experiencing an outage. People are unable to get support responses directly in the app. Email to support@retailzipline.com is the available workaround. We're monitoring Intercom's status page for more information and will update as we know more. https://www.intercomstatus.com/incidents/kftxs04jy93k?u=hs04wzbrmyty

Report: "Outage due to a configuration error"

Last update
resolved

We have seen no new issues with the site. The issue was due to a configuration value being changed in a regular deploy that was incompatible with our application. We were able to immediately identify the change, revert it, and bring the site back up. The delay was due to the fix being sent to all of our servers through our zero-downtime deploy process. Thank you for your patience and we apologize for the blip in service.

monitoring

Service has been restored. We are monitoring the situation but expect no other issues at this time.

investigating

We are currently fixing the issue and should have everything back up and running within 3 minutes.

Report: "Delayed email sending this morning due to larger than expected backlog"

Last update
resolved

This morning we had a backlog build up in our email sending that caused some emails to be delayed. We have increased capacity and the emails are now being delivered as intended. If you have any questions, please contact support@retailzipline.com or speak with your account manager.

Report: "Emergency Maintenance being done by our Support provider"

Last update
resolved

We have monitored the situation for the last few hours and seen no more instability. We're considering this resolved at this time.

monitoring

Intercom says they've fixed the issue and they're currently monitoring it. Everything seems to be working again but we're keeping an eye on it.

identified

Intercom, the provider we use for support, went in to emergency maintenance mode and is experiencing downtime. We're monitoring the situation and reverting to email for customer support until they resume service. We are monitoring via their status page here: https://www.intercomstatus.com/

Report: "Investigating Reported Downtime"

Last update
resolved

We will continue to be monitoring this incident for further impact, but we have determined that this incident has been resolved for now.

monitoring

Update from our hosting provider: https://status.heroku.com/incidents/1974

monitoring

We have noticed an increased error rate with our hosting provider that is likely to have caused this downtime. We'll continue to monitor and look to find a root cause shortly.

investigating

We are currently investigating a report of downtime. Will update shortly.

Report: "Investigating Reported Downtime"

Last update
resolved

Heroku has declared this as fully resolved. We are waiting to find out more information from them as they discovery what caused the outage. At this time, the site is has been up and stable since our last update. We will continue to monitor as usual.

monitoring

Heroku has restored service so the site is back up. We're continuing to monitor the site.

investigating

There seems to be issues with Heroku, our application host. https://status.heroku.com/incidents/1973 We are monitoring and looking to see if we need to revert to our failover.

investigating

We are currently investigating this issue.

Report: "Reports of errors across Zipline: Resolved"

Last update
resolved

Redis Labs, one of our service providers, had a network outage that impacted Retail Zipline. Their DNS issue lasted about 25 minutes, but we were able to reduce it to 3 minutes of downtime for Zipline customers. The issue has been resolved by Redis Labs. More information is available on their status page here: https://status.redislabs.com/incidents/h1gmlvw0g7v0

monitoring

A service that we rely on stopped responding for 3 minutes. We have switch over to a failover and the errors have stopped. We will continue to monitor the situation.

investigating

We will update as soon as we know more.

Report: "Our hosting provider experienced errors that caused Zipline to be down"

Last update
resolved

We have confirmed that this downtime was due to issues with our platform provider Heroku. They retroactively posted an update here, well over an hour after the issue occurred: https://status.heroku.com/incidents/1964 Our team followed the downtime checklist and did everything they could to get the site back up. Unfortunately we had no control over their platform and could do little other than wait for Heroku to resolve the issue. Thankfully they resolved the issue 14 minutes after the outage began and our site was immediately brought back up. We take downtime seriously and work hard to make sure the site is always available. Downtime is never fun, and we appreciate your patience in the rare event that it happens. If you have any questions, please contact your account manager or support@retailzipline.com and we would be happy to address them.

monitoring

We have brought the site back up and are monitoring for issues. We're also investigating what could have happened and reaching out to our platform provider.

investigating

We are currently investigating the situation and working to return service as quickly as possible. We will keep this page updated.

Report: "Some store level Recurring Team Tasks missing from the Day Sheet"

Last update
resolved

We have verified that all tasks are correct back on the Day Sheet and responded to any customer support requests regarding this issue.

investigating

We have identified the issue and found that only 0.17% of stores were affected, and it only affected Recurring Team Tasks created by those stores on July 15th, 2019. The issue was due to a broken recurrence rule. We have added the recurring tasks back to the Day Sheet for the affected stores. We have also checked surrounding days to make sure there are no more issues. If you have any questions about this issue, please reach out to our support team or your account manager.

investigating

We've identified that some recurring team tasks created more than 6 months ago have not shown up on the Day Sheet for some stores. A small number of customers are affected. We are investigating both the cause and a fix and will update with more information shortly.

Report: "Unplanned downtime due to a failed deploy"

Last update
resolved

At 8:18am UTC Retail Zipline became unavailable due to a failed configuration setting during a deploy. Our team received the notification immediately and worked to roll back to a stable build. The rollback was completed and service was restored at 8:32am UTC. We know how important Zipline is to your day to day and I apologize for this issue. Our services team is looking into what configuration caused the issue to ensure this doesn't happen again. - Jeremy, CTO

Report: "Configuration Issue for Resource Library customers resulting in 404 Not Found"

Last update
resolved

This morning there was a configuration issue that affected all customers who have the Resource Library. When attempting to access a Document in the Resource Library, they were presented with a 404 Not Found page instead of the Document. This configuration change was in effect from 2:54am - 4:50am PST. We are currently investigating how many users were impacted and will provide more detail shortly. 3 users contacted customer support. The issue was resolved as soon as it was identified, so we are posting this notice retroactively.