Is Omnivore.io Down Right Now? Discover if there is an ongoing service outage.

Omnivore.io is currently Operational

Last checked Jul 29, 2025 14:42 UTC from Omnivore.io's official status page

Historical record of incidents for Omnivore.io

Jun 12, 2025

Report: "Omnivore Connectivity Issue"

Last update 2025-06-12T20:38:00.000Z

Resolved2025-06-12T20:38:00.000Z

The issues have subsided and we are seeing error rates return to baseline.

Update2025-06-12T18:50:00.000Z

We are continuing to investigate this issue.

Investigating2025-06-12T18:45:00.000Z

We are seeing issues occurring with our 3rd-party cloud connectivity provider. This affects most Omnivore systems to varying degrees. We will update further as we have additional details.

Report: "Omnivore Connectivity Issue"

Last update 2025-06-12T18:45:01.327Z

investigating2025-06-12T18:45:01.323Z

We are seeing issues occurring with our 3rd-party cloud connectivity provider. This affects most Omnivore systems to varying degrees. We will update further as we have additional details.

Mar 5, 2025

Report: "NCR CloudConnect API - Increased error rate"

Last update 2025-03-05T23:09:17.342Z

resolved2025-03-05T23:09:17.325Z

NCR has marked their ordering components as operational and Omnivore has continued to see normal error rates since recovery.

monitoring2025-03-05T22:29:18.298Z

NCR posted an update to their statuspage at 22:13 UTC indicating they have found a possible solution and are working on stabilizing services. Omnivore calls to the NCR CloudConnect API began seeing a return to more normal error rates at 22:10 UTC and have been maintaining those rates since. We will continue to monitor error rates until NCR has confirmed they have resolved the issue.

monitoring2025-03-05T20:43:29.951Z

At this time, we have no further updates from NCR regarding an estimated resolution. We will continue to monitor the situation and update as soon as the situation changes.

monitoring2025-03-05T19:39:06.923Z

Beginning around 19:20 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. All API calls to fetch ticket and clock entry data are failing and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Nov 21, 2024

Report: "Omnivore API - Ticket Writes Failing"

Last update 2024-11-21T19:59:39.026Z

resolved2024-11-21T19:59:39.008Z

API has resumed normal functionality

monitoring2024-11-21T19:53:03.344Z

We have implemented a fix and are observing successful API writes. We will continue to monitor.

identified2024-11-21T19:47:33.910Z

We have identified an issue where writes to the Omnivore API are failing 100% of the time. We are implementing actions to recover. During this time, API reads will still function normally. We will provide updates as they are available.

Sep 6, 2024

Report: "Brink API - Increased Error Rate for Ticket Calls"

Last update 2024-09-06T21:31:00.782Z

resolved2024-09-06T21:31:00.770Z

We have observed Brink API calls return to their normal success levels for the past hour. Functionality of dependent systems have returned to normal.

monitoring2024-09-06T19:55:35.130Z

Starting at 18:25 UTC, we began seeing an increased number of errors when calling the Brink API, impacting ticket reads. API calls to ticket reads will likely fail at an increased rate and webhooks may be delayed until service returns to normal operation. So far, this seems isolated to a single Brink host. We have reached out to our Brink contacts and will continue to monitor until the situation is resolved. There are no other technical actions we can take to resolve the issue at this time.

Jun 11, 2024

Report: "Agents Offline"

Last update 2024-06-11T14:55:21.328Z

postmortem2024-06-11T14:53:11.512Z

## Overview On May 29, 2024, Olo’s Omnivore Platform experienced agent degradation between 20:30 UTC and 21:55 UTC. Some API calls were failing during this time, and some agents went offline at 21:30 UTC. ## What Happened On May 29, 2024, during a routine instance resizing operation for our Connect service cluster, our configuration management system misidentified the IP addresses for the newly deployed instances, causing them to get bootstrapped incorrectly. This resulted in elevated error rates for Omnivore API calls beginning at 20:30 UTC, with 32% of connected Omnivore agents becoming degraded. At 21:30 UTC, some more API calls began to fail, causing 6% of connected agents to go fully offline. We initiated an accelerated rollback of the change, which fully restored service to a healthy state by 21:55 UTC. ## Next Steps * Improve the provisioning process to better detect and alert on this kind of misconfiguration earlier, and before the new instances are put into rotation. * Create additional alerting around agent errors to improve investigation speed.

resolved2024-05-29T22:15:53.338Z

Affected locations have returned to online status and are operating normally

monitoring2024-05-29T22:01:09.175Z

A fix has been implemented and location status have returned to normal. We will continue to monitor at this time.

identified2024-05-29T21:47:38.771Z

At approximately 20:30 UTC, we identified an issue causing many locations to enter either a degraded or offline state. We have identified the issue and are working to resolve it.

May 29, 2024

Report: "Lavu Partner Outage"

Last update 2024-05-29T21:58:27.968Z

resolved2024-05-29T21:58:27.948Z

With help from the Lavu team, connectivity has been restored for all Lavu Locations with Apps attached.

monitoring2024-02-05T18:26:40.419Z

The Lavu team has communicated that they are targeting a fix for end of Q1 or early Q2.

monitoring2024-01-12T22:14:23.104Z

At this time we are not expecting a resolution until at least Tuesday Jan 16th.

monitoring2024-01-12T21:53:10.322Z

At around 21:47 UTC we were asked by Lavu to take further steps to prevent calls to their services. As such, we are taking action to set all Omnivore Lavu locations offline.

monitoring2024-01-12T21:38:19.992Z

Beginning around 21:26 UTC at the request of Lavu we have disabled webhooks and background processing for Lavu locations to aid in Lavu's outage recovery. During this time Webhooks will be delayed and data may become stale.

Apr 17, 2024

Report: "NCR CloudConnect API - Increased error rate"

Last update 2024-04-17T01:14:50.629Z

resolved2024-04-17T01:14:50.611Z

Calls to the NCR CloudConnect API have returned to base level timings and error rates. We will continue to monitor the success of calls to the NCR CloudConnect API.

monitoring2024-04-17T00:08:39.569Z

Beginning around 00:02 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Apr 9, 2024

Report: "NCR CloudConnect API - Increased error rate"

Last update 2024-04-09T13:17:57.193Z

resolved2024-04-09T13:17:57.177Z

At 8:32 UTC, we saw calls to the NCR CloudConnect API return to base level timings and error rates. We will continue to monitor the success of calls to the NCR CloudConnect API.

identified2024-04-09T02:59:22.807Z

Beginning around 2:47 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Report: "NCR CloudConnect API - Increased error rate"

Last update 2024-04-09T02:40:48.276Z

resolved2024-04-09T02:40:48.262Z

Beginning around 02:26 UTC, calls to the NCR CloudConnect API began to succeed.

monitoring2024-04-09T00:27:05.164Z

Beginning around 00:00 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Mar 14, 2024

Report: "NCR CloudConnect API - Increased error rate"

Last update 2024-03-14T19:35:43.110Z

resolved2024-03-14T19:35:43.098Z

The number of timeouts has returned to normal levels.

identified2024-03-14T17:46:08.379Z

Beginning around 17:03 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Feb 28, 2024

Report: "MMS Outage"

Last update 2024-02-28T19:53:13.285Z

resolved2024-02-28T19:18:00.000Z

At around 19:18 Omnivore engineers cleaned up an MMS ingress that was believed to be unused as part of a larger release. During the release, we noticed MMS order counts dropped and immediately began to rollback. We were completely rolled back by 19:37 with ordering traffic returning back to normal at that time. Upon investigation, we found that a manual DNS entry was in place that referenced the removed ingress. Because this entry was not committed to our our infrastructure repository, we mistakenly believed the ingress to be unused. We will audit for any other manual DNS entries in our environment before continuing with this release.

Feb 15, 2024

Report: "Brink API - Increased error rate"

Last update 2024-02-15T09:19:59.087Z

resolved2024-02-15T09:19:59.072Z

Around 9:15 UTC Brink API calls began to succeed.

monitoring2024-02-15T08:07:05.931Z

Beginning around 07:40, we began seeing an increased number of errors when calling the Brink API impacting ticket reads and clock entries. API calls to ticket reads and clock entries will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our Brink contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Feb 5, 2024

Report: "NCR CloudConnect API - Increased error rate"

Last update 2024-02-05T22:58:37.893Z

resolved2024-02-05T22:58:37.878Z

Error rates returned to normal levels around 20:58 UTC. We will continue to monitor for any further increases.

identified2024-01-30T13:22:02.430Z

Beginning around 8:21 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jan 27, 2024

Report: "Brink API - Increased error rate and Timeouts"

Last update 2024-01-27T17:38:58.192Z

resolved2024-01-27T17:38:58.174Z

Error rates returned to normal levels around 17:20 UTC. We will continue to monitor for any further increases.

monitoring2024-01-27T17:05:39.686Z

After further investigation, we have found that the timeouts are only happening when making calls to https://api22.brinkpos.net. All other Brink hosts seem to be operational.

monitoring2024-01-27T16:53:59.512Z

Beginning around 16:50 UTC, we began seeing an increased number of timeouts when calling the Brink API impacting our Brink Locations. API calls to fetch Tickets will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our Brink contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jan 16, 2024

Report: "Omnivore API - Degraded Performance"

Last update 2024-01-16T20:49:41.986Z

resolved2024-01-16T20:49:41.972Z

All systems are confirmed stable and the Omnivore API is functioning normally. This incident is now resolved.

monitoring2024-01-16T19:16:09.988Z

We are continuing to monitor for any further issues.

monitoring2024-01-16T19:15:56.746Z

A fix has been implemented and API connectivity has been restored. We are continuing to monitor the affects.

investigating2024-01-16T18:07:10.315Z

We are still investigating issues reported with the Omnivore API, Clients may experience errors and latency when accessing panel.omnivore.io. We'll provide updates as they come in.

investigating2024-01-16T15:34:05.543Z

We are currently investigating issues reported with the Omnivore API. Clients may experience errors and latency when accessing panel.omnivore.io.

Jan 12, 2024

Report: "Lavu API - Increased Error Rate"

Last update 2024-01-12T20:00:20.665Z

resolved2024-01-12T20:00:20.651Z

This incident has been resolved.

monitoring2024-01-12T19:19:54.565Z

The Lavu API is currently experiencing intermittent degradation. Please see their status page for details: https://status.lavu.com. We will continue to monitor until the Lavu API returns to normal functionality. There are no further technical actions we can take at this time.

Jan 9, 2024

Report: "Lavu API - Increased Error Rate"

Last update 2024-01-09T00:59:21.730Z

resolved2024-01-09T00:59:21.717Z

This incident has been resolved.

monitoring2024-01-08T21:23:05.869Z

Jan 8, 2024

Report: "Lavu API - Increased Error Rate"

Last update 2024-01-08T04:34:09.914Z

resolved2024-01-08T04:34:09.903Z

This incident has been resolved.

monitoring2024-01-08T02:06:55.321Z

Monitoring - The Lavu API is currently experiencing intermittent degradation. Please see their status page for details: https://status.lavu.com. We will continue to monitor until the Lavu API returns to normal functionality. There are no further technical actions we can take at this time

Jan 5, 2024

Report: "API Outage"

Last update 2024-01-05T21:56:47.191Z

postmortem2023-11-29T22:23:37.841Z

# Overview On November 24, 2023, Olo's Omnivore API experienced a disruption between 21:17 UTC and 22:12 UTC. During this time all API operations with the exception of Add Payment, Open Ticket, and Submit Order were failing, and 25% of Omnivore-related webhooks experienced delayed delivery. # What Happened On November 24, 2023, Olo experienced a disruption to the Omnivore API and related webhook delivery, caused by a failure in the automated process for creating new Omnivore API instances. As traffic to the Omnivore API increased, its auto-scaling system was unable to add capacity to meet it. As a result, at 21:17 UTC all API operations with the exception of Add Payment, Open Ticket, and Submit Order began to fail, and 25% of Omnivore-related webhooks began to experience delayed delivery. ‌ We discovered that some of our package dependencies had been updated by their maintainers to require a newer runtime version than what was available in our deployment pipeline. This caused the bootstrapping process to fail for new instances that were needed to handle current traffic levels. With this identified, we implemented and deployed a fix to remove the failing dependencies from the API's critical path, allowing the system to resume scaling out additional API instances and restoring service at 21:12 UTC. # Next Steps * We have already made improvements to our alerting to automatically detect and mitigate similar issues before they become critical. * We will complete our in-progress migration of all Omnivore services into our newer hosting environment, which removes these dependencies as a failure point.

resolved2023-11-25T02:31:54.000Z

All systems have been functioning normally with API and Webhooks flowing normally for several hours. We will follow up with a postmortem by 12/1/2023.

monitoring2023-11-24T22:25:09.490Z

We have identified the issue and implemented a fix. We are monitoring systems to ensure stability. API and webhooks traffic are flowing normally.

investigating2023-11-24T21:55:53.983Z

We are currently investigating an issue that is affecting the Omnivore API.

Oct 20, 2023

Report: "Lavu API - Increased error rate"

Last update 2023-10-20T19:36:42.313Z

resolved2023-10-20T19:36:42.301Z

The Lavu API outage was resolved around 19:00 UTC. All Omnivore API calls and webhooks involving Lavu Locations have returned to normal operation.

monitoring2023-10-20T18:41:33.769Z

The Lavu API is currently experiencing an outage. Please see their status page for details: https://status.lavu.com. We will continue to monitor until access to the Lavu API has been restored. There are no further technical actions we can take at this time.

investigating2023-10-20T18:28:52.507Z

Beginning around 18:27 UTC, we began seeing an increased number of errors when calling the Lavu API. API calls to fetch ticket data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are currently investigating the root cause.

Oct 11, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-10-11T17:40:53.143Z

resolved2023-10-11T17:40:53.130Z

Error rates returned to normal levels around 17:35 UTC. We will continue to monitor for any further increases.

monitoring2023-10-11T17:33:07.999Z

Beginning around 17:20 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Sep 14, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-09-14T13:15:15.813Z

resolved2023-09-14T13:15:15.795Z

At 5:00 UTC, calls to the NCR CloudConnect API returned to baseline. We will continue to monitor the success of calls to the NCR CloudConnect API.

identified2023-09-13T19:26:36.732Z

Beginning around 17:21 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. Static data populated by background tasks may become stale. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Aug 9, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-08-09T17:45:20.611Z

resolved2023-08-09T17:45:20.591Z

At 17:35 UTC, we began seeing successful calls to the NCR CloudConnect API. We will continue to monitor the success of calls to the NCR CloudConnect API.

identified2023-08-09T17:28:43.481Z

Beginning around 17:15 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. Static data populated by background tasks may become stale. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jul 13, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-07-13T21:53:09.904Z

resolved2023-07-13T21:53:09.893Z

At 7/13 21:08 UTC, error rates and timeouts for calls to the NCR CloudConnect API resumed nominal levels. We will continue to monitor the success of calls to the NCR CloudConnect API.

identified2023-07-13T20:31:31.052Z

Beginning around 7/13 at 20:06 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-07-13T13:25:33.804Z

resolved2023-07-13T13:25:33.791Z

At 7/13 05:04 UTC, error rates and timeouts for calls to the NCR CloudConnect API resumed nominal levels. We will continue to monitor the success of calls to the NCR CloudConnect API.

identified2023-07-12T19:08:41.914Z

Beginning around 7/12 at 18:55 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jul 12, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-07-12T18:48:23.577Z

resolved2023-07-12T18:48:08.000Z

At 7/12 at 18:31 UTC, error rates and timeouts for calls to the NCR CloudConnect API resumed nominal levels. We will continue to monitor the success of calls to the NCR CloudConnect API.

identified2023-07-12T17:53:45.848Z

Beginning around 7/12 at 17:20 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jun 27, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-06-27T17:30:41.681Z

resolved2023-06-27T17:30:41.665Z

At 6/27 16:03 UTC, error rates and timeouts for calls to the NCR CloudConnect API resumed nominal levels. We will continue to monitor the success of calls to the NCR CloudConnect API.

identified2023-06-27T14:18:32.411Z

Beginning around 6/26 at 9:40 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jun 26, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-06-26T11:15:32.200Z

resolved2023-06-26T11:15:32.187Z

Error rates returned to normal levels around 11:04 UTC. We will continue to monitor for any further increases.

identified2023-06-26T10:17:35.624Z

Beginning around 09:52 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jun 15, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-06-15T13:12:22.452Z

resolved2023-06-15T13:12:22.440Z

Error rates returned to normal levels around 06:22 UTC. We will continue to monitor for any further increases.

identified2023-06-15T01:14:11.525Z

Beginning around 00:52 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-06-15T00:38:27.194Z

resolved2023-06-15T00:38:27.182Z

Error rates returned to normal levels around 00:14 UTC. We will continue to monitor for any further increases.

identified2023-06-14T23:38:16.105Z

Beginning around 22:52 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jun 14, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-06-14T21:24:15.580Z

resolved2023-06-14T21:24:15.565Z

Error rates returned to normal levels around 21:09 UTC. We will continue to monitor for any further increases.

identified2023-06-14T20:22:09.117Z

Beginning around 19:52 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-06-14T19:14:16.033Z

resolved2023-06-14T19:14:16.019Z

Error rates returned to normal levels around 18:57 UTC. We will continue to monitor for any further increases.

identified2023-06-14T19:05:36.148Z

Beginning around 17:17 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jun 6, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-06-06T15:31:11.679Z

resolved2023-06-06T15:31:11.666Z

At 12:45 UTC, NCR CloudConnect API error rates and timeouts returned to nominal levels. We will continue to monitor the success of calls to the NCR CloudConnect API.

identified2023-06-06T12:47:09.642Z

Beginning around 12:14 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

May 24, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-05-24T13:51:49.476Z

resolved2023-05-24T13:51:49.462Z

At 13:30 UTC, the NCR CloudConnect API began responding to requests successfully. Error rates have returned to nominal rates. We will continue to monitor the success of calls to the NCR CloudConnect API.

identified2023-05-24T13:10:23.074Z

Beginning around 12:57 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

May 15, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-05-15T20:56:02.768Z

resolved2023-05-15T20:56:02.754Z

At 17:30 UTC, the NCR CloudConnect API began responding to requests successfully. Error rates have returned to nominal rates. We will continue to monitor the success of calls to the NCR CloudConnect API.

monitoring2023-05-15T13:59:54.915Z

Beginning around 12:12 UTC, we began seeing an increased number of errors when calling the NCR CloudConnect API impacting Tickets, Employee, and Job reads. All API calls may present stale data. Webhooks may also be delayed. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

May 11, 2023

Report: "NCR CloudConnect API - Increased error rate"

Last update 2023-05-11T19:26:56.531Z

resolved2023-05-11T19:26:56.515Z

Starting at 17:45 UTC, we observed calls to the NCR CloudConnect API begin succeeding. Cached data is successfully being updated over time in batches. We will continue to closely monitor the success of these background jobs and for the success of calls to the NCR CloudConnect API.

monitoring2023-04-25T19:12:58.846Z

Beginning around 13:35 UTC, we began seeing an increased number of timeouts when calling the Ticket routes of the NCR CloudConnect API. API calls to fetch Tickets will likely fail and Ticket webhooks will be delayed until the NCR outage resolves. NCR is continuing to update their status page (at https://status.aloha.ncr.com/incidents/cnl38krr6n6b). We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

monitoring2023-04-13T20:32:38.242Z

Around 20:00 UTC, the number of timeouts when calling the Ticket routes of the NCR CloudConnect API decreased to normal levels. We are continuing to see an increased error rate when calling the NCR CloudConnect API for employee and job data. API calls to fetch employees may present stale data. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

monitoring2023-04-13T13:37:47.323Z

Beginning around 8:03 UTC, we began seeing an increased number of timeouts when calling the Ticket routes of the NCR CloudConnect API. Based on this, we are upgrading the scope of the outage. API calls to fetch Tickets will likely fail and Ticket webhooks will be delayed until the NCR outage resolves. NCR is continuing to update their status page (at https://status.aloha.ncr.com/incidents/cnl38krr6n6b). We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

monitoring2023-04-12T20:28:09.050Z

We are continuing to see an increased error rate when calling the NCR CloudConnect API for employee and job data. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time. Please refer to this NCR Incident for details: https://status.aloha.ncr.com/incidents/cnl38krr6n6b

identified2023-04-12T19:24:47.531Z

We are continuing to see an increased error rate when calling the NCR CloudConnect API for employee and job data. API calls to fetch employees may present stale data. We have reached out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

identified2023-04-12T17:44:54.049Z

Beginning around 10:00 UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting employee and job reads. API calls to fetch employees may present stale data. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Apr 28, 2023

Report: "CloudPOS Scheduler Queue Length"

Last update 2023-04-28T20:09:29.577Z

resolved2023-04-28T20:09:29.564Z

After further investigation, we see no other impacts to address. All systems appear to be fully operational.

monitoring2023-04-28T19:56:32.630Z

As of 19:41 UTC, the Scheduler Queue has returned to baseline. We have confirmed that POS data has been refreshed for all affected POS types (Brink, Toast, Cloud Connect, Lavu, and Lightspeed), including seeing current day Tickets. Webhooks have resumed as well. With the acute phase of the incident being over, we will check for any other impacts before closing the incident.

monitoring2023-04-28T19:02:27.036Z

After scaling up our Scheduler Workers, the queue size has shrunk by ~75%. We will continue to monitor until the queue size is back to baseline.

identified2023-04-28T18:20:06.947Z

Around 18:00 UTC, we noticed that our CloudPOS Scheduler queue had an elevated number of tasks waiting to be run. This would likely cause all CloudPOS data to be stale, including Tickets and Clock Entries. It would also lead to delayed webhooks. We are currently scaling up our Scheduler Workers to process the delayed tasks.

Feb 14, 2023

Report: "API and Webhooks Intermittent Unavailability"

Last update 2023-02-14T22:13:09.885Z

postmortem2023-02-14T22:11:56.326Z

## Executive Summary On January 27, 2023, between UTC 02:37 and 03:40, ECS instances could not be deployed in our environment because a GPG key was changed on a package used on these instances. This caused a cascading outage of Omnivore’s API, with a period of total downtime between 02:50 and 03:40. ## Background and Root Cause Omnivore utilizes Amazon Web Services Elastic Container Service for some of our services. These instances are deployed as needed and built using Chef's configuration management tool. When Chef runs on these instances, it installs software packages that are needed by the instances. Typically, these software packages are in repositories maintained by the operating system. However, there are a few packages that are maintained by software companies that develop the application. These repositories are secured using GnuPG \(GPG\) keys. Software companies will change their GPG keys from time to time for security reasons. When this happens, the software will not be installed, and an error message will be displayed. When this type of error happens with Chef, the installation of the ECS instance is not completed, and the needed extra resources are not deployed. This is what caused this outage. ## Timeline All times are in UTC 02:37: Omnivore infrastructure team receives an alert that ECS instances were not able to be deployed. 02:45: Omnivore infrastructure team attempts to manually raise the number of ECS instances. 02:50: Omnivore infrastructure team receives an alert that the Omnivore API is failing. 02:56: Omnivore infrastructure team pages service team to alert them to the issue. 03:19: Omnivore infrastructure team discovers that Chef is not able to deploy ECS instances. 03:22: Omnivore infrastructure team attempts to run Chef manually to force deployment. 03:40: Omnivore infrastructure team notes that Chef is failing due to a bad GPG key. 03:40: Omnivore infrastructure team downloads and installs new GPG key, allowing Chef to run to completion. ## Action Items 1. Change the process for Chef deployment to include a fresh download of the GPG key on every run. 2. Consider using a “Golden Image” over deploying with Chef.

resolved2023-01-28T04:31:00.528Z

This incident has been resolved.

monitoring2023-01-28T04:00:31.006Z

We have identified the problem and have implemented a fix. The API and webhooks are returning to normal. We will continue to monitor.

investigating2023-01-28T03:09:18.337Z

We are continuing to investigate this issue.

investigating2023-01-28T03:08:51.832Z

We are currently investigating our API and Webhooks having instability.

Oct 28, 2022

Report: "API & Webhook Activity Graph Cloud Provider Outage"

Last update 2022-10-28T22:38:30.486Z

resolved2022-10-28T22:38:30.471Z

This incident has been resolved.

monitoring2022-10-28T17:24:14.178Z

A third-party cloud provider outage is preventing API & Webhook activity graphs from displaying in the Omnivore Control Panel.

Oct 3, 2022

Report: "Toast API - Increased error rate"

Last update 2022-10-03T09:37:53.612Z

resolved2022-10-03T09:37:53.598Z

At 9:30 UTC, we found that a service responsible for authenticating against the Toas API had previously lost its database connection and failed to reconnect. We restarted the service and see that Toast API connections have now returned to normal.

identified2022-10-03T08:34:38.717Z

Beginning around 8:15 UTC, we began seeing an increased number of errors when calling the Toast API, impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Sep 28, 2022

Report: "NCR CloudConnect API - Increased error rate"

Last update 2022-09-28T00:11:22.931Z

resolved2022-09-28T00:11:22.909Z

Error rates returned to normal levels around 22:40 UTC. We will continue to monitor for any further increases.

identified2022-09-27T22:21:32.567Z

We are continuing to work on a fix for this issue.

identified2022-09-27T22:13:05.864Z

Beginning around 21:42 UTC, we began seeing an increased number of errors when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve.

Aug 31, 2022

Report: "CloudPOS (Brink, CloudConnect, Toast, Lavu, Lightspeed) Ticket and Ticket List Errors"

Last update 2022-08-31T18:29:40.264Z

resolved2022-08-31T18:00:00.000Z

For a period of 20 minutes, requests for tickets to CloudPOS locations (Brink, CloudConnect, Toast, Lavu, Lightspeed) were failing, after a release. We have identified the issue and rolled back the release.

Jun 21, 2022

Report: "Toast API - Increased error rate"

Last update 2022-06-21T12:10:55.815Z

resolved2022-06-21T12:10:55.798Z

This incident has been resolved.

identified2022-06-21T12:10:27.257Z

This incident has been resolved

identified2022-06-21T07:30:56.829Z

Beginning around 6:28 UTC, we began seeing an increased number of errors when calling the Toast API impacting ticket reads. API calls to fetch ticket data will likely fail at an increased rate and webhooks may be delayed until service is restored. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

Jun 2, 2022

Report: "Order Error Rate Increase"

Last update 2022-06-02T19:16:13.686Z

resolved2022-06-02T19:16:13.667Z

This incident has been resolved.

monitoring2022-06-02T18:53:11.821Z

Order error rates have returned to normal levels.

investigating2022-06-02T18:35:56.837Z

We are investigating an increate in order error rate.

May 21, 2022

Report: "NCR CloudConnect API - Increased error rate"

Last update 2022-05-21T05:46:12.138Z

resolved2022-05-21T05:46:12.086Z

Error rates returned to normal levels around 2022-05-21T05:46:09.475Z UTC. We will continue to monitor for any further increases.

identified2022-05-21T05:08:25.528Z

Beginning around 2022-05-21T05:08:23.644Z UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

May 19, 2022

Report: "NCR CloudConnect API - Increased error rate"

Last update 2022-05-19T08:59:16.275Z

resolved2022-05-19T08:59:16.218Z

Error rates returned to normal levels around 2022-05-19T08:59:12.027Z UTC. We will continue to monitor for any further increases.

identified2022-05-19T08:50:35.136Z

Beginning around 2022-05-19T08:50:24.056Z UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

May 13, 2022

Report: "NCR CloudConnect API - Increased error rate"

Last update 2022-05-13T08:55:13.788Z

resolved2022-05-13T08:55:13.711Z

Error rates returned to normal levels around 2022-05-13T08:55:11.145Z UTC. We will continue to monitor for any further increases.

identified2022-05-13T08:23:10.340Z

Beginning around 2022-05-13T08:23:04.673Z UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

May 10, 2022

Report: "NCR CloudConnect API - Increased error rate"

Last update 2022-05-10T08:41:30.041Z

resolved2022-05-10T08:41:29.949Z

Error rates returned to normal levels around 2022-05-10T08:41:24.150Z UTC. We will continue to monitor for any further increases.

identified2022-05-10T08:24:31.462Z

Beginning around 2022-05-10T08:24:29.024Z UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

May 6, 2022

Report: "NCR CloudConnect API - Increased error rate"

Last update 2022-05-06T15:54:31.163Z

resolved2022-05-06T15:54:31.073Z

Error rates returned to normal levels around 2022-05-06T15:54:25.191Z UTC. We will continue to monitor for any further increases.

identified2022-05-06T15:53:09.102Z

Beginning around 2022-05-06T15:53:07.156Z UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.

May 4, 2022

Report: "Investigating Webhooks"

Last update 2022-05-04T19:21:13.575Z

resolved2022-05-04T19:21:13.560Z

This incident has been resolved. No transaction data was lost.

monitoring2022-05-04T19:14:54.528Z

A fix is in place. We are actively monitoring. You may experience a higher than normal volume while our system catches up.

investigating2022-05-04T19:04:09.523Z

Webhooks are currently not being sent. We are investigating the cause.

Report: "NCR CloudConnect API - Increased error rate"

Last update 2022-05-04T19:11:19.247Z

resolved2022-05-04T19:11:17.063Z

Error rates returned to normal levels around 2022-05-04T19:11:13.488Z UTC. We will continue to monitor for any further increases.

identified2022-05-04T19:06:26.880Z

Beginning around 2022-05-04T19:06:26.316Z UTC, we began seeing an increased number of timeouts when calling the NCR CloudConnect API impacting ticket and clock entry reads. API calls to fetch ticket and clock entry data will likely fail at an increased rate and webhooks may be delayed until service is restored. We are reaching out to our NCR contacts. We will continue to monitor for the issue to resolve. There are no further technical actions we can take to resolve the issue at this time.