Historical record of incidents for Cheddar
Report: "502 Bad Gateway"
Last updateThis incident has been resolved.
We have rebuilt and redeployed services and are monitoring for performance and residual issues.
We are experiencing downtime as the result of a configuration change. We are working on a fix currently and will update when resolved.
Report: "Website issues"
Last updateWe’ve corrected the issue. The marketing website is back up and functioning normally. We’ll continue to monitor, but are resolving this incident.
The Cheddar marketing website is experiencing issues related to our new deployment procedures. Application API requests appear to be operating normally. We’ve identified the issue and are working to correct.
Report: "Service Temporarily Unavailable"
Last updateThis incident has been resolved. Cheddar is operating normaly.
We've corrected the issue and are continuing to monitor.
While working on updating our deployment process, we began experiencing an elevated level of errors, including API errors. We are currently investigating the issue.
Report: "Deployment Related Issue"
Last updateIssues with both the dashboard login and API appear to have been resolved. We will continue to monitor.
API access has been restored to normal operation. Continuing to investigate dashboard login issues.
After a deployment this morning, we're experiencing an elevated level of errors including dashboard and API login errors. We are currently investigating the issue.
Report: "Service Unavailable Error"
Last updateThe system continues to perform as normal, so we're transitioning this issue to resolved. We're now reviewing the incident to learn more about what went wrong and how we can avoid similar problems in the future. So far, we've determined that a new disk partition reached capacity around 12:30 am EDT this morning which caused the system to become unavailable. We managed to make more disk space available and were back online by 6:15 am EDT. We have multiple redundancies in place to monitor disk space that all failed in this situation. Both our DBA provider and secure hosting provider should have systems to notify us when disk space is close to capacity, but it appears that they either didn't enable monitoring on this particular disk partition or that there's an issue with their system that prevented notifications from being sent. We also have a database redundancy layer in place that should automatically transition to a different database in the event of a failure in a primary database. In this case, the system didn't recognize the failure of the primary database, so the failover was never triggered. To prevent this issue from happening again, we're working with our DBA and hosting provider to confirm that they have monitors in place for this server partition. We're also trying to determine why the system didn't failover to backup database servers and will be making adjustments to ensure that failover is successfully triggered in the event of a similar issue in the future. We want to apologize to our merchants for the inconvenience this issue caused. We know that you and your customers rely on Cheddar being available for essential billing and subscription activity. Rest assured that our team will be taking all steps possible to make sure that Cheddar's uptime isn't affected by disk capacity again. Please get in touch with our support team at support.getcheddar.com if you have any questions.
A fix for the database capacity issue has been implemented and the website, API, and dashboard are back online. We're continuing to monitor the system at this time.
We've identified the issue as an unexpected capacity problem in one of our databases. We've engaged our database management provider and are currently working toward a solution.
Some customers have reported that they are unable to log in to Cheddar or successfully send API calls. When attempting to access Cheddar via the dashboard or API they get a "Service Unavailable" error. We're currently investigating and our team is working to resolve this issue as quickly as possible.
Report: "Intermittent 500 Error Responses"
Last updateFurther monitoring has confirmed that Cheddar is returning normal responses to API requests, so we're updating the status of this issue to Resolved. If you have any questions, please reach out to us at support.getcheddar.com.
Between 9:30 and 9:48 AM EDT (13:30-13:48 UTC) today, Cheddar was intermittently returning 500 errors to API requests made to our platform. We quickly determined the cause of the issue was a deployment our team made this morning. We rolled back the deployment right away and the 500 errors stopped. We’ll continue to monitor Cheddar’s API responses over the next few hours, but we believe the issue is resolved. If you had any API requests result in 500 errors during that window of time, please re-try those requests now. If you have any questions, get in touch with our support team at support.getcheddar.com.
Report: "PayPal IPN Failures"
Last updatePayPal has confirmed that this issue is now resolved.
Cheddar is now successfully receiving IPNs from PayPal. Customers of Cheddar merchants using PayPal Adaptive Payments Preapprovals should now be able to create new preapprovals. We recommend that Cheddar merchants using preapprovals check their IPN history in their PayPal account. If there are any IPNs from the past 48 hours with a status other than "sent," please retry the IPN. If the retry is successful, Cheddar will update any customer records in a PayPal Pending status with the accepted preapproval. We are continuing to monitor the situation.
As of 11/Sep/2018:03:51:49 UTC Cheddar stopped receiving IPNs (instant payment notifications) from PayPal. As a result, Cheddar merchants using PayPal Adaptive Payments Preapprovals for payment processing are reporting that their customers are unable to create new preapprovals at this time. Recurring transactions for existing customers are not affected by this issue. We're currently working with PayPal support to resolve the issue as soon as possible.
Report: "Deliverability issues with default SMTP provider"
Last updateOur monitoring shows that emails continue to be delivered successfully, so we believe the issue is now resolved. If you have any questions or concerns, get in touch with our support team at support.getcheddar.com.
We've implemented a fix that appears to have resolved the issue. We'll continue to monitor the situation, but transactional emails from Cheddar product accounts are being successfully delivered and are no longer being flagged by Google as spam.
We've submitted a request to Google and they're in the process of reviewing it now. As soon as they send us an update, we'll let you know.
Sending has been re-enabled for our default SMTP provider. However, Google may be erroneously marking some of these emails as spam, so some recipients still might not be receiving emails that are sent using our default SMTP configuration. We're working to resolve this issue as soon as possible, and will keep you updated on our progress.
We are currently experiencing compromised deliverability of emails via our default SMTP service. Outgoing customer communication transactional emails from Cheddar product accounts using the default SMTP configuration may not be delivered until the issue is resolved. Customers using their own SMTP service are unaffected. To see if your Cheddar product is using our default SMTP configuration, please visit the customer communications page of your Cheddar dashboard here: https://www.getcheddar.com/admin/emails.
Report: "Data inconsistencies"
Last update#Data Inconsistencies Postmortem Last week, we experienced some issues that temporarily created data inconsistencies and delayed the processing of recurring transactions for some merchants. We know you rely on us to provide a consistent, dependable service, and we regret the disruption that these issues caused. We wanted to take this opportunity to say we’re sorry, and to explain the chain of events that took place from April 29th - May 3rd. **What Happened** Starting April 29, we noticed that the system clocks on our servers were no longer synchronized. This issue was caused by a configuration error in our hosting provider’s time sync settings, which affected all of their data centers. Once our hosting provider told us the issue was resolved, we rebooted our servers on April 30th in an effort to immediately synchronize the clocks. After the reboot, we noticed some inconsistencies between the nodes in our high availability database layer. These issues caused our recurring engine, which automatically runs queued invoices several times a day, to stall. While we worked with our third-party providers to remedy the underlying issues, out of an abundance of caution we disabled the recurring engine on 5/2/18, at around 15:25 UTC. We fully re-enabled it on 5/3/18, at around 20:00 UTC, at which time all of Cheddar’s normal functionality was restored. **Effects** - Due to the time sync issue, multiple queued invoices were created on several merchants’ customers’ accounts. - As a result of the reboot, data sent to Cheddar on April 30th, between approximately 19:04-19:10 UTC, briefly appeared to be “missing” for some of our merchants. For example, some transaction activity or new customer records created during this 6-minute window were unavailable. - The temporary data inconsistencies resulted in duplicate transactions for some of our merchants. - While the recurring engine was disabled, recurring transactions were not automatically being run, and invoices sat in a queued state for longer than usual. **What we did** Throughout this incident, we worked closely with our third-party providers to remedy the underlying issues. In the meantime, we also worked to minimize the impact on our merchants. - Fixing customer records: Some of the customers who had multiple queued invoices were in danger of being auto-canceled by the recurring engine. Our engineering team manually corrected those customer records to prevent cancellation. We’ve also been monitoring for duplicate transactions on customer records so we can let our merchants know they might need to issue refunds. - Restoring data: Thanks to Cheddar’s redundancy, the data that appeared to be “missing” was never truly gone. We worked with our database administrators to restore the data. - Recurring engine: The recurring engine was brought back in service slowly as we monitored for anomalies. **What we’ve learned** The issues we experienced last week had one cause in common: miscommunication with our third-party providers. While technical issues are sometimes inevitable, we recognize our role in minimizing the impact of our upstream services on our customers. Going forward, we’re focusing on some technical and organizational measures to help us fulfill that role: - We’re working to incorporate additional automated monitoring to catch system clock drift. - We’ve updated our shared documentation and conferred with all relevant third-party providers regarding mechanisms for keeping that documentation accurate and up to date. - We’re implementing additional monitoring of the recurring engine processes, so we can be better aware of problems with the recurring engine. - We’re putting procedures in place for communicating via our [status page](https://status.getcheddar.com) and [support forum](http://support.getcheddar.com), so that we can better keep our merchants informed of any adverse conditions. Thanks for your patience while we sorted this out. We appreciate the trust you put in Cheddar by allowing us to take care of one of the most important aspects of your business. Rest assured that we're already working hard on implementing these solutions. As always, if you have any questions or concerns, please reach out to us at the [support forum](http://support.getcheddar.com) and we’ll be happy to help!
This incident has been resolved. We will issue a postmortem in the next couple of days.
Our tests have completed successfully, and the recurring engine is now up and running normally. We will continue to monitor the situation for any further issues.
We are currently testing the recurring engine on some customer records and monitoring for any errors.
All data records that were affected during the 6-minute incident window have now been incorporated into the production database. The small subset of customers that were missing transactions or events in their records yesterday is now repaired. You may still notice that some customer accounts have multiple queued invoices. We’ve identified which customer records were affected, and will be fixing those as soon as possible. While we continue to repair invoice data, the recurring engine is still disabled. We will post updates about the status of the recurring engine soon. In the meantime, it is possible to manually run queued transactions. If there are queued invoices you'd like to go ahead and transact now, click on the queued invoice and hit 'run invoice' in the bottom left corner of the page.
Additional info on today's data discrepancy issues: Cheddar experienced a sync issue during maintenance of our production database. This resulted in an approximately 6 minute gap in data. We didn’t lose this data. We have a redundancy to ensure against this kind of data loss. However, now that additional data has been created, we have to bring that 6 minutes of data back into the production database. Our team of engineers and database administrators have been working tirelessly to do just that. In the meantime we’ve turned off our recurring engine while we fix this issue. Invoices do not transact when the recurring engine is off. The recurring engine is fault tolerant and when re-enabled will transact any invoices that didn’t process while the engine was suspended. We are very close to correcting any missing transactional data and should be able to re-enable the recurring engine soon. In the meantime you may see two problems: 1. A limited amount of missing data from that window Monday night. This affects a small subset of Cheddar customers. 2. Invoices appearing queued and being delayed. This affects all customers. Once resolved: 1. Any missing data should be restored. 2. The recurring engine will be turned on and any queued transactions processed. We will continue to monitor the situation and update status. Primary systems including the API and Dashboard continue to operate normally.
We are currently experiencing database sync issues which have caused data about some transactions to become temporarily unavailable, and some recent transactions may not have run as expected. We have temporarily suspended recurring transactions while we work with our database provider to remedy the issue, and will post updates when available.
Report: "Some API calls returning 500 errors"
Last updateThis incident has been resolved, and the API is functioning normally. Our findings suggest that impact from this incident was minimal. Although Cheddar was returning 500 errors for API calls during the incident window, the application was still properly recording data in the platform (e.g. if a customer create call returned a 500, the customer was still created in your Cheddar product account). We’ve implemented fixes that resolve the underlying issues and correct Cheddar’s API responses, but if you have any concerns about specific updates made during that time, please contact us at team@getcheddar.com.
We identified a database issue that was causing 500 error responses from Cheddar on some API calls and released a fix. The issue should be resolved as of 10:08 EDT. Customers shouldn't be receiving 500 errors at this point. We'll continue to monitor the situation for the next few hours.
API calls are returning 500 errors for some customers. We’re investigating the cause and will provide an update as soon as possible.
Report: "Unable to run transactions"
Last updateAll transactions are executing normally. Retries of errant transactions have completed.
Transactions are executing normally. Recurring transactions that failed during the incident will retry automatically and are expected to run as normal.
All new transactions are currently succeeding. Recurring transactions attempted during the incident are continuing to fail when retried. This is being investigated.
The fix has been implemented and verified. We are continuing to monitor for abnormalities.
The cause has been identified and a fix is in progress.
We've received alerts indicating that transaction running is failing.