Historical record of incidents for Baremetrics
Report: "We’re making some behind-the-scenes updates. Some account data may be delayed while we pause syncing temporarily. Everything will be back to normal shortly!"
Last updateWe are currently investigating this issue.
Report: "Some accounts are behind"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Some accounts are currently behind. We're investigating the issue and monitoring progress as accounts begin to catch up. It may take a bit of time for all accounts to fully update—thanks for your patience!
Report: "Some accounts are behind"
Last updateThis incident has been resolved.
We're experiencing some delays in account updates due to a backlog in our system processes. Things are recovering, but updates may take a little longer than usual.
Report: "Elevated Error Rates"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently investing elevated error rates in our app.
Report: "Some metrics are delayed"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Report: "Intermittent Delays Processing Metrics"
Last updateThis incident has been resolved.
A fix has been implemented and we're monitoring the results.
We have identified the issue and are implementing a solution.
Report: "Issues with Apple metrics"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We have reported the issue to Apple and are waiting for resolution.
Report: "Intermittent connection issues"
Last updateThis incident has been resolved.
We are currently investigating the issue.
Report: "Intermittent blank pages"
Last updateThis incident has been resolved.
A fix has been implemented and we're monitoring the results.
We are currently working on resolving some issues with dashboards not loading. We will work on getting your data back up and running as quickly as possible.
Report: "Intermittent connection issues"
Last updateThis incident has been resolved.
Thank you for your patience as we addressed intermittent problems. If you're still encountering errors, please contact our Customer Success team.
We are currently investigating this issue.
Report: "500 Errors"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We're having some issues. We're working on it! 🔨
Report: "Hosting provider outage"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
Looking to resolve
Report: "Dashboards Returning 500 Error Code For Some Customers"
Last updateSome customers have reported seeing 500 error code on their dashboards. The impact for those customers is 'Major', while other customers do have have issues.
Report: "Intermittent Timeouts"
Last updateWe identified the issue and accounts should be back to updating normally.
We are currently investigating why some accounts are experiencing timeouts and slow updates.
Report: "Customers & Segmentation downtime"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
Our Elasticsearch servers are having some trouble keeping up. This affects some Customers and Segmentation. Other parts of the app will keep working as usual.
Report: "Hosting instability"
Last updateEverything is back to normal. We're continuing to monitor as our systems are picking up where they left off, but don't expect any further issues.
We're back up and running on reduced capacity. Our top priority right now is getting everything back to full capacity, while ensuring stability of the whole system.
We've rebuilt a large proportion of our infrastructure and we're slowly migrating traffic to it. Access should be resumed shortly.
We're still working hard to bring our app back up after networking issues took everything offline. We will post updates as we have them.
We are currently aware of issues occurring at our hosting partner's datacenter which has resulted in us going offline. We are working with them to identify a timeline for resolution.
Report: "Account Update Delays"
Last updateMost accounts are now caught up.
Some accounts are running behind on updates due to a large import. Accounts are in the process of catching up, but it will take time for every account to update. Thanks for your patience!
Report: "Failing Stripe webhooks"
Last updateEverything seems to be back in order and we've confirmed all webhooks are properly processing now.
CloudFlare reports they've resolved the issue on their end. We're monitoring their fix now, but otherwise webhooks should be processing normally now and we're working our way through the backlog of previously failed webhooks.
We've deployed a temporary fix that should allow Stripe API calls to process while we work with CloudFlare.
We've identified the issue is indeed with CloudFlare's rate-limiting incorrectly blocking IP addresses that we've whitelisted. Working with CloudFlare now to address this.
We've detected that roughly half of all webhooks from Stripe are getting blocked. This appears to be a new issue with the way CloudFlare handles rate-limiting. In this case, they're incorrectly limiting calls from Stripe's IP addresses. We're working with them now to resolve this. All failing webhooks will automatically be reprocessed once this is resolved. We'll provide more updates as we have them.
Report: "Recover emails not sending"
Last updateThis incident has been resolved.
We believe the outage has been resolved and email sending has been resumed. We'll keep monitoring the incident to make sure there are no further hiccups.
Our email service provider has identified the issue and is working on a fix now.
Our email service provider appears to be having a widespread outage. Working on a way to potentially reroute emails in the meantime.
Due to an issue with our email service provider, Recover emails are not sending. We're working with them now to resolve this. Any emails not sent are queued up and will be sent out once the issue is resolved. The caveat here is if you have a custom DNS setup for use with Recover, it will not be affected by this. We'll update as we have more info.
Report: "Intermittent availability"
Last updateThis incident has been resolved.
Our DNS provider appears to be having temporary issues preventing some users from accessing Baremetrics. We are monitoring for any changes and will update once resolved.
Report: "Processing Delays"
Last updateThis incident has been resolved.
Background processing has been resumed and workers are catching up with the backlog of tasks.
Background processing has been resumed and workers are catching up with the backlog of tasks.
As part of our ongoing maintenance, plan breakouts have been temporarily disabled.
Our background processing is currently paused. This will prevent your metrics from updating, and some other actions from updating across the API and dashboard.
Report: "Delayed Updates for Metrics"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
Several accounts are experiencing delays in updates for metrics. We are currently investigating the issue.
Report: "Network Issues"
Last updateThis incident has been resolved.
Cloudflare has implemented a fix for this issue and is currently monitoring the results.
One of our network providers is experiencing issues, which is preventing some people from accessing our site and/or API.
Report: "Marketing Site Inaccessible"
Last updateThis incident has been resolved.
Monitoring the situation. Site should be fully accessible to everyone.
Narrowing down the problem. Site should be mostly accessible at this point.
The marketing site is currently returning a 502 Bad Gateway error. Working with our providers to get to the bottom of the issue.
Report: "Database Issues"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Report: "Network Issues"
Last updateEverything should be golden now.
Network traffic seems to have improved. We're now monitoring this to be sure everything is working as it should.
Our network provider is currently performing emergency maintenance on a distribution switch, which is causing requests to randomly get dropped.
We are continuing to investigate this issue.
Our primary network provider is experiencing an outage so you may see some degraded performance. We're working to resolve as quickly as possible.
Report: "Server instability"
Last updateThis incident has been resolved.
We've got a full scale outage. Servers aren't responding at all. Working to figure out the root cause.
A few servers in the main cluster are being problematic. We're digging in to figure out what's up. In the meantime, access to the dashboard may be intermittently problematic.
Report: "Database Maintenance"
Last updateThis incident has been resolved.
All done. Keeping an eye on things.
We need to perform some databases maintenance. Some parts of the API will be offline during this time, specifically creating/updating/deleting subscriptions.
Report: "Recover dashboard unavailable part 2"
Last updateThe Recover dashboard should be back in business for everyone.
The fix is in place, but the roll-out isn't complete. Recover should be restored for everyone soon!
Report: "Recover dashboard unavailable"
Last updateAll better!
We've identified the issue and are working on the fix!
Report: "Data not loading"
Last updateFix deployed.
We're looking in to an issue with dashboard data not loading.
Report: "Dashboard Errors"
Last updateIssue resolved.
Issue found. Deploying fix now.
Looking in to an issues that's preventing the dashboard from loading.
Report: "Infrastructure Migration"
Last updateThis incident has been resolved.
All systems are back up and running. We will continue to monitor everything to ensure the migration has gone smoothly.
We are continuing to monitor for any further issues.
Dashboard and API are back online. We still have some background systems down for maintenance, and will be bringing them back online over the next few hours.
The migration is going well, the dashboard should be back online in the next hour or so. Thank for you for patience.
We are working on a full infrastructure migration overnight. During this time the dashboard may be inaccessible for brief periods of time. We'll post status updates as necessary.
Report: "Database availability issues...again"
Last updateThis incident has been resolved.
Making progress. Dashboard should be back for most.
Dealing with database issues again. As before, digging in to address this as quickly as possible.
Report: "Database availability issues"
Last updateThe database seems to have stabilized, however we'll be keeping an eye on things.
Working on resolving the issue now. Dashboard may be sporadically down a bit longer.
Looking in to database issues.
Report: "Database maintenance"
Last updateWe should be in the clear now!
Still working through stability issues, but the dashboard should be back for most.
Doing some quick database maintenance as we transition some data. Dashboard will be offline for a bit. Getting it back online as quickly as possible.
Report: "Networking issues"
Last updateWe believe the issue is resolved, though we'll continue to monitor to be sure.
Still monitoring network activity to make sure everything has stabilized. We've resumed metric processing along with Recover and report emails. Most accounts are caught back up now.
We believe the networking issue has been resolved, though we're actively monitoring and working with network providers to verify and be sure everything has stabilized.
Still working through network issues. We're currently actively debugging issues with the network from multiple points within our infrastructure to narrow down the problem. Still appears to be a higher level network issue with our ISP outside of our direct control, but we're exhausting all options.
We're putting the dashboard in Maintenance Mode while we work with network providers to resolve the issue.
We've paused Recover and Report emails as we continue to investigate. These will queued up and sent once the networking issue is resolved.
We're seeing issues across multiple services, which appears to be a wider networking issue with our ISP. We've paused metric updates while we investigate. You may encounter the occasional internal server error. We'll update as we know more.
Report: "Dashboard availability issues"
Last updateAll sorted, everyone is caught up. Thanks for your patience folks!
We're back up and stabilized. Working on the backlog of webhooks and data processing now.
Were back for a bit, but our fix didn't hold, so going to do some digging. Putting the dashboard back in to maintenance mode.
Bringing servers back online now. Will be a few more minutes before we come out of maintenance mode.
We've found the issue and are working on rolling out the fix for it now.
We're having issues with availability and speed of the main metrics dashboard, potentially caused by a rogue database. Putting the app in maintenance mode while we investigate.
Report: "$0 Metrics"
Last updateOkay, sorted! Sorry about that folks, had to be done. 🛠
We're clearing the system that caches recent metrics, so some folks may see $0 metrics for a few minutes or possibly a bit longer. Appreciate your patience! We're doing some house cleaning in advance of a big launch tomorrow.
Report: "Updating Ruby"
Last updateAaaand we're back. Sorry about that folks!
This, plus our hosting provider having issues is causing recovery to be a bit slower than we hoped.
Update went a bit sideways but we're working on it. Sorry folks! It's a very important security update so it had to be done.
We're updating to the most stable version of Ruby, Baremetrics will be back momentarily!
Report: "$0 Metrics"
Last updateThis incident has been resolved.
We are rebuilding everyone's account. While it's being rebuilt, you may see "$0" for some metrics. Once your rebuild is done, things will be back to normal. Shouldn't be too long! 🙌
A subset of customers are seeing 0'd out metrics. We're on it! A big data migration that we were (trying) to do quietly went sideways for a few accounts.
Report: "Maintenance"
Last updateAll done!
We are performing some needed maintenance
Report: "Dashboard Outage"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Maintenance"
Last updateAll done!
We are performing some quick maintenance on our DB. Most users will not notice anything, but for people that do, we should be back to normal very soon.
Report: "Customer Search"
Last updateThis incident has been resolved.
All accounts should now be back online. We are continuing to monitor this.
Customer search is back for most accounts, with a few remaining.
Just an update on what is happening: We use Elasticsearch for our customer search and profile pages. This allows us to scale search out to millions of records, and still deliver very fast search results. Yesterday elasticsearch crashed, due to hitting the JVM 32gb heap limit. However, we did not notice this error as it was buried in the logs around other errors. We attempted to restore the cluster, which hit the limit again a few hours later. Right now, we are scaling up more machines, and rebalancing the elasticsearch cluster to resolve this issue. Sadly the restore time is slow, but we are making progress.
We have identified the issue, but it may take a while for this to recover. We will keep you updated on the progress.
The customer search section of the dashboard and API are currently unavailable
Report: "Maintenance"
Last updateDone!
Doing another quick round of maintenance. Won't be long.
Report: "Maintenance"
Last updateThis incident has been resolved.
Nearly done! Won't be long.
Doing some quick DB maintenance. Won't be long!
Report: "Maintenance"
Last updateThis incident has been resolved.
Performing some quick emergency maintenance. We will be back soon!
Report: "Maintenance"
Last updateThis incident has been resolved.
Performing some quick emergency maintenance. We will be back soon!
Report: "Database Outage"
Last updateRemaining accounts will be caught up shortly. A post mortem report is on the way as well.
95% of accounts are now fully caught up. The rest will take a bit more time.
Most accounts are caught back up. Still working through the last few and monitoring overall stability.
During the night we managed to get most accounts caught up, but we have had to slow things down again this morning to do some more work on the DB cluster. We should see most accounts caught up today.
We believe we’ve found the issue and are working on rolling out a solution now.
We’re still trying to get to the bottom of the segmentation faults in the database cluster. Working directly with our database provider to create a new binary that gives us more insight in to the cause of the faults.
We are experiencing segmentation faults within our database cluster, this is causing nodes in the cluster to restart, which then delays our metric processing anymore. We are working as fast as we can to figure out a solution to this.
We are still working through these issues, but accounts are processing (at a slower rate than normal) and should be caught up soon.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Database Outage"
Last updateThis incident has been resolved.
Most accounts are caught up now. We will update this page when all accounts are back to normal.
We are continuing to monitor this, and ensure that accounts are caught up properly.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.