Simplero

Is Simplero Down Right Now? Check if there is a current outage ongoing.

Simplero is currently Operational

Last checked from Simplero's official status page

Historical record of incidents for Simplero

Report: "AI Chat Bot on sites not working"

Last update
resolved

AI Chat Bot on sites should be working again.

identified

Actually, it was our own boo-boo. We deleted a team-member who no longer works here from our OpenAI account, but our servers were using credentials associated to their user. We are changing the credentials right now. Expect everything to be working in 20-25minutes.

identified

The service we use for chat bot (OpenAI) is currently down. As a result the AI chat bot on sites is currently not working.

Report: "Issues with email deliveries"

Last update
resolved

This incident has been resolved.

identified

Our email provider Sendgrid is dealing with an incident which may delay email deliveries.

Report: "All pages listing course lessons are currently broken -- a fix is being deployed and should be out in 8 minutes..."

Last update
resolved

All is working in the land of Simplero again.

investigating

We are currently investigating this issue.

Report: "Email Sending is Down"

Last update
resolved

Our emails have been unsuspended and they should be up and running again. Emails sent during the suspension have now been delivered.

monitoring

Our emails have been unsuspended and they should be up and running again. We are working to confirm if emails sent during the suspension will still be sent or if they will need to be resent.

identified

We're in touch with several people at Twilio, but no one is able to actually do anything because it's Christmas here in the US. It's pretty remarkable that a $17Bn market cap ~10,000 person company cannot find a single person who's able to flip a simple switch to rectify an obvious mistake. But that's where we're at. We've also switched over transactional email (login information, receipts/invoices, forgotten password, etc.) to use the channel that does let emails go through. More details in the community: https://simplero.community/forum/posts/193635-email-down

identified

We now know the reason as for the suspension (a phishing e-mail sent to one of our members which was forwarded by our systems to the same member as a notification email). We are still waiting on our email delivery system to restore our account.

investigating

Emails are not being delivered. Our email delivery system suddenly suspended our email sending without a clear reason. We have asked for urgent support from them and are waiting for a response.

Report: "Simplero is down"

Last update
resolved

One of our webservers (out of 10) went down for ~45 minutes. We've restarted it so the problem should be fixed. Weirdly enough, our automatic alerts didn't catch this downtime. We'll continue to monitor and figure out a way to setup automatic alerts for this case so we're alerted early on.

investigating

Some people are unable to access Simplero. We are investigating the issue.

Report: "Simplero is down"

Last update
resolved

We've resolved the issue and everything should be back to normal.

identified

Our engineers are working on a spam traffic attack that's bringing us down.

Report: "Simplero is down"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Background processing and API is down"

Last update
resolved

The email stats and other stuff is still catching up and will be updated very soon.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are investigating the issue

Report: "Email Delivery Delays"

Last update
resolved

This incident has been resolved.

identified

We are currently experiencing an issue impacting our email delivery system. Users may notice delays in receiving emails sent through our platform. Current Status: Our engineering team has identified the root cause as an unexpected surge in email load, leading to a bottleneck in our processing queues. We are actively working fixing it.

Report: "Simplero is down"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating the issue.

Report: "Errors on Site admin pages"

Last update
resolved

All done. So sorry about this.

identified

All admin pages for sites not on the new experience are throwing errors right now. A fix is going out. Should be all done within 15-20 minutes. (That's how long it takes to deploy and update.)

Report: "Course overview pages are currently broken"

Last update
resolved

All fixed. So sorry about that.

monitoring

A fix is going out right now. The courses themselves are fine, but the overview page is throwing a 500 server error.

Report: "Search Functionality Disruption"

Last update
resolved

This incident has been resolved.

monitoring

Search should be working as expected, we are monitoring for any issues.

identified

We are currently experiencing an issue with our search functionality. Our team is aware of the problem and is working diligently to resolve it as soon as possible. We apologize for any inconvenience this may cause and appreciate your patience.

Report: "Simplero is down"

Last update
resolved

We have restarted our database which got us back!

investigating

Seems to be affecting our database which is causing all Simplero admin and user facing pages to be down. The Engineering team is investigating.

Report: "FontAwesome is Down 👎"

Last update
resolved

FontAwesome is back as well as all the fabulous icons and texts 💃đŸ’Ș

investigating

Fontawesome is down 😞 This is affecting fonts and icons used in Simplero. G o here to see their status updates: https://status.fortawesome.com/ We'll do our best to update as we get more information đŸ‘·

Report: "Simplero is down right now.."

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Email Delivery Delay"

Last update
resolved

This incident has been resolved.

identified

We are currently experiencing issues with our email provider, which has resulted in delays in email delivery. Outgoing emails may be affected. Our technical team is actively working on resolving this issue and is in communication with the email provider.

Report: "Investigating issues accessing the platform"

Last update
resolved

This incident has been resolved. What happened? We created a new API endpoint and this was used at a much higher rate that we were anticipating. This created a logjam amongst our backend processing which spilled over to page loads. We are so sorry for that! We have now added rate-limiting to this endpoint and are modifying it in a way that prevents this from happening again.

identified

We've identified an issue that may be the cause of the down time. We are deploying a fix and will continue to monitor.

investigating

We are currently investigating this issue.

Report: "We have a problem, we’re working on it, it seems to be affecting checkout pages and video assets"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Our "pusher" is extremely busy at the moment, which handles the "purchase processing" screen, video encoding, and transcripts generation. The following are expected to be affected: 1. The "purchase processing" screen will not automatically move on: the user purchasing from your site will need to click the link to force the redirect. 2. Video encoding status will not automatically update in your dashboard but the encoding will still process: you'll just need to refresh the page to see it updated. Video transcription status will not automatically update in your dashboard but it will be generated on the background: you'll just need to refresh the page to see it updated.

Report: "Instagram feeds are down"

Last update
resolved

Instagram feeds were working again as of Feb 16th. Did we remember to update this? No. No we did not.

monitoring

Our integration with Instagram is currently being reviewed by Meta. We’ve submitted the information we need to submit and the Instagram feed section should start working again within 2-3 days. Please hide your Instagram sections for now.

Report: "Attachments (image/file uploads/mentions) on comments/forum posts uploaded 2 days ago not being displayed"

Last update
resolved

We've fixed attachments and mentions posted between February 3 and 5. All attachments and mentions should be functional again.

identified

We have fixed the issue for attachments and mentions posted before February 3 and all those posted going forward. We are working on a fix for those posted between February 3 and 5.

identified

The issue has been identified and a fix is being implemented.

Report: "Website degraded performance"

Last update
resolved

Website performance was degraded for about 30 minutes. It has gone back to normal. We subsequently found the root cause and fixed it.

Report: "All sites showing error code"

Last update
resolved

Fixed.

identified

Will show a message like "ERROR: undefined method `google?' for nil:NilClass" or show the site without any styles at all. A fix is currently being deployed. So sorry about this.

Report: "Database upgrade has stalled Broadcast and Email sendings"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating why Broadcasts and Emails are not sending after our Database upgrade. We will update you as soon as possible.

Report: "Looks like AWS is down"

Last update
resolved

AWS should be back online

monitoring

We have disabled a part of our logging service that depends on the affected AWS region. Everything on Simplero should be working again. We are continuously monitoring for other issues that may come up - none so far.

investigating

We are continuing to investigate this issue.

investigating

Looks like Amazon Web Services is having issues causing outages on Simplero. We are investigating further...

Report: "Emails may not be sending"

Last update
resolved

This issue appears to have been resolved. A small number of emails may not have been sent between 2:28 PM and 3:34 PM EST on October 12. If you sent messages around that time, please check to see if the broadcasts are marked as 'not delivered'.

monitoring

This issue appears to have been resolved, but we are monitoring to make sure no further issues occur.

investigating

We are experiencing an issue were some emails may not be delivering.

Report: "Instagram is having some problems with their API"

Last update
resolved

The virtual hugs worked! Facebook/Instagram have announced that their API is back up. If your account's feed disconnected during this outage you will now be able to reconnect it in Settings > Integrations. Please check yours! Some disconnected and some didn't...

monitoring

We are monitoring to see when Instagram resolves their issue. Let's all send them virtual hugs...

investigating

At around 1AM ET on July 24, we started experiencing problems with our Instagram integration. After much digging on our end (go Owais!), it turns out to be an issue with the Instagram API itself, and not with Simplero. As a result, your Instagram integrations might not work as expected until Instagram resolves these issues. We will monitor and you can also follow along here: Facebook's status page: https://status.fb.com/graph-api

Report: "Auto-response and Automation email delivery stats are not complete"

Last update
resolved

This issue is now resolved and emails sent during this issue should now show correct statistics.

monitoring

We have implemented a fix and new stats should now be recorded correctly. We are monitoring this fix and exploring methods to update the data on affected email during the issue.

investigating

All emails are still being sent, don't worry! But we are currently investigating an issue where the email stats for these messages are zero or incomplete. The number of 'Delivered' emails is not correct, and thus the percentages of other things (like 'Opens') are also wonky. As far as we can tell, other number-based metrics like 'Opens' are accurate - only percentages are affected. The problem may lie with SendGrid, but we haven't fully identified the issue yet. We're on it, though! Our apologies for any inconvenience.

Report: "Saved cards not working for new purchases"

Last update
resolved

We believe we have this all straightened out, and saved cards are again available for new purchases.

identified

Previously-saved cards are temporarily not working for making new purchases. Complete fix expected within a few hours. In the meantime, saved cards do not show as a payment option, so the checkout process for repeat customers is somewhat worse than normal but fully functional. (Only cards processed via Stripe are affected, no other processors.)

Report: "Something is Amiss"

Last update
resolved

We had a backlog of running jobs. Jobs are running again and we are seeing emails and media files uploading again.

investigating

We are currently investigating an issue with emails not being sent and video encoding. We'll update as soon as we have the issue resolved.

Report: "Links in email briefly broken"

Last update
resolved

For five minutes or so, anyone clicking a link in an email got an error page. If they tried again in a few minutes, it worked correctly. No change too small for going through the proper steps. Our head of engineering is having a stern talk about expectations and SOPs with...himself. Mea culpa. -Joshua

Report: "Certain email deliveries delayed"

Last update
resolved

Some deliveries failed last night between 9:20 PM and 2:20 AM Eastern. We corrected the problem (one of our mail-sending servers rapidly ran out of disk space due to an unrelated series of unfortunate events) and re-sent all the failed emails—except where we could tell that the account owner had already re-sent them. This really was quite the freak combination of problems, but we're taking steps to make sure similar processes can't use up the disk again.

Report: "The case of the unreported deliveries"

Last update
resolved

You may have noticed unusually low % delivered for mailings sent in the last day or so. For about 24 hours starting at 1:45 PM Eastern (18:45 UTC) on November 2, Simplero did not record deliveries or bounces for email addresses with capital letters in them. The mail still got delivered as it always does! Unfortunately we can't get those delivery events back, and affected mailings are going to have somewhat weird-looking reports. Opens and clicks were still tracked correctly.

Report: "Database upgrade"

Last update
postmortem

Thursday we set in motion some infrastructure upgrades—very carefully, behind the scenes. But it turns out Maria DB has a bug that caused it to “leak” memory when using a certain kind of data compression, and over the course of several hours, it consumed all available memory, slowed down, and rebooted itself. That caused the few minutes of downtime on Thursday evening. We’ve never had a problem like that before, but there’s a first time for everything. Now we have alarms so we’ll be notified of any memory issues with the database long before they cause a problem. We also decided to upgrade Maria DB to a version that fixes the memory leak bug. It’s a so-called minor version upgrade, and Amazon even offers to do them for you automatically during a short regularly-scheduled maintenance window, so we expected a few minutes of downtime. Instead, as you know, there was over an hour Saturday night when the database \(and hence the entire application\) was inaccessible. And once the process started there was no stopping it: we were at the mercy of Amazon Web Services. Going forward, we’ll announce ahead of time on [status.simplero.com](http://status.simplero.com) and in our Facebook group any time we plan even a few minutes of downtime. And we’re implementing a plan to be able to upgrade the database with—for real—no more than a few minutes of downtime.

resolved

And, we're back! Sorry that took a bit longer than expected. All is safe and sound and operational.

identified

We're currently doing a database upgrade. We expect to be back online in a few minutes. Sorry for the wait.

Report: "More database upgrade"

Last update
resolved

This incident has been resolved.

monitoring

Upgrade process is completely finished. No data was harmed in the upgrading of this database.

monitoring

We've been back up for a while now, and we're fairly sure we're out of the woods. But given that we thought we were done 30 minutes ago and then we weren't, we'll leave this Status as Monitoring. To be on the safe side.

investigating

We are continuing to investigate this issue.

investigating

Apparently our database wasn't quite done updating. This is still expected downtime, it's just taking longer then we'd expected. We're very sorry this is taking so long.

Report: "Something is amiss"

Last update
resolved

And we're back in business! We'll post more details here about what happened after we do a full post-mortem.

investigating

As you've noticed, something is amiss in Simplero-land. We're on it and will get it fixed ASAP.

Report: "Site is down ... working on it"

Last update
resolved

All good now. Thanks for your patience.

identified

Most stuff is back online now. Looks like it's just our own website (simplero.com) that's still borked. Your sites and services are working fine, and you can login to your account by going to youraccount.simplero.com/admin.

investigating

We made a boo-boo. We're working hard on restoring service. So sorry, guys. We know we screwed up.

Report: "Brief interruptions caused by maintenance"

Last update
resolved

It's all cleaned up now. We apologize for the inconvenience. A few times the site was offline and everything got paused for a minute, but it's all back to normal, and there should be no lasting effects.

identified

We're experiencing a few brief interruptions in service this morning due to some unexpected problems during system maintenance. We're working on getting it all cleaned up.

Report: "Mail delayed by SendGrid outage"

Last update
resolved

SendGrid is reporting that systems are back online. I still wouldn't be surprised if some inbound and outbound messages are delayed.

identified

According to https://status.sendgrid.com, SendGrid is having an outage across all capabilities. Mail sending will be delayed. Our architecture is designed so that mail will get delivered automatically as soon as Sendgrid is back online.

Report: "Temporary network error caused downtime for sites"

Last update
postmortem

A temporary error with domain name resolution happened to coincide with our process that checks to see that domain names are still configured to point to Simplero, which caused our system to see many domains as no longer pointing to Simplero, which caused that process to mark them inactive. This kind of problem has never happened before in all the years we have supported custom domains, but the system design was still a mistake on our part: DNS systems _can_ fail, so we shouldn’t have had a system that deactivated sites based on a single check. We have improved the system so that an active domain must fail multiple checks over a couple days before it’s deactivated.

resolved

Our systems suffered temporary, partial errors with domain name resolution this morning, which resulted in a number of customer websites temporarily failing to display. We're still investigating to determine exactly what went wrong and what sequence of events may have caused sites to be offline any longer than necessary.

Report: "Mail sending down"

Last update
postmortem

Our mail sending partner decided to change all login credentials at 20 minutes past midnight US Eastern Time on a Saturday, without notice, in a way that broke all of our email sending completely. Emails just stopped going out. This is terrible on their part. We're going to reevaluate our business relationship with them, we're going to obviously do everything we can to make sure this won't happen again in the future, and we will create a system to catch a situation like this automatically, and immediately, going forward. I'm so sorry. This is absolutely horrific. Nothing like this has ever happened before in our 11\+ year history, and I've never experienced a supplier behaving this irresponsibly before. We've definitely learned from this. With sincere apologies, –Calvin

resolved

Backlogged messaged have been sent.

monitoring

Email sending is working again, and we are delivering all mail that should have been sent earlier today. We're monitoring to make sure everything gets sent.

identified

All mail sending from Simplero is currently failing. We have identified the problem and are working to correct it.

Report: "Brief downtime"

Last update
resolved

We had nine minutes of downtime from 12:25 AM to 12:34 AM US Eastern time. To support a new feature, a developer made a configuration change of a kind we rarely need to make, and it didn't go well. We're improving our internal documentation, and this won't happen again.

Report: "Notification emails delayed"

Last update
resolved

Notification emails and other one-at-a-time emails across Simplero were stalled from 11:43 PM US Eastern time last night until 8:52 AM this morning. All such emails were delivered starting at 8:52 AM. The problem was caused by a configuration error which is now fixed. Broadcasts and newsletters delivered normally and were not affected.

Report: "Some purchases failed during an hour due to network issues"

Last update
resolved

From 1:22 PM to 2:32 PM US Eastern today, some of our servers were unable to make connections to outside services, including payment gateways. Some payments attempted during this window failed. Full connectivity has been restored. We sincerely apologize for the outage.

Report: "Intermittent connectivity"

Last update
postmortem

This morning we had about 30 minutes of intermittent failures affecting the Simplero software and customer websites—including [simplero.com](http://simplero.com): we use our own stuff! That was followed by a few minutes of all services being down completely. We’re so sorry about that! Here’s what happened. We deploy a new version of Simplero every time we fix or improve something, typically several times a day. We keep a few previous versions around, and the oldest version gets cleaned up as a new version gets deployed. One of our deploys this morning failed, and the old version kept running. That’s as it should be, but the failure today was silent: we didn’t realize anything had gone wrong. A few more deploys later, the old version was still running, but it was old enough that the application files got cleaned up right out from under the running application on one of our servers. \(All your media, images, text, customer data, and any other files you’ve added to your Simplero were just fine. Only the application itself was affected.\) Another deploy trying to fix the problem meant the old, still-running version got cleaned up on every server, and we went from intermittently down to completely down. Finally, we realized the root cause and undid the changes that were causing new deploys to fail. Going forward, we’re changing our deploy process to make a silent failure like this visible so we can roll it back immediately. Sorry we let you down: we’ve learned from this error and we’ll make sure this kind of failure can’t happen again. Thank you for your patience and for your trust in Simplero.

resolved

We're back online.

investigating

We're completely down now. Deploying a fix we believe will solve it completely. Fingers crossed.

investigating

We've received report of some sites experiencing intermittent connectivity issues. We're currently investigating the issue.

Report: "System Wide Outage"

Last update
postmortem

Here’s what happened yesterday with our longest downtime in 5 years. First, our background jobs got stuck, and we got a notification about it. It was strange, because there hadn’t been a recent deploy or any other recent event that would correlate to that. Then, in an attempt to get them unstuck, a team member made a quick decision to run a full deploy. That turned out to be a mistake, because that ended up taking down EVERYTHING, including our web servers, so now the site was completely down. To be fair, though, given what turned out to be the cause, the web servers would probably have stopped responding fairly soon after, anyway. As soon as the site was down, it was all hands on deck. We spent the majority of the time just trying to figure out what the heck was going on. There was nothing in the logs, no indications of what could be causing this. We tried the logical route: It started with background jobs, it spread to the web servers when they were redeployed. We also, of course, tried the good old “turn it off and back on” method, but, predictably, it didn’t do anything to fix it. Finally we got a clue. Some requests did go through, and they threw an error from our PostgreSQL database saying the connection was bad. That pointed us in the direction of the logging server running PostgreSQL. As soon as we validated that, it was an easy fix to turn off logging to PostgreSQL, which is safe to do since we only use it for internal debugging purposes. Then the site was back up. But what had gone wrong with our PostgreSQL database? We keep stuff there for a limited period of time, and then delete it. It looks like the way we deleted things weren’t very efficient, and we also never VACUUM’d our database. It’s been many years since I last used PostgreSQL, and that was something I’d forgotten you should do every so often. One thing that threw us was that our system is designed such that if logging to PostgreSQL fails for some reason, the application should be able to keep serving requests. Clearly something about that wasn’t working quite right. We’ve now changed our process for how we delete old rows, and implemented a system to VACUUM the database more regularly, as well as split this process out from some other processes it was lumped in with. Again, I’m super sorry about this. The big factor was just how long it took us to figure out what was going on here. It was completely mystifying for the longest time, until we finally got a clue that put us on the right track. Thank you for being here with us. We’re grateful every day.

resolved

Everything's operational. Our specialized logging system is still offline, but that doesn't affect operations. We're doing some maintenance and cleanup on it, before putting it back in commission. This was the longest-running downtime in five years, and I'm terribly terribly sorry we let you down like this. We are, of course, fixing all of the issues that led to this downtime.

identified

Yup, that was it. We're back. Now on to figuring out what happened to our PostgreSQL installation. It seems like something's really screwed there.

investigating

We think we figured out what's going on. It's related to our logging infrastructure.

investigating

This is the strangest thing I've seen in my almost 40 years in software development. It's certainly the worst downtime we've had in over five years. We've got all hands on deck trying to figure this thing out, but at this point, we don't even know what's causing the processes to not respond correctly. I'm so so sorry. We take this stuff supremely seriously, and we're working as HARD as we can to bring everything back up.

investigating

We are currently experiencing a system-wide outage. We are looking into it and will update with details as soon as we can. Thanks for your patience!

Report: "Video Encoding issues at AWS"

Last update
resolved

Hallelujah! Media files are encoding effectively now. Join me in raising a glass to our team of coders who figured out several challenging problems today. Thank you all for your patience.

monitoring

We are continuing to monitor for any further issues.

monitoring

There's a new twist in today's media file encoding challenge. Dev team is investigating as fast as their fingers can take them. Thanks for your continued patience.

monitoring

We've figured out a solution and things are catching up. This will be a permanent improvement going forward. That's the good news. Thank you so much for your patience!

identified

Network issues at AWS are affecting video encoding.

Report: "Site is down"

Last update
resolved

We're back up and systems are operational again. Thanks all.

investigating

We are continuing to investigate this issue.

investigating

We implemented a change that resulted in an outage. We are working to resolve the situation and expect to be back up soon. Thank you for your patience.

Report: "Video encoding is backlogged at the moment"

Last update
resolved

Everything's humming along nicely now. Thank you for your patience.

monitoring

Looks like AWS is behaving again. We still have a little bit of a backlog, but everything is moving forward as it should.

investigating

Things are progressing, but the network issues are making it slow. We'll get through it, but we need your patience here.

investigating

It looks like a network issue with Amazon's web services that makes connections between our encoding servers and S3 where the video files are stored, makes download/upload very slow and unreliable.

investigating

Processing is stuck for a number of videos. We're working on getting it all cleared up as soon as we can.

Report: "Switching over Content Distribution Network"

Last update
resolved

Almost everything's switched over now, and things seem to be working well.

investigating

We're switching over our Content Distribution Network. There may be breakage in the app while we do this, but we're monitoring closely. If you notice something, please let us know, but most likely we're already on it.