ReadMe

Is ReadMe Down Right Now? Check if there is a current outage ongoing.

ReadMe is currently Operational

Last checked from ReadMe's official status page

Historical record of incidents for ReadMe

Report: "Widespread service provider outage"

Last update
monitoring

While ReadMe services have not been affected, our team is monitoring the status of widespread outages with Cloudflare and GCS.

Report: "OAuth end-user authentication for Enterprise customers is down"

Last update
investigating

End-user authentication via OAuth is currently unavailable due to an availability incident with Heroku, our upstream provider. They are currently investigating a resolution for the issue.

Report: "ReadMe Hubs and Dashboard are down"

Last update
resolved

There was an issue with one of our upstream providers. Service is remaining stable and we’re investigating the root cause with our provider.

monitoring

We're back up and continuing to monitor.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating.

Report: "ReadMe Hubs and Dashboard are down"

Last update
Resolved

There was an issue with one of our upstream providers. Service is remaining stable and we’re investigating the root cause with our provider.

Monitoring

We're back up and continuing to monitor.

Update

We are continuing to investigate this issue.

Investigating

We are currently investigating.

Report: "Slightly elevated error rates"

Last update
resolved

We have eliminated the slightly elevated error rates.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Search results not showing up in "All" tab on hubs"

Last update
resolved

This issue has now been resolved. A database setting was changed which conflicted with the way search works when searching all sections. This setting has now been reverted and search is functioning as before.

investigating

There is a currently ongoing issue where the "All" tab when searching for search results is coming up empty. Searching specifically inside of each section is still working. We are investigating and will update when we have a fix.

Report: "Some customers are experiencing pages that are not loading correctly"

Last update
resolved

There was an issue with our upstream provider which is now resolved.

investigating

We are currently investigating

Report: "Slow performance in Refactored dashboard"

Last update
resolved

This incident has been resolved.

monitoring

The issue lasted a few minutes and the admin dashboard is currently online. We’re continuing to monitor.

Report: "Slow performance across hubs"

Last update
resolved

ReadMe Refactored project performance has been restored.

investigating

Current ReadMe Refactored experience customers are experiencing performance issues.

Report: "ReadMe Hubs are down"

Last update
resolved

This incident has been resolved.

investigating

The hubs are currently operating as expected, but we are actively monitoring and investigating the root cause of this issue. If you are still experiencing any problems, please contact support@readme.io.

investigating

We are currently investigating this issue.

Report: "ReadMe Hubs are down"

Last update
resolved

We have added server capacity to deal with an increase in traffic and load. We will continue to monitor but all sites appear to be working as expected again. Please reach out to support@readme.io if you are still having problems.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating the issue

Report: "SSL Certs sometimes fail to generate"

Last update
resolved

This incident has been resolved.

monitoring

We found the culprit on Friday! Our hosting provider (who we love!) was marking some domains in Cloudflare as theirs, and Cloudflare got confused. We're still waiting for a report from them to understand how that happened, but for now it seems to be an issue of the past.

investigating

We're currently seeing that for a very small percentage of our users, we're having trouble reissuing SSL certs with Cloudflare. We're working with Cloudflare to identify the issue and correct it. We're monitoring it, but please reach out to support@readme.io if you experience this! This is only happening for a small portion of customers.

Report: "Slow performance across all hubs and dashboard"

Last update
resolved

This incident has been resolved.

investigating

Currently investigating this incident

Report: "Intermittent dashboard issues"

Last update
resolved

The dashboard has remained up ever since. We're continuing to investigate to find a root cause.

monitoring

The dashboard located at https://dash.readme.com/ seems to be going in and out of healthy status. It is currently up but we're monitoring and investigating to find a root cause. Everything else seems unaffected at the present time.

Report: "Slow performance on ReadMe hubs and dashboard"

Last update
resolved

After monitoring, performance has improved significantly

investigating

We're investigating slow performance on both the dashboard and hub

Report: "Slow performance across hubs"

Last update
resolved

We added server capacity to deal with an increase in load. We’ll continue to monitor but all sites appear to be working as expected. Please reach out to support@readme.io if you are still having problems.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Current ReadMe Refactored experience customers are experiencing performance issues

Report: "ReadMe Hubs and Dashboard are down"

Last update
resolved

This incident has been resolved.

monitoring

Access to the ReadMe application has been restored and we are monitoring the results. Formal investigation and a post-mortem to follow. We apologize for the downtime.

identified

We're working on restoring our dash and hubs currently

investigating

We are currently investigating the issue

Report: "We are seeing an increase in 503s and timeouts across our hubs"

Last update
resolved

We have added server capacity to deal with an increase in traffic and load. We will continue to monitor but all sites appear to be working as expected again. Please reach out to support@readme.io if you are still having problems.

investigating

We are currently investigating timeouts and 404s across hubs.

Report: "dash.readme.com is currently down"

Last update
resolved

Due to a deploy that went out, we had about 18 minutes of degraded performance.

monitoring

Due to a deploy that went out, we had about 18 minutes of degraded performance.

monitoring

We are continuing to monitor for any further issues.

monitoring

We have rolled back a recent deploy which has resolved the issue. Currently monitoring.

investigating

We are currently investigating admin dashboard outage

Report: "dash.readme.com is unresponsive"

Last update
resolved

This incident has been resolved.

monitoring

We've deployed changes successfully and our admin dash is stable. We're currently monitoring this to see if there are any issues.

investigating

We're continue to investigate the admin dash, as there are periods of timeouts for some people on the platform.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

monitoring

We've rolled back our previous deploy and our admin dashboard is back up, but we're still investigating the cause of the issue.

investigating

After a deploy, our dashboard is currently down. We are working on bringing this back up shortly.

Report: "Slow API Explorer requests when using 'Try It'"

Last update
resolved

Requests should now be going through without issue. We'll continue to monitor requests to make sure no one is rate limited by accident.

monitoring

We've found the issue and are monitoring the changes. Requests should be able to go through once again.

investigating

We are currently investigating CORS errors when making API requests from ReadMe

Report: "Versions not appearing in hub"

Last update
resolved

We have resolved the issue of versions not appearing correctly in the public facing hubs. If you are still not seeing your list of versions, please contact support@readme.com and we will be happy to resolve this!

identified

We have identified an issue where other project versions are not appearing in the hub and are actively working toward a resolution

Report: "Owlbot is unavailable for some users"

Last update
resolved

This incident was created out of an abundance of caution. Since then our service provider has confirmed that their API was unaffected, so we can also now confirm that Owlbot is doing owl-right.

monitoring

Owlbot appears to be functioning correctly despite reported issues with our service provider.

investigating

Our AI-powered search tool, Owlbot, is down for some users.

Report: "Very high request volume as well as high error rate"

Last update
postmortem

On Monday, May 20, ReadMe users experienced issues accessing the ReadMe Dashboard \([dash.readme.com](http://dash.readme.com)\) from 19:33 to 19:45 UTC. Users may have experienced errors or slow response times when accessing the Dash. Mitigations were applied by 19:45 UTC and the dashboard quickly resumed normal operation.

resolved

This incident has been resolved.

monitoring

We've discovered the culprit and blocked traffic to the offending IP. Our dashboard is now back up, and we are currently monitoring the situation.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Developer Dashboard data insertion issues"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Search results are slow to reindex"

Last update
resolved

Search indexing performance for enterprise customers using the staging feature have returned to normal.

monitoring

Search results for some enterprise production projects are slow to reindex. We are monitoring the situation as it's continuing to normalize.

Report: "Widespread outage"

Last update
postmortem

## **What Happened** ReadMe experienced a significant outage on Tuesday, March 26 beginning at 16:06 UTC \(9:06am Pacific\). This outage affected all ReadMe services including our management dashboard and the developer hubs we host for our customers. We recovered the majority of our service by 16:42 UTC \(9:42am Pacific\) including most access to the Dash and the Hubs. The rest of the service fully recovered at 17:34 UTC \(10:34am Pacific\). Although the outage began with one of ReadMe’s service providers, we take full responsibility and we’re truly sorry for the inconvenience to our customers. We’re working through ways to prevent the same issue from happening again and to reduce the impact from similar events in the future. ## **Root Cause** ReadMe uses a number of third party service providers to host our Internet-facing services including our customer-facing dashboard \([dash.readme.com](https://dash.readme.com/)\) and developer documentation hubs. One of our primary service providers is Render, a web application hosting platform. This outage began when Render experienced a broad range of outages. We’re still learning more about what happened and we will update this document when those details are available. We have redundant systems running at Render and can handle a partial Render service outage. Further, it’s usually very fast to replace cloud services on Render in a partial outage. But our infrastructure is not resilient to a full outage of the entire Render service, which is what happened on the 26th. _**Update \(April 1, 2024\):**_ Render has confirmed that the issue began with an unintended restart of all customer and system workloads on their platform, which was caused by a faulty code change. Render has provided a [Root Cause Analysis](https://render.com/blog/root-cause-analysis-extended-service-disruption-3-26-24) for their underlying incident. Although the incident was triggered by our service provider, we’re ultimately responsible for our own uptime and we are working on remediations to reduce the scope and severity of this class of incidents. ## **Resolution** We host many services on Render including our Node.js web application and our Redis data stores. Redis is an in-memory data store that we use for caches and queues. We don’t use Redis for long-term \(persistent\) data storage, but many other companies do. Because of the unique challenges of restoring persistent data stores, Render’s managed Redis services took significantly longer to recover. We implemented two temporary workarounds to restore ReadMe service: we removed Redis from the critical path in areas of our service where this was possible, and we launched temporary replacement Redis services until our managed Redis instances were recovered. After the managed Redis service was available and stable, we resumed normal operations on our managed Redis instances. ## **Timeline** * 2024-03-26 16:06 UTC: All traffic to ReadMe’s web services begins to fail with HTTP 503 server errors. The ReadMe team begins mobilizing at 16:08 and automated alerts fire at 16:10. * 2024-03-26 16:12 UTC: Render confirms that they are experiencing a major outage. We begin troubleshooting and looking for paths forward. The [ReadMe Status](https://www.readmestatus.com/) site is updated at 16:13. * 2024-03-26 16:35 UTC: Although Render reports that many services have already recovered, ReadMe’s applications are still unavailable. We consult with our service provider and determine that Redis caches and queues will take longer to recover. We immediately began efforts to workaround the Redis services that had not yet recovered. * 2024-03-26 16:42 UTC: We deploy a change to remove Redis from the critical path of many application flows. This restores most ReadMe functionality; from this point forward 88% of requests to the Dash and the Hubs are successful. Some functionality that requires Redis is unavailable, like page view tracking and page quality voting. Further, our Developer Dashboard and its API are still offline. We continue attempting to restore remaining service by deploying alternate Redis servers outside the managed Redis infrastructure. * 2024-03-26 17:34 UTC: With the temporary Redis servers in place, all remaining issues with our application are resolved, including the Developer Dashboard and its API. Error rates and response times immediately return to nominal levels. We note the full recovery on our status site at 17:53. ## **Path Forward** ReadMe is committed to maintaining a high level of service availability; we sincerely apologize for letting our customers down. We will be holding an internal retrospective later this week to learn from this incident and improve our response to future incidents. This incident identified a number of tightly-coupled services in our infrastructure — failures in some internal services caused unforeseen problems in other related services. Among other improvements, we’ll look into ways to decouple those services. This incident alone isn’t enough to reevaluate our relationship with Render, but we continually monitor our partners’ performance relative to our service targets. If we are unable to meet our service targets with our current providers, we will engage additional providers for redundancy, or look for replacements depending on the situation. Finally, our close relationship with Render allowed us to get accurate technical details during the incident. This information allowed us to move quickly and take corrective action. ## **Final Note** ReadMe takes our mission to make the API developer experience magical very seriously. We deeply regret this service outage and are using it as an opportunity to strengthen our processes, provide transparency, and improve our level of service going forward. We apologize for this disruption and thank you for being a valued customer.

resolved

This incident has been resolved.

monitoring

All metrics related products have returned to operational. We have also fixed the underlying issue blocking new edits from being saved. We will continue to monitor to ensure recovery is maintained.

identified

While the administrative dashboard is reachable, users are presently unable to save edits to existing documents. We are working to recover this and the metrics-related products.

identified

Documentation hubs have recovered and we will monitor recovery. The Developer Metrics API, dashboard metrics, and "Recent Requests" in documentation API reference pages are still unavailable. We will continue to update as we have more information

identified

We are beginning to see recovery on administrative dashboards, while our documentation hubs continue to experience very high latency. We are working with our partners to remedy this situation. We will provide an update as soon as we have more information.

investigating

We are aware of a widespread issue that has impacted all of ReadMe's products. We are working with our partners to identify and resolve this issue.

Report: "Hub navigation links direct to 404 pages"

Last update
resolved

We had a deploy earlier this morning that broke the navigation bar links resulting in 404s when going to 'Guides' and 'API Reference' sections. The deploy has been rolled back and things are now functioning as normal.

Report: "Some enterprise documentation hubs are displaying errors"

Last update
resolved

We have resolved the issue with errors displaying on some documentation hubs.

investigating

We are currently investigating alerts that some enteprise documentation hubs are displaying errors within the documentation.

Report: "Delay in API Metrics Processing"

Last update
resolved

This incident has been resolved.

monitoring

The delay in API metrics processing has been resolved. We will continue to monitor and investigate for any longer-term effects.

investigating

We are experiencing a delay in processing API Metrics. Metrics displayed in the "My Developers" tab of the administrative dashboard will be delayed.

Report: ""Too Many Requests" error message on ReadMe Hubs"

Last update
resolved

Healthchecks starting firing alerting us to a DDOS across Hubs and the display of a "Too Many Requests" error message. We took measures to block malicious traffic and sites appear to be functioning as expected now.

Report: "Owlbot AI Offline"

Last update
resolved

This incident has been resolved.

identified

Services that support Owlbot AI are currently unavailable.

Report: "Currently investigating downtime on our hub sites"

Last update
resolved

Response times have returned to normal and all health checks are returning as expected. We are considering this incident resolved but will continue to monitor.

monitoring

We have taken actions to block malicious traffic and sites are starting to respond again. We will continue to monitor.

investigating

We are currently investigating downtime and failed health checks for all documentation hub sites.

Report: "We are currently experiencing an unusually high server load on our platform"

Last update
resolved

This incident has been resolved.

monitoring

Performance has stabilized across ReadMe, but we are still monitoring

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating this issue.

Report: "Investigating outage across ReadMe Hubs"

Last update
resolved

Response times have remained stable since so we're considering this issue closed. Please reach out to support@readme.io if you have any more problems!

monitoring

We've taken measures to return the service to normal functionality. We will continue to monitor for the next 30 mins.

investigating

We are currently investigating an issue that is causing either a slowdown or a full outage on ReadMe hub sites. We will update this as we know more.

Report: "ReadMe Hard Down"

Last update
resolved

This incident has been resolved.

monitoring

Access to ReadMe's administrative dashboard and documentation hubs has returned to normal levels. We will continue to monitor the situation for the immediate future.

investigating

We are currently investigating this issue.

Report: "500 errors and slow response times"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue. Error rates and latencies have returned to normal but we continue to monitor the situation.

investigating

We are currently investigating intermittent degradation to our ReadMe hubs service.

Report: "500s on reference docs"

Last update
resolved

This incident has been resolved.

monitoring

We've taken steps to stabilize our application and are monitoring for further degradation.

monitoring

We're aware of reports of intermittent 500s on some reference documentation.

Report: "readme.io subdomains not redirecting properly"

Last update
resolved

This incident has been resolved.

investigating

Subdomains on the readme.io domain are currently not redirecting to the ReadMe service. All endpoints on the readme.com domain are unaffacted. We are currently investigating and will post an update as soon as we have more information.

Report: "ReadMe documentation hubs and dashboards experiencing failures and large delays in load time"

Last update
resolved

This incident has been resolved.

monitoring

The ReadMe dash and documentation hubs are currently recovering. We will continue to monitor the situation.

investigating

We are currently investigating this issue and will report when we have more information.

Report: "Investigating Downtime"

Last update
resolved

This incident has been resolved.

monitoring

We have implemented a fix and our currently monitoring the situation carefully. We will provide an update when we have more information.

investigating

We are continuing to investigate this issue.

investigating

We're currently investigating downtime across our dashboard and hubs.

Report: "ReadMe API Metrics product experiencing major outage"

Last update
resolved

The backlog of API logs has been cleared and this incident has been resolved.

monitoring

We have identified the cause for the API Metrics outage previously reported. We have issued a fix and the API Metrics product is available once again. We are currently processing queued logs and will be monitoring the situation. An update will be provided when the queues have finished processing.

investigating

The ReadMe API Metrics product is currently experiencing a major outage. Requests to send API logs to ReadMe are currently failing or have very high latency, and the viewing of API metrics in the hub and administrative dashboard is currently disabled. We are actively investigating this issue and will provide an update when we know more information.

Report: "Degraded Performance for Developer Dashboard"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating this issue.

Report: "Codeblocks not rendering correctly on hubs"

Last update
resolved

We deployed a change which made it so various codeblocks were rendering as raw Markdown. We quickly reverted this change and things should be back to normal.

Report: "Issue with static assets not loading on initial page load (Google Chrome only)"

Last update
resolved

This incident has been resolved.

monitoring

We've identified a mitigation and operation has returned to normal. We will continue monitoring to ensure proper remediation.

investigating

We are continuing to investigate this issue with our CDN provider. We are simultaneously working on a potential alternative to hopefully mitigate this issue in the meantime. We will post more information as it comes available.

investigating

When loading a Hub site for the first time in Google Chrome, the console is showing a lot of HTTP/2 errors which are resulting in an unstyled and unresponsive page. We're actively investigating this issue and suspect it may be something to do with our upstream CDN provider. We will update as we know more.

Report: "ReadMe API Metrics experiencing increased latencies and API failures"

Last update
resolved

This incident has been resolved.

monitoring

We have identified the root cause of the issue and have issued a fix. Impacts are returning to normal levels; we will continue to monitor to confirm the impact has subsided.

investigating

We are continuing to investigate this issue and will provide updates as they become available.

investigating

We are currently experiencing a degradation in ReadMe's API metrics product, resulting in an increase in 500-level errors when submitting API logs through ReadMe's metrics API. Additionally, metrics dashboards are intermittently experiencing a delay in loading for some customers. We are actively investigating and will post an update when available.

Report: "Regressions executing some custom JavaScript"

Last update
resolved

This incident has been resolved.

identified

We have identified an issue that went out today at 11:35am PT that regressed the ability for custom JavaScript to perform DOM queries.

Report: "Search re-indexes are not completing successfully for non-page types"

Last update
resolved

Our search re-index queues are processing jobs once again. Please e-mail support if you're still seeing any issues!

monitoring

A fix has been implemented and we are now testing this on specific Enterprise projects with staging

identified

We have identified a solution to reindex non-page search results.

investigating

We are continuing to investigate this issue.

investigating

We are having issues re-indexing non-page search results, such as custom pages and discussion posts.

Report: "Search re-indexing is currently down for certain projects"

Last update
resolved

Projects are currently able to re-index their search results successfully.

monitoring

Our upstream provider has fixed their issue and our search queues are starting to re-index jobs again. We're monitoring this issue and will confirm shortly once jobs are successful.

identified

We are still seeing issues re-indexing projects with staging enabled due to our upstream provider. We will provide updates once we have more available.

investigating

We are continuing to investigate this issue.

investigating

We are currently having issues with our search re-indexing on specific projects. Our team is investigating the issue and we will provide an update when available.

Report: "Multiple issues impacting customer hubs"

Last update
resolved

The issue preventing emails to be sent and causing staged changes to be visible immediately in production has been resolved. Emails attempted to be sent during the impacted time period have been discarded. Changes made to projects with staging enabled are now correctly not visible on production until they have been promoted.

investigating

In addition to staged changes being immediately published to prod, all emails sent from ReadMe's systems were failing to be sent. This impacted password reset emails, password-less logins, and other emails sent to end users.

investigating

We are continuing to investigate this issue.

investigating

Changes to enterprise projects that have staging enabled are currently being promoted to production immediately, skipping the normal staging step. This impacts any enterprise-plan customer with staging enabled. Our team is currently investigating and will provide updates as they become available.

Report: "ReadMe Micro Unavailable"

Last update
resolved

This incident has been resolved.

identified

The ReadMe Micro service is currently unavailable for all ReadMe Micro subscribers. Any visits to ReadMe Micro documentation sites will fail and display an Internal Server Error. There is an issue with the upstream provider that hosts ReadMe Micro; we are working to have this resolved as soon as possible. Other ReadMe services are not affected — ReadMe documentation hubs and the ReadMe admin dashboard are both fully functional.

Report: "Degraded performance for ReadMe Hubs"

Last update
resolved

The previous issue with intermittent failures on ReadMe Hubs has been resolved.

investigating

ReadMe Hubs for all public projects have been experiencing small intermittent failures for guides, API references, recipes, and discussions. End users are either being presented with a 500 error or a blank white screen for about 5% of requests. Our engineering team is working to identify a fix.

Report: "Owlbot AI Unavailable"

Last update
resolved

This incident has been resolved.

identified

We have resolved the issue with the Owlbot AI chatbot and all functionality has been restored.

identified

We are continuing to work on a fix for this issue.

identified

The Owlbot AI chatbot component is currently unavailable for all Owlbot AI subscribers. End users that attempt to issue a query receive no response or visual aid to identify that an error has occurred. We are aware of the cause and are working to have this resolved as soon as possible.