KnowledgeOwl

Is KnowledgeOwl Down Right Now? Check if there is a current outage ongoing.

KnowledgeOwl is currently Operational

Last checked from KnowledgeOwl's official status page

Historical record of incidents for KnowledgeOwl

Report: "Knowledge base outage"

Last update
resolved

This incident has been resolved. Thank you again for your patience!

monitoring

A fix has been implemented and knowledge bases are back up. We are monitoring.

investigating

We are continuing to investigate this issue.

investigating

We are currently experiencing an outage affecting knowledge bases. Our team is actively investigating the issue and working on a resolution. We will provide updates as soon as we have more information. Thank you for your patience!

Report: "Intermittent slowdowns and connection errors"

Last update
resolved

We are still investigating the underlying cause of the issue with Friday's release. However, no further issues have occurred since the rollback so we are marking this incident as resolved.

monitoring

We've narrowed down the issue to a release that went out on Friday, September 6th. The release was rolled back and we are monitoring to ensure the slowdowns and connection errors have been resolved. We will continue to investigate what went wrong with this release.

investigating

We are currently investigating the situation to find the cause of the intermittent issues we're seeing.

Report: "Intermittent Errors"

Last update
resolved

We have not seen any new errors in over 20 minutes and all customers who reported the issue have confirmed it appears to be resolved. Hoot!

monitoring

We have blocked traffic that appears to have been malicious and are monitoring to make sure things are okay now.

investigating

We are currently investigating errors accessing both the knowledge base and the web application.

Report: "Issue with images and files"

Last update
resolved

This incident appears to have been resolved. We've been monitoring for the past hour and are no longer seeing errors with uploading files and images. We've confirmed with multiple customers that the issues appear to be resolved on their end as well. We are still determining what caused the issue and we will send a postmortem with our findings and next steps. Thanks again to everyone who reported the issue and helped us get it resolved so quickly!

monitoring

We think we've fixed the issue. We are continuing to monitor and confirming the fix with impacted customers. For anyone that received an error while uploaded or updating a file or image, you will need to upload it again for it to work properly.

investigating

We are investigating reports of images not loading and errors with newly uploaded files. Many thanks to everyone who reported the issue and so sorry for the trouble. We're working on getting this fixed as soon as possible!

Report: "Errors and slow loading in kbs and KO"

Last update
resolved

Everything seems back to normal so we are resolving. Let us know if you are still having any issues and thank you for your patience!

investigating

We are investigating reports of slowness and errors when logging into the application and knowledge bases.

Report: "Outage detected"

Last update
postmortem

One of our servers went into a failing state and began issuing a vast amount of internal connection requests. Those requests overloaded our network. Rebooting the server solved the issue, and internal traffic patterns have returned to normal. ## Next steps We are reviewing our alarms on the affected server to see if we could have detected the failure sooner and prevented the downtime.

resolved

Our fix seems to have fully resolved the issue; we'll be issuing a postmortem shortly to explain root cause in more detail. Thank you all for your patience today!

monitoring

We seem to be back to normal operations, but we're continuing to monitor performance and finish our investigation of the initial root cause.

investigating

We are continuing to investigate this issue.

investigating

We just started seeing reports and warnings that KnowledgeOwl is down. Our team is focusing on getting it sorted and back online ASAP.

Report: "Intermittent Outages and Slowness"

Last update
postmortem

# Summary In the early morning of Friday, March 24th, our systems began to see large spikes of web traffic from various IP addresses. These traffic spikes were targeting our public website \(www.knowledgeowl.com\). This high traffic volume overwhelmed the reverse proxy system that we use to serve the public website, the KnowledgeOwl application, and customer knowledge bases. As an immediate mitigation step, we changed the configuration of our public website to no longer use our primary proxy. This seemed to stabilize the application and customer knowledge bases. The traffic spikes continued until late morning. Our investigations indicated that these connections were not initiated by customers. The associated web requests did not appear to be legitimate. While we cannot say with certainty that these connections were malicious, we are treating this incident as a distributed denial of service \(DDoS\) attack. # Next Steps The risk from high-volume traffic spikes like these is almost impossible to completely remove. However, we are reviewing our systems and processes to better handle these kinds of traffic spikes. We have already identified some concrete next steps to reduce the overall risk: ## Short-term In the short-term, we are taking three steps: 1. We are exploring ways to further separate our public website from the KnowledgeOwl application and customer knowledge bases. 1. Friday's incident provided us with data on how to better identify these types of events. We are building that knowledge into our processes moving forward. 1. We have already begun modifying our proxy and web application firewall configuration to make our traffic management infrastructure a bit more robust. We'll continue to monitor these changes and iterate on them as needed. ## Long-term This event has highlighted some potential architectural improvements to our infrastructure, mainly around our traffic-management systems. We'll review these changes' feasibility and effectiveness in the coming months. # Thank you Above all, we want to thank you for your patience during this incident. We know that we've had a higher number of downtime incidents in the last two months. We know how integral your knowledge base can be to daily operations. Our team is working hard to learn from this experience and to make KnowledgeOwl stronger in the future.

resolved

We are officially marking this incident as resolved. All systems have been operating normally since approximately 12 pm EDT, and we have not seen any more malicious looking traffic hitting our servers. We are continuing to investigate the source of the potential attack and we are planning to share a postmortem on Monday. We are sorry for the trouble this caused and we appreciate the grace everyone has shown as we dealt with this problem. Let us know if we can help with anything in the meantime, and we will be in touch early next week with our postmortem.

monitoring

Thank you everyone for your patience today. Based on our investigation, it appears that KnowledgeOwl was attacked by malicious traffic which caused the intermittent outages and slowness. We have made some changes to mitigate the impact of the spikes, but we are not quite ready to mark this as resolved as we want to ensure that there are no more spikes of illegitimate traffic. We will be providing more information and a postmortem once we feel the issue is fully resolved. Please let us know if you are still having issues.

monitoring

Quick update from our end. The app and knowledge bases have come back online and seem to be stable now. We have a lead on potential root cause but we are still working through it. We are so sorry for the disruptions to your day and we will be sharing more information as we work to fully resolve the problem.

monitoring

The system is back down. We are dealing with an abnormal amount of traffic and trying to resolve asap. We are so sorry for the trouble today.

monitoring

Our CTO confirmed that all systems are back to operational and we will continue to monitor. We have a possible lead as to the cause of today's issues and will post more information as it becomes available. Sorry again for the trouble and let us know if you are still having any issues!

investigating

We are continuing to investigate the intermittent outages and slowness. Since just before 7 am EDT, the system seems to have gone down and recovered on its own a few times. While it appears to be back up and working, we will leave this incident active as we continue to investigate and monitor the situation. We will continue to post updates here along with a postmortem once it is resolved. Thanks for your patience this morning and our apologies for the disruption to your day!

investigating

We are currently investigating and are working to get everything back up and running asap. We will be posting updates here!

Report: "Reports of slowness and intermittent 504 errors"

Last update
postmortem

We identified two processes that contributed to this week's slowness during our investigation. Both have been on our list to update, but we're prioritizing them as a result of this week's incident. We're already testing a fix for the first process. The second is a larger change that will take longer to resolve. We'll include these fixes in our normal release notes, identified as "performance improvements". Thank you all for your patience this--and every--week.

resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

This appears to be resolved. Thanks to everyone who reported the problem and sorry for the trouble today!

monitoring

The issues seem to have subsided around 11:45 ET. We are currently keeping an eye on it and will wait a bit before marking the issue as resolved.

investigating

We have received multiple reports of slowness and intermittent 504 errors. We are currently investigating and will report back asap. Sorry for the trouble and thanks to everyone for letting us know!

Report: "Site outage & reports of slowness"

Last update
postmortem

# Summary Last night we released a set of changes to the table of contents that had a bug in it. This bug caused display issues in tables of contents. First, we tried to hotfix the issue. The hotfix caused some processes to eat up more memory than normal. The memory shortage built until it caused a slowdown and an outage. At this point, we rolled the release back completely. # Next Steps We already have a new fix ready to work into the initial release. Thanks to today's issues, we've identified several opportunities for improvement: ## Short-term We're updating our testing and release processes for changes to the table of contents. We'll be using these revamped processes to test the fix and the full release before we take it live. ## Mid-term We're reviewing our enterprise- and business-level account SLAs. We'll be issuing credits to customers whose up-time SLAs weren't met this month. If you're a customer in one of these tiers, you can expect to hear from a member of our team to discuss this in more detail. ## Long-term We've also identified several possible improvements in our load-testing processes. We'll be making changes to those processes, too. # What you can do While the bug was live, it may have caused changes to your knowledge base's table of contents. Please review your table of contents. If you see duplicate articles or missing subcategories, please email [support@knowledgeowl.com](mailto:support@knowledgeowl.com) so we can get it fixed. # Thank you Thank you for your patience and grace with us through this month's issues. Outages are every software provider's worst nightmare and we are very thankful to have such amazing customers.

resolved

Our monitoring has looked good and we're seeing continued normal performance across the board, so we're marking this as Resolved. We have noticed some issues with knowledge base tables of contents either missing some content or having duplicate articles. If your knowledge base is showing either of these issues, please reach out to our support team and we can get you back to a normal table of contents state. Thank you all for your patience and being so gracious with our team today through this whole outage. We'll post a full postmortem after we've fully fleshed out the root cause and next steps.

monitoring

We've rolled out a new fix and are monitoring its performance. So far we've seen some small performance spikes but nothing that should prevent access. We'll continue to monitor to be sure things are resolved.

identified

The fix we implemented isn't performing as well as we'd hoped. We're taking sites down briefly to implement an additional fix.

monitoring

It looks like things have stabilized due to our fix, but we are continuing to monitor performance.

identified

We have identified the issue and are testing a fix.

investigating

We've confirmed a full outage of the app and knowledge bases and are actively working to get it resolved. Sorry for the disruption to your day and we hope to be back online quickly!

investigating

We've had several reports of the KnowledgeOwl app and knowledge bases being slow or inaccessible this morning. We're investigating the root cause and will provide updates as we have them.

Report: "Outage"

Last update
postmortem

## Incident postmortem A traffic distribution system failed and was not restarted by automation. We are auditing these systems to ensure this failure does not happen again. Due to gaps in our overnight emergency alert tools, it took us longer than normal to resolve the issue. We are going to review our processes to make sure that more staff receive alerts when there is an outage. We will also look into training more staff to be able to handle incidents like this.

resolved

All systems are operational and we will continue investigating the root cause. We are going to close this incident as it appears to be resolved. Please contact support@knowledgeowl.com if you need any help!

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating and are working to get everything back up and running as quickly as possible. We will be posting updates here!

Report: "Outage"

Last update
resolved

All systems are operational and we will continue investigating the root cause. We are going to close this incident as it appears to be resolved. Please contact support@knowledgeowl.com if you need any help!

monitoring

We are back up and stable right now. We are still monitoring and investigating the root cause of the issue.

investigating

We are currently investigating and are working to get everything back up and running as quickly as possible. We will be posting updates here!

Report: "Slow processing of background jobs"

Last update
resolved

This incident has been resolved.

monitoring

The background job processor is now fully caught up. We're continuing to monitor performance.

identified

Background jobs are still experiencing some delays while the background queue recovers.

identified

Our background job processor is currently backed up. This means delays for processes that hit that worker, including things like bulk edit jobs and PDF generation. You may see these types of tasks taking longer to complete than normal. We're working to get the worker back to full bandwidth and appreciate your patience in the meantime.

Report: "AWS outage causing slow performance"

Last update
resolved

Resolving as no further incidents have occurred since the last update from AWS.

monitoring

Background jobs and file library issues appear mostly resolved. AWS is reporting that their services have mostly recovered.

identified

Like much of the internet, KnowledgeOwl is being impacted by issues with Amazon Web Services (AWS) ec2-us-east-1. This seems to be slowing KnowledgeOwl performance in: 1. Our Server Host 2. Our File Storage So far, we believe this slower performance could be affecting you in two major areas: File Library (images and files): You may have issues with adding new files, updating files, and/or deleting files from within articles or within the library itself. Jobs we run in the background: These jobs are run from actions like generating/updating PDFs, reindexing content, reordering articles or subcategories within a category, and some CSV exports. We expect these jobs to be slow/backed up a little even once full AWS service is restored. We'll keep you posted as we hear more from AWS, and we especially appreciate your patience today!

Report: "Reports of 504 Gateway Time-out Errors and Elevated Response Times"

Last update
resolved

Performance has been stable for the last hour. Please let us know if you notice any further issues. Thanks for your patience and quick reporting today!

monitoring

We are getting reports that things appear to be back to normal. We will continue to monitor the situation and investigate what happened.

investigating

We are currently investigating reports of 504 gateway time-out errors and elevated response times.

Report: "Site outage"

Last update
postmortem

## Summary: We faced an unsuccessful Denial of Service \(DoS\) attempt that caused degraded performance for many customers. ## Next steps: As part of our infrastructure upgrades, we’re testing and reviewing anti-DoS measures. Once implemented, these measures will prevent future DoS attempts like this.

resolved

We've resolved the issue. Thanks for bearing with us and we are so sorry for the trouble today. We appreciate everyone who reported the problem and helped to confirm the fix. If you are still having issues, please email us at support@knowledgeowl.com.

monitoring

Performance seems to be back to normal for all customers who've reported and responded to our follow-ups. We are continuing to monitor the solution.

monitoring

We've identified the issue and released a fix, which already seems to be working for most customers. We're continuing to monitor results.

investigating

We've had reports of 500 application errors and/or generalized slowness in the KnowledgeOwl application, knowledge bases, and API. Right now we're treating this as a major outage and are investigating; we'll provide updates as soon as we have more information.

Report: "Outage"

Last update
postmortem

## Incident postmortem One of our internal server monitoring systems went down and failed to alert us to an issue this morning. Because we did not receive the alert, it took us longer than normal to diagnose and fix the problem. We've since found and fixed the issue to restore service. ## Next steps To prevent this in the future, we are looking to improve our monitoring and notifications.

resolved

Sorry for interruption today, and thank you to all the customers who reported issues and for your patience while we got things sorted. One of our internal server monitoring systems went down and failed to alert us to an issue this morning. Because we did not receive the alert, it took us longer than normal to diagnose and fix the problem. We've since found and fixed the issue to restore service. To prevent this in the future, we are looking to improve our monitoring and notifications.

monitoring

We've rolled out an additional set of fixes that seem to have resolved issues for customers who were reporting issues. We'll continue to monitor for further alerts or issues.

monitoring

We've implemented a fix. Most traffic to knowledge bases, the KnowledgeOwl web application, and API should be unaffected, but we are continuing to monitor and tweak things.

investigating

We are currently investigating an issue with KnowledgeOwl. This affects both the knowledge base software, and published knowledge bases.

Report: "System Outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating an outage affecting all customers.

Report: "Outage"

Last update
resolved

This incident has been resolved. Sorry for the trouble, all!

monitoring

A fix has been implemented and we are monitoring the situation.

investigating

We are currently investigating and will report back as soon as possible.

Report: "504 Gateway Time-out Errors"

Last update
resolved

Resolution was successful. No new incidents reported.

monitoring

The errors appear to have stopped and users should be able to save again. We will continue to monitor as we investigate the root cause.

investigating

We are receiving reports of some users being unable to save articles. We are investigating and will report back asap.

Report: "Slowness in the app and kbs"

Last update
resolved

We have been monitoring the system and the issue appears to be resolved.

monitoring

The system appears to be recovering and speeds are coming back to normal. We are continuing to monitor the situation.

investigating

We are investigating reports of extreme slowness loading pages in the application and knowledge bases.

Report: "Slowness and intermittent connectivity issues"

Last update
resolved

This incident has been resolved.

monitoring

Performance has recovered and has remained stable for the last 20 minutes. We are continuing to monitor for any new anomalies.

investigating

Currently investigating reports of slowness and intermittent connectivity issues.

Report: "System outage"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating this issue.

Report: "Intermittent 500 errors"

Last update
resolved

No incidents in the past 3 hours. Closing issue.

monitoring

The problem has been identified. A fix is in place and we are monitoring the results. Things should be back to normal so please let us know if you continue to experience any problems.

investigating

We are monitoring the issue and will continue to post updates here. You can subscribe for notifications and a post-mortem once we get it fully resolved.

investigating

We are investigating intermittent 500 errors in the application and knowledge bases.

Report: "Database connectivity issues"

Last update
resolved

Knowledge bases and the application were experiencing intermittent 500 errors between 6:49 am and 7:02 am MT. Our servers were able to recover on their own. We tracked the issue to database problem that has now been resolved. Sorry for the trouble!

Report: "System outage"

Last update
resolved

This incident has been resolved.

Report: "Issue saving articles"

Last update
resolved

This incident has been resolved.