Historical record of incidents for Hex
Report: "Degraded Kernel Acquisition"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue acquiring kernels for some customers.
Report: "Degraded Kernel Acquisition"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue acquiring kernels for some customers.
Report: "Degraded kernel acquisition"
Last updateAfter fully analyzing the timeline of events, **we’ve confirmed that the root cause of the incident was an issue in the AWS EBS storage system backing our primary database replica**. The issue began on May 12 11:00 PM PDT according to AWS, and the effect was gradual degradation of performance on the replica, eventually leading to a cascade where even our primary database got backpressured on critical write operations, which triggered the instability causing the incident. The AWS issue was resolved May 13 9:32 AM PDT, after we pushed a database configuration change that caused the EBS system software to reset. We then stabilized and cleaned up systems on our end before resolving the incident. While the root cause of this incident was on the AWS side, we are not satisfied that our detection and response were fast enough. We are investing in the following mechanisms to improve: * We have automated testing to monitor provisioning of kernels, but it is combined with a larger testing suite that makes the signal less immediately actionable. We will be extracting this out as its own monitor so we can more quickly identify and react. * Our database monitoring around latencies will be made more comprehensive, including the replica database, so we do not miss this early signal. * We are improving our incident runbook to cover some of these checks earlier in the process so we zero in on the root cause more quickly.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
Some users are still experiencing issues as we implement a fix.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Degraded Kernel Acquisition"
Last updateThis incident has been resolved.
The issue has been identified and a fix has been implemented.
We are currently investigating this issue.
Report: "Issues with building custom kernels"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We've identified an issue with custom kernels failing to build and are actively working on a fix.
Report: "Inputs not reacting on some published apps"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We've identified an issue with published apps using the pre-run cells in the background feature and are working on implementing a fix.
Report: "Copy to clipboard and download CSV issues"
Last updateThis incident has been resolved.
When users copy to clipboard and/or download CSV they may see HTML rather than their data. The issue has been identified and a fix is being implemented. In the meantime, a refresh of the page should resolve the issue.
Report: "Viewers experiencing issues viewing projects"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Report: "Slack outage impacting ability to send notifications from Hex"
Last updateThis incident has been resolved.
Slack is currently experiencing an outage which impacts the ability for Hex to send notifications to this channel. We are monitoring this outage at https://slack-status.com/ and encourage you to do the same if you rely on this functionality.
Report: "Instability in project loads"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Report: "Projects not loading for some users"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Projects not loading"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Web app outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Projects not loading"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently investigating this issue.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "HTML and styled dataframes not rendering"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Projects not loading"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Platform instability"
Last updateOn November 12, 2024 from 8:06 AM to 10:21 PM PT, Hex customers experienced platform instability, including issues with kernel execution and site access. This disruption stemmed from an unintended effect of a recent database migration, which ultimately caused a backlog and impacted overall platform performance. Our engineering team promptly identified the root cause and took corrective actions by resetting kernels to alleviate database congestion. Once the database stabilized, we rolled back the migration change, fully restoring service by 10:21 PM PT. To prevent similar issues in the future, we are enhancing our automated tests to better validate database migrations before deployment, ensuring they don’t inadvertently impact platform reliability. Thank you for your patience and understanding.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Kernel Unavailability"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Hex is experiencing degraded performance; some users are unable to acquire kernels and run projects. We are actively investigating the root cause and will update as soon as possible on a resolution.
We are currently investigating this issue.
Report: "Workspace tiers downgraded"
Last updateSome organizations have been erroneously downgraded to the community tier
Report: "SQL Query Instability"
Last updateNon-dataframe SQL queries were intermittently failing for all users.
Report: "Application Instability"
Last updateUsers may have seen general application-wide instability and failures caused by partial database unavailability due to likely malicious users performing large numbers of operations quickly.
Report: "Application Instability"
Last updateUsers may have seen general application-wide instability and failures caused by partial database unavailability due to likely malicious users performing large numbers of operations quickly.
Report: "Hex Toolkit Outage"
Last updateCalls to hextoolkit.get_data_connection() in any code cells were failing.
Report: "R Project Outage"
Last updateAll SQL queries in R projects were failing.
Report: "DNS Outage"
Last updateHex was completely inaccessible to many users.
Report: "Kernel Assignment Failure"
Last updateUsers experienced long delays waiting for kernels to start for doing analysis in Logic view and running apps.
Report: "Main site service interruption 11/7/22"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Some projects failing to acquire new kernels"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Login System Issue"
Last updateThis incident has been resolved.
The issue has been identified and we are implementing a fix.
We are currently investigating this issue.
Report: "Slow kernel start up times for some projects"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Projects not loading"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Intermittent platform access issues"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Performance degradation in projects"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Main site unavailable"
Last updateThis incident has been resolved.
We are currently investigating this issue.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Platform availability issue"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Platform availability issue"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Table display filter inconsistency"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Some projects failing to acquire new kernels"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
Report: "Platform availability issue"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Platform availability issue"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Platform degradation due to AWS outage"
Last updateThis incident has been resolved.
Issue has been identified to be due to an ongoing AWS outage.
Report: "Kernel instability 08/29/23"
Last updateThis incident has been resolved.
The issue has been identified and we are working with our hosting provider to resolve the issue.
We are continuing to investigate the issue and are working with upstream providers to identify and resolve the outage.
We are currently investigating this issue.
Report: "Kernel Instability"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Degraded kernel execution"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Unable to load Hex"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Platform Instability"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Platform instability"
Last updateThis incident has been resolved.
We have observed some platform instability due to a scheduled change to our system.
Report: "Kernels unavailable"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Platform instability"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Platform instability"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Intermittent 502s when accessing Hex"
Last updateThis incident has been resolved.
We are following a Cloudflare incident where intermittent connectivity issues are being reported: https://www.cloudflarestatus.com/