PostHog

Is PostHog Down Right Now? Check if there is a current outage ongoing.

PostHog is currently Operational

Last checked from PostHog's official status page

Historical record of incidents for PostHog

Report: "Errors from an upstream provider outage"

Last update
investigating

We're experiencing a elevated errors from an upstream provider, we're monitoring the issues and will post an update soon.

Report: "Cohort recalculations taking longer than expected"

Last update
monitoring

We've spotted a small number of cohorts are stuck in a recalculating state, and a larger number are taking longer than 24 hours to automatically recalculate as they should. We've identified the issue and have deployed a fix.

Report: "Queries are slow to run"

Last update
investigating

We've been alerted to an increase in query times. We're currently investigating the issue, and will provide an update once we identify the root cause.

Report: "Elevated errors on us.posthog.com"

Last update
investigating

We're seeing elevated errors loading the posthog interface. We're investigating and we'll update you as we know more.

Report: "Elevated errors on us.posthog.com"

Last update
Investigating

We're seeing elevated errors loading the posthog interface. We're investigating and we'll update you as we know more.

Report: "Data Processing Delays - Reporting Tools Affected"

Last update
resolved

The ingestion delay incident has been resolved

identified

Due to delays in a maintenance process, our data processing infrastructure is running behind which is causing inaccuracies in the reporting tools. No data has been lost and the system should be caught up shortly.

Report: "EU: elevated errors on web UI"

Last update
resolved

This incident has been resolved.

investigating

Situation is back to normal. We found the root cause being in our networking stack. We're preparing a long term fix for it. Thanks for your patience!

investigating

The situation seemed to have calmed down, we're investigating the root cause.

investigating

We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

Report: "Data Processing Delays - Reporting Tools Affected"

Last update
Resolved

The ingestion delay incident has been resolved

Identified

Due to delays in a maintenance process, our data processing infrastructure is running behind which is causing inaccuracies in the reporting tools. No data has been lost and the system should be caught up shortly.

Report: "EU: elevated errors on web UI"

Last update
Resolved

This incident has been resolved.

Update

Situation is back to normal. We found the root cause being in our networking stack. We're preparing a long term fix for it. Thanks for your patience!

Update

The situation seemed to have calmed down, we're investigating the root cause.

Investigating

We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

Report: "US: Delayed event ingestion"

Last update
resolved

The backlog has been fully processed and event ingestion is back to normal. Thank you for bearing with us and apologies for the disruption.

monitoring

We are consuming the lagged backlog and still monitoring the progress.

monitoring

We have increased the consumer resources to speed up the resolution and keep monitoring the rate.

monitoring

We identified another related issue and rolled the appropriate fix. The lag should be down and we keep monitoring it.

monitoring

We identified the issue and rolled out a fix. The event lag is dropping, and we keep monitoring it.

investigating

We're currently falling behind on event ingestion. No data loss has occurred, and we're actively investigating the issue.

Report: "US: Delayed event ingestion"

Last update
Resolved

The backlog has been fully processed and event ingestion is back to normal. Thank you for bearing with us and apologies for the disruption.

Update

We are consuming the lagged backlog and still monitoring the progress.

Update

We have increased the consumer resources to speed up the resolution and keep monitoring the rate.

Update

We identified another related issue and rolled the appropriate fix. The lag should be down and we keep monitoring it.

Monitoring

We identified the issue and rolled out a fix. The event lag is dropping, and we keep monitoring it.

Investigating

We're currently falling behind on event ingestion. No data loss has occurred, and we're actively investigating the issue.

Report: "Data Processing Delays - Reporting Tools Affected"

Last update
resolved

This incident has been resolved.

monitoring

We identified the issue and the ingestion pipeline is catching up.

investigating

Our data processing infrastructure is running behind which is causing inaccuracies in the reporting tools. No data has been lost.

Report: "Posthog Cloud EU Database Maintenance"

Last update
Completed

The scheduled maintenance has been completed.

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Scheduled

We are performing scheduled maintenance on our EU Cloud Clickhouse database. We don't expect significant disruption but there may be some slow queries or ingestion delays.

Report: "Increased parts count impacting performance"

Last update
resolved

Parts are back to normal and the cluster is responding normally again. We'll keep monitoring it and let you know if we find misbehavior again.

investigating

We are currently investigating increased parts counts on our datastore and we are investigating why these parts are not being merged as they should. This will cause increased query times.

Report: "US: Delayed event ingestion"

Last update
resolved

We've caught up on our backlog of messages. Ingestion rates look optimal. Parts are being merged as they should. New nodes are fully online. Query latencies are looking great at 100ms avg. Should be smooth sailing from here on out. Enjoy your Friday!

monitoring

There were some recurring errors in our infrastructure that led us to restart clickhouse nodes. We are falling behind on events ingestion, as we are replacing some nodes in our ClickHouse cluster. This will increase lag in our ingestion pipeline. Performance may be impacted during this time too. We are still working on this and monitoring it.

Report: "Elevated API Errors"

Last update
resolved

We've resolved the incident. This just affected querying and no data was lost We're still working on finding the root cause for this issue (our clickhouse nodes were segfaulting without warning) and will continue to monitor.

investigating

We're experiencing failures to load data across the entire app at the moment. We've identified the root cause and are working to resolve this asap.

investigating

We're experiencing failures to load data across the entire app at the moment. We've identified the root cause and are working to resolve this asap.

Report: "US degraded performance"

Last update
resolved

Lag has recovered and the system is completely functional again. Sorry for any inconvenience caused by this incident.

monitoring

The cluster is now responsive and the data ingestion has been resumed. The app is responding better now. We are still monitoring a couple of fixes we have pushed. We identified a query that was flooding the cluster and which may have been the root cause of this.

investigating

We have recovered a good part of the cluster, but we are still working to bring it back completely. The performance may be still degraded. We think some problematic queries may have been the root cause, we are still investigating it.

investigating

We are trying to bring back the cluster. The app may be completely unresponsive, and lag is expected during this time, we'll try to provide an update as soon as possible.

investigating

We have detected a partial outage in our ClickHouse cluster and it's impacting the application response and performance getting insights. We are investigating the root cause.

Report: "Elevated API Errors"

Last update
resolved

We've not seen a reoccurence of this issue so closing this incident now.

monitoring

We're still investigating high load for some offline functionality (ie exports) but the vast majority of the app should work fine now

investigating

Our US app instance is down and pods are unhealthy. We're figuring out why and are working on resolving. Data ingestion and feature flags are not affected.

Report: "US degraded performance"

Last update
resolved

We rolled out a change that increased load causing queries to be slower. We rolled back that change so performance should be back up.

investigating

We have spotted that our data infrastructure is under heavy load and it's impacting the time the app takes to load insights or leading to errors when loading them. We are investigating what could be the root cause.

Report: "Elevated API Errors"

Last update
resolved

The problem has been fixed.

investigating

We're experiencing an elevated level of API errors and are currently looking into the issue.

Report: "Intermitting API erorrs - API endpoints & feature flags"

Last update
resolved

Looking good, resolved!

monitoring

We scaled the infrastructure components and are monitoring this. First signs indicate recovery. We will come back with an update once this is verified.

identified

We identified intermitting 500s on several API endpoints, incl. feature flags. Reason seems to be an underprovisioned infrastructure compoenent. We're working on a fix. Apologies for any inconvenience

Report: "Elevated API Errors"

Last update
resolved

We identified undetected underprovisioning in one of our network components. We scaled this up now and working on a fix to mitigate this long-term. Thank you for your patience.

investigating

Performance and error rate are back to normal levels. we're still investigating the root cause for this issue.

investigating

We are continuing to investigate this issue. Notice about US: this incident never affected the US environment. The "partial outage" status was wrong for that. We will correct this later. apologies for the inconvencience

investigating

The error rate has gone down, we're still looking for the root cause.

investigating

Elevated error rates are coming up again, we're investigating

monitoring

We identified a surge in memory usage and workload eviction events. We scaled up feature flags and web app to mitigate. We're monitoring this.

investigating

Situation has calmed down after scaling up resources. We're still investigating the root cause. Notice: in an earlier message, it was reported that this was about the US region. This was wrong, this is only about the EU region. Apologies for the initial wrong reporting

investigating

We are continuing to investigate this issue.

investigating

We're experiencing an elevated level of API errors incl feature flags and are currently looking into the issue.

Report: "Data Processing Delays - Reporting Tools Affected"

Last update
resolved

This incident has been resolved.

monitoring

We're monitoring the ingestion pipeline, as it processes the delayed messages. We're estimating that the system will fully recover within an hour.

investigating

We are still investigating intermittent latency spikes in the event ingestion pipeline. Events are still being processed with a delay, which should decrease over time.

investigating

We are still investigating the root cause of the issue. Events are still delayed but the delay is no longer increasing. We hope to have a resolution shortly

investigating

Our data processing infrastructure is running behind which is causing inaccuracies in the reporting tools. No data has been lost and the system should be caught up shortly.

Report: "API Query endpoint intermittently 500'ing"

Last update
resolved

This incident was resolved over the weekend.

monitoring

We've shed load and haven't seen errors re-occur yet. We'll continue monitoring this over the weekend.

investigating

The API query endpoint is throwing intermittent 500 errors due to capacity limits on our end. We are working to fix this on our end and make the errors more clear. If known valid queries are failing with 500s, we recommend retrying queries with exponential backoff.

Report: "US Error Tracking Processing Delays"

Last update
resolved

Bug fixed, ingestion workers scaled back up and lag recovering rapidly. No data loss should be observable.

monitoring

We've identified the root cause of the issue. We are reprocessing exception events and continuing to monitor to make sure the pipeline fully recovers.

identified

We are currently experiencing downtime in our error tracking data pipeline, while a bug is resolved. No data loss has occurred.

Report: "Elevated API Errors Evaluating Feature Flags"

Last update
resolved

After adding more database capacities feature flag evaluation has recovered to normal values. We close this incident now, but keep monitoring. We're working on a long term fix. Apologies for the inconvenience.

monitoring

We saw a surge in feature flag evaluations and increased backend and database capacity. Seeing first signs of recovery.

investigating

US: We're experiencing an elevated level of feature flags API errors and are currently investigating.

Report: "Processing Delays"

Last update
resolved

We've resolved the issue and ingestion has caught up to real time

investigating

We're keeping a close eye on our ingestion delay. Events might take up to 35 minutes to show up inside PostHog in our EU Cloud. No data has been lost.

investigating

Our EU data processing infrastructure is running behind, which is causing inaccuracies in the reporting tools. No data has been lost, and the system should catch up shortly. We're monitoring it closely.

Report: "Data pipeline delivery delays in US Cloud"

Last update
resolved

We've identified the bottleneck and fixed it with improved alerting to avoid the issue in the future.

investigating

Pipeline destinations are currently experiencing delays in US Cloud - this means deliveries may be sent significantly later than the event that triggers it. No data has been lost and the deliveries will happen as we catch up on processing

Report: "Elevated API Errors for Feature Flag evaluation"

Last update
resolved

The load issue has been resolved.

investigating

We're experiencing an elevated level of API errors when evaluating feature flags, due to unexpected load. We're currently investigating.

Report: "Elevated feature flag and local evaluation API Errors"

Last update
resolved

Load spike identified and resolved, error rate and api latency returned to normal

investigating

We're seeing unexpected database load causing query timeouts and elevated latency on these endpoints.

Report: "Elevated API Errors - Feature Flags and Local Evaluation"

Last update
resolved

Load has dropped and our error rate has returned to normal levels

investigating

We're experiencing an elevated level of API errors when evaluating feature flags, due to unexpected load. We're currently investigating.

Report: "Elevated capture errors in the US region"

Last update
resolved

A patch in was applied and we do not have errors anymore

identified

We're experiencing elevated capture endpoint error rates, due to unanticipated kafka cluster patching. The vast, vast majority of requests are being retried successfully by our network edge routers, but some very large volume customers may see a very small number of terminally failed requests.

Report: "Batch exports not making progress in US Cloud"

Last update
resolved

This incident has been resolved.

monitoring

We have narrowed down the problem to a very small set of Snowflake batch exports that we have manually cancelled. If you were affected we will be reaching out. All other batch exports are fully recovered or on the path to recovery. Performance of ongoing batch exports will soon be on pace with real time once again.

investigating

We were unable to make a full recovery, and the issue seems to persist. We are investigating new potential fixes. In the meantime, batch exports will be delayed.

monitoring

We are monitoring the backfill process for Snowflake batch exports and any pending large batch exports for other destinations. All backfills are progressing normally. On-going batch exports are operating normally, but users with pending backfills may see us lag behind real time until all the backfilling is done.

identified

We have deployed our fixes and have managed to resolve the concurrency issues with Snowflake batch exports. Most batch exports besides Snowflake should be fully recovered, with the exception of larger batch exports that will still need some time to work through the backlog. We will shortly begin backfilling any Snowflake batch exports that were cancelled due to this incident.

identified

We are continuing to work on a fix for this issue.

identified

We have reasons to believe the cause of the problem is a deadlock happening while connecting to Snowflake. We are attempting to deploy a patch that would deal with the deadlock when it happens, leaving the investigation of what is causing the deadlock for later. Assuming the patch is successful in addressing the problem we will begin back filling any Snowflake batch exports that were cancelled.

investigating

We have reason to believe that the problem is related to Snowflake batch exports. In consequence, most Snowflake batch exports are being cancelled to be retried at a later date. We are investigating how to remediate the problem with Snowflake. Users of other destinations should see batch exports recovering over time. Depending on the size of the data exported, this recovery could take less or more time.

investigating

We have been making slow progress on batch exports and a backlog has built up, particularly on larger batch exports. It is taking us some time to work through the backlog, so users may see batch exports be delayed in delivering data. No data loss has happened nor is it expected.

Report: "US ingestion lag"

Last update
resolved

After monitoring we have seen that all systems are working as normally.

monitoring

We have been able to identify the root cause and pushed a fix to get event ingestion back to normal and latency is now back to normal. We'll keep monitoring the infrastructure.

investigating

Our ingestion infrastructure is processing slowly causing delays for event ingestion. We are investigating what could be the root cause.

Report: "EU: elevated feature flag evaluation errors"

Last update
resolved

EU: We observed elevated error rates in feature flag evaluation that may have led to some requests timing out between 15:00 UTC and 17:00 UTC. We're apologizing for this inconvencience and started improving our alerting to catch this earlier.

Report: "EU: feature flags and surveys with elevated error rates"

Last update
resolved

EU: We were observing increased error rates for feature flags and surveys. While mitigating first issues with feature flags, we were restarting some internal components, which caused other issues. Surveys showed elevated error rates between ~15:22 UTC and 15:36 UTC. Feature flags showed elevated error rates between ~15:00 UTC and 15:36 UTC. There was a large number of timeouts in our database in the EU region causing high feature flags error rate and service disruptions. Apologies for this disruption.

Report: "US: Increased person processing load is causing locks on the replica DB"

Last update
resolved

This incident has been resolved.

monitoring

We scaled up the processing to ease out the spike. We are monitoring the situation.

identified

Performance on the posthog read replica database is a bit degraded due to a high load of person ingestion processing. This is occasionally affecting flag evaluation since the feature flag service depends on the read replica database

Report: "EU: Data Processing Delays - Reporting Tools Affected"

Last update
resolved

Issue resolved

identified

We have identified an issue with our service that builds the list of event and properties to search for when querying data. We are deploying a fix now and hope to see recovery in the coming hours. Until then UI tools for querying your data may be missing information you would expect. No data loss has occurred and event ingestion itself is unaffected.

Report: "Taxonomy updates delayed in EU"

Last update
resolved

Issue resolved

monitoring

The issue is resolved, and we are catching up on newly seen events and properties.

identified

Our taxonomy generation system (for event and property definitions you use in filters and elsewhere) is currently delayed as we fix a minor schema bug. This means new event names or properties you just sent to posthog won't be available for use in places like filters or insights. We have identified the bug, and expect to resolve it shortly.

Report: "PostHog Cloud UI in EU is down"

Last update
resolved

Systems were stable over the last few hours, metrics are showing normal behaviour over longer time for both flag evalulation and the web UI. Thank you for you patience!

monitoring

Flag evaluation is back to normal, monitoring the overall state

investigating

Cloud UI is back up, investigating impact on flag evaluation

investigating

We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

Report: "PostHog app in EU is not loading"

Last update
resolved

We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

identified

PostHog cloud is unavailable, the surveys API and local_evaluation APIs are also affected. Fix is being worked on at the moment.

investigating

We're experience issues loading the PostHog app in EU. Data ingress does not appear to be affected.

Report: "US app has intermittent errors"

Last update
resolved

We'd identified a migration that had unintended impact to our database. We've cleared the lock and watching db health stabilize. All is looking back to normal at this time.

investigating

We're seeing intermittent errors with loading us.posthog.com, and we're investigating why. This isn't impacting the ingestion of data.

Report: "Elevated API Errors"

Last update
resolved

We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

monitoring

We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

investigating

We're experiencing an elevated level of API errors and are currently looking into the issue.

Report: "Batch exports delayed in EU"

Last update
resolved

All pending batch export runs have completed and new batch export runs are progressing normally. The incident is resolved.

identified

We have identified the root cause of the problem and are in the process of deploying a fix.

identified

We have noticed batch exports experiencing a delay of several hours in PostHog Cloud EU. We are investigating the problem. Batch exports in PostHog Cloud US are not affected and operating normally.

Report: "[US] increased errors on feature flags, ingestion and app"

Last update
resolved

We briefly had a spike in errors on our US instance for various endpoints due to a rollout. We rolled back and errors rates dropped.

Report: "Data Processing Delays - Reporting Tools Affected"

Last update
resolved

All processing is back to normal

monitoring

We've identified some bottlenecks slowing down processing. We should be back to real time shortly

investigating

Our data processing infrastructure is running behind which is causing inaccuracies in the reporting tools. No data has been lost and the system should be caught up shortly.

Report: "Event taxonomy processing delays"

Last update
resolved

We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.

monitoring

We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon. System recovered. We are continuing to monitor.

identified

We've spotted a problem in processing event taxonomy updates - property and event definitions. We're working to fix the problem, and while working on that fix, some updates to event taxonomy will be delayed. This impacts e.g. whether new events or properties are available for filtering.

Report: "EU: Elevated error rate on data capture"

Last update
resolved

We resolved the issue and everything is operational again. One of our reverse-proxy instances scaled in ungracefully which caused routing errors. After manually terminating it, the services recovered. We saw elevated errors from 16.01 UTC to 16.48 UTC. A good part of it was recovered by internal retries, but we can't be certain right now to not have lost some events. We will analyze and provide a long term fix so that this won't happen again. Our apologies for this as we were not able to capture all data during this time.

identified

We found something in networking and it seems to be recovering now. Monitoring the situation.

investigating

We've spotted that something has gone wrong. We're seeing elevated error rates on capture on the web app. We're currently investigating the issue, and will provide an update soon.

Report: "EU Maintenance - Data Processing Delays"

Last update
resolved

ingestion caught up. All is good as expected.

monitoring

The maintenance operations are done, we are monitoring and waiting on all ingestion and data processing delays to catch up. Again, no data has been lost during this standard procedure. Thank you for your patience!

monitoring

Due to a planned maintenance activity, we're expecting ingestion and data processing delays in EU. No data will be lost during this operation. Thank you for your patience!

Report: "EU ingestion lag"

Last update
resolved

We have fixed the underlying issue and ingestion latency is back to normal. All data is up to date now.

investigating

We have identified that there is lag in the events ingestion pipeline. We are investigating what could be the root cause. No data has been lost.

Report: "Web app down"

Last update
resolved

We rolled back and fixed the issue. After monitoring we can clear things up now.

monitoring

We've rolled back to a previous version and the web app is recovered. Monitoring until bug fix is merged and latest web app version deployed.

identified

The posthog web app is down in all regions, due to a bug in our HTML rendering. All data pipeline components are still fully functional, and no data will be lost.

Report: "Maintenance - Data Processing Delays"

Last update
resolved

This incident has been resolved.

monitoring

The main work of the maintenance operations are done. We're monitoring the ingestion and data processing to catch up. Thanks again for your patience!

monitoring

Due to a planned maintenance activity, we're expecting ingestion and data processing delays. No data will be lost during this operation. Thank you for your patience!

Report: "Event processing delays on EU Cloud"

Last update
resolved

This incident has been resolved.

investigating

We're investigating ingestion delays on EU cloud

Report: "Web app unavailable"

Last update
resolved

We improved our monitoring so we can catch similar issues before they affect production.

monitoring

The app is back now... we're investigating the root cause here

investigating

We've seen the web app is unavailable and we're investigating data ingestion is not affected

Report: "JS static assets not loading"

Last update
resolved

The incident was resolved an hour ago. We're blocking new deployments until we root-cause the issue

identified

The issue was triggered again, we're rolling back quickly this time.

Report: "JS static assets not loading"

Last update
resolved

This incident is resolved. You may need to hard-refresh (CMD + Shift + R) in order for the page to load. For some reason, our github workflows skipped the "upload static assets to s3" step but rolled out anyway. We're investigating why this happened.

investigating

We've again spotted you can't load the PostHog app at the moment We're investigating the cause Data ingestion is not affected

Report: "Issues loading the posthog site"

Last update
resolved

We've spotted and fixed the issue with our static asset pipeline and all environments are back online and available!

investigating

We've spotted you can't load the PostHog app at the moment We're investigating the cause Data ingestion is not affected

Report: "Live Stream service unavailable"

Last update
resolved

We've spotted and addressed the root cause and the service is back up and running. Sorry for the inconvenience and enjoy those fresh free range live events streaming to your browser!

investigating

Something has gone wrong with our livestream service which is responsible for reporting live events to the activity page. We are investigating now and will report back once we have found the root cause!