PubNub

Is PubNub Down Right Now? Check if there is a current outage ongoing.

PubNub is currently Operational

Last checked from PubNub's official status page

Historical record of incidents for PubNub

Report: "Elevated latency and errors for History service in US-East"

Last update
resolved

Beginning at 18:33 UTC we observed increased latency and errors for our History service in one of our North America regions. Users in the affected region may have experienced failures with error or latency when making calls to the History endpoint. The issue has been resolved as of 19:07 UTC. We will publish a root cause analysis within the coming days. We sincerely apologize for any impact caused to your PubNub services. If you feel you have been impacted by the issue and wish to report it, please contact support@pubnub.com.

Report: "Increased errors and latency for presence in FRA and BOM"

Last update
postmortem

### Problem Description, Impact, and Resolution  Starting at 12:20 pm UTC on June 2, 2025, we noticed an increase in errors and latency affecting the Presence service in multiple regions. After investigating the increase in errors and latency, we identified the cause to be a higher-than-normal traffic pattern. In response, we  scaled up resources to handle the request load, and the Presence service was fully restored in the affected regions by 1:51 PM UTC on June 2, 2025. ### Mitigation Steps and Recommended Future Preventative Measures  To prevent a similar issue from occurring in the future, we are tuning our autoscaling configuration, as well as analyzing system behavior and bottlenecks observed during the incident window to inform further adjustments.

resolved

Starting at 12:20 UTC, we noticed an increase in errors and latency affecting the Presence service in the BOM and FRA regions. Our engineers investigated the issue and successfully restored the service completely, which has remained stable since 13:51 UTC.

Report: "Increased errors and latency for presence in FRA and BOM"

Last update
Postmortem
Resolved

Starting at 12:20 UTC, we noticed an increase in errors and latency affecting the Presence service in the BOM and FRA regions. Our engineers investigated the issue and successfully restored the service completely, which has remained stable since 13:51 UTC.

Report: "Increased errors and latency in US East"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On May 23, 2025 at 20:51 UTC, PubNub experienced increased errors and latency for a subset of traffic within a single availability zone in the US East region.  The root cause was an operational error during an infrastructure change. We  unintentionally drained traffic from newly deployed load balancers. As a result, the system routed traffic incorrectly, causing elevated error rates and latency for some users. Once the issue was identified, traffic was rerouted, and service performance returned to normal levels. The incident was resolved by 21:04 UTC the same day. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent the issue from recurring in the future, the change procedure has been updated to reference isolated, non-production resources during preparation stages. If run prematurely, the script will now operate on an empty set, ensuring no production traffic is impacted.

resolved

Beginning at 20:51 UTC, we detected increased errors and latency for traffic within a single availability zone in the US East region. Our engineers investigated the issue and successfully restored service, which has remained stable since 21:04 UTC.

Report: "Increased errors and latency in US East"

Last update
Postmortem
Resolved

Beginning at 20:51 UTC, we detected increased errors and latency for traffic within a single availability zone in the US East region. Our engineers investigated the issue and successfully restored service, which has remained stable since 21:04 UTC.

Report: "Presence API errors & latency in US and AP North"

Last update
resolved

This incident has been resolved.

monitoring

At 09:05 UTC we detected increased errors and latency for the Presence service in US and AP-North regions. The service has stablized as of 9:30 UTC. Our engineers are monitoring the system.

Report: "Increase Latency and Errors in Presence in the US (East and West) and AP-North regions"

Last update
postmortem

## **Problem Description, Impact, and Resolution** On May 12, 2025 at 23:22 UTC, we observed elevated errors and increased latency for customers using our Presence service in the Virginia \(US-East\), California \(US-West\), and Tokyo \(AP-North\) regions. After discovering the source of the errors and latency, a solution was set up and ready to be implemented, but the incident was resolved before it was implemented.  This issue occurred because we could not be sure that the fix would not negatively impact some functionality of the presence system. ## **Mitigation Steps** To prevent a similar issue from occurring in the future, we are ensuring that all presence features work with channel sharding to safely enable the solution to any sudden usage changes.

resolved

This incident has been resolved.

monitoring

The issue has resolved and we are monitoring the system for stability.

investigating

Beginning at 23:22 UTC we detected increased errors and latency for the Presence service in US (East and West) and AP-North regions. Our engineers are investigating the issue and the service has stablized as of 23:52 UTC.

Report: "Increased errors and latency for presence in FRA and BOM"

Last update
postmortem

## **Problem Description, Impact, and Resolution** On May 2, 2025 at 15:40 UTC, we observed elevated errors and increased latency for customers using our Presence service in the Frankfurt \(FRA\) and Mumbai \(BOM\) regions. After observing the errors and latency, we increased the memory allocation and number of replicas for the affected services, and the issue was resolved on May 2, 2025 at 16:00 UTC. This issue occurred because we did not have adequate resource thresholds and alerting configured to proactively scale in response to a sudden spike in subscribe traffic, which led to resource exhaustion in key components of our Presence infrastructure. ## **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future, we have permanently increased resource limits and replicas for the Presence service in impacted regions. In the next week, we will also improve our alerting and monitoring to detect abnormal traffic patterns earlier and trigger automated scaling where possible.

resolved

Beginning at 15:40 UTC we detected increased errors and latency for the Presence service in the BOM and FRA regions. Our engineers investigated the issue and were able to restore the service which remains stable as of 15:52 UTC.

Report: "Server Errors and Elevated Latency"

Last update
postmortem

### Problem Description, Impact, and Resolution  At approximately 00:17 UTC on April 25, 2025, we observed elevated server errors and increased latency impacting multiple API endpoints, most notably the Presence service. Our engineering team immediately began investigating the issue. We identified that the root cause involved resource contention within a specific component of our backend infrastructure responsible for managing presence state. We implemented targeted configuration changes to better distribute this traffic and alleviate the resource contention. The issue was fully resolved by 02:32 UTC on April 25, 2025.  ### Mitigation Steps and Recommended Future Preventative Measures  To prevent a similar issue from occurring in the future, we have already implemented specific configuration changes to ensure the responsible backend component can more effectively handle the type of traffic pattern encountered. Furthermore, we are actively working to enhance our monitoring and alerting systems.

resolved

With no further issues observed, the incident has been resolved. We will follow up soon with a root cause analysis. If you believe you experienced impact related to this incident, please report them to PubNub Support at support@pubnub.com.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

The PubNub Technical Staff continues to investigate. Errors and elevated latency are with some Presence customers. The real-time network services are operational.

investigating

The PubNub Technical Staff is investigating. More updates will follow once available.

investigating

At about 12:17 AM UTC, we started to experience elevated latencies and server errors in all PoPs. PubNub Technical Staff is currently investigating and more updates will follow once available. If you are experiencing issues and believe them to be related to this incident, please report them to PubNub Support at support@pubnub.com.

Report: "Presence Server Errors and Elevated Latency"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On March 10, 2025, we observed elevated latency and errors in the Presence service across our global points of presence from 18:26 to 18:40 UTC and again from 20:05 to 20:27 UTC.  After investigating, we found that a short spike in traffic with an unusual pattern caused a downstream service provider to respond with increased latency, which affected our own response times. While this incident resolved quickly, we are working with the third-party provider to better handle this scenario in the future.

resolved

With no further issues observed, the incident has been resolved. We will follow up soon with a root cause analysis. If you believe you experienced impact related to this incident, please report them to PubNub Support at support@pubnub.com.

monitoring

We continue to monitor the situation to guarantee that stability is fully restored.

monitoring

We have identified the issue and have taken effective remediation actions, and our engineers are diligently monitoring the situation to guarantee that stability is fully restored.

investigating

At about 6:26 PM UTC, Presence service started to experience elevated latencies and server errors in all PoPs. PubNub Technical Staff is currently investigating and more updates will follow once available. If you are experiencing issues and believe them to be related to this incident, please report them to PubNub Support at support@pubnub.com.

Report: "Elevated History Error/Latency in US West Region"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 07:23 UTC on December 9th, we received alerts indicating high error levels related to storage writer operations in one of our data centers. Shortly after, one of our third-party service providers reported service disruption in their environment. The service provider began the process of replacing the affected nodes. Throughout the restoration process, we closely monitored our systems to assess how the issue impacted our environment. Once all nodes were successfully restored, error levels returned to normal, and all associated alerts were resolved. ### **Mitigation Steps and Recommended Future Preventative Measures**  We have worked with our vendor to ensure that nodes of this type will be on redundant infrastructure going forward, so that there is less exposure to this kind of incident.

resolved

This incident has been resolved with no errors observed for the last 30 minutes. We apologize for any impact this may have had on your service. Don't hesitate to contact us by reaching out to PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. An RCA will be provided soon.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

Around 07:23 UTC we began to notice increasing errors and latency for History in SJC region(US West).

Report: "Errors and Latency Across All Services in our US-East and US-West Points of Presence"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 21:50 UTC on 2024-12-11 we observed increased latencies and error rates across all services in our US-East point-of-presence and, a few minutes later, in US-West as well. We observed that the PubNub Access Manager \(PAM\) was at the center of the degradation, and an investigation noted that nodes in that service were highly memory constrained. We increased capacity and the issue was mitigated in both points-of-presence at 22:10 UTC, and declared resolved at 22:22 UTC. This issue occurred because a previously unseen pattern of customer behavior overwhelmed a cache in the PAM system, causing memory to become constrained and performance to degrade. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we changed the cache capacity and updated our monitoring to alert on this and similar patterns of behavior.

resolved

This incident has been resolved. We apologize for any impact this may have had on your service. Don't hesitate to contact us by reaching out to PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. An RCA will be provided soon.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an issue that is causing requests in our US-East points-of-presence to fail or respond slowly.

Report: "Presence Server Errors and Elevated Latency"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On October 24, 2024 at 11:15AM UTC, we observed elevated latency and errors in the Presence service across our global points of presence. Affected customers may have experienced a slowdown in Presence request responses and/or failures with 5XX server errors returned. After investigating, we identified the cause of the issue, blocked the source of traffic causing it, and the issue was resolved on October 24, 2024 at 2:00PM UTC. This issue occurred because our services were not auto scaled appropriately in response to a spike in unexpected traffic from non-standard usage of the Presence service. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, we have addressed the source of the unexpected traffic spike directly, ensuring changes were made to align usage with our prescribed methods for Presence. Additionally, we are working in the coming days to deploy sharding in Presence infrastructure to enhance scalability and better manage traffic surges like this, should they recur.

resolved

Beginning at around 11:00 UTC we observed elevated latency and server errors for our Presence service in all of our server endpoints. The issue has been resolved as of 14:11 UTC. We will continue to monitor the incident to ensure service stability has been fully restored. Your trust is our top priority, and we are committed to ensuring smooth operations.

monitoring

We have taken effective remediation actions, and our engineers are diligently monitoring the situation to guarantee that stability is fully restored.

identified

We have successfully identified the issue, and our dedicated engineers are actively working to resolve it. We are seeing positive trends, with both latency and error rates improving significantly.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

At about 11:00 AM UTC, Presence service started to experience elevated latencies and server errors in all PoPs. PubNub Technical Staff is currently investigating and more updates will follow once available. If you are experiencing issues and believe them to be related to this incident, please report them to PubNub Support at support@pubnub.com.

Report: "Elevated latency and errors for history service in us-east-1 region"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 7:35 UTC on October 5, 2024, we received a report of intermittent failures \(5xx errors\) for History API requests. The issue was triggered by an unexpectedly high volume of data requests processed through our shared infrastructure, overwhelming the shared history reader containers responsible for fetching this data from our storage nodes. As data was retrieved and processed by the history reader containers, we observed memory exhaustion \(OOM-kills\), even though the memory capacity had been significantly increased. This impacted the performance of our system, causing History API requests to fail when the memory overload occurred. We took action by isolating the requests responsible for the high data volume and deploying dedicated infrastructure for them. This ensured that the issue was resolved at 00:43 UTC on October 6, and no further impact was observed across the broader customer base. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent this issue from recurring, we deployed dedicated infrastructure for high-volume data requests, and we implemented dynamic data bucket creation to distribute large data volumes more efficiently, reducing strain on our nodes. These steps ensure that our system can handle sudden spikes in resource usage while maintaining stability for all customers.

resolved

Beginning at around 8:00 UTC we observed increased latency and errors for our History service in one of our North America regions. The issue has been resolved as of 14:05 UTC. We will continue to monitor the incident to ensure service stability has been fully restored.

monitoring

Remediation actions have been taken. Our engineers are currently monitoring the incident to ensure the stability has been restored.

identified

We are continuing to work on a fix for this issue.

identified

PubNub Technical Staff still working on fixing the issue.

identified

The issue has been identified and our engineers are engaged and continue to work on the issue. Latency and errors rates are improving.

investigating

At about 8:00 AM UTC, History service started to experience elevated latencies and errors in North America PoP. PubNub Technical Staff is currently investigating and more updates will follow once available. If you are experiencing issues and believe them to be related to this incident, please report it to PubNub Support at support@pubnub.com.

Report: "Presence Latency"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 22:42 UTC on August 22, 2024, we observed increased latency and errors for our Presence service. We found evidence of network issues from our own testing and monitoring, and we created a ticket with our network service provider for additional investigation. The issue was resolved as of 22:50 UTC. The root cause of this issue was a lack of monitoring and alerting around transient network issues within our network service provider's inter-VPC routing. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we have configured the proper monitoring and alerting to provide us with enough time to address this issue before it can affect the QoS of our services.

resolved

Beginning at 22:42 UTC some customers may have observed increased latency and errors for our Presence service in our Frankfurt and Mumbai regions. The issue has been resolved as of 22:50 UTC. We will continue to monitor the incident to ensure service stability has been fully restored.

Report: "Push Notifications Latency in Multiple PoPs"

Last update
postmortem

**Problem Description, Impact, and Resolution** At 18:16 UTC on June 13, 2024 we observed increased latency for delivery of mobile push messages in our Frankfurt and US-East points of presence. In response, we increased the resources available to the services and redeployed the service.The issue was resolved at 21:21 UTC on June 13, 2024. Upon further investigation, we identified this issue occurred due to malformed message payloads creating a backlog in the message queue. ‌ ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future, we increased the memory for the service to handle similar malformed payloads, as well as added additional monitoring.

resolved

This incident has been resolved, and mobile push notifications continue to be delivered normally. We will follow up with a root cause analysis soon. We sincerely apologize for any impact on our customers and their users. If you believe you have experienced production impact due to this issue and would like to discuss it, please reach out to support@pubnub.com.

monitoring

Push notifications are now being delivered normally. We are monitoring the system to ensure no further issues.

investigating

We are continuing to investigate this issue.

investigating

We continue investigating delayed push notifications in our Frankfurt point-of-presence. Push notifications are now being delivered normally in our other regions. We will continue to provide updates here.

investigating

Our investigation continues and we will continue to provide updates here.

investigating

Our Engineering teams continue actively investigating the issue. We will continue to provide updates here.

investigating

We are continuing to investigate this issue.

investigating

We are continuing to investigate this issue.

investigating

We have discovered that push notifications in our US-East point-of-presence are also affected, with push notifications being delivered latently. We continue to investigate and will provide updates here.

investigating

We are continuing to investigate this issue.

investigating

We have discovered that push notifications in our Mumbai point-of-presence are also affected, with push notifications being delivered latently. Push notifications in our Frankfurt point-of-presence are also being delivered latently. We continue to investigate and will provide updates here.

investigating

We have discovered an issue where push notifications are being delivered latently in our Frankfurt point-of-presence since the last 30 minutes. Our Engineering teams are actively investigating the issue and we will provide updates here. If you believe you have experienced production impact due to this issue and would like to discuss it, please report impact to support@pubnub.com.

Report: "Delayed Push Notifications in Frankfurt PoP"

Last update
postmortem

At 18:53 UTC on June 7, 2024, we observed excessively latent deliveries of mobile push messages in our Frankfurt point-of-presence. We discovered that a previously undetected bug was being triggered by malformed messages being sent to the service. We increased the resources available to that service, which allowed the system to catch up and deliveries were being made normally. The issue was declared resolved at 19:59 UTC. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we are leaving the system running in the new configuration. We will also increase monitoring for this area, and will be modifying the push notification service to rectify the bug that triggered the scenario originally.

resolved

This incident has been resolved, and mobile push notifications are being delivered normally. We will follow up with a root cause analysis.

monitoring

We have processed all push messages in our backlog and stabilized the system in Frankfurt. We are monitoring the system to ensure no further issues.

investigating

We are still experiencing some delayed delivery of push messages in our Frankfurt point-of-presence. We continue to investigate the issue.

investigating

We have discovered an issue where push notifications were not being delivered in our Frankfurt point-of-presence. Those notifications were then delivered about twenty minutes late. We are investigating the issue.

Report: "Increased latency and errors across all services in US East"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On Sunday, May 12, 2024 at 14:16 UTC, customers using PubNub’s legacy Access Manager Version 2 may have experienced increased errors and latency across all services in our US-East point of presence. After investigation, we discovered that several database nodes were failing. We were prepared to fail out of that region when the nodes recovered and the errors stopped by 15:20 UTC.  Customers using Access Manager Version 3 were unaffected because Version 3 does not leverage a database.  ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, we are increasing allocated resources in the affected infrastructure, as well as tuning the auto-scale threshold. Additionally, we continue to encourage and assist customers using Access Manager Version 2 to migrate to Version 3. For information on migrating to Access Manager Version 3, please refer to our [Migration Guide](https://www.pubnub.com/docs/general/resources/migration-guides/pam-v3-migration), or contact [support@pubnub.com](mailto:support@pubnub.com) for assistance.

resolved

From 14:16 UTC to 15:20 UTC on May 12, 2024, users may have experienced increased latency and errors across all PubNub services in our US East PoP. Our Engineering teams applied a fix and the issue has been resolved since 15:20 UTC. A root cause analysis will be posted soon.

Report: "Elevated History Error/Latency in Tokyo Region"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 00:06 UTC on March 17, 2024, we observed increased error rates and latency in our Tokyo region for History calls. We then identified the source of latency and errors were due to our third-party provider for storage. We alerted the third-party provider, which then restarted the impacted storage nodes, and the issue was resolved at t 00:47 UTC on March 17, 2024.   ‌ ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we have added monitoring to the swap space level on our servers so we will have better alerting if such issues with our third-party provider occur in the future.

resolved

This incident has been resolved.

monitoring

We have not seen any errors or increased latency for ~45 minutes. We will continue to monitor history to validate the resolution.

monitoring

The elevated state of History errors and latency has returned to normal. We will continue to monitor the incident

investigating

Around 00:06 UTC we began to notice increasing errors and latency for History in Tokyo region. We are investigating this incident.

Report: "Failures for Presence Webhooks"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  Around 14:45 UTC, March 1, 2024, we observed the Presence service started experiencing missed Presence webhook calls. The missed webhook calls were due to a migration of the Presence webhooks across multiple customers. We rolled back the migration for all customers in case the issue was broader and created a status page for transparency. Further investigation showed the missed webhook calls were isolated to a small subset of customers.  ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring, we added and deployed redirect functionality in our Events & Actions service and added monitoring for future webhook migrations.

resolved

This incident has been resolved with no errors observed for the last 30 minutes. We apologize for any impact this may have had on your service. Don't hesitate to contact us by reaching PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. An RCA will be provided soon.

monitoring

A fix was implemented at 01:45 UTC. We are monitoring the results for the next 30 minutes.

identified

The issue has been identified and a fix is being implemented.

investigating

`Around 14:45 UTC (06:45 PT) March 01, the Presence service began to experience missed Presence webhook calls for some users. PubNub Technical Staff is investigating and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "Elevated Subscribe Latency and Errors in US-East"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On October 30, 2023 at 17:52 UTC, we observed a high number of errors and increased latency in our Subscribe service, mostly in our US-East point of presence. This was a result of increased Subscribe traffic in the region growing since around 13:45 UTC.  Customers making subscribe API calls at the time may have experienced increased response latency, or failures with error. We increased service capacity, and memory allocation in all regions, and the issue was resolved on October 30, 2023 at 18:23 UTC.  ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, we updated our monitoring to provide early warning of increased Subscribe traffic, giving us sufficient time to scale before issues occur.

resolved

This incident has been resolved with no errors observed for the last 30 minutes. We apologize for any impact this may have had on your service. Don't hesitate to contact us by reaching PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. An RCA will be provided soon.

identified

The issue has been identified and a fix is being implemented. We will continue to provide regular updates.

investigating

Starting around 13:45 UTC, we began observing elevated latency and errors in calls made to our Subscribe endpoint in our US East point of presence. PubNub Technical Staff is investigating, and more information will be posted as it becomes available. We apologize for any impact this may have had on your service. Don't hesitate to contact us by reaching PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

Report: "Proactive Reconfiguration of PubNub Services"

Last update
postmortem

We received a notification from an infrastructure provider that they were having issues that could affect our US-East Point of Presence. On that recommendation, we shifted our infrastructure to avoid any potential problems. We posted this incident in advance of shifting, in order to keep our customers informed of any issues that might arise. In the end, we were able to make the changes \(and revert them back once the provider was stable again\) without any disruption to PubNub services. As always, please contact PubNub Support with any questions about this \(or other\) concerns or issues.

resolved

We've had no customer impact from this configuration change, and are closing this incident. As always, contact PubNub support with any questions.

monitoring

We have completed our changes, with no customer impact reported. We will continue to monitor the situation closely.

identified

The issue has been identified and a fix is being implemented.

investigating

We are proactively reconfiguring our services, based on advice from an infrastructure provider that is experiencing issues in our US-East point-of-presence. There is no customer impact at this time, and we will update this page if we see any impact during this process. Please contact PubNub support if you experience any issues.

Report: "Elevated Channel Groups Subscribe latency in the US-West PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On Friday, August 11th at 23:45 UTC, we observed a delay in message delivery for subscribe requests using our Channel Groups service. After identifying the delay, we restarted the affected pods, and the issue was resolved at 01:44 UTC on Saturday, August 12th.  ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, we are improving the Channel Groups service communication, as well as exploring enhanced error handling and retries to ensure improved monitoring and alerting.

resolved

This incident has been resolved. The incident was declared with an overabundance of caution, and was determined to be limited to a handful of customers. Please contact PubNub Support (support@pubnub.com) if you wish to discuss the incident.

monitoring

We are continuing to monitor for any further issues for the next 30-60 minutes. We will continue to provide updates here.

monitoring

This issue has been resolved and latency has returned to normal levels. We will continue to monitor services for the next 30-60 minutes. We will continue to provide updates here.

identified

We believe the issue has been identified and a fix is being implemented. We will provide updates as they become available.

investigating

On August 11 at 23:45 UTC, we began to observe increased latency for channel group subscribes in our US-West PoP, which could result in delays in receiving messages. PubNub Technical Staff is investigating, and more information will be posted as it becomes available. We apologize for any impact this may have had on your service. Don't hesitate to contact us by reaching PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

Report: "Some Services Across US East Are Suffering From Partial Outage"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 19:20 UTC on Tuesday June 13th, 2023  we observed increased error rates and latency for our Authorization services at our US East facility.  In response, we redirected authorization services from US East to US West, and the issue was mitigated at 20:59 UTC on Tuesday June 13th, 2023. During this time, we identified the root cause of the issue was due to a third-party service incident. After confirming the third-party service incident was resolved, we rerouted the Authorization traffic back to US East at 22:58 UTC on Tuesday, June 13th 2023.  ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we are developing a comprehensive failover plan to more quickly move services from one region to other regions..  In the next few weeks we will be implementing new processes to allow mitigation of regional service issues.

resolved

This incident has been resolved, all services fully restored.

monitoring

We are continuing to monitor the changes that we have implemented to mitigate this incident.

identified

We have taken steps to mitigate this issue. Error rates and latency have be reduced.

identified

This is particularly effecting Authorization and downstream Objects.

Report: "Elevated latencies in the EU PoP Subscribe Service"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On June 1, 2023, we observed two subscribe latency spikes in our Europe Point of Presence. The spikes occurred from 08:18 AM UTC through 09:06 AM UTC, and from 09:44 AM UTC through 09:54 AM UTC. During these times, users in the Europe region may have experienced slower than normal responses on subscribe calls. The higher than normal latency affected one of multiple access zones in the region. Shortly after detecting the increase in latency, the cause of the issue was identified, and a fix was deployed, restoring the region to normal operational status by 09:54 AM UTC on June 1. This issue occurred when a code deployment to the region overwrote a configuration previously deployed, resulting in a lack of resources in the access zone.  ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, we are applying a fix to all clusters. Additionally, we are improving alerting around publish to subscribe latency so we are quickly notified if a similar issue were to occur.

resolved

On June 1, 2023 between 08:18 and 09:06 UTC, customers using Subscribe service in our EU Central PoP may have experienced some errors, timeouts, and delays in message delivery. We will publish a root cause analysis soon.

Report: "Subscribers experiencing errors in all regions"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 14:35 UTC on May 18, 2023 we observed some errors being served to subscribers globally. We noted a large, unusual traffic pattern that was putting memory pressure on parts of our infrastructure faster than our normal autoscaling could handle. We resolved the issue by manually adding capacity to cover the newly observed pattern. The issue was resolved at 16:15 UTC the same day. This issue occurred because the system was not prepared to scale quickly enough on the combination of factors that were unique to this traffic. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we are adding new monitoring and alerting that can detect this scenario, as well as tuning scaling factors in our systems to allow our autoscaling to react more appropriately to it.

resolved

We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

We have been operating with all systems normal for more than an hour. We are monitoring the situation at this point and investigating the root cause.

identified

We have identified the issue, and are still investigating further. All systems are operational.

investigating

Subscribers were experiencing sporadic errors on May 18 between 2:35 PM and 4:11 PM UTC. We are investigating the cause.

Report: "Elevated rate of latency in India Point of Presence"

Last update
postmortem

### **Problem Description, Impact, and Resolution** On April 21, 2023 at 12:04 UTC, we observed increased latency for Publish API calls to our Mumbai Point of Presence. The issue specifically affected some customers whose publish calls are routed through the PubNub Functions service. We scaled up capacity, and the issue was resolved on April 21, 2023 at 13:40 UTC. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future, we are implementing changes to our scaling process to better address these traffic patterns and adding improved alerting and monitoring for earlier warning of such issues. This should assist with faster responses if there are repeated intervals of high load then lower load.

resolved

This incident has been resolved with no spikes observed since around 13:30 UTC.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and we have not seen latency spike in the past 30 minutes.

investigating

Starting at 12:05 UTC, Customers using our India PoP might be experiencing spikes of increased latencies across multiple services.

Report: "Elevated rate of latency in US East Point of Presence"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On Tuesday May 2, 2023 at 14:26 UTC observed errors and latency on all services in US East PoP. We modified the configuration to scale faster during high load occurrences, and the issue was resolved on May 2, 2023 at 14:34 UTC. This issue occurred because the monitoring could not ramp up needed services quickly enough. ‌ ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, we modified our monitoring to increase resources more rapidly when needed.  These changes were made during the incident and brought resolution to this incident.

resolved

On Tuesday, May 2, 2023 between 14:20 UTC and 14:28 UTC, customers in our US-East PoP may have experienced some errors, timeouts, or delays across multiple services. RCA will be provided soon.

Report: "Events and Actions Outage in US East PoP. Events are not Being Dispatched."

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On Tuesday May 2, 2023 at 21:23 UTC, we observed that Events and Actions messages were not being processed on our US PoP or EU Central PoP.  Shortly after observing the issue, we redeployed a processing schema, and the issue was resolved on May 2, 2023 at 22:00 UTC. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, we added the needed monitoring to alert us when Events and Actions messages are not being processed properly so we can redeploy the processing schema as needed.

resolved

This service incident has been resolved since 22:00 UTC

monitoring

The Events and Actions service in US-East has been restored and any messages that would trigger actions have now been processed.

investigating

Events and Actions messages are not currently being dispatched for customers in our US-East PoP.

Report: "Elevated rate of latency and timeouts in US East Point of Presence (POP)"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On April 2, 2023 at 15:34 UTC we observed an increased rate of timeouts and errors in our services for Presence, Push, History, Subscribe, and Auth Services in our US East Point of Presence. We identified the issue was related to repair work being performed on this PoP by one of our third party vendors. After identifying the issue, we rerouted traffic from US East to US West and moved the incident to Monitoring status since improvement was observed. Once we received an update from our third-party vendor that the repair work was complete, we routed traffic back to our US East PoP, and the issue was resolved at 17:13 UTC. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we will implement more granular monitoring and alerting for early warning of such issues to the affected infrastructure.

resolved

This incident has been resolved and we will follow up with a post-mortem soon. We apologize for any impact this may have had on your service. Don't hesitate to contact us by reaching PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

Around 16:30 UTC a fix has been implemented and we are monitoring the results for next 30 mins.

identified

Beginning at 15:34 UTC 04/02, some customers may be experiencing a higher rate of timeouts and errors in Presence, Push, HIstory, Subscribe, and Auth Services.

Report: "Subscribe is experiencing elevated latencies in the EU PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On Tuesday, March 21, 2023 at 09:08 UTC, we observed errors, timeouts, and delays in message delivery in our EU Central PoP. We rolled back the responsible configuration changes, and the issue was resolved at 09:38 UTC. This issue occurred due to a configuration change that allowed our subscribe service to use the existing resources better. Unfortunately, this caused us to hit a limit in open connection counts, leading to delays in creating new connections.  This, in turn, led to delayed subscribe call connections and message delivery. This was the same issue that occurred on March 16th. Unfortunately, it is particularly difficult to measure the customer impact of the subscribe API. ‌ ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, there is a metric we can use to approximate customer impact that we will monitor closely going forward, including during any further configuration changes.

resolved

Between 12:25 - 14:45 UTC, Customers using Subscribe service in our EU Central PoP may have experienced some errors, timeouts, and delays in message delivery.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Customers using Subscribe service in our EU Central PoP may be experiencing errors, timeouts, and delays in message delivery

Report: "Increased Access Manager error rates and latency in our US East PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  On December 9, 2022 at 19:12 UTC, we observed an increase in errors and latency for Access Manager API calls in our US East PoP. The errors and latency occurred shortly after initiating a deployment. We responded by stopping the deployment, and rerouted affected traffic.  The issue was resolved at 19:20 UTC. This issue occurred because of a misconfiguration in an infrastructure deployment. **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, we will fix the misconfiguration and update our system to better handle failures.

resolved

Between 19:12 - 19: 20 UTC, Customers using PAM grant in our US East PoP may have experienced increased error rates and latency. They may have experienced a small impact to other services as well.

Report: "Admin Portal Slow to Respond"

Last update
postmortem

### **Problem Description, Impact, and Resolution** On August 30, 2022 at 8:50 PM UTC, we observed slow or failed logins to our Admin Portal, as well as errors using some functionality within the portal once logged in. We investigated, and found that a third-party provider we use for subscription management was having an operational incident. We did what we could to work around the provider’s issue, and escalated with them. The provider resolved their issue, and the incident was resolved at 5:20 AM UTC on August 31. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future we have added work to our roadmap to reduce our dependency on the third-party for Admin Portal logins, preferring to allow a login and alert the user to the reduced functionality in case of an error at the provider. We also added work to our operations backlog to better alert us to errors of this kind from the service provider.

resolved

The incident has not resurfaced for the past hour. We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

The admin portal has been operating as expected after a fix was deployed by our provider. We will continue to monitor this closely for at least the next 60 minutes before we determine that this incident is resolved. If you continue to experience issues that you believe are related to this incident, please report the details to PubNub Support (support@pubnub.com).

identified

We are continuing to work with the provider to fix the issue, and some improvements are already implemented and apparent.

identified

The engineering team has confirmed the issue is with a downstream provider, and is taking necessary steps to escalate with them and resolve as quickly as possible.

investigating

Due to an issue at a downstream provider, the admin portal at admin.pubnub.com is currently taking an abnormally long time to respond, and some functionality within the portal may time out. We are investigating the issue. Other PubNub services are unaffected, and are operating normally.

Report: "Storage is experiencing elevated latencies and errors"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 04:59 UTC on August 10, 2022, we observed an elevated error rate for History and Channel Groups services globally. We contacted our third-party vendor where the affected servers reside, and the issue was resolved at 05:57 UTC when the backlog of requests was processed. This issue occurred because of a faulty process with our vendor that unexpectedly restarted our servers while performing an update. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we are working with our vendor to ensure this process gap is addressed and fixed.

resolved

This incident has been resolved. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you'd like to discuss the impact on your service.

monitoring

A fix has been implemented, and most of the errors have been gone since 22:24 PST. Currently, we are monitoring the results for the next 30 mins.

identified

The issue has been identified and a fix is being implemented.

investigating

At about 05:00 UTC on Aug 10 (Aug 09, 22:00 PST), the History began to experience elevated latencies and errors. PubNub Technical Staff is investigating, and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "Channel Groups services are reporting errors in the Mumbai PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At about 02:45 UTC on August 10, 2022, we observed the Channel Groups service reporting errors in our Mumbai PoP, preventing users from making channel group related API calls there. We contacted our third-party vendor where the affected node resides, and the vendor applied a fix resolving the issue at 03:20 UTC. This issue occurred because of an error in our vendor's configuration that went undetected. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we have confirmed that our vendor has applied a fix which addressed the root cause.

resolved

This incident has been resolved. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you'd like to discuss the impact on your service.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

At about 02:45 UTC on Aug 10 (Aug 09, 19:45 PST), the Channel Groups service began to report errors in the Mumbai PoP. PubNub Technical Staff is investigating, and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "Elevated rate of 5XX errors Subscribe US West Point of Presence (POP)"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 18:36 UTC on August 9, 2022, we observed a small number of errors served across all PubNub services in our US-West PoP. We identified a configuration sync issue with a load balancer, removed it from service, and the issue was resolved at 19:29 UTC. This issue occurred because alerting did not include the specific configuration problem affecting the load balancer which had to be identified through investigation. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we are working to enhance server-level monitoring and alerting in the coming days. This will ensure we are immediately aware of any recurrence and can act quickly to resolve it.

resolved

This incident has been resolved. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you'd like to discuss the impact on your service.

monitoring

A fix has been implemented and we are monitoring the results.

identified

Beginning on 18:36 UTC 08/09, some customers may be experiencing a higher rate of timeouts and errors. We have applied a rollback and investigating this issue.

Report: "Issue with messages missing in Storage & Playback, Push notifications in Europe PoP"

Last update
postmortem

At 01:08 UTC on August 3rd we observed that messages published to persistence from our Europe PoP from 23:32 August 2nd to 00:41 UTC August 3rd were not accessible by History API call, and push notifications sent during the same period were not delivered. We implemented a fix and the issue was resolved at 02:04 UTC on August 3rd resulting in stored messages being backfilled and accessible as expected. Push notifications sent during the period in question remained undelivered due to expiry. This issue occurred because there was an error in our server code that went undetected, resulting in messages waiting in the queue unprocessed. To prevent a similar issue from occurring in the future we expanded testing coverage to ensure messages are written to persistence as intended. In the coming days we will implement enhanced internal alerting to provide early warning of such an occurrence in the future.

resolved

This incident has been resolved.

monitoring

A fix has been implemented. Messages published to storage during the time period in question to our Europe PoP are now available for retrieval. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you'd like to discuss the impact on your service.

identified

Implementation of the fix remains underway. We will provide an update within 30 minutes.

identified

The issue has been identified and a fix is being implemented.

investigating

Users who published messages to our European PoP from ~23:32-00:41 GMT may be temporarily unable to access them. No errors are being returned from the service, despite messages not being returned correctly. Push messages sent during that time were not sent to their intended recipients. The other services, including PubSub, are working normally. While the issue has been resolved, we are working now to restore access to the stored messages.

Report: "Degraded performance of Push, Presence and Storage service in the US East PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 15:45 UTC on 2022-06-27, we observed delays in push notifications sent and messages written to history, as well as excess presence join & leave events. In response, we scaled the underlying systems supporting these services,, and the issue was resolved at 16:05 UTC. This issue occurred because our third party service provider experienced an outage in the US-East PoP. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future, we are updating our processes to ensure that malfunctioning nodes are restarted in a way that will preserve their state for analysis, as well as updating our runbook for scaling the system.

resolved

We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for any impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

Between 2022-06-27, 15:45 to 16:05 UTC, a small percentage of traffic experienced delayed Push notifications, Presence events, and Storage writes that were published from a single PoP in the US East region. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you'd like to discuss the impact on your service.

Report: "Issue with messages missing in Storage & Playback in US West PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 12:00 UTC on May 3, 2022 we observed latency in our Storage & Playback service in the US West PoP, which manifested itself as missing messages to clients who used that service to look up messages in that location. Publish and Subscribes were unaffected. We identified the cause as an issue with a downstream data storage service provider in that region, and took steps to have other regions assist in processing the message backlog. This caused the secondary effects of temporarily increasing latency and error rates in the Storage & Playback and Push services in the US East and AP Northeast PoPs from 13:42 to 13:54, after which all services were operating nominally. There was an additional secondary effect which manifested as increased latency to the Push service from 14:25 to 14:36. All systems were then performing within normal bounds, and the incident was considered resolved at 14:36 UTC the same day. This issue occurred because of a failure of 2 of 3 database nodes at a database provider in the US West region. The provider completed the replacement of the failed nodes at 18:00 UTC the same day, after which we returned to our normal operating posture for the affected services. ### **Mitigation Steps and Recommended Future Preventative Measures**  To help minimize the impact of a similar issue in the future, we updated our operational runbooks for dealing with a regional database failure based on some of our observations during this incident. We noted the secondary effects to the Push system that were caused by the runbook used to route around the issue by bringing other regions’ capacity to assist, and have scheduled work to prevent that kind of effect in the case of another similar procedure. We are continuing to work with our database provider to analyze the root cause in their service, and mitigate that going forward.

resolved

We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for any impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

A fix has been implemented and we are monitoring the results.

identified

A workaround for Storage & Playback has been implemented. We are monitoring for additional issues as the downstream database provider works on the issue.

identified

Other PoPs have been seeing increased errors and latency for the past few minutes as they assist in the backlog from US West.

identified

The Storage & Playback service is degraded in the US West PoP due to a database issue at a downstream service provider. We are working to use resources in other PoPs to help US West.

investigating

Users in our US West PoP are currently experiencing an issue with our Storage & Playback service that is causing some messages to not appear in calls to that service. No errors are being returned from the service, despite messages not being returned correctly. The other services, including PubSub, are working normally. We are investigating the issue.

Report: "Degraded performance of Functions service in the Mumbai PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution** At 15:44 UTC on 2022-02-04, we observed degraded performance with our functions in the Mumbai region. When we realized a drop in service, we doubled our internal capacity to account for the degraded performance. The issue was resolved for most customers at 19:11 UTC on 2022-02-04, but a small subset of affected customers continued to see issues through 04:00 UTC on 2022-02-05. The reason for the degraded performance was due to a service failure with our vendor. When the service failed, we did not have alerting in place to proactively detect the failure. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future, we have set up the necessary alerting of our service and continue to work with the vendor to understand the underlying cause.

resolved

We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. This incident has been resolved, and we will follow up with a post-mortem soon.

monitoring

A fix has been implemented, and we are monitoring the results for the next 30 mins.

investigating

We are experiencing degraded regional Functions performance in Mumbai PoP. The issue is currently under investigation

Report: "Channel Groups services are reporting errors in the EU PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 18:42 UTC on 2022-02-16, we observed subscribe \(to channel groups\) errors increase due to issues with channel group registrations \(add/remove channels to/from channel groups\) in our EU PoP. We notified our storage provider and began rerouting storage traffic to our Mumbai PoP to mitigate the issues.  At 21:56 UTC on 2022-02-16, the issue was resolved.  A new usage pattern at scale exposed some sub-optimal behavior which required us to scale our storage services on short notice to mitigate the issue.   **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we are fixing the bottleneck so we are able to scale our storage service more quickly.

resolved

The incident has not resurfaced for 60 minutes. We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

Engineering is still working with our storage provider to completely resolve the underlying issue. We will continue to provide timely updates to report any changes or progress.

monitoring

The Channel Groups service has been operating as expected for the past 30 minutes. We will continue to monitor this closely for at least the next 60 minutes before we determine that this incident is resolved. If you continue to experience issues that you believe are related to this incident, please report the details to PubNub Support (support@pubnub.com).

identified

The engineering team has identified the issue and is taking necessary steps to mitigate and resolve as quickly as possible. At this point, latency should be returning to acceptible levels.

investigating

At about 21:25 UTC on Feb 16 (Feb 16, 13:25 PST)}, the Channel Groups service began to report errors in the EU PoP. PubNub Technical Staff is investigating and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "The Subscriber and Channel Groups services are reporting errors in the AP South PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 03:27 UTC on 2022-16-02, we observed errors with Channel Group registrations \(add/remove channels to/from channel groups\) in the South AP PoP \(Mumbai\). We immediately escalated to our storage provider while we routed storage traffic to US West to mitigate the issue. The issue was resolved at 04:30 UTC on 2022-02-15.  This issue occurred due to a bug in our storage providers' platform. Our internal monitoring detected the errors which allowed us to take immediate actions with rerouting traffic and escalating to our provider. ### **Mitigation Steps and Recommended Future Preventative Measures**  We plan to migrate our Mumbai PoP to Kubernetes which will allow us to more efficiently reroute traffic to another PoP when needed. Our storage provider will resolve the existing bug.

resolved

The incident has not resurfaced for 30 minutes. We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

Our storage vendor has identified the issue and taken steps to mitigate it. We will provide further details once we have done a full investigation.

investigating

Our storage vendor has responded that they advised that the affected PoP is currently recovering. Our engineering staff continues to investigate and mitigate as possible, in parallel to our vendor's efforts.

investigating

Engineering is still working to resolve the underlying issue. We will continue to provide timely updates to report any changes or progress.

investigating

At about 03:25 UTC on Feb 16 (Feb 15, 19:25 PST)}, the Subscriber and Channel Groups service began to report errors in the AP South PoP. PubNub Technical Staff is investigating and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "The Presence service reported errors globally"

Last update
postmortem

At 18:08 UTC on 10/6/2021 we observed some data inconsistency in our Presence service. We determined that it was due to an error during maintenance of the Presence service, and the issue was resolved at 18:55 UTC. The process for performing this type of maintenance has been updated to correct the erroneous part of the procedure so it does not happen again.

resolved

Between 01:08 UTC and 01:55 UTC, customers using our Presence APIs will have experienced devices that might have falsely timed out and rejoined during this time. The issues have not resurfaced since 01:55 UTC. We will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support

Report: "Elevated Latency and timeouts to the Subscribe service"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 06:28 UTC on 2022-01-29, we received a small number of customer reports about higher-than-normal rates of premature disconnection of TCP connections for specific TLS sessions, resulting in client connection resets in our US-West \(California\) and AP-South \(Mumbai\) points-of-presence. After a number of additional customer reports of error messages, which at the time presented as expected behavior for connections and reconnections within our SDK, further investigation led us, at 06:42 UTC on 2022-02-02, to identify the connection resets were caused by a provider's service that we use at our network edge which started sending these TCP connection resets. We immediately routed traffic around these issues to minimize the impact while the provider rectified its issues. Once the infrastructure was stable, we were able to return traffic to its normal flow, and the issue was resolved at 19:00 UTC on 2022-02-04. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue, we have implemented monitoring to alert our on-call staff of similar issues with our infrastructure provider and have implemented a runbook to promptly resolve these errors. We will also investigate ways to make our SDKs automatically react to these errors in the future.

resolved

We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. This incident has been resolved, and we will follow up with a post-mortem soon.

monitoring

A fix has been implemented, and we are monitoring the results for the next 30 mins.

identified

We are continuing to work on a fix for this issue.

identified

We've started rolling out the fixes in our US West PoP first, and customers should start seeing the improvements in the timeouts over the next few hours.

identified

We are continuing to work on a fix for this issue.

identified

Beginning on 06:28 UTC 01/29 some customers may be experiencing a higher rate of premature disconnection of TCP connections for certain TLS sessions, which could be seen as client timeouts. This would particularly affect customers using a pndsn.com origin and primarily affects the subscribe API call, but others may have a lesser effect.

Report: "Presence, Subscribe and Publish service experienced elevated latencies and errors in US East region"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 10:20 UTC on 2022-01-21, we observed elevated latencies and errors for Subscribe, Publish, and Presence service in the US East region. This issue occurred during the migration of another service. The issue was quickly diagnosed and resolved, with services returning to normal at approximately 10:30 UTC.  ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring we have established new runbooks items and necessary monitoring to alert our on-call staff.

resolved

Presence, Subscribe and Publish services were impacted briefly by maintenance work related to the admin portal. As a result, service was degraded in US East PoP from approximately 10:21 UTC to 10:29 UTC. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. This issue is resolved, and we will follow up with a post-mortem soon.

Report: "Continuation to premature client timeouts to the Subscribe service in Mumbai PoP"

Last update
resolved

We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. This incident has been resolved, and we will follow up with a post-mortem soon.

monitoring

A fix has been implemented, and we are monitoring the results for the next 30 mins.

identified

We are continuing to work on a fix for this issue.

identified

At 06:28 UTC on 01/29, we observed that some customers may be experiencing a higher rate of premature disconnection of TCP connections for certain TLS sessions, which could be seen as client timeouts. This would particularly affect customers using a pndsn.com origin and primarily affects the subscribe API call, but others may have a lesser effect. This has been partially mitigated, but we continue to work towards a complete solution.

Report: "Subscribe service is experiencing elevated latencies and errors in the US East PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution** At about 12:09 UTC, the Subscribe service began to report errors and experience elevated latencies in the US East point-of-presence due to a data center failure of one of our cloud infrastructure providers. We routed traffic around these issues to minimize latency and impact while the provider rectified its issues. Once the infrastructure was stable, we were able to return traffic to its normal flow and the latency resolved itself by 14:20 UTC. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring we have implemented the necessary monitoring to alert our on-call staff and updated the runbook to resolve such issues in a timely manner.

resolved

The incident has not resurfaced in the past 30 mins. We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

A fix has been implemented and we are monitoring the results.

identified

At about 12:07 UTC, the Subscribe API began to experience elevated latencies and errors in the US East PoP. PubNub Technical Staff is investigating and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "Failures for Presence Webhooks and elevated latencies for Mobile Push Notifications"

Last update
postmortem

### **Problem Description, Impact, and Resolution** At 15:26 UTC on 2021-12-17, we observed failures for Presence Webhooks and elevated latencies for Mobile Push Notifications. There were also issues for some logging into our hosted customer support portal and our own PubNub Account Dashboard. This issue occurred because our service provider experienced an outage in the US-West PoP and was resolved by them at 16:10 UTC.  ### **Mitigation Steps and Recommended Future Preventative Measures**  The proper action was taken by our service provider within a timely manner.

resolved

Between 2021-12-15 15:15 UTC (07:15 PST) and 16:00 UTC (08:00 PST), a small percentage of traffic experienced failed Push notifications that were published from a single PoP in the US West region to devices. The PubNub Account Dashboard and the Customer support portal were also affected which prevented successful login. The root cause was due to an outage of our hosting provider which they were able to resolve. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. This issue is resolved, and we will follow up with a post-mortem soon.

Report: "Storage service is experiencing elevated latencies in the US-East PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 08:57 UTC \(12:57 PST\) on 2021-11-25, we observed elevated latencies and errors for the Storage service in the US-East PoP.  The elevated latency caused the inability to retrieve messages from storage within the expected timeframe.  This issue occurred because our storage vendor was running deployments in the US-East PoP region which resulted in higher storage write latencies. We contacted our storage vendor to postpone the deployments to later in the day when traffic levels were lower.  The issue was resolved at 14:08 UTC. **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future we will coordinate optimal times for off-peak hours deployments.

resolved

We are in communication with our Storage vendor who is performing rolling deployments to update their infrastructure. They have put a hold on US-East updates until later during non-peak hours. Deployments to the AP-Northeast (Tokyo) PoP are scheduled to begin at 16:00 UTC (08:00 PST). The deployment is estimated to run for 3 to 4 hours. There could be similar latencies in that PoP for Storage during that time period. We are resolving this incident. If there is any further impact to this (US-East) PoP or others, we will post a new incident at that time. We apologize for the impact this may have had on your service. Please feel free to report any issues or questions about this incident to the PubNub Support Team (support@pubnub.com).

monitoring

We have seen the latency return to normal and has been operating as expected for the past 45 minutes (since 11:49 UTC or 03:49 PST). We will continue to monitor this closely for at least the next 30 minutes before we determine that this incident is resolved. If you continue to experience issues that you believe are related to this incident, please report the details to PubNub Support (support@pubnub.com).

identified

At 08:57 UTC (12:57 PST) on 2021-11-25, we observed elevated latencies and errors for the Storage service in the US-East PoP. This is directly related to a maintenance deployment being performed by our storage vendor. The estimation from the storage vendor is that this will continue for about another 90 minutes. PubNub Technical Staff is investigating and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "The Objects service is reporting errors and high latency in the EU PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution** At about 17:33 UTC, the Objects service began to report errors and experience elevated latencies in the EU PoP. We started to implement rate limits on the impacting Objects query, during that process the latency resolved itself by 19:10 UTC. The reason for the issue is due to a lack of monitoring to detect runaway query processes within the Objects service. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring we have implemented the necessary monitoring to alert our on-call staff and established default rate limits to prevent incorrect queries from consuming resources unnecessarily.

resolved

The incident has not resurfaced since 19:10 UTC. We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

A fix has been implemented and we are monitoring the results.

identified

At about 17:33 UTC, the Objects service began to report errors and experience elevated latencies in the EU PoP. PubNub Technical Staff is investigating and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "The Presence service is experiencing elevated latencies in the all PoPs"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 14:02 UTC on 2021/11/12, we observed increased latency with the Presence service in all of our PoPs. This was a repeat issue from the prior day so we already knew the quickest path to mitigating the effects of this incident. We contacted our service provider to provide insights while we restarted our presence services to clear the backlog of requests. and the issue was resolved at 15:01 UTC.  This issue occurred because there wasn't sufficient alerting between our system and our service provider when there is a rapid increase in Presence requests. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future our service provider has upgraded the infrastructure to handle higher network throughput.

resolved

The incident has not resurfaced in the past hour. We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

The Presence service latency has returned to expected levels and has been operating as expected for the past 10 minutes. We will continue to monitor this closely for at least the next 60 minutes before we determine that this incident is resolved. If you continue to experience issues that you believe are related to this incident, please report the details to PubNub Support (support@pubnub.com).

investigating

Around {02:00 UTC (06:00 PST)}, the Presence service began to experience elevated latencies in all of the PoPs. PubNub Technical Staff is investigating and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "The Presence service is experiencing elevated latencies in the Frankfurt PoP"

Last update
postmortem

### **Problem Description, Impact, and Resolution**  At 03:50 UTC on 2021/11/11, we observed increased latency with the Presence service in our Frankfurt PoP. We contacted our service provider to provide insights while we added capacity and restarted our presence services to handle the backlog. Once the services were restarted and our service provider increased the network throughput, the issue was resolved at 20:23 UTC.  This issue occurred because there wasn't sufficient alerting between our system and our service provider when there is a rapid increase with Presence requests. ### **Mitigation Steps and Recommended Future Preventative Measures**  To prevent a similar issue from occurring in the future our service provider will upgrade the infrastructure to handle higher network throughput.

resolved

The incident has not resurfaced for over 2 hours. We are resolving this issue, and we will follow up with a post-mortem soon. We apologize for the impact this may have had on your service. Please reach out to us by contacting PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service.

monitoring

We are continuing to monitor for any further issues.

monitoring

The Presence service has been operating as expected for the past 30 minutes and we are confident that we are in a stable state with expected low latencies. We will continue to monitor this closely for at least the next 60 minutes before we determine that this incident is resolved. If you continue to experience issues that you believe are related to this incident, please report the details to PubNub Support (support@pubnub.com).

identified

Presence join latency has mostly recovered but is still elevated and there is some increase in Presence Webhook errors. We are continuing to work on a fix for this issue. As always, if you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

identified

The engineering team has identified the issue and is taking the necessary steps to mitigate and resolve as quickly as possible. Engineering is focused on adding capacity and restarting processes as necessary.

investigating

Around 03:50 UTC (06:50 PST), the Presence service began to experience elevated latencies in the Frankfurt PoP. PubNub Technical Staff is investigating and more information will be posted as it becomes available. If you are experiencing issues that you believe to be related to this incident, please report the details to PubNub Support (support@pubnub.com).

Report: "Degraded capacity in the Functions service"

Last update
postmortem

### **Problem Description, Impact, and Resolution** At 01:02 UTC on 10/6/2021, we observed that some new Functions could not be scheduled for execution. We started new server nodes to replace a group of failed server nodes and the issue was resolved at 01:37 UTC. This issue occurred because of an outage with the infrastructure provider that runs the nodes. ### Mitigation Steps and Recommended Future Preventative Measures While our capacity for new executions was reduced briefly, we were able to service the existing demand effectively. We continue to follow up with our provider to look at the root cause from their end.

resolved

This incident has been resolved.

monitoring

A failure in a downstream infrastructure provider reduced our capacity to start new Functions. We have brought up new capacity up to compensate.