Historical record of incidents for PlayFab
Report: "Audit Logs are not being collected"
Last updateWe are investigating an issue in Audit log persistance.
Report: "PlayStream Processing Delay"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently experiencing delayed processing of events for a subset of players. No data has been lost and engineers are working to resolve this as soon as possible.
Report: "Increased latency in user api's"
Last updateThis incident has been resolved.
Api latency has returned to normal, we'll continue monitor while investigations continue.
We have observed increased API latency and error rates starting at approximately 11 am pacific today for the following APIs: UpdateUserData, GetUserData, GetPlayerCombinedInfo and are investigating.
Report: "PlayStream Processing Delay"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently experiencing delayed processing of events for a subset of players. No data has been lost and engineers are working to resolve this as soon as possible.
Report: "Increased latency in user api's"
Last updateThis incident has been resolved.
Api latency has returned to normal, we'll continue monitor while investigations continue.
We have observed increased API latency and error rates starting at approximately 11 am pacific today for the following APIs: UpdateUserData, GetUserData, GetPlayerCombinedInfo and are investigating.
Report: "Reports are delayed for May 21st"
Last updateThis incident has been resolved.
We are currently investigating an issue with processing analytics reports and daily title report emails for May 21st.
Report: "Reports are delayed for May 21st"
Last updateThis incident has been resolved.
We are currently investigating an issue with processing analytics reports and daily title report emails for May 21st.
Report: "Partial outage for Economy V2 affecting Catalog APIs and Inventory APIs"
Last updateMigration is completed and incident mitigation is complete
Issue has been mitigated, we're monitoring for complete recovery
The issue has been identified, it was due to migrations for service improvements, we are currently working on a fix
Report: "Partial outage for Economy V2 affecting Catalog APIs and Inventory APIs"
Last updateMigration is completed and incident mitigation is complete
Issue has been mitigated, we're monitoring for complete recovery
The issue has been identified, it was due to migrations for service improvements, we are currently working on a fix
Report: "Scheduled Task Failures"
Last updateWe have identified an issue that impacted a subset of scheduled tasks. From 5/19 10:30AM - 5/20 12:00AM PDT, there were failures for scheduled tasks that run actions on each player in a segment. This issue has been resolved.
Report: "Scheduled Task Failures"
Last updateWe have identified an issue that impacted a subset of scheduled tasks. From 5/19 10:30AM - 5/20 12:00AM PDT, there were failures for scheduled tasks that run actions on each player in a segment. This issue has been resolved.
Report: "Issues related to saving Title Data on Game Manager"
Last updateWe are currently investigating this issue.
Report: "Economy V2 APIs availability issue"
Last updateThis incident has been resolved.
We have identified an issue that affected a subset of the Economy V2 APIs. The issue is fixed and we are monitoring the status of the affected APIs
Report: "Service Degradation across APIs"
Last updateThis incident has been resolved.
We are continuing to investigate this issue.
We are observing recovery with error rates.
We are continuing to investigate this issue.
We are currently investigating increased API errors across APIs.
Report: "May 6th Data Connections and Playstream Actions are delayed"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Economy v1 Game Manager pages not loading"
Last updateThis incident has been resolved.
We're aware of the problem and are working on a fix
Report: "PlayFab api limits exceeded"
Last updateThis incident has been resolved.
A fix has been implemented and we're monitoring the results.
We've identified the cause of unexpected limits exceptions, and are attempting to mitigate.
We're investigating an issue where some titles are reporting unexpected errors related to limits being exceeded
Report: "Character API instability"
Last updateDelay in database index updates has resulted in inconsistent behaviors related to characters. For example, after a Character was successfully created, it cannot be found. While all the operations eventually succeeded, the delay introduced affected multiple games. The issue has now been corrected.
Report: "Service Degradation on Login APIs"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating increased errors with our Login* APIs.
Report: "Reports have been delayed since March 28th."
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue with processing analytics reports and daily title report emails from March 28th.
Report: "MPS Build and Allocation Failures"
Last updateOn March 24th, 2025, between 9:00 PM and 12:00 AM PST, customers intermittently encountered failures with PlayFab Multiplayer Server \(MPS\) APIs, such as build or allocation calls. The incident was caused by the unhealthy state of a cluster, which was triggered by an experimental feature enabled by a high load customer, resulting in pod restarts due to high CPU usage. We resolved the issue by deploying a hotfix to address the bug. ### Impact During the incident, customers experienced intermittent failures when using MPS APIs. The issue was isolated to titles leased to a specific cluster. Titles on other clusters were not affected. ### Root Cause Analysis The root cause of the incident was pod restarts triggered by unnecessary recurring calls from a new experimental feature enabled by a high load customer. The feature caused grains to initialize with leases in stamps not associated with the title, leading to delays in processing heartbeat requests. This filled the message queue on the grain, resulting in excessive CPU usage and pod restart events. ### Action Items To prevent similar incidents from occurring in the future, we have implemented the following actions: * Verified the functionality of the experimental feature. * Checked flags in Cosmos DB and API usage for predictive standby. * Initiated a re-evaluation of the design and test coverage for the feature.
One of our clusters experienced issues processing calls between 4 AM to 7 AM UTC on March 25th, causing some calls to fail. The issue was resolved after deploying a fix, but customers may have intermittently encountered failures with MPS APIs, such as build or allocation calls, during that time.
Report: "Experimentation Service Degradation"
Last updateBetween 2025-03-18 19:00 UTC and 2025-03-19 17:00 UTC, some customers saw InternalServerErrors being returned from the Experimentation/GetExperiments API, or a blank page when loading the Experiments page in Game Manager. The incident was caused by the deployment of a bad configuration used by the experimentation service. We resolved the issue by fixing the configuration and redeploying the impacted service. ### Impact Any title that attempted to retrieve their experiment information via Game Manager or API saw InternalServerErrors or a blank Experiments page for the duration of the incident. There was no impact to any other Experimentation APIs or to the operation of the experiments themselves. ### Root Cause Analysis The root cause of the incident was the rollout of an incorrect configuration used by the Experimentation service. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: · Enhanced monitoring and alerting systems to detect and report errors in service-to-service communication.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently experiencing an issue with the experimentation page not loading in correctly. Our team is actively investigating and working to resolve the problem as quickly as possible.
Report: "5XX error rate and higher latency across many PlayFab APIs"
Last updateThis incident has been resolved.
We experienced an increased amount of 5XX errors, we have identified the issue we mitigated the cause. We're currently monitoring the recovery.
Report: "Game Manager Login Issues"
Last updateThis incident has been resolved.
We are monitoring Game Manager account issues. Customers who are experiencing login issues should be assured that their data is unaffected and will be available once their account access has been restored. If your account has been locked, please use the Contact Us form located at https://playfab.com/contact to have your account unlocked. For now, we recommend that customers seeing the PlayFab Account to Microsoft Account migration flow select "Migrate Later".
We are continuing to troubleshoot Game Manager account issues. Customers who are experiencing login issues should be assured that their data is unaffected and will be available once their account access has been restored. If your account has been locked, please use the Contact Us form located at https://playfab.com/contact to have your account unlocked. For now, we recommend that customers seeing the PlayFab Account to Microsoft Account migration flow select "Migrate Later".
We have resolved the Game Manager login and logout issues. We are investigating an issue with PlayFab account to Microsoft Account (MSA) migration. For now, we recommend that customers seeing the account migration flow select "Migrate Later".
We have identified the issue and are working on deploying a mitigation.
Some customers are currently experiencing issues logging into or out of Game Manager. We are currently investigating the issue. In the interim, we recommend that customers migrating to MSA select "Migrate Later" in the MSA migration flow.
Report: "Delay in event delivery to Data Connections and Data Explorer"
Last updatePlayFab engineers identified an issue with event delivery that resulted in a maximum delay of 50 minutes across all titles. This issue has been fixed, and all delayed data has been processed.
Report: "Errors when creating players in new titles in Game Manager"
Last updateWe have reprocessed the titles whose namespaces were not properly initialized. All titles are expected to be fully functional.
A fix has been deployed. New titles created now are able to create players. We are now looking for titles that have been created since the issue was introduced.
When a new title is created in a new namespace, player creation fails. We are working on a fix.
Report: "Authentication with Microsoft account failing for some users"
Last updateThe incident has been resolved
A fix has been deployed, we're monitoring the service status and rate of errors.
Authentication for some users authenticating with Microsoft accounts is failing. We're currently investigating.
Report: "Increased Error Rates for PSN APIs"
Last updateOn February 7th, 2025, between 3:15 PM PST and 3:00 PM PST on February 8th, 2025, some customers experienced increased latency and intermittent failures when making PlayStation network-related calls \(e.g., LoginWithPSN, RedeemPlayStationStoreInventoryItems\) on PlayFab's service. The incident was caused by a PlayStation Network outage leading to a high rate of InternalServerErrors for API calls dependent on the PlayStation Network. We monitored the situation and the issue was resolved once the PlayStation Network recovered. ### Impact All PlayStation Network-related API calls experienced increased latency and an intermittent failure rate over the course of 24 hours. This incident impacted the overall availability SLA for PlayStation-related services during this period. ### Root Cause Analysis The root cause of this incident was an external issue with the PlayStation Network. The PlayStation Network outage was outside of our direct control and required intervention from PlayStation to resolve. ### Action Items • Improve our communication protocols with partners to receive timely updates on outages and recovery status. • Enhance monitoring and alerting systems to detect and report anomalies in a more granular manner for external dependencies.
This incident has been resolved.
We are observing increased errors for PSN APIs because of a third party outage and are currently monitoring.
We are observing increased errors for the LoginWithPSN APIs because of a third party outage and are currently monitoring.
Report: "PlayStream Processing Delay"
Last updateOn February 12th, 2025, between 11:09 AM and 12:14 PM PST, some customers experienced delays in the updating of Leadership dashboards due to an issue with the PlayStream processor. The incident was caused by a failed authentication error from a network configuration change which was not correctly assigned to the managed identity. We resolved the issue by deleting the stats processor pods in the partially created cluster and ensuring the monitor reported healthy status. ### Impact The delay in updating Leadership dashboards lasted 1 hour and 4 minutes, affecting the PlayStream processor's ability to update its processing status. ### Root Cause Analysis The root cause of this incident was a human error in configuration. A new rollout was initiated earlier in the day, but the cluster was not fully created and the deployment should have been for an earlier version. This incomplete status led to missing role assignments and managed identities, resulting in authentication errors in the stats processor. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: · We have improved our testing and validation procedures for network configuration changes to catch such bugs before they reach production. · We have enhanced our monitoring and alerting systems to detect and report any anomalies in the load balancer's behavior and performance. · We investigated and fixed the health probe for the PlayStream processors to ensure proper assignment of managed identities.
There was a playstream incident between 10:50 AM and noon, which caused a delay in updating the leaderboard dashboards. The issue was resolved at noon, and processing has returned to normal.
Report: "Reduced API availability"
Last updateOn January 22, 2025, between 10:44 AM and 11:15 AM PST, some customers experienced increased latency in PlayFab's API. The incident was caused by a network configuration issue during the migration to new Redis instances, which resulted in ports being blocked. We resolved the issue by rolling back to the previous Redis cluster and restarting the pods. ### Impact The APIs experienced increased latency; however, the availability remained above the Service Level Objective \(SLO\). ### Root Cause Analysis The issue was caused by the migration to new Redis instances, which resulted in the use of ports that were not included in the exclusion list. The issue was not detected sooner because the alert was set as severity 4 and was not noticed immediately. Availability numbers were not impacted by the change. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: · Exclude the full range of Redis ports. · Improved our testing and validation procedures for network configuration changes to catch such issues before they reach production. · Improved deployment process of infrastructure changes by rolling out updates to a subset of users
This incident has been resolved yesterday afternoon. Apologies for the late update.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Customers are experiencing reduction in API availability and increased latency since an infrastructure upgrade. We are investigating and preparing to roll back the infrastructure change.
Report: "Reduced availability of Group Apis"
Last updateThere was a dip in availability of the following apis from 8.46am PST to 9 am PST. - Group/ListMembershipOpportunities - Group/ListGroupInvitations
Report: "Legacy Multiplayer APIs unavailable"
Last updateThis incident was resolved at 23:51 UTC.
Affected APIs: - Matchmaker/RegisterGame - GameServer/SetGameServerInstanceTags - GameServer/RefreshGameServerInstanceHeartbeat - GameAcquisition/Matchmake
Report: "API execution times degraded"
Last updateThis incident has been resolved, API response times have returned to normal.
A fix has been deployed, we're monitoring the service.
We are aware of and investigating an issue that is causing API execution times to be slowed.
Report: "Increased error rates for multple Economy V2 APIs, increased delayed for PlayStream events, and increased delay for maketplace APIs"
Last updateOn January 8th, 2025, between 5:30 PM and 9:02 PM PST, some customers experienced delayed playstream processing and increased error rates for economy requests when accessing PlayFab's API. The incident was caused by network connectivity issues from backend services. We resolved the issue by migrating traffic to a health backend. ### Impact The incident resulted in delayed playstream processing and increased error rates for economy requests. Additionally, the Economy V2 SLA dipped to 99% reliability during the impact period. ### Root Cause Analysis The root cause of the incident was identified as network connectivity errors from a specific backend. The connectivity issues were not caused by any recent changes. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: · Enhanced our monitoring and alerting systems to detect and report any anomalies in the load balancer's behavior and performance. · Automatic failure when backend services are impacted
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
Issue is under investigation. Multiple Economy APIs have degraded performance, including Inventory APIs, redemption APIs and PlayStream for Economy
Report: "Increased ConcurrentEditError Response Rate When Accessing User Title Data"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Some titles are experiencing an increased ConcurrentEditError response rate when accessing user title data since 1/13. We have identified the cause for this issue, and are in the process of resolve it.
Report: "Errors when creating or editing Data Connections"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating an issue where customers may receive an error when using the PlayFab portal to create or edit a Data Connection for their title. Existing Data Connections are not impacted.
Report: "Player data unavailable for a small subset of players"
Last updateFor five days between January 10, 2025 and January 15, 2025, accessing data for player profiles with custom data attachments that were "null" and set to public permissions would fail with an "InternalServerError". These now return successfully.
Report: "January 13 reports are delayed"
Last updateThis incident has been resolved. We identified the issue and reprocessed the impacted reports.
We are currently investigating an issue with processing analytics reports for January 13.
Report: "Increased Economy V2 Search Latency"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Processing delays impacting scheduled tasks."
Last updateOn December 11th, 2024, between 9:20 AM and 11:20 AM PST, some customers experienced processing delays with PlayFab's scheduled tasks. The incident was caused by a bad configuration change. The issue was resolved by reverting the configuration change. ### Impact During the incident, the scheduled task processor failed to process any messages for approximately 2 hours. Scheduled tasks queued to run during this time were delayed or did not trigger, but no customer data was lost. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: * Created a repair item to update our mock unit tests to catch regressions related to this configuration change. * Investigated why end-to-end tests and integration deployment did not catch this issue. * Added a new production alert for no messages processed within a specified time frame. * Adjusted existing production alerts to trigger faster when no scheduled tasks/messages are processed.
We observed delays in processing scheduled tasks between 9:30 AM - 11:20 AM PST. During this time, scheduled tasks that were scheduled to execute may have not executed. This incident is now resolved.
We are investigating an issue impacting message processing related to scheduled tasks.
Report: "Event Export jobs to AWS S3 are paused."
Last updateOn December 9th, 2024, between 3:00 PM and 11:00 AM PT the next day, some customers experienced issues with corrupted export files when accessing PlayFab's event data export to S3 buckets. The incident was caused by a bug introduced during an update to legacy code. We resolved the issue by deploying a hotfix and reprocessing the corrupted data. ### Impact There were 58 titles affected by this incident, specifically those configured for event exports to S3 buckets. The exports contained invalid characters, causing downstream parsing and decompression issues. The affected data was backfilled successfully by December 11th, 2024, at 7:00 PM PT. During the mitigation, exports to S3 were paused to prevent further impact. ### Root Cause Analysis The bug in the export process was introduced during an update to legacy code, which led to additional padding bytes being included in the export data. The codebase had not been actively maintained and lacked end-to-end tests, leaving the bug undetected during manual testing. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: * Enhanced our monitoring and alerting systems to detect anomalies in export data. * Refactored the code for downloading and uploading blobs to S3. * Added end-to-end tests for exports to blob and S3. * Created tools for backfilling corrupted data.
Reprocessing of S3 Event Exports for the period between Dec 9th, 1:00 PM PST, and Dec 10th, 6:00 PM PST has been completed. Customers are advised to check their S3 buckets for the updated data.
A fix has been deployed and we have resumed processing S3 Event Exports. The engineering team is working on going back to reprocess exports that may have had missing or corrupted data between Dec 9th, 1pm PST and Dec 10th, 6pm PST. We will post additional updates when reprocessing is completed.
The issue has been identified and a fix is being implemented.
We have identified an issue with Event Export jobs to S3 where some uploads contain invalid characters that may cause issues with parsing or decompressing the contents. S3 Event Export jobs are being paused and data is queued while we investigate and deploy a fix, at which time jobs will resume.
Report: "Economy V2 increase in Service Unavailable errors"
Last updateOn December 5th, 2024, between 5:48 AM and 8:30 AM PST, some customers experienced intermittent errors and delays when accessing PlayFab's Catalog API. The incident was caused by an isolated issue which resulted in connection failures to the Catalog APIs. We resolved the issue by restarting the problematic instance and applying an ad-hoc fix. ### Impact During the incident, customers faced reduced Catalog API availability. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: * Enhanced our monitoring and alerting systems to detect and report any network anomalies.
Economy v2 Catalog APIs experiencing an increase in 503 Service Unavailable errors
Report: "Economy V2 Service Availability"
Last updateIssue has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "5XX error rate and higher latency across many PF APIs"
Last updateHigh number of 5XX errors and higher latency across multiple PlayFab APIs
Report: "Economy V2 APIs timing out"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Data Explorer advanced not working for some titles"
Last updateOn November 14th, 2024, between 13:07 PST and November 15th, 2024 at 02:48 PST, some customers experienced issues accessing player data from the Data Explorer pages in PlayFab's Game Manager. The incident was caused by a misconfigured internal URL in the service powering the Data Explorer page. The issue was resolved by a configuration update that corrected the URL. ### Root Cause Analysis The incident was triggered by a recent service update that changed how Game Manager sends requests to the PlayFab Insights databases. A bug in the code caused the incorrect domain to be used for some titles, resulting in errors when users tried to query past events. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: * Fixed the bug in the code to ensure the correct domain configuration
Between 2024-11-13 06:40 UTC and 2024-11-15 00:45 UTC, new titles and titles that reset their query connections may have been unable to query data using Data Explorer advanced. Data Explorer basic was not impacted by this issue. There was no impact to the underlying data or data ingestion during this time.
Report: "Developer authentication to PlayFab UI"
Last updateThe incident has been resolved.
A fix has been implemented, we're monitoring now.
We're aware of an issue with authentication to the PlayFab web interface, we believe the source of the issue is understood and are attempting to mitigate.
Report: "Increased API Error Response Rate for Redemption"
Last updateOn November 11th, 2024, between 4:30 PM and 6:30 PM PST, some customers experienced a high rate of internal server errors when accessing PlayFab's Inventory/Redeem APIs. The incident was caused by a new deployment that introduced a code defect, resulting in increased 500 error responses. We resolved the issue by rolling back the deployment. ### Root Cause Analysis The root cause of the incident was a human error in the code, where an incorrect config was deployed leading to new Redeem API errors. ### Action Items · To prevent similar incidents from happening again, we improved our testing and validation procedures
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
Most Redeem APIs for Economy V2 have been affected due a recent deployment and present degraded performance. We have identified the issue and are currently rolling out a fix.
Report: "Errors in Data Explorer Search"
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are experiencing errors in Data Explorer Search. We are investigating.
Report: "Delay in PlayStream Entity Events"
Last updateThe issue has been resolved. PlayStream services should be operating normally.
We have deployed the fix for the identified issue. Engineers are continuing to monitor to ensure there are no further issues.
The issue has been identified and a fix is being implemented.
We are currently investigating an issue that is causing delays in PlayStream entity events since 2024-11-12 23:40 PST (2024-11-13 07:40 UTC)
Report: "Delayed transaction history for Economy V2"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Reports and Trends reporting incorrect metrics for 11/2/2024"
Last updateThe issue has been resolved and all services should be operating normally.
We are currently investigating this issue.
Report: "Data from Insights S3 Exports is delayed"
Last updateStarting on 2024-10-29 at 19:15 UTC until 2024-10-30 18:15 UTC, some customers experienced delays in data exports to S3. This incident was caused by a misconfiguration in API routing that impacted the export service. The issue was resolved by correcting the configuration used for API routing. ### Impact All Insights data exports to S3 were delayed due to errors in the export service. There was no data loss, the export service continued to retry until success. Data caught up in less than an hour after the configuration was fixed. ### Root Cause Analysis The root cause of the incident was a human error in configuration. ### Action Items To prevent similar incidents from happening again, we have taken the following actions: · Improved our monitoring and alerting systems to detect and report anomalies in the export service's API requests.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating delays exporting Insights data to S3. Impact started around 2024-10-29 19:00 UTC.
Report: "Performance Issues for Economy V1 Catalog Game Manager Page"
Last updateSlow performance on the Economy v1 catalog page has been resolved.
A fix has been implemented and we're monitoring the results.
We have identified a performance issue on the Economy V1 catalog GameManager page, where catalogs with more than 300 items have difficulty loading.
Report: "Incomplete Title Overview data in Game Manager"
Last updateThe incident has been resolved; dashboards now load as expected.
The issue has been identified and a fix is being implemented.
We are currently investigating an issue that is preventing some data from loading in the Title Overview page in Game Manager.
Report: "Multiplayer servers are experiencing issues - [Region: South Central US]"
Last updateThe issue has been resolved and all services should be operating normally.
The issue is now resolved
Quick update that this issue still persists and that we recommend that customers to use other regions (such as East US) instead of South Central US at this time.
This issue continues to persist. We are recommending that customers to use other regions (such as East US) instead South Central US at this time.
We are continuing to work on a fix for this issue.
We are currently experiencing service issues due to limited compute capacity availability in the ‘South Central US’ Azure region. The team is actively working to resolve this issue and restore full service as quickly as possible. We apologize for any inconvenience this may cause and appreciate your patience.