Historical record of incidents for Pandium
Report: "Services inaccessable"
Last updateAll services are inaccessible or responding slowly due to an issue with our cloud hosting provider. We will update as soon as we know more.
Report: "Some runs firing with delays"
Last updateThis issue has been resolved and runs are processing normally.
We have identified the issue and deployed a fix.
Some runs in the US region are firing with delays. We are investigating.
Report: "Runs processing with delays"
Last updateThis issue has been resolved.
A fix has been implemented and we are monitoring recovery
The issue has been identified and a fix is being implemented.
The issue with runs has been identified and we are implementing a fix. No other services are affected.
Runs for some US-Based accounts are processing with delays. We are investigating.
Report: "Runs slowed"
Last updateThis incident has been resolved.
We are currently investigating an issue in which some runs are firing with delays.
Report: "Some runs processing with delays"
Last updateThis incident has been resolved and we are continuing to monitor
The issue has been identified and we are working on a fix
We are investigating.
Report: "Runs slowed"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring.
Runs are starting with delays as we investigate the underlying cause.
Report: "Runs degraded"
Last updateThe underlying service issue causing instability has been resolved and runs are firing. We are continuing to monitor for the next few hours and will open a new incident if necessary.
A fix has been implemented and runs are recovering. We are closely monitoring and will update when fully resolved.
The Pandium Integration Hub is fully operational however jobs are still running with delays. We will provide an update as soon as possible.
We are continuing to work on a fix for this issue.
We have identified that there is an issue with our underlying cloud hosting provider and we are working with them to implement a fix. They have escalated this issue and we will provide an update as soon as possible.
A fix has been implemented and we are monitoring recovery.
After recovery, we are experiencing a different issue with our underlying platform. We are investigating
We are continuing to monitor for any further issues.
The released fix was effective and we are monitoring platform recovery.
The issue has been identified and a fix has been released
We have identified the issue and are investigating a fix.
We are continuing to investigate this issue.
Some runs are firing slowly due to an intermittent issue with token management. We are investigating.
Report: "API responses exceeding error limits."
Last updateWill Continuing to monitoring. All systems green.
We have increased replicas and are seeing a reduction in error counts. Monitoring.
API responses exceeding error limits. We are investigating.
Report: "Degraded Performance"
Last updateScheduled run runs have caught up, and we have not experienced any more issues with control plane.
A fix has been implemented. We are now monitoring recovery.
The underlying issue has been identified. Ectd cluster of the control plane is having a fit. Working on mitigation.
Issue is with the control plane of underlying hosting provider. We are working with them on mitigation.
Underlying Control Plane under high load. Investigating. Run and API are running slow.
Report: "Runs were failing due to inability to access secrets"
Last updateRuns were failing across the platform due to a service that was stuck in a terminating state.
Report: "Runs Degradation"
Last updateThis incident has been resolved and we're monitoring to ensure performance.
Activity reports are slow to load via the UI. There is no issue with runs or integrations.
Runs are currently processing slower than normal. We are currently addressing the issue and will update shortly.
Report: "Runs degraded after core system restart"
Last updateThe underlying issue that caused webhook runs to be queued is resolved. New manual, scheduled and webhook runs are all functioning normally while queued webhook payloads are being processed.
While a core system restarted, runs were saved and queued, including webhook payloads. They are currently processing and we expect this issue to resolve shortly. We are continuing to monitor.
Report: "Some APIs degraded"
Last updateThis incident has been resolved.
We are monitoring a fix.
We have identified an issue and are implementing a fix.
Some front-ends are unavailable and we are investigating. Runs are not affected.
We are investigating this issue.
Report: "Degraded APIs"
Last updateThis issue has been resolved.
A fix has been implemented and we are monitoring.
We have identified an issue we believe is the cause for the API instability.
Update: All jobs are continuing to run.
We are experiencing intermittent issues with our APIs. We are investigating the cause.
Report: "Some marketplaces degraded"
Last updateThis issue is resolved.
A fix has been implemented and we are monitoring
We are currently investigating this issue. Runs are not affected.
Report: "Degraded APIs"
Last updateThis incident has been resolved.
The API is recovering and we will continue to monitor.
Some pages may be slow to load. We are working to recover.
Report: "Some sites inaccessible"
Last updateThis issue has been resolved.
We identified an issue where some sites were inaccessible. We implemented a fix and are monitoring.
Report: "Marketplace and admin unavailable"
Last updateThis issue has been resolved.
We are continuing to investigate this issue.
Some instances of the admin dash and marketplace are unavailable. We are investigating this issue
Report: "API Slowness"
Last updateThis incident has been resolved.
After an earlier incident, we are continuing to monitor as we are experiencing intermittent API slowness.
Report: "Degraded APIs"
Last updateThis issue is resolved.
We are continuing to monitor for any further issues.
All front-ends have recovered. We are continuing to monitor.
We have identified the issue and have implemented a fix. APIs are recovering.
We have identified an issue with a core service and are investigating. Scheduled jobs are continuing to run. We will update when we have more information.
Report: "Marketplace unavailable"
Last updateThis issue has been resolved. Apologies for the inconvenience.
A fix has been implemented and we are monitoring.
We have identified the issue and are deploying a fix.
The in-app marketplace is unavailable for some customers. We are actively investigating. Runs are continuing but the UI is unavailable.
Report: "Runs failing"
Last updateRuns are successfully running. We will continuing to investigating root causes, but immediate issue has been resolved.
A fix has been deployed that has mitigated the issue. There seems to have been a issue where stale configs have been applied. We are continuing to monitor as runs come back online.
This issue has been identified and we are working on a fix.
We have restored a key service and are continuing to investigate the underlying issue
We are continuing to investigate this issue.
We are currently investigating an issue that is causing some runs to fail.
Report: "API is down"
Last updateThis incident has been resolved.
Some scheduled syncs may of been skipped.
We are continuing to monitor for any further issues.
We are continuing to monitor for any further issues.
We have resolved all issues with API. We are currently monitoring!
We are continuing to work on a fix for this issue.
We are experiencing issues with our API. We have an identified the issue and are working on a mitigation.
Report: "Frontend Outage"
Last updateIssue seems to be resolved. We will keep an eye on it.
Major outage of Admin and Marketplace Dashboards caused by a Google Networking issue. We are actively monitoring.,
Report: "Degraded API and Dashboard Experience"
Last updateWe were experiencing higher then normal error rates, but all is back to normal.
Report: "This is an example incident"
Last updateWhen your product or service isn’t functioning as expected, let your customers know by creating an incident. Communicate early, even if you don’t know exactly what’s going on.
Empathize with those affected and let them know everything is operating as normal.
As you continue to work through the incident, update your customers frequently.
Let your users know once a fix is in place, and keep communication clear and precise.