Is Frontegg Down Right Now? Discover if there is an ongoing service outage.

Frontegg is currently Operational

Last checked Jul 29, 2025 14:53 UTC from Frontegg's official status page

Historical record of incidents for Frontegg

Jul 4, 2025

Report: "[EU Region] - Degarded Performance for EU Hosted Login"

Last update 2025-07-04T11:56:20.896Z

investigating2025-07-04T11:56:20.889Z

We are currently investigating this issue.

May 28, 2025

Report: "Service Outage"

Last update 2025-05-28T07:20:14.354Z

postmortem2025-05-28T07:14:41.985Z

Root Cause Analysis \(RCA\): DDOS Attack Incident ‌ **Incident Summary** On May 23, 2025, between 16:53 to 17:16 UTC, our service in the Europe region experienced a temporary outage due to a sophisticated DDOS attack. Despite mitigation efforts by Cloudflare, the scale and speed of the attack overwhelmed our system's autoscaling capabilities, leading to service unavailability for a short period. ‌ **Timeline of Events:** * **16:53 UTC: DDOS attack begins.** * **16:54 UTC: Monitoring system alerts the on-call team.** * **17:03 UTC: On-call team identifies the DDOS attack.** * **17:10 UTC: Attack characteristics scoped.** * **17:15 UTC: Blocking and rate limit rules applied.** * **17:16 UTC: Service recovers.** ‌ **Root Cause** The attack's high volume and rapid escalation exceeded our system's ability to scale automatically in time, causing service disruption. ‌ **Incident Resolution & Next Steps:** To resolve the incident, we took the following actions: * We successfully blocked malicious traffic and hardened our defenses. * Preventive measures are being implemented, including enhancing CDN, infrastructure autoscaling, automated tools to identify attacks faster, and DDOS protection in collaboration with Cloudflare.

resolved2025-05-23T17:16:00.000Z

This incident has been resolved. Timeline of Events: 16:53 UTC: DDOS attack begins 16:54 UTC: Monitoring system detects service degradation and alerts the on-call team. 17:03 UTC: On-call team identifies that a DDOS attack is ongoing. 17:10 UTC: On-call team scopes the characteristics of the attack (volume, source IPs, and traffic patterns). 17:15 UTC: The on-call team applies blocking and rate limit rules on Cloudflare to mitigate the attack. 17:16 UTC: System recovers and service is restored.

May 23, 2025

Report: "Service Outage"

Last update 2025-05-23T17:17:00.000Z

Monitoring2025-05-23T17:17:00.000Z

A fix has been implemented and we are monitoring the results.

Investigating2025-05-23T17:15:00.000Z

We're currently investigating an issue affecting some users. Our team is working to identify the cause and will provide updates as we learn more.

Report: "Service Degradation"

Last update 2025-05-23T17:17:00.000Z

Monitoring2025-05-23T17:17:00.000Z

A fix has been implemented and we are monitoring the results.

Investigating2025-05-23T17:15:00.000Z

We're currently investigating an issue affecting some users. Our team is working to identify the cause and will provide updates as we learn more.

Apr 5, 2025

Report: "EU environment issues"

Last update 2025-04-05T22:35:38.199Z

resolved2025-04-05T22:35:38.158Z

This incident has been resolved.

monitoring2025-04-05T22:04:12.259Z

The fix has been rolled out, and all indications are positive. Backoffice sync will be delayed

identified2025-04-05T21:58:44.163Z

The fix has been implemented and is being rolled out.

identified2025-04-05T21:31:53.393Z

We are continuing to work on a fix for this issue.

identified2025-04-05T21:23:54.200Z

We have identified the issue and implemented a fix.

investigating2025-04-05T21:16:04.134Z

We are currently investigating this issue.

Report: "EU environment issues"

Last update 2025-04-05T21:16:00.000Z

Investigating2025-04-05T21:16:00.000Z

We are currently investigating this issue.

Report: "Infrastructure Upgrades"

Last update 2025-04-05T20:00:00.000Z

Scheduled2025-04-05T22:00:00.000Z

We will be performing scheduled maintenance on our infrastructure during this time.

In progress2025-04-05T20:00:00.000Z

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Feb 9, 2025

Report: "US Environment Degradation - Potential 504 Errors"

Last update 2025-02-09T14:31:39.980Z

resolved2025-02-07T22:17:05.691Z

This incident has been resolved.

monitoring2025-02-07T19:49:36.810Z

We are continuing to monitor for any further issues.

monitoring2025-02-07T19:42:42.367Z

We are currently implementing a patch to improve system performance. Some services may experience temporary disruptions

monitoring2025-02-07T19:30:58.422Z

A fix has been implemented and we are monitoring the results.

identified2025-02-07T19:20:38.000Z

The issue has been identified and a fix is being implemented

investigating2025-02-07T18:48:35.000Z

We are currently investigating the issue.

Dec 26, 2024

Report: "Increased Latency in EU region"

Last update 2024-12-26T07:11:15.612Z

resolved2024-12-26T07:11:15.593Z

This incident has been resolved.

monitoring2024-12-26T06:23:16.074Z

A fix has been implemented and we are monitoring the results.

investigating2024-12-26T06:21:37.000Z

Monitoring

investigating2024-12-26T06:10:05.924Z

We are currently investigating this issue.

Sep 16, 2024

Report: "Email service"

Last update 2024-09-16T08:48:07.116Z

resolved2024-08-06T18:16:08.601Z

The incident is resolved. Email should be sent now.

investigating2024-08-06T17:50:49.759Z

We are working with our email provider on a solution at the moment

investigating2024-08-06T16:43:10.763Z

Some emails are not getting sent. For example Magic code and Magic link emails. We are investigating with our email provider

Sep 11, 2024

Report: "[EU Region] - Sporadic System Latency for Traffic Originating from IL"

Last update 2024-09-11T11:32:57.576Z

resolved2024-09-11T11:32:57.561Z

This incident has been resolved.

monitoring2024-09-11T09:56:37.134Z

A fix has been implemented and we are monitoring the results.

investigating2024-09-11T09:45:09.832Z

We are currently investigating this issue.

Jul 30, 2024

Report: "Increased reports in issues loading Hosted Login Page"

Last update 2024-07-30T21:16:59.454Z

resolved2024-07-30T21:16:59.441Z

This incident has been resolved.

monitoring2024-07-30T14:26:30.000Z

We are monitoring the issue and in contact with Azure

investigating2024-07-30T13:38:08.000Z

We are receiving reports on sporadic issues to loading the hosted login page for some users - it does not appear to be widely affecting usage, the team is currently investigating. The issue appears to be due to an Azure incident affecting our CDN service.

Jul 26, 2024

Report: "EU Degraded State - Partial Outage"

Last update 2024-07-26T14:00:53.150Z

postmortem2024-07-26T14:00:47.574Z

# **Root Cause Analysis \(RCA\) Report** **Date and Time**: July 24, 2024**Duration**: 22 minutes **Affected Services**: Authentication and core services**Impact**: Customers in the EU region were hanging and returned as 504 timeouts**Reported By**: Internal monitoring systems and customers \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ **Executive summary:** On Wednesday, July 24th, at 08:43 GMT, Frontegg's internal monitoring systems indicated that the API Gateway encountered an issue following the deployment of a new OpenTelemetry propagator \(OTEL instrumentation\), causing service disruptions in the EU. As a result, some of our customers were experiencing timeout errors \(HTTP status 504\) returned by Frontegg. During the upgrade of our API Gateway, Frontegg also updated the OpenTelemetry library. This update inadvertently caused the system to send data one piece at a time instead of using efficient batches due to a misconfiguration in the data handling settings. OTEL transmitted millions of traces individually rather than in aggregated batches. Although our system was rigorously tested under various conditions, the high load in the EU environment caused our auto-scaling mechanism to lag behind the incoming traffic. This led to the API gateway being overwhelmed by the volume of client requests. ‌ **Cause Analysis:** The primary cause of the incident was the deployment of a new OTEL instrumentation in the API Gateway, which led to a significant increase in trace data volume. Contributing factors included: * The API Gateway's OTEL was configured with the BasicPropagator instead of a BatchPropogator, sending each trace as part of the flow. * The fast rise of HTTP requests to the OTEL collector overloaded the API gateway to handle incoming requests. Although it was autoscaled, it lacked in response to the number of requests. * With the increase of traces being sent, the OTEL Collector failed to handle millions of traces at such a rate, increasing the request handling time, which caused another increase in API-gateway HTTP requests ‌ **Customer Impact** During the incident, customers in the European region experienced significant service degradation. Specific issues included failures in hosted login monitors and general service instability. ‌ **Mitigation and resolution:** Upon receiving the initial alerts, the Frontegg team began investigating the issue promptly. After identifying the problem with the OTEL propagator and collector, we increased the allocated resources and reverted to the latest working version. Following the implementation of this change, the systems returned to normal operations. **Mitigation**: * Increased the CPU allocation for the OTEL Gateway to handle the increased workload. * Revert to the latest Api-gateway version. **Resolution**: * Restarted the API Gateway to clear hanging requests and stabilize the OTEL Gateway. * Deployed a new version of the API gateway with the correct configuration ‌ **Prevention and Future steps:** Enhance OTEL Propagator: Implement batch processing, asynchronous handling, and strict timeouts. * **Upgrade OTEL Gateway**: Allocate additional resources to the OTEL Gateway and implement autoscaling to handle increased workloads effectively. * **Implement Aggressive Timeouts**: Implement stringent timeout policies for all HTTP requests that are not customer-related. This measure will proactively prevent delays and mitigate the risk of unresponsive requests. * Stress tests: change the deployment pipeline to include stress testing instead of the nightly testing suite. ‌ **Communication:** **Enhance Status Page Communication**: Ensure the status page provides clear and timely updates during incidents. Develop and maintain standardized templates for incident communication to facilitate prompt and consistent information, even if the root cause is not immediately identified.

resolved2024-07-24T10:58:00.432Z

This incident has been resolved.

monitoring2024-07-24T10:37:27.425Z

We are continuing to monitor for any further issues.

monitoring2024-07-24T09:47:13.203Z

We are continuing to monitor for any further issues.

monitoring2024-07-24T09:13:43.657Z

A fix has been implemented and we are monitoring the results.

identified2024-07-24T09:06:07.036Z

The issue has been identified and a fix is being implemented.

investigating2024-07-24T08:51:54.503Z

We are currently investigating this issue.

Report: "US Degraded State - Partial Outage"

Last update 2024-07-26T14:00:26.895Z

resolved2024-07-24T10:57:45.464Z

This incident has been resolved.

monitoring2024-07-24T10:37:35.369Z

We are continuing to monitor for any further issues.

monitoring2024-07-24T09:40:58.226Z

We are continuing to monitor for any further issues.

monitoring2024-07-24T09:20:09.373Z

A fix has been implemented and we are monitoring the results.

investigating2024-07-24T09:14:20.807Z

We are currently investigating this issue.

Jun 5, 2024

Report: "US region services partial outage"

Last update 2024-06-05T09:05:01.246Z

postmortem2024-06-04T16:13:41.402Z

## **Executive summary:** On June 3rd, at 12:06 GMT, The Frontegg team received an indication from our monitoring system of increased latency for refresh token requests \(average greater than 750 ms\) in our US region. Starting at 12:12 GMT, the first customer reached out to frontegg noting request timeouts. At 12:13 GMT, we updated our status page and officially began the investigation. As a preliminary measure the team began a number of different mitigation actions in an attempt to remedy the situation as quickly as possible. After seeing no improvement, at 12:30 GMT the team began a full cross-regional disaster recovery protocol. At 12:40 GMT we also began a same-region disaster recovery protocol \(starting a new same-region cluster\) as part of the escalation to ensure a successful recovery. At 13:25 GMT we began to divert the traffic to the new same-region cluster and by 13:30 we saw a stabilization of traffic to Frontegg. Upon further investigation, we discovered the root cause to be a networking issue inside our main cluster which caused a chain reaction affecting the general latency of the cluster. Additionally we are working with our cloud provider to gather additional details on the event from their side. ## **Effect:** From 12:06 GMT to 13:30 GMT on June 3rd, Frontegg accounts hosted in our US region experienced a substantial latency to a significant part of identity-based requests on Frontegg. This meant many requests were timed out, causing users to be unable to login or refresh their tokens. Additionally, access to the Frontegg Portal was partially blocked due to this issue. ## **Mitigation and resolution:** Once the Frontegg team received the initial alert to refresh latency, we began an investigation into our traffic, request latency, workload, hanging requests, and database latency. Upon finding inconclusive results, the team initiated a handful of mitigation efforts, such as: * At 12:14 GMT, we increased our cluster workload. * At 12:30 GMT the team began a full cross-regional disaster recovery protocol. * At 12:40 GMT we also began a same-region disaster recovery protocol \(starting a new same-region cluster\) as * By 13:00 GMT, we increased the number of Kafka brokers as an additional measure for mitigation. After a preliminary check on the new same-region cluster we began diverting traffic to the new cluster. By 13:30 GMT we saw a stabilization of traffic to this cluster and moved the incident to monitoring. We continued to monitor traffic for the next hour before resolving the incident. ## **Preventive steps:** * We are adding a same-region hot failover cluster for quick mitigation of P0 issues * We are fine-graining our rate limits on all routes within the system to add additional protection to our cluster health * We are working closely with our cloud provider to gather additional information on the event in order to increase the predictability of future events ‌ At Frontegg, we take any downtime incident very seriously. We understand that Frontegg is an essential service, and when we are down, our customers are down. To prevent further incidents, Frontegg is focusing all efforts on a zero-downtime delivery model. We apologize for any issues caused by this incident.

resolved2024-06-03T14:58:54.928Z

This incident has been resolved.

monitoring2024-06-03T14:01:38.530Z

We are continuing to monitor for any further issues.

monitoring2024-06-03T13:40:04.873Z

A fix has been implemented and we are monitoring the results.

investigating2024-06-03T13:01:06.871Z

We are continuing to investigate this issue.

investigating2024-06-03T12:13:51.000Z

We are currently investigating this issue.

May 28, 2024

Report: "[US Instance] - Authentication Service in degraded state"

Last update 2024-05-28T13:58:39.609Z

resolved2024-05-28T13:58:39.591Z

This incident has been resolved.

monitoring2024-05-28T13:58:33.189Z

We are continuing to monitor for any further issues.

monitoring2024-05-28T13:34:52.891Z

We are continuing to monitor for any further issues.

monitoring2024-05-28T13:29:25.330Z

A fix has been implemented and we are monitoring the results.

investigating2024-05-28T13:27:09.397Z

We are currently investigating this issue.

Report: "Email Service in Degraded State"

Last update 2024-05-28T13:22:50.435Z

resolved2024-05-28T13:22:50.409Z

This incident has been resolved.

monitoring2024-05-28T12:43:30.887Z

A fix has been implemented and we are monitoring the results.

identified2024-05-28T10:33:49.518Z

Frontegg's email sending service is experiencing issues - we've identified the issue and are working our service provider on a fix.

May 24, 2024

Report: "CA region - Portal access"

Last update 2024-05-24T07:05:13.156Z

resolved2024-05-24T07:05:13.142Z

This incident has been resolved.

monitoring2024-05-24T06:36:46.100Z

A fix has been implemented and we are monitoring the results.

investigating2024-05-24T05:39:06.268Z

We are continuing to investigate this issue.

investigating2024-05-24T05:35:32.096Z

Unable to access Canada's portal, we are investigating the issue

May 5, 2024

Report: "Degraded performance in the US region"

Last update 2024-05-05T20:38:06.034Z

resolved2024-05-05T20:38:06.018Z

This incident has been resolved.

monitoring2024-05-05T19:05:47.281Z

A fix has been implemented and we are monitoring the results.

investigating2024-05-05T19:01:47.737Z

We are currently investigating this issue.

Apr 25, 2024

Report: "EU region - System degraded performance"

Last update 2024-04-25T10:33:12.666Z

resolved2024-04-25T10:31:19.000Z

This incident has been resolved.

investigating2024-04-25T10:31:02.000Z

We are experiencing a system degradation, user login flows might be affected

Mar 27, 2024

Report: "[US Environment] - Backoffice opperating with degraded performance"

Last update 2024-03-27T16:44:28.171Z

resolved2024-03-27T16:44:28.160Z

This incident has been resolved.

identified2024-03-27T15:16:08.043Z

The issue has been identified and a fix is being implemented.

Mar 1, 2024

Report: "US region - Management APIs and MFA service degradation"

Last update 2024-03-01T19:13:10.045Z

resolved2024-03-01T19:13:10.028Z

This incident has been resolved.

monitoring2024-03-01T19:06:32.326Z

A fix has been implemented and we are monitoring the results.

investigating2024-03-01T18:52:33.706Z

We are experiencing degradation in MFA service

Jan 30, 2024

Report: "Webhooks performance degradation"

Last update 2024-01-30T21:26:30.248Z

resolved2024-01-30T21:26:30.234Z

This incident has been resolved.

investigating2024-01-30T21:10:36.426Z

We are continuing to investigate this issue.

investigating2024-01-30T21:10:18.703Z

We are currently investigating this issue.

monitoring2024-01-30T20:57:27.339Z

A fix has been implemented and we are monitoring the results.

investigating2024-01-30T20:21:40.103Z

We are currently investigating this issue.

Jan 11, 2024

Report: "Entitlements Service is in a degraded state"

Last update 2024-01-11T12:39:18.332Z

resolved2024-01-11T12:39:18.319Z

This incident has been resolved.

monitoring2024-01-11T12:00:55.762Z

A fix has been implemented and we are monitoring the results.

investigating2024-01-11T11:57:32.438Z

We are currently investigating this issue.

Dec 21, 2023

Report: "Management portal is partially available"

Last update 2023-12-21T18:28:09.019Z

resolved2023-12-21T18:28:09.004Z

This incident has been resolved.

monitoring2023-12-21T15:36:05.574Z

A fix has been implemented and we are monitoring the results.

investigating2023-12-21T15:24:51.914Z

We are continuing to investigate this issue.

investigating2023-12-21T15:22:52.001Z

We are continuing to investigate this issue.

investigating2023-12-21T15:22:00.034Z

We are continuing to investigate this issue.

investigating2023-12-21T15:21:48.460Z

We are currently investigating this issue.

Oct 31, 2023

Report: "Latency to Partial Traffic on US Cluster"

Last update 2023-10-31T09:10:07.391Z

resolved2023-10-30T14:00:00.000Z

For some traffic on US cluster there was a high latency which resulted in some users being unable to log in, or a 504 (timeout) response to Frontegg identity calls for roughly 1 hour.

Oct 29, 2023

Report: "Webhooks & Backoffice services in US cluster are in a degraded state"

Last update 2023-10-29T11:49:58.834Z

resolved2023-10-29T11:49:58.821Z

This incident has been resolved.

monitoring2023-10-29T10:55:12.641Z

A fix has been implemented and we are monitoring the results.

investigating2023-10-29T10:51:22.532Z

We are currently investigating this issue.

Oct 5, 2023

Report: "Frontegg services in EU cluster are in a degraded state"

Last update 2023-10-05T22:16:57.490Z

resolved2023-10-05T22:16:57.478Z

This incident has been resolved.

monitoring2023-10-05T22:16:42.866Z

We are continuing to monitor for any further issues.

monitoring2023-10-05T18:38:55.088Z

A fix has been implemented and we are monitoring the results.

Oct 4, 2023

Report: "Partial Service Degradation in EU Cluster"

Last update 2023-10-04T16:45:04.738Z

resolved2023-10-04T16:45:04.727Z

This incident has been resolved. RCA Investigation is ongoing at the moment.

monitoring2023-10-04T15:09:53.877Z

A fix has been implemented and we are monitoring the results.

investigating2023-10-04T15:00:40.412Z

We are currently investigating this issue.

Aug 22, 2023

Report: "DB upgrade on U.S. region"

Last update 2023-08-22T10:15:49.683Z

resolved2023-08-22T10:15:49.668Z

The maintenance has been successfully completed

monitoring2023-08-22T10:04:24.000Z

The maintenance was completed and we are monitoring the results.

identified2023-08-22T10:03:46.000Z

The maintenance is still in progress

identified2023-08-22T10:03:11.000Z

The maintenance is still in progress

identified2023-08-22T09:58:46.000Z

The maintenance is still in progress

identified2023-08-22T09:30:57.000Z

A maintenance procedure on our DB may result a intermittent 500s in identity flows. We applied a caching mechanism to make sure that no data during the flow will be lost in case of an error. * Note: the issue should not effect active refresh tokens

Jul 12, 2023

Report: "Partial Outage on Hosted Login Service"

Last update 2023-07-12T13:07:03.477Z

resolved2023-07-12T12:55:00.000Z

From 15:55 to 16:02 there was a partial outage on our hosted login service. The root cause is currently under investigation, but the issue has been mitigated.

Jul 6, 2023

Report: "Degraded performance on custom domains in the EU region"

Last update 2023-07-06T18:30:25.404Z

resolved2023-07-06T18:30:25.385Z

This incident has been resolved.

monitoring2023-07-06T17:50:59.258Z

A fix has been implemented and we are monitoring the results.

investigating2023-07-06T17:42:38.342Z

We are currently investigating this issue.

Jun 1, 2023

Report: "Frontegg Services are showing Degraded Performance in EU & US"

Last update 2023-06-01T14:28:32.095Z

postmortem2023-06-01T14:27:35.022Z

### **Executive summary:** On Wednesday May 31st, 2023, at 12:55 GMT we deployed a minor version to one of our services. Shortly after at 12:56 GMT, Frontegg’s US monitoring system started sending alerts for an authentication service which was not performing as expected, and the team immediately began investigating the issue. At 13:01 GMT we started getting alerts from Frontegg’s EU monitoring as well regarding the same service, shortly after, we started to get complaints from customers. At 13:04 GMT, 8 min after we started getting the alerts the team concluded that it was sourced by a recent change that was deployed. As part of the change, there was a database migration for one of our primary services. However, the migration job didn't run due to an edge race condition in our CD infrastructure, causing the service to remain in a schema mismatch state. At this point we immediately started a rollback process for both EU & US regions that was completed by 13:16 GMT. Once the rollback completed, we noticed that our services are working as expected again and customers also reported that they were no longer experiencing issues. ‌ **Affect:** Most requests to customers’ custom Frontegg domains resulted in 401/404 responses or inability to authenticate. For the EU region - between 12:59 to 13:16 GMT time.For the US region - between 12:56 to 13:14 GMT time ### **Mitigation and resolution:** Following the monitoring alerts the incident response team immediately identified the potential corrupted service and started rollback procedure with the previous successful deployment. ### **Preventive steps:** * We defined a gated process for deploying DB migration changes * A schema validation on service init to prevent schema mismatch cases was added * Will add deployment validation that will fail deployment if migration didn’t run * Will remove the high dependency in that specific service as a single-point-of-failure for the main system flows * Reduce service rollback time by running only relevant part of the CD pipeline

resolved2023-05-31T16:13:34.458Z

This incident has been resolved.

monitoring2023-05-31T13:18:11.719Z

A fix has been implemented and we are monitoring the results.

investigating2023-05-31T13:08:54.798Z

We are currently investigating this issue.

Report: "Degradation in Tenant API Token authentication"

Last update 2023-06-01T08:22:23.900Z

resolved2023-05-16T10:00:00.000Z

Users with Tenant API tokens that were with client credentials had an issue with authenticating the api token. The authenticate API token route was returning a 400 response. Access token API Tokens were fully opperational

May 31, 2023

Report: "Networking issue for US region working with Frontegg EU cluster"

Last update 2023-05-31T10:02:25.593Z

resolved2023-05-31T10:02:25.578Z

This incident has been resolved.

monitoring2023-05-31T07:56:38.111Z

We are continuing to monitor for any further issues.

monitoring2023-05-31T07:55:53.215Z

We are continuing to monitor for any further issues.

monitoring2023-05-31T07:10:39.000Z

We are working with our external services and providers on this issue.

monitoring2023-05-31T07:10:00.667Z

A fix has been implemented and we are monitoring the results.

investigating2023-05-31T06:29:45.485Z

We are currently investigating this issue.

Nov 9, 2022

Report: "Degraded performance in US region"

Last update 2022-11-09T21:39:51.485Z

resolved2022-11-09T21:39:51.467Z

This incident has been resolved.

monitoring2022-11-09T21:20:32.764Z

There might be delays in responding to some requests in the US region

Aug 16, 2022

Report: "Api are having Performance degradation"

Last update 2022-08-16T12:05:21.362Z

postmortem2022-08-16T12:05:05.854Z

### **Executive summary:** On August 15th, 2022 at 02:01 IST \(UTC \+2\) Frontegg underwent a sophisticated DDOS subdomain organized attack. The attackers used multiple servers spread across a variety of Digital Ocean IPs. Each Server executed a low number of requests per second so our WAF did not trigger rate-limiting rules, yet it was recognized that many of the paths were related to WordPress engine's known weakness. By 03:21 the attack had been successfully mitigated. At 04:46 a second organized attack began. The restrictions put in place by the previous attack were helpful in mitigating the second attack. By 05:30 all traffic returned to normal ### **Affect:** The incident caused a degraded performance to our API gateway. As a result, our API returned 504 and 524 errors to partial traffic over the course of the incident. The majority of these errors were returned between 02:01 IST and 02:30 IST, when our mitigation efforts began to have an effect. A majority of traffic was still able to go through without error during this time. ### **Mitigation and resolution:** Our initial response to the attack was to increase our rate limiting and WAF constraints. This initial step was implemented at 02:30 IST. Once we understood the level of sophistication and distribution of the attack, we implemented changes on the application level, including a different routing mechanism and added more specific WAF constraints based on origins of the attacking traffic, which took effect by 03:21 IST. ### **Preventive steps:** In order to prevent attacks like this in the future, we are implementing a more sophisticated route blocking mechanism to our API-gateway. Additionally we have reported the incident to the cloud provider which hosted a majority of the attacking traffic, and we are consulting with our WAF provider for further guidance on preventing such attacks.

resolved2022-08-15T00:39:35.893Z

This incident has been resolved.

monitoring2022-08-15T00:24:03.594Z

A fix has been implemented and we are monitoring the results.

investigating2022-08-15T00:23:46.322Z

We are continuing to investigate this issue.

investigating2022-08-15T00:00:16.289Z

We are continuing to investigate this issue.

investigating2022-08-14T23:54:50.657Z

We are currently investigating this issue.

Jul 5, 2022

Report: "Frontegg services are in a Degraded State"

Last update 2022-07-05T15:58:32.864Z

resolved2022-07-05T15:30:00.000Z

Frontegg services were in a degraded state causing some users to experience issues with their user login. The problem was fixed and it is now under close monitoring on our side.

Jun 21, 2022

Report: "Partial outage in Frontegg services for some regions due to Cloudflare major outage"

Last update 2022-06-21T07:30:43.152Z

resolved2022-06-21T07:30:43.138Z

This incident has been resolved.

monitoring2022-06-21T07:09:57.000Z

A fix was implemented and we have bypassed Cloudflare services

identified2022-06-21T06:43:14.448Z

Our DNS and WAF provider Cloudflare is partially down in some regions.

investigating2022-06-21T06:35:27.790Z

We are currently investigating this issue.

Jun 15, 2022

Report: "Management Portal in degraded state"

Last update 2022-06-15T16:10:46.091Z

resolved2022-06-15T16:10:46.078Z

This incident has been resolved.

monitoring2022-06-15T15:59:11.198Z

A fix has been implemented and we are monitoring the results.

investigating2022-06-15T15:33:09.697Z

We are currently investigating this issue.

May 31, 2022

Report: "US Region in partial outage"

Last update 2022-05-31T16:56:56.833Z

resolved2022-05-31T16:56:56.819Z

This incident has been resolved.

monitoring2022-05-31T16:25:13.503Z

A fix has been implemented and we are monitoring the results.

investigating2022-05-31T16:17:53.528Z

We are currently investigating this issue.

May 25, 2022

Report: "US region - Degraded performance in Frontegg services"

Last update 2022-05-25T19:32:52.921Z

resolved2022-05-25T19:32:52.906Z

This incident has been resolved.

monitoring2022-05-25T16:39:22.537Z

A fix has been implemented and we are monitoring the results.

investigating2022-05-25T16:28:22.514Z

We are continuing to investigate this issue.

investigating2022-05-25T16:27:56.597Z

We are currently investigating this issue.

May 14, 2022

Report: "Degraded performance on Frontegg services"

Last update 2022-05-14T12:36:02.914Z

resolved2022-05-14T12:36:02.894Z

This incident has been resolved.

monitoring2022-05-14T12:26:18.848Z

A fix has been implemented and we are monitoring the results.

investigating2022-05-14T12:19:59.550Z

We are continuing to investigate this issue.

investigating2022-05-14T12:19:41.537Z

We are currently investigating this issue.

Mar 25, 2022

Report: "Delay with sending webhooks"

Last update 2022-03-25T19:47:34.902Z

resolved2022-03-25T19:47:34.885Z

This incident has been resolved.

identified2022-03-25T17:47:40.076Z

We are working on a fix to reduce the delay. Will continue to update

identified2022-03-25T17:44:28.844Z

We are currently investigating reports on delays with sending webhooks

Report: "We have identified an issue with processing user operational emails"

Last update 2022-03-25T11:51:57.025Z

resolved2022-03-25T11:51:57.010Z

This incident has been resolved.

monitoring2022-03-25T11:10:13.950Z

A fix has been implemented and we are monitoring the results.

investigating2022-03-25T11:06:18.122Z

We are currently investigating this issue.

Mar 24, 2022

Report: "Delay with audit logs"

Last update 2022-03-24T20:36:57.572Z

resolved2022-03-24T20:36:57.560Z

This incident has been resolved.

monitoring2022-03-24T20:27:43.329Z

A fix has been implemented and we are monitoring the results.

identified2022-03-24T19:09:07.245Z

The issue has been identified and a fix is being implemented.

investigating2022-03-24T18:03:24.046Z

We are currently investigating reports from some of our customers for delay with Audit logs

Mar 22, 2022

Report: "Frontegg Portal is in a Degraded State"

Last update 2022-03-22T12:37:46.499Z

resolved2022-03-22T12:37:46.487Z

The incident has been resolved, all systems operational.

monitoring2022-03-22T12:25:22.937Z

A fix has been implemented and we are monitoring the results

Jan 31, 2022

Report: "Frontegg Portal is in a Degraded State"

Last update 2022-01-31T14:29:35.041Z

resolved2022-01-31T14:29:35.024Z

This incident has been resolved.

monitoring2022-01-31T14:18:28.231Z

We are continuing to monitor for any further issues.

monitoring2022-01-31T14:17:48.424Z

We are continuing to monitor for any further issues.

monitoring2022-01-31T14:13:58.682Z

A fix has been implemented and we are monitoring the results.

identified2022-01-31T14:03:57.937Z

The issue has been identified and a fix is being implemented.

Jan 23, 2022

Report: "Connectivity issues"

Last update 2022-01-23T20:29:28.240Z

resolved2022-01-23T20:25:00.000Z

We are currently investigating this issue.

Report: "Performance degradation in Portal"

Last update 2022-01-23T17:56:54.585Z

resolved2022-01-23T17:56:54.570Z

This incident has been resolved.

monitoring2022-01-23T17:49:42.181Z

A fix has been implemented and we are monitoring the results.

identified2022-01-23T17:49:38.165Z

The issue has been identified and a fix is being implemented.

investigating2022-01-23T17:43:39.359Z

We are currently investigating this issue.

Jan 5, 2022

Report: "Issues with hosted login"

Last update 2022-01-05T18:58:32.768Z

resolved2022-01-05T18:58:32.755Z

This incident has been resolved.

monitoring2022-01-05T18:53:46.764Z

We are continuing to monitor for any further issues.

monitoring2022-01-05T18:53:36.238Z

A fix has been implemented and we are monitoring the results.

identified2022-01-05T18:45:41.476Z

We are working on a fix. This does not affect customers using the embedded version of the Frontegg login

identified2022-01-05T18:44:48.285Z

The issue has been identified and a fix is being implemented.

investigating2022-01-05T18:41:16.150Z

We are currently investigating this issue.

Dec 30, 2021

Report: "US Server In a Degraded State"

Last update 2021-12-30T15:48:25.010Z

resolved2021-12-22T14:42:47.432Z

This incident has been resolved.

monitoring2021-12-22T13:28:33.636Z

A fix has been implemented and we are monitoring the results.

investigating2021-12-22T12:59:42.688Z

We are currently investigating this issue.

Nov 4, 2021

Report: "Frontegg services where in a Degraded Performance"

Last update 2021-11-04T15:41:20.224Z

resolved2021-11-04T15:00:00.000Z

During a maintenance operations on the database secure access services were in degraded performance state

Oct 20, 2021

Report: "Frontegg services are in a Degraded State"

Last update 2021-10-20T15:18:22.174Z

resolved2021-10-20T15:18:22.159Z

This incident has been resolved.

investigating2021-10-20T15:18:09.355Z

We are continuing to investigate this issue.

investigating2021-10-20T15:11:10.067Z

We are currently investigating this issue.

Jul 29, 2021

Report: "Frontegg services are in a Degraded State"

Last update 2021-07-29T16:48:48.860Z

resolved2021-07-29T16:48:48.846Z

This incident has been resolved.

monitoring2021-07-29T16:13:11.164Z

A fix has been implemented and we are monitoring the results.

investigating2021-07-29T16:02:45.689Z

We are currently investigating this issue.