Is Factorial Down Right Now? Discover if there is an ongoing service outage.

Factorial is currently Operational

Last checked Aug 5, 2025 11:46 UTC from Factorial's official status page

Historical record of incidents for Factorial

Jun 16, 2025

Report: "Scheduled Job Disruption"

Last update 2025-06-16T16:41:02.233Z

monitoring2025-06-16T16:41:02.221Z

We are currently experiencing an unexpected event that has resulted in the loss of some scheduled jobs. We are actively investigating the issue and working to restore normal operations as quickly as possible. We have identified the following impact: - Automatic Breaks: users may notice inconsistent information regarding automatic breaks. - Time Off and Expenses Reports: users may need to retrigger their time off and expenses reports to ensure accurate processing. - Other minor side-effects like missed notifications. We apologize for any inconvenience this may cause and appreciate your understanding as we work to resolve the issue. We will provide updates as more information becomes available. Thank you for your patience. Next Update: before 17/06/2025 12:00 CEST

May 24, 2025

Report: "Major outage"

Last update 2025-05-24T02:00:14.138Z

resolved2025-05-24T02:00:14.121Z

All systems are now fully operational. Thank you for your patience, and we apologize for any inconvenience caused.

monitoring2025-05-24T01:22:39.248Z

All systems are now back online. We're continuing to actively monitor the platform to ensure ongoing stability.

identified2025-05-23T23:54:18.825Z

The issue has been identified, and our Platform team is actively working on implementing a resolution.

investigating2025-05-23T22:00:24.602Z

We are currently experiencing a major outage affecting the Factorial application and its related services. As a result, the platform is temporarily unavailable. Our engineering team is actively investigating the issue and working to restore services as quickly as possible. We will provide updates as soon as more information becomes available. We sincerely apologize for the inconvenience this may cause and appreciate your patience and understanding.

May 23, 2025

Report: "Major outage"

Last update 2025-05-23T21:00:00.000Z

Resolved2025-05-23T21:00:00.000Z

All systems are now fully operational.Thank you for your patience, and we apologize for any inconvenience caused.

Monitoring2025-05-23T20:22:00.000Z

All systems are now back online. We're continuing to actively monitor the platform to ensure ongoing stability.

Identified2025-05-23T18:54:00.000Z

The issue has been identified, and our Platform team is actively working on implementing a resolution.

Investigating2025-05-23T17:00:00.000Z

We are currently experiencing a major outage affecting the Factorial application and its related services. As a result, the platform is temporarily unavailable.Our engineering team is actively investigating the issue and working to restore services as quickly as possible. We will provide updates as soon as more information becomes available.We sincerely apologize for the inconvenience this may cause and appreciate your patience and understanding.

May 16, 2025

Report: "Small amount of errors during a planned test operation"

Last update 2025-05-16T05:30:45.446Z

resolved2025-05-16T04:30:00.000Z

Today between 04:39 and 04:50 UTC there was an incident that resulted in a temporary increase in error rates. This happened during a planned test operation. Our team has promptly identified and resolved the issue, and is investigating the root cause to ensure it is not repeteated in upcoming tests. We appreciate your understanding.

Apr 24, 2025

Report: "Degraded latency and increased error rates - failing cache system"

Last update 2025-04-24T14:28:07.071Z

resolved2025-04-24T14:28:07.054Z

We have identified the root of the problem and have deployed fixes to restore the service performance. We apologize for the inconvenience caused.

investigating2025-04-24T14:03:22.313Z

We are investigating an issue where one of our cache nodes has become unresponsive, leading to a global performance degradation. We are working on restoring the service as soon as possible.

Mar 12, 2025

Report: "Demo environment temporarily unavailable"

Last update 2025-03-12T18:13:32.193Z

resolved2025-03-12T18:13:32.182Z

We experienced a temporary outage in our demo environment between 17:52 and 18:04 UTC today, following a maintenance operation. During this time, users may have encountered issues accessing the demo environment. Our team identified the root cause of the issue and quickly implemented a fix. The demo system is now fully operational. We are conducting a thorough review of the incident to prevent future occurrences.

Feb 7, 2025

Report: "Incorrect avatars displayed"

Last update 2025-02-07T09:23:35.065Z

resolved2025-02-06T11:30:00.000Z

On February 6, 2025, between 12:20 and 15:09, some components of the Factorial application may have displayed incorrect avatars due to an issue in our last release. The issue was reported at 14:48, and our internal team immediately resolved it. If you’re still experiencing issues, please refresh the page in your browser. If the issue persists you can reach us via your Customer Support Portal.

Feb 5, 2025

Report: "Slower response times and timeouts - advanced reports"

Last update 2025-02-05T13:04:04.593Z

resolved2025-02-04T17:28:45.000Z

The performance issue we started experiencing yesterday has been resolved. Please keep in mind, advanced reports are still unavailable. We will reenable them in the next 24h. Apologies for the inconvenience. Update: Reports have been reenabled on February 5th at 8:15 UTC

monitoring2025-02-04T12:05:06.888Z

We have restored the service, although the Reports functionality is currently unavailable.

identified2025-02-04T11:55:10.543Z

We have identified the source of the issue and we are applying a mitigation to restore service as soon as possible.

investigating2025-02-04T11:47:03.388Z

This seems an unrelated issue, but we are currently facing a major outage in the application. We are investigating and will update as soon as we have more details on the reason and the estimated time of recovery.

monitoring2025-02-03T17:23:08.076Z

A fix has been implemented and we are monitoring the results.

identified2025-02-03T13:23:13.394Z

Our application is experiencing slower response times than usual. This will be most noticeable at peak hours like 13:00 and 14:00 UTC, when some requests could even timeout and require re-submission. We are currently working on several fixes at both the application and infrastructure layer to restore our usual quality of service. We will update this incident as soon as they are deployed. Apologies for the inconvenience.

Jan 24, 2025

Report: "Unusual web traffic volume altering performance of the app"

Last update 2025-01-24T15:43:42.216Z

resolved2025-01-24T15:43:42.198Z

Traffic levels have been stable and back to the usual levels for a while.

monitoring2025-01-24T12:37:10.076Z

Our actions have stabilized the system and the application is now performing normally. While the number of requests has greatly reduced, we keep investigating to understand the origin of the spike.

investigating2025-01-24T11:41:42.670Z

As of 11:00 UTC / 12:00 CET, we have observed a significant increase in requests to our application, resulting in performance degradation. We are implementing mitigation measures to restore service quality while we keep investigating the root cause of the issue. We appreciate your patience and will provide further updates as we work to resolve this matter.

Jan 18, 2025

Report: "Service outage caused by unresponsive database"

Last update 2025-01-18T16:12:55.982Z

resolved2025-01-18T16:12:55.967Z

The lock has been cleared and the service has been restored. We will perform a more thorough investigation next week and set up the mechanisms and processes required to prevent this from happening again.

identified2025-01-18T15:31:37.567Z

The issue has been identified and a fix is being implemented.

investigating2025-01-18T15:29:47.238Z

We are currently investigating this issue.

Jan 13, 2025

Report: "Sporadic errors using the application"

Last update 2025-01-13T09:02:36.825Z

resolved2025-01-13T09:02:36.812Z

The fix deployed in the previous hour has resolved the issue. The application should work normally now. We keep investigating why this errors started appearing today and how they relate to a change that was published last Thursday. Apologies for the inconvenience.

identified2025-01-13T08:36:04.308Z

Some users are experiencing sporadic errors loading or moving around different features of the application. We have identified the potential root cause of the issue and have deployed a solution that should solve these errors.

Nov 25, 2024

Report: "German domain factorialhr.de unavailable"

Last update 2024-11-25T15:32:54.039Z

resolved2024-11-25T15:32:54.018Z

The issue has been resolved, factorialhr.de is now serving again our website.

identified2024-11-25T07:44:13.155Z

The issue is still ongoing, but we expect more information today when the German DNS registry resolves our open request.

identified2024-11-22T20:21:04.000Z

There is an issue with the German domain name (factorialhr.de). The public website and customer pages using it are currently unreachable. We have identified the problem and are working on a fix. Meanwhile, users can still access the application at https://app.factorialhr.com/ We apologize for the inconvenience caused. Next update: before Monday 25th Nov. 12:00 UTC

Report: "Increased latency, response times a lot higher than usual"

Last update 2024-11-25T07:46:08.859Z

resolved2024-11-25T06:00:00.000Z

Due to an issue with under-provisioned capacity the Factorial web application experienced highly increased latency resulting in very long loading times from 6:00AM to 7:00AM (UTC). Apologies for the inconvenience. - The Factorial Team

Nov 18, 2024

Report: "Service outage caused by an unresponsive database"

Last update 2024-11-18T15:57:44.311Z

resolved2024-11-18T15:57:44.298Z

We experienced a brief service interruption from 03:30 PM to 03:36 PM (UTC). During this time, our database experienced unresponsiveness, which impacted the availability of our service. To mitigate the issue, our team promptly killed the suspicious process to restore the database functionality instantly. All systems are now operating normally, and we'll apply preventive measures to prevent similar occurrences in the future. We appreciate your understanding and patience as we work through this process.

Nov 15, 2024

Report: "Sidebar visibility issue in Factorial web application"

Last update 2024-11-15T12:20:25.074Z

resolved2024-11-15T12:20:15.831Z

Since around 11:15 CET, the sidebar in the Factorial app hasn’t been visible, making it unusable. We identified the root cause but chose to quickly reduce the impact by rolling back to a previous version at 12:00 CET. Thanks for your patience, and sorry for the inconvenience! - The Factorial Team

Nov 10, 2024

Report: "Increased latency and error rates"

Last update 2024-11-10T10:16:07.346Z

resolved2024-11-09T09:48:40.000Z

For now the mitigation is working and the app is working normally. We continue investigating to find a permanent fix. Update: as of 9/11 midday UTC, the root cause has been addressed. We don't expect any further side-effects of this incident. Apologies for the inconvenience, Factorial team

identified2024-11-09T00:57:54.499Z

We have identified the root cause of the issue and are currently working on a fix.

investigating2024-11-08T22:33:28.842Z

We are continuing to investigate this issue.

investigating2024-11-08T19:55:21.015Z

We are experiencing increased latency and higher error rates since 18:55 UTC. Our team is investigating the origin of the issue. We will provide an update as soon as we have more information.

Oct 23, 2024

Report: "Performance regression"

Last update 2024-10-23T12:06:22.018Z

resolved2024-10-23T12:06:22.003Z

This incident has been resolved.

monitoring2024-10-23T10:57:19.860Z

A fix has been implemented and we are monitoring the results.

identified2024-10-23T08:39:58.512Z

We have been experiencing slow response times, leading to the app being unusable at times. We have identified the root of the issue and are now working on a solution. We expect it to be deployed during the next 2 hours.

Oct 17, 2024

Report: "Service outage caused by an unresponsive database"

Last update 2024-10-17T12:30:54.487Z

resolved2024-10-17T12:30:54.473Z

The fix has now been deployed.

identified2024-10-17T08:56:39.351Z

We experienced another service interruption today from 06:43 to 06:51 (UTC). The symptoms were consistent with those observed on Tuesday; however, this time we had access to additional information that enabled us to identify the root cause of the issue. We are currently working on a resolution and are committed to deploying a fix promptly. We appreciate your understanding and support during this time. If you have any questions or require further assistance, please do not hesitate to reach out through our support channels. Thank you for your continued partnership. Best regards, Factorial Infrastructure Team

Oct 15, 2024

Report: "Service outage caused by an unresponsive database"

Last update 2024-10-15T10:20:35.212Z

resolved2024-10-15T09:00:00.000Z

We experienced a brief service interruption from 09:00 AM to 09:15 AM (UTC). During this time, our database experienced unresponsiveness, which impacted the availability of our service. To mitigate the issue, our team promptly initiated a failover to the secondary database instance. This action successfully restored service functionality, and we are pleased to report that all systems are now operating normally. We are currently investigating the root cause of the incident to ensure that we can prevent similar occurrences in the future. We appreciate your understanding and patience as we work through this process. If you have any questions or require further assistance, please do not hesitate to reach out through our channels. Thank you for your continued support. Best regards, Factorial Infrastructure team

Sep 30, 2024

Report: "Degraded performance on Factorial"

Last update 2024-09-30T15:40:57.821Z

resolved2024-09-30T15:40:57.805Z

We are pleased to inform you that the fix has been successfully deployed. The application is now operating at its normal performance levels. Thank you for your patience during this incident.

monitoring2024-09-30T14:17:46.667Z

A fix has been tested and is being currently deployed to all environments.

identified2024-09-30T13:48:24.912Z

The application has suffered a major performance degradation at 15:25 CEST, the cause of which has been identified as well and is being tackled by our Engineering team. It should be deployed in the next 1-2 hours. The fix to the earlier issue, which only happened on certain conditions, has produced the desired results.

investigating2024-09-30T13:38:16.944Z

We keep seeing intermittent service interruptions. Our team is hard at work to identify and fix the root cause of the issue. Apologies for the inconvenience.

identified2024-09-30T12:30:07.172Z

We are currently investigating reports of laggy performance affecting some customers when using our application and API. Our teams have determined that this issue is unrelated to the earlier infrastructure problem encountered today. We have identified the specific conditions that lead to this performance degradation and are actively working on a resolution. We appreciate your patience as we address this matter and will provide updates as soon as more information becomes available. Thank you for your understanding.

Report: "Degraded performance on Factorial app"

Last update 2024-09-30T08:51:03.805Z

resolved2024-09-30T08:51:03.790Z

We are pleased to inform you that the performance degradation issue has been successfully resolved. Our team has conducted a thorough investigation and identified enhancements to our monitoring systems. These improvements will enable us to detect and address similar situations more effectively in the future. We appreciate your patience and understanding during this incident. Thank you for your continued support.

monitoring2024-09-30T08:37:38.597Z

The fail-over to the secondary database has improved the situation as expected. We are monitoring the recovery and looking at side-effects of the situation before marking the incident fully resolved.

investigating2024-09-30T08:23:06.775Z

We have decided to fail over the database to the secondary instance in another availability zone. This should resolve the issue in a matter of minutes.

investigating2024-09-30T07:41:57.980Z

The performance of the application is heavily degraded since 9:00 CEST. We are investigating the source of the issue to restore the service as soon as possible.

Aug 28, 2024

Report: "Delay in time tracking calculations"

Last update 2024-08-28T13:19:57.995Z

resolved2024-08-28T13:19:57.979Z

The issue has been resolved. There may be some inconsistencies in the calculation that will eventually be reconciled with the shifts in the database.

monitoring2024-08-28T12:40:54.000Z

The system in charge of computing the time tracking totals for display on the application interface is experiencing unusual delay since 12:00 UTC. While the shifts were being registered, today's totals may not include the most recent ones. Our team has submitted a fix and we are hoping to see a recovery in a matter of minutes.

Aug 26, 2024

Report: "Core component failure made Factorial application unavailable"

Last update 2024-08-26T14:35:04.656Z

resolved2024-08-26T12:00:00.000Z

An incident occurred today, between 14:22 and 14:49 CEST, which affected the performance of our application and website. During this time, a failure in a core component of our infrastructure resulted in slower response times, increased error rates, and, ultimately, service unavailability. Our incident response team acted swiftly to identify the issue and successfully replaced the failing component, restoring full service shortly thereafter. Following the incident, our infrastructure team conducted an investigation to understand the root cause of the failure. We have since implemented improvements to our configuration to prevent similar issues from occurring in the future. We sincerely apologize for any inconvenience this may have caused and appreciate your understanding as we continue to enhance the reliability of our services. Thank you for your continued support.

Jul 15, 2024

Report: "Factorial application unavailable"

Last update 2024-07-15T12:20:46.610Z

resolved2024-07-15T12:13:30.439Z

The service has been restored; app.factorialhr.com is fully operational again. We apologize for the inconvenience caused and will perform an investigation to ensure such errors don't happen again in the future.

identified2024-07-15T11:58:45.000Z

A fix is underway - we expect the service to be restored in the next hour.

investigating2024-07-15T11:50:40.000Z

Due to an error introduced in our latest release, the Factorial application is currently unavailable or partially loading. Our teams have identified the source of the problem and are investigating a fix to be deployed as soon as possible.

Jul 9, 2024

Report: "Elevated error rates"

Last update 2024-07-09T12:09:45.017Z

resolved2024-07-09T12:09:45.004Z

The affected system has been replaced. This incident is now resolved.

monitoring2024-07-09T11:56:45.390Z

Our monitoring systems have detected higher error rates than usual. In most cases these are timeouts caused by a malfunctioning system. Our Engineers have applied remediation and we are confirming the recovery of the service levels back to normal.

Jul 1, 2024

Report: "Brief service interruption during Database migration"

Last update 2024-07-01T15:05:34.255Z

resolved2024-07-01T14:58:45.000Z

As part of our continuous efforts to improve the application and its performance, an unanticipated short service interruption has been noticed while upgrading our database services. We apologize for the inconvenience this event may have caused our customers and will improve our protocols to ensure this kind of interruption does not reoccur.

Jun 6, 2024

Report: "Full outage after routing misconfiguration"

Last update 2024-06-06T05:25:00.780Z

resolved2024-06-05T07:00:00.000Z

Our team have introduced a misconfiguration at 09:20 CEST with an automated deployment, immediate action have been taken and the sevice were restored at 09:28. Despite our validation processes this introduced an unwanted change that triggered a second downtime at 10:00 CEST. Our emergency procedure have been launched and we restored our services at 10:27 CEST. We are committed to delivering exceptional services and we are constantly reviewing all processes to avoid similar inconveniences in the future.

Apr 3, 2023

Report: "Factorial Backend Error"

Last update 2023-04-03T10:59:34.902Z

resolved2023-04-03T10:59:34.833Z

We’re back! The Backend Service should be up and running. Thanks for bearing with us.

investigating2023-04-03T10:56:52.602Z

We are investigating a Backend Service issue that might be affecting some users. We are making every effort to find a solution as soon as possible. We'll soon provide another update.

Report: "Factorial Backend Error"

Last update 2023-04-03T10:49:39.436Z

resolved2023-04-03T10:49:39.352Z

We’re back! The Backend Service should be up and running. Thanks for bearing with us.

investigating2023-04-03T10:46:47.554Z

We are investigating a Backend Service issue that might be affecting some users. We are making every effort to find a solution as soon as possible. We'll soon provide another update.

Mar 30, 2023

Report: "Factorial Backend Error"

Last update 2023-03-30T09:44:34.688Z

resolved2023-03-30T09:44:34.608Z

We’re back! The Backend Service should be up and running. Thanks for bearing with us.

investigating2023-03-30T08:18:27.557Z

We are investigating a Backend Service issue that might be affecting some users. We are making every effort to find a solution as soon as possible. We'll soon provide another update.

Mar 16, 2023

Report: "Factorial Backend Error"

Last update 2023-03-16T17:05:40.369Z

resolved2023-03-16T17:05:40.264Z

We’re back! The Backend Service should be up and running. Thanks for bearing with us.

investigating2023-03-16T17:05:00.614Z

We are investigating a Backend Service issue that might be affecting some users. We are making every effort to find a solution as soon as possible. We'll soon provide another update.

Mar 13, 2023

Report: "Factorial Backend Error"

Last update 2023-03-13T15:17:20.160Z

resolved2023-03-13T15:17:20.091Z

We’re back! The Backend Service should be up and running. Thanks for bearing with us.

investigating2023-03-13T15:09:49.443Z

We are investigating a Backend Service issue that might be affecting some users. We are making every effort to find a solution as soon as possible. We'll soon provide another update.

Mar 4, 2023

Report: "Factorial Backend Error"

Last update 2023-03-04T21:52:37.209Z

resolved2023-03-04T21:52:37.141Z

We’re back! The Backend Service should be up and running. Thanks for bearing with us.

investigating2023-03-04T21:50:53.403Z

We are investigating a Backend Service issue that might be affecting some users. We are making every effort to find a solution as soon as possible. We'll soon provide another update.

Oct 1, 2020

Report: "Degraded performance and requests timing out"

Last update 2020-10-01T11:50:01.230Z

resolved2020-10-01T10:30:00.000Z

Due to a configuration error in an instance of our server cluster, requests to Factorial servers that were hitting that machine were timing out or had a very slow response time. The isuse has since been resolved

Sep 9, 2020

Report: "Major outage"

Last update 2020-09-09T14:26:01.429Z

postmortem2020-09-09T14:16:17.697Z

# What happened? At 14:39 new content for our public pages was deployed, causing our cache to hit its maximum capacity limit. This event triggered a fallback strategy: we start requesting a third-party service to serves us the content for our public pages. This third-party service quickly became overwhelmed with requests and started applying an exponential backoff strategy, forcing our backend services to wait long periods of time in order to get a response, and thus making our API unresponsive. # How did we solve it? Increasing the maximum capacity limit of our cache fixed the issue. # How are we gonna make sure it does not happen again? We are gonna review our cache strategy, so that our whole infrastructure does not depend on it in order to properly function.

resolved2020-09-09T14:15:52.834Z

Majour outage of all our services except the blog from 14:39 to 15:43.

Jun 10, 2020

Report: "Increased latency, response times a lot higher than usual"

Last update 2020-06-10T16:07:17.469Z

resolved2020-06-10T16:07:17.452Z

A recent change in one of our api endpoints made it much slower than usual and that created a snowball effect in our application servers after some time, making it so requests from all endpoints were queued up and not served in a timely manner

investigating2020-06-10T14:47:43.256Z

We're seeing high response times across all our features We're currently investigating the issue

May 19, 2020

Report: "Time Tracking service outage"

Last update 2020-05-19T15:32:01.477Z

postmortem2020-05-19T15:21:39.301Z

# What happened? Today we made an upgrade in the Time Tracking service that involves a migration of the settings of each company to the new implementation. This process took longer than expected disabling the service up to 1 hour in some companies # How are we going to prevent similar issues in the future? When an upgrade like this requires big Data migrations, procure to implement it backward compatible \(if possible\) and in low-impact times \(nights/weekends\)

resolved2020-05-19T13:30:00.000Z

The Time Tracking service was disabled for some companies, preventing employees to clock-in and out, in desktop and mobile App

May 18, 2020

Report: "Performance degraded on our API request time"

Last update 2020-05-18T14:14:49.674Z

resolved2020-05-18T14:14:49.657Z

We fixed the main performance issue and now the system is stable. The custom fields feature has been enabled again. We're still going to monitor our system to detect possible regressions.

identified2020-05-18T12:33:47.607Z

The issue has been identified and a fix is being implemented.

investigating2020-05-18T12:20:36.877Z

We found the culprit of this issue. Yesterday we deployed a change with our custom fields system with a non performant endpoint, this degraded our puma's trying to serve this requests for about 60 seconds. This was causing other requests to be delayed and eventually timedout. We partially disabled custom fields feature in order to keep other parts of the app working. We're fixing the performance regression and we'll enable full custom fields feature once we get a decent performance on affected endpoint. We'll keep updating this incidence with further steps.

investigating2020-05-18T07:30:53.000Z

We are currently investigating the issue

May 12, 2020

Report: "DNS change produced downtime"

Last update 2020-05-12T13:59:39.891Z

postmortem2020-05-12T13:28:56.827Z

# What happened? Today we made a change in our DNS \(Domain Name System\) that produced a downtime by making our main domains \([factorialhr.com](http://factorialhr.com), [factorialhr.es](http://factorialhr.es), [factorialhr.fr](http://factorialhr.fr), ...\) unable to be resolved for a few seconds. Our infrastructure change first destroyed and then recreated an existing dns record, while having our SOA retry-time too high. That produced a downtime to our public sites of about 7 minutes # How are we going to prevent similar issues in the future? Re-think the way we apply DNS changes in our infrastructure. We will lower the retry-time in our SOAs to more manageable values. We'll also apply some changes manually first and then pass these changes to our infrastructure code.

resolved2020-05-12T13:28:30.692Z

This incident has been resolved.

May 4, 2020

Report: "Overall degraded service"

Last update 2020-05-04T09:30:08.896Z

postmortem2020-03-24T10:10:58.973Z

# What happened? On May 23rd, 2020, we added a new Scheduler service to our infrastructure. The Scheduler is in charge, among other things, of distributing load between all our service instances to ensure the best possible performance and make Factorial more resilient to potential failures. Unfortunately, the Scheduler was misconfigured in such a way it started unevenly distributing work among our services. On May 24th, as traffic to our websites and applications started increasing, the uneven distribution of work overloaded some of Factorial services, causing a partial outage of our home page, web, and mobile applications that lasted for approximately one hour. # How are we going to prevent similar issues in the future? We immediately got noticed of the problem but took a longer than expected time to fix it, in part due to some key personnel not being available at the appropriate time. We acknowledge this is due, in part, to a poorly implemented outage procedure at Factorial, and we commit to improve this procedure.

resolved2020-03-24T09:34:52.000Z

A technical incidence due to some changes performed to our infrastructure has resulted in intermittent outages of our Home Page, web and mobile applications. The issue has been resolved and we are currently working to provide a more detailed postmortem.

Report: "Domain Name resolution interrupted"

Last update 2020-05-04T09:29:26.509Z

postmortem2020-05-04T09:06:12.747Z

# What happened? Last week we introduced a regression in the way we manage our DNS in our infrastructure. This change made that during about an hour our main domains [fatorialhr.com](http://fatorialhr.com) and [factorialhr.es](http://factorialhr.es) didn't respond and some users couldn't access our site. # How are we going to prevent similar issues in the future? Our infrastructure tools have a way to preview the changes that are going to be applied. In the future we’ll double check that we're not introducing unexpected changes.

resolved2020-05-04T09:05:52.206Z

This incident has been resolved.

monitoring2020-05-04T08:55:59.000Z

A technical incidence due to some changes on our DNS performed to our infrastructure has resulted in outages of our Home Page, web and mobile applications. Due to this change the main domains: factorialhr.com and factorialhr.es stopped responding. We identified the issue and we applied a change to restore old working behaviour.

Dec 2, 2019

Report: "Employees were not able to clock in"

Last update 2019-12-02T11:53:05.428Z

resolved2019-12-02T08:00:00.000Z

We had an issue with our queueing system on Sunday night and the system didn't generate the December periods. This meant that employees couldn't clock in until ~9:00 when the issue was resolved.

Nov 14, 2019

Report: "All customers experienced a downtime for an hour"

Last update 2019-11-14T15:42:03.295Z

postmortem2019-11-14T15:33:16.128Z

# What happened? We tried to issue the rollback through the CI. But this relies on production machines to do the builds. There is already a task to fix this issue, but on the meantime we learned that the way to go should have been to issue the rollback directly in the machines, with Capistrano. Also, we managed to do the rollback quite quick, but at that point, redis was already down. We kept looking for the culprit although we already fixed it. This lack of visibility obfuscated the real problem. If we had checked our monitors/logs more attentively we would have seen that we were dealing with a different problem \(redis being down\). We acknowledge that the response was not quick enough. Next time, we are going to rollback quicker and be sure to review all the monitors before taking next steps.

resolved2019-11-13T09:00:00.000Z

We launched a new version of the product that had a performance regression. This new version caused the machines started to go slower than usual until they could not handle any load. At that point the engineers rolled back the faulty release but the machines were still unresponsive and one of our database services went down with them (redis). In the end we managed to bring all the machines back to live with their services and restore service to all customers.

Oct 29, 2019

Report: "Factorial was down due to slow migration"

Last update 2019-10-29T08:29:10.488Z

postmortem2019-10-29T08:23:37.668Z

**What happened** A slower-than-usual database migration resulted in Factorial backend service getting stuck in an inconsistent state, trying to query nonexistent attributes from the database. A service restart quickly fixed the issue. **What are we going to do to prevent it in the future?** We implemented a system to enforce migrations be coded in a safe manner. We are gonna start enforcing the use of this system from now on.

resolved2019-10-29T08:23:17.979Z

Factorial was down 7 minutes. This incident has been resolved.

Aug 2, 2019

Report: "Factorial was down due a database high load"

Last update 2019-08-02T12:35:11.509Z

resolved2019-08-02T12:35:11.500Z

There was a mysql high load due a bad performant migration

Jul 3, 2019

Report: "Factorial was down due to a broken reference"

Last update 2019-07-03T15:35:06.534Z

postmortem2019-07-03T15:34:17.261Z

What happened? There was a broken reference in production environment that wasn't detected by our deployment process which crashed the whole backend application. What are we going to do to prevent it in the future? We are going to add a step in our deployment process which will be able to detect this kind of broken references.

resolved2019-07-03T15:32:45.102Z

We already fixed the issue. Service should become operational soon.

Jul 1, 2019

Report: "Factorial being very slow and not loading sometimes"

Last update 2019-07-01T09:52:29.156Z

postmortem2019-07-01T09:44:45.518Z

# Factorial very slow and unresponsive ## What happened? We released a new version of Factorial with the new “Upgrade button”. In order to show/hide the button we needed to request more information from our API which for some reason was hitting a third party \(Stripe\). This made the application unbearably slow. ## What are we going to do to prevent it in the future? This kind of regressions are very difficult to catch during development. We are going to keep investing on monitoring and alerting so we can catch this issues earlier and rollback fast to not affect our customers.

resolved2019-07-01T09:41:40.173Z

We already fixed the issue. Service should become operational soon.

identified2019-07-01T09:23:22.778Z

Our engineers have identified the issue and are working on a fix.

May 28, 2019

Report: "Application unresponsive"

Last update 2019-05-28T15:54:09.882Z

postmortem2019-05-28T15:15:44.797Z

## 🤷‍♂️ What happened? The application became very slow and some users claimed that they couldn’t work for 5 minutes. ## 🕵️‍♀️ Why did it happen? We released a new version of Factorial to improve how we resize and serve the user avatars and company images. During the migration the machines were overloaded and couldn’t handle the traffic. ## 👮‍♀️ What are we going to do to avoid it in the future? The team has learned – the hard way – that our infrastructure can’t handle this kind of migrations during peak working hours. We will make this migrations more progressively and, if that’s not possible, choose the migration times more wisely.

resolved2019-05-28T10:25:22.550Z

Factorial became very slow and some users couldn't use the service.