Sema4.ai (Robocorp)

Is Sema4.ai (Robocorp) Down Right Now? Check if there is a current outage ongoing.

Sema4.ai (Robocorp) is currently Operational

Last checked from Sema4.ai (Robocorp)'s official status page

Historical record of incidents for Sema4.ai (Robocorp)

Report: "Database engine upgrade on SSO authentication database, US based deployments"

Last update
resolved

Upgrade is complete and we are monitoring the performance.

investigating

Today's database upgrade 3/3: SSO integration database engine upgrade for US based deployments is in progress. SSO-based users may encounter failures in login or session refresh. Backend services or API access is not impacted.

Report: "Database engine upgrade on SSO authentication database, EU based deployments"

Last update
resolved

This incident has been resolved.

identified

Upgrade is complete and we are monitoring the performance.

identified

Today's database upgrade 2/3: SSO integration database engine upgrade for EU based deployments is in progress. SSO-based users may encounter failures in login or session refresh. Backend services or API access is not impacted.

Report: "Database engine upgrade on eu1 (Ireland) Control Room instance"

Last update
resolved

Upgrade is complete and we are monitoring the performance.

investigating

Today's database upgrade 1/3: core database engine upgrade for eu1 (Ireland) Control Room instance is in progress. We expect slightly increased error rates during a 10 minute time window in all Control Room services.

Report: "Database engine update on Control Room us1 (AWS us-east-1) instance"

Last update
resolved

Upgrade is complete. We apologize for any inconvenience caused by the maintenance.

investigating

As previously announced, we are performing a database update on the us1 Control Room instance hosted in N. Virginia. Occasional API call failures or other minor disruptions in service may be expected during the update. Our team is closely monitoring the progress, and we will keep this page up to date with status.

Report: "Database engine update on Control Room eu2 (AWS eu-central-1) instance"

Last update
resolved

Upgrade on eu2 is complete. Apologies for the inconvenience.

investigating

As previously announced, we are performing a database update on the eu2 Control Room instance hosted in Frankfurt, Germany. Occasional API call failures or other minor disruptions in service may be expected during the update. Our team is closely monitoring the progress, and we will keep this page up to date with status.

Report: "Increased error rates"

Last update
resolved

We observed increased error rates in Robocorp Cloud around 3:00 - 3:14 UTC today.

Report: "Issues with robot executions"

Last update
resolved

This incident has been resolved. We're still monitoring situation for any additional robot execution errors.

monitoring

We are continuing to monitor for any further issues.

monitoring

https://status.fastly.com has applied the fix and this should address the issue. Some robots may still experience issues with e.g. slow dependency downloads etc. but service should now be restored.

investigating

Seems to be related to Fastly outage https://status.fastly.com/. This is currently preventing e.g. robot dependency downloads.

investigating

We are currently investigating this issue.

Report: "Process run listing fails"

Last update
resolved

This incident has been resolved.

investigating

We are investigating an issue causing process run listing to fail. Initial analysis suggests the issue is cosmetic and affecting only the list view.

Report: "Container start failures"

Last update
resolved

This incident has been resolved.

monitoring

Fix has been deployed and we're monitoring the situation. Current findings suggest only an early access test environment was impacted by this issue.

investigating

We are observing an increased failure rate on starting Robocorp-hosted cloud containers. The issue is being investigated.

Report: "Increased error rates"

Last update
resolved

The root cause has been fixed by AWS and our services have been back to stable since around 3 PM PDT (Oct 5) / 01:00 EEST (Oct 6).

identified

This situation is still active with a minor practical impact on users. Our hosting provider AWS has confirmed increased invoke timeout rates in AWS Lambda. We expect the root cause to be solved by AWS in a timely fashion. Throughout the day we have implemented some resiliency improvements to the core services powering our platform. Meanwhile, we continue to monitor the situation closely and are prepared to perform additional resiliency improvements as opportunities are identified.

investigating

We are currently observing slightly increased error rates around a database in our monitoring. Recovery from most errors happens automatically and overall the platform is operating as usual. However, if you are encountering any unusual problems please contact us via support@robocorp.com.

Report: "Issues with process start"

Last update
resolved

Issue resolved. We rolled back part of the changes that caused the issues.

investigating

We are currently investigating this issue.

Report: "Increased error rates"

Last update
resolved

Situation is back to normal. Our analysis indicates the issues were caused by intermittent failures on our infrastructure provider, but service remained operational at all times due to retries.

investigating

We are currently investigating increased error rates in the platform.

Report: "Increased error rates"

Last update
resolved

This incident has been resolved. A backend service was having database connection issues. Minor impact to a small subset of workloads may have occurred.

investigating

We are currently observing increased error rates on the platform. All services are still up and running, and we are investigating & monitoring the situation.

Report: "Artifact and Work Item File metadata inconsistency"

Last update
resolved

We discovered a failure in a subsystem responsible for updating metadata of Step Run Artifacts and Work Item Files. File upload completion detection did not run as expected and associated metadata is out of date, showing files as "not uploaded". The issue impacted file uploads between 17:00 EEST and 18:00 EEST today. The practical impact of this issue is negligible. Files are served regardless of the metadata, and thus there is no issue unless the metadata is explicitly relied on. No data loss has occurred. If you encounter any practical issues, please let us know and we can take action on a case-by-case basis.

Report: "Process API failures"

Last update
resolved

Bulk of the errors was caused by a change that caused misreporting of 4xx status codes as 5xx. This was visible to clients of the process API, but impacted only requests that failed regardless. Additionally, between 11:49 and 12:05 EEST a subset of requests were failing. We will notify all impacted paid tier customers explicitly during today.

investigating

We are investigating increased error rates in process API.

Report: "Multiple runs started by Control Room scheduler"

Last update
resolved

We have fixed an issue causing Control Room scheduler to incorrectly start multiple runs for a subset of scheduled runs between Oct 19 16:15 EEST (6.15am PT) and Oct 20 10:20 EEST (1.20am PT). The root cause of the issue was an insufficient timeout configuration, which in some cases caused our resiliency mechanism to trigger a retry before the processing of the original request had been completed. We apologize for any inconvenience.

Report: "Slack integration failures in Control Room"

Last update
resolved

We encountered errors with the Slack integration in Control Room this week on Wednesday 19th and Thursday 20th. Customers impacted by failures to deliver Slack notifications have been explicitly contacted along with details of the notifications that were not delivered successfully. The situation has been restored back to normal.

Report: "Increased error rates"

Last update
resolved

We encountered a significant spike in errors from AWS services during 14:35 EET - 14:37 EET (5.35 - 5.37 Pacific Time). The situation is back to normal. Based on the inspection of dead-letter-queues, all failures in customer workloads were automatically recovered via retry mechanisms.

investigating

We are currently investigating increased error rates across various AWS services on eu-west-1, where our EU-based Control Room is hosted.

Report: "Work Item system failures"

Last update
resolved

The situation is back to normal. Approximately 10% of requests to the task data subsystem between 6:54 UTC and 7:54 UTC were failing with HTTP status code 500. Impacted step runs may have failed depending on which actions the robot has taken. Run failure notifications have been operational during the time, thus surfacing the issue to robot operators. The failed work items can be retried from the control room. We apologize for the inconvenience.

investigating

We are currently investigating Internal server error responses from the work item system.

Report: "Increased error rate on process related features"

Last update
resolved

We encountered a spike in errors on our rate limiting system during 05:30 EET - 07:30 EET (8:30PM - 10:30PM PDT). The situation is back to normal. The issue may have impacted process related operations like starting, stopping, and listing processes. The system has been patched to prevent similar issues in the future.

Report: "Login failures on Control Room US based instance"

Last update
resolved

This incident has been resolved. The root cause was a managed database update that resulted in some database requests failing during a time window of around 5 minutes.

investigating

We are currently investigating failures to login via web browser to the US based Control Room.

Report: "US instance service degradation"

Last update
resolved

Most AWS services have recovered and Control Room is returning to normal operation. We continue monitoring the situation.

identified

The issue is confirmed to be caused by a broader problem on AWS services in us-east-1 region. The AWS team is working on getting things back to normal and we are already seeing significant improvements in the performance. Currently Control Room is operational with occasional hiccups.

investigating

We are investigating an issue affecting Control Room services in AWS us-east-1 region. The Service is severely impacted by increased error rates. This incident appears to be connected with an operational issue on AWS services as reported on https://health.aws.amazon.com/health/status.

Report: "Video feed view failing"

Last update
resolved

This issue has been fixed. Please note that the issue impacted only serving the videos. Any run videos recorded earlier are now available for viewing.

identified

There is a known issue with the run video feed view failing. Only the viewing infrastructure is experiencing problems. Recording is functioning correctly and recorded videos will be available once the issue is addressed. We aim to get a fix rolled by end of the day today.

Report: "Increased error rates in indexing service"

Last update
resolved

Situation is back to normal. Subset of messages in our indexing service failed to process in a time window between 14:25 - 15:25 UTC today. The indexing service powers various listing and search features, but is not a system of record for any data or part of the operational path of robots. In practice, there may be discrepancies between the actual data and data shown in various searchable listings. We will continue to analyze the impact and are considering either replaying the messages or contacting impacted customers, depending on the findings.

investigating

We are currently observing increased error rates in indexing service of eu1 Control Room (https://cloud.robocorp.com) and our team is investigating the situation.

Report: "Increased error rates in indexing service"

Last update
resolved

The issue has been resolved and we have deployed additional measures to avoid this problem in the future. Some data may still be out of sync in the indexes, and we continue on re-syncing.

monitoring

The situation appears to be back to normal. We are still monitoring the situation and starting work on measures to ensure this does not happen in the future.

investigating

We are currently observing increased error rates in indexing service of eu1 Control Room (https://cloud.robocorp.com) and our team is investigating the situation as a top priority. The indexing service powers various listing and search features, but is not a system of record for any data or part of the operational path of robots. In practice, there may be discrepancies between the actual data and data shown in various searchable listings.

Report: "Increased error rates on eu1 self-service instance"

Last update
resolved

We encountered increased error rates in database access during a ~10-minute window around noon today on the eu1 self-service instance accessible at https://cloud.robocorp.com. A small number of runs may be affected. The situation is back to normal. We apologize for the inconvenience. If you have any questions, please get in touch with us.

Report: "Control Room login failures"

Last update
resolved

This issue has been resolved. The issue impacted only end-user facing feature. Any automations running on the platform were not impacted. Our apologies for the inconvenience.

investigating

We are investigating issues with failure to log in to Control Room.