SchemeServe

Is SchemeServe Down Right Now? Check if there is a current outage ongoing.

SchemeServe is currently Operational

Last checked from SchemeServe's official status page

Historical record of incidents for SchemeServe

Report: "Update to AuthApi"

Last update
Postmortem
Completed

An update to the AuthApi has been released

Report: "Update to XeroApi"

Last update
Postmortem
Completed

An update to the XeroApi has been released

Report: "Update to TemplateTransformApi"

Last update
Postmortem
Completed

An update to the TemplateTransformApi has been released

Report: "Update to AgentStatementsApi"

Last update
Postmortem
Completed

An update to the AgentStatementsApi has been released

Report: "Update to AccountingApi"

Last update
Postmortem
Completed

An update to the AccountingApi has been released

Report: "Update to OpenBankingApi"

Last update
Postmortem
Completed

An update to the OpenBankingApi has been released

Report: "Update to SchemesApi"

Last update
Postmortem
Completed

An update to the SchemesApi has been released

Report: "Update to BankAccountsApi"

Last update
Postmortem
Completed

An update to the BankAccountsApi has been released

Report: "Update to RatesApi"

Last update
Postmortem
Completed

An update to the RatesApi has been released

Report: "Update to QuestionSetsApi"

Last update
Postmortem
Completed

An update to the QuestionSetsApi has been released

Report: "Update to RiskGroupsApi"

Last update
Postmortem
Completed

An update to the RiskGroupsApi has been released

Report: "Update to PostcoderApi"

Last update
Postmortem
Completed

An update to the PostcoderApi has been released

Report: "Update to EPostcodeApi"

Last update
Postmortem
Completed

An update to the EPostcodeApi has been released

Report: "Update to EndorsementsApi"

Last update
Postmortem
Completed

An update to the EndorsementsApi has been released

Report: "Update to DataRequestsApi"

Last update
Postmortem
Completed

An update to the DataRequestsApi has been released

Report: "Update to DocumentTemplatesApi"

Last update
Postmortem
Completed

An update to the DocumentTemplatesApi has been released

Report: "Update to DataLookupsApi"

Last update
Postmortem
Completed

An update to the DataLookupsApi has been released

Report: "Update to UsersApi"

Last update
Postmortem
Completed

An update to the UsersApi has been released

Report: "Update to TranslationsApi"

Last update
Postmortem
Completed

An update to the TranslationsApi has been released

Report: "Update to ReportsApi"

Last update
Postmortem
Completed

An update to the ReportsApi has been released

Report: "Update to AutomationsApi"

Last update
Postmortem
Completed

An update to the AutomationsApi has been released

Report: "Update to AddressCloudApi"

Last update
Postmortem
Completed

An update to the AddressCloudApi has been released

Report: "Update to SmsApi"

Last update
Postmortem
Completed

An update to the SmsApi has been released

Report: "Update to TagsApi"

Last update
Postmortem
Completed

An update to the TagsApi has been released

Report: "Update to StatsApi"

Last update
Postmortem
Completed

An update to the StatsApi has been released

Report: "Update to PaymentsApi"

Last update
Postmortem
Completed

An update to the PaymentsApi has been released

Report: "Update to SitesApi"

Last update
Postmortem
Completed

An update to the SitesApi has been released

Report: "Update to ScheduledTasksApi"

Last update
Postmortem
Completed

An update to the ScheduledTasksApi has been released

Report: "Update to LogsApi"

Last update
Postmortem
Completed

An update to the LogsApi has been released

Report: "Update to QuoteApi"

Last update
Postmortem
Completed

An update to the QuoteApi has been released

Report: "Update to OpenApiCombinerApi"

Last update
Postmortem
Completed

An update to the OpenApiCombinerApi has been released

Report: "Update to IntegrationsApi"

Last update
Postmortem
Completed

An update to the IntegrationsApi has been released

Report: "Update to HtmlToPdfApi"

Last update
Postmortem
Completed

An update to the HtmlToPdfApi has been released

Report: "Update to ImportsApi"

Last update
Postmortem
Completed

An update to the ImportsApi has been released

Report: "Update to InsurersApi"

Last update
Postmortem
Completed

An update to the InsurersApi has been released

Report: "Update to EmailsApi"

Last update
Postmortem
Completed

An update to the EmailsApi has been released

Report: "Update to ClientsApi"

Last update
Postmortem
Completed

An update to the ClientsApi has been released

Report: "Update to DocumentsApi"

Last update
Postmortem
Completed

An update to the DocumentsApi has been released

Report: "Update to FileTransferApi"

Last update
Postmortem
Completed

An update to the FileTransferApi has been released

Report: "Update to CasesApi"

Last update
Postmortem
Completed

An update to the CasesApi has been released

Report: "Update to ClaimsApi"

Last update
Postmortem
Completed

An update to the ClaimsApi has been released

Report: "Update to BillingApi"

Last update
Postmortem
Completed

An update to the BillingApi has been released

Report: "Update to AgentsApi"

Last update
Postmortem
Completed

An update to the AgentsApi has been released

Report: "Update to SchemeServe front end"

Last update
Postmortem
Completed

An update to front end interfaces has been released to the live environment

Report: "Update to Frontend.Custom"

Last update
Postmortem
Completed

An update to the Frontend.Custom has been released

Report: "Update to SchemeServe Application"

Last update
Completed

An update to the SchemeServe Application has been released

Report: "Issue with Multi factor authentication"

Last update
postmortem

**Root Cause** SchemeServe’s authentication system relies on short-lived access tokens known as JWTs \(JSON Web Tokens\), each with a lifespan of 15 minutes before requiring renewal. Historically, SchemeServe has used **BST \(British Summer Time\)** across its codebase and storage layers. As part of an ongoing effort to align with international standards, newer services have been transitioned to operate in **UTC**. However, the JWT generation process had not yet been updated and was still using BST. As a result, when the UK clocks moved forward at **01:00 on Sunday, 30th March**, newly generated tokens contained timestamps based on BST. These tokens appeared to be issued in the future relative to the UTC-based verification process, placing them immediately outside the valid lifetime window. Consequently, the tokens were rejected and users were unable to authenticate. **Impact** This issue prevented users from authenticating into **SchemeServe**, effectively blocking access to the platform. **Detection** Unfortunately, this event did not trigger any health alerts and as such, the issue was not detected in real-time. We became aware of the problem at **07:50 on Monday, 31st March**, following user reports. Further investigation quickly identified the root cause. The incident led to prolonged authentication failures from **01:00 Sunday** until resolution. **Resolution** The JWT generation process was updated to explicitly use **UTC**, ensuring that newly issued tokens fall within the expected validity window. This change was deployed, and the issue was fully resolved by **09:34 on Monday, 31st March**. We sincerely regret the disruption caused by this incident and appreciate your understanding as we continue to improve the resilience and reliability of the SchemeServe platform.

resolved

The incident has been resolved, and MFA is now working correctly again. A post-mortem will be following shortly

investigating

SchemeServe is currently experiencing issues with multi-factor authentication. the issue is under invetigation.

Report: "Reports of slow speeds"

Last update
postmortem

**Root Cause** On Wednesday 5th, between 10:00 PM and 2:00 AM, we carried out an upgrade of our primary database. Following this upgrade, a subset of complex queries began executing with less-than-optimal query plans, resulting in a significant slowdown of over 100 times compared to previous performance. This issue was not identified during extensive pre-upgrade testing in a separate environment, where query performance remained stable over several months. However, in the live environment, the database optimiser selected different execution plans, leading to the unexpected performance degradation. ‌ **Impact** The issue affected all customers when performing case searches, significantly increasing query execution times. ‌ **Detection** We became aware of the issue on Thursday, 6th March, at 6:30 AM, when reports of slow performance were received. The excessive query execution times caused a knock-on effect across all usage of SchemeServe, as the additional load exceeded 100 times the normal peak load, further impacting overall system performance. ‌ **Resolution** Following our investigation, we determined that reverting the affected queries to SQL 2019 compatibility mode would restore their performance. This change was implemented immediately, and normal service levels were fully restored. ‌ We sincerely regret any disruption caused and appreciate our customers’ patience while we worked to resolve the issue.

resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating reports of slow speeds affecting some users. Our team is working to resolve this as quickly as possible. Thank you for your patience.

Report: "Service Interruption on Sunday, March 2nd"

Last update
postmortem

#### **Root Cause** The issue was caused by a new database backup and replication process that was initiated on Saturday, March 1st, at 14:32. While the process initially appeared to be functioning as expected, it was not until 08:33 on Sunday that the issue manifested, preventing quote submissions. When replication was enabled, each column value had a maximum size limit of 65,536 bytes. The `PolicyRecordDebugLogs` exceeded this limit, leading to errors that blocked new case records from completing. Despite thorough testing in multiple test environments prior to deployment, a misconfiguration in the live environment led to the issue. Unfortunately, this was due to human error in applying the necessary settings correctly during implementation. #### **Resolution** Once the issue was identified, corrective action was taken, and normal service was restored by 13:48 on Sunday. #### **Preventative Measures** To prevent similar issues in the future, we have implemented additional safeguards, including: * Enhanced deployment checks to ensure live configurations match tested environments. * Additional validation steps in our change management process. * Improved monitoring and alerting to detect similar issues earlier and prevent service disruptions. We sincerely regret the disruption caused and take full responsibility for the error. We are committed to improving our processes to ensure that such incidents do not occur again.

resolved

On Sunday, March 2nd, between 08:33 and 13:48, SchemeServe experienced an issue that prevented new case records from completing fully, meaning they could not be placed on cover.

Report: "Multifactor authentication is currently triggering for every login"

Last update
resolved

This incident has been resolved.

investigating

We are currently investigating an issue when multifactor authentication is triggering for each login, even when then "remember this device" option had been selected. For further updates please see https://status.schemeserve.com/

Report: "Some users unable to view cases"

Last update
postmortem

During the deployment a part of the process did not execute correctly resulting in a badly formatted database column. This has now been rectified, and the database column is now in place and verified.

resolved

Following a release of an update some users may have been unable to access some areas of SchemeServe, this has been rolled back while we investigate.

Report: "Intermittent timeout errors for some users"

Last update
resolved

This incident has been resolved.

investigating

We are aware of an issue where some users are experiencing intermittent timeout errors, 2FA delays, or being returned to the login screen. This is being investigated as a matter of urgency, for further updates please see https://status.schemeserve.com/

Report: "Login issues"

Last update
postmortem

This morning we had issues with our Authentication API failing to serve login requests. The cause was resource exhaustion, which was caused by a condition whereby scaling up the resource itself triggered a scale up event. This loop continued until the available resources were exhausted. In light of this we have: 1. Changed the scale event triggers so that one scale event does not trigger another. 2. Set up alerts to notify us when the symptoms begin to present themselves so we are able to take action before issues occur.

resolved

This incident has been resolved.

investigating

We are investigating a problem with logging into SchemeServe, where some users may experience error messages and be unable to log in.

Report: "Some multipage cases displaying blank address fields"

Last update
postmortem

Over the last few days we have been seeing repeated issues of postcode lookups failing to populate addresses when selected. We cache address lists for a given postcode and it appears that some of them had become "stale" and the actual properties had changed. Clearing the cache resolved the issue of addresses not populating when selected.

resolved

We have identified and fixed the issue, a post-mortem will follow.

investigating

We are currently investigating reports of some multipage cases displaying blank address fields within the case record.

Report: "Errors when updating existing cases"

Last update
postmortem

For some time SchemeServe has been approaching the limit of 32 bit integers for IDs in its main answers storage. Some time ago, we successfully implemented a zero downtime change to the main storage that enabled us to store 64 bit integers - and this was done in advance of actually requiring the extra space. Yesterday evening however, the IDs crossed the 32 bit threshold and unfortunately highlighted an instance of archive storage that had not been converted from 32 to 64 bit integers. This meant that the operations that caused the usage of the archive storage failed. Due to the diverse and varying workloads that SchemeServe processes, our tests did not catch this beforehand. We quickly realised the issue and implemented the storage change to the archive.

resolved

We have identified and fixed the issue, a post-mortem will follow.

investigating

We are aware of an issue that may result in some users experiencing errors when saving changes to existing cases - we are investigating as a matter of urgency.

Report: "Some users intermittently returned to login screen"

Last update
resolved

This incident has been resolved.

investigating

We are continuing to investigate this issue.

investigating

We are are aware of an issue where some users are intermittently being returned to the login screen. This is being investigated as a matter of urgency, for further updates please see https://status.schemeserve.com/

Report: "DNS Record Updates Required for DMARC"

Last update
resolved

This incident has been resolved.

monitoring

A new DNS record is required by some email providers in order to validate DMARC. This change helps prevent spoofing by verifying the sender’s identity. If an email fails DMARC validation, it often means that the sender is not who they claim to be, and the email could be fraudulent. If this record is not added, emails sent from SchemeServe, but not from a SchemeServe email address, will likely be rejected by these email providers if the DMARC cannot be validated. Potential impact if no action is taken - your customers / brokers may stop receiving emails from you. To check your sites DNS records, please ask an administrator within your SchemeServe site to navigate to Admin/Website/Domain & Email. If your DNS records require updating, you will see a message at the bottom of that tab stating 'You have chosen to send email from "[EMAIL]" instead of schemeserve.com. To update and validate your DNS records, please follow the instructions on that page. If you need any assistance, please contact our Obsessive Support desk.

Report: "SchemeServe connectivity issues"

Last update
postmortem

SchemeServe’s primary edge SSL certificates were due to enter their renewal period this evening 30 days before expiry, this process did not function correctly and was unable to renew the certificate. If this occurs the process should log the event and retry 24 hours later, however, instead the certificate was automatically removed from circulation. Upon identification of the issue, a new certificate was issued and all affected services were accessible again. We will be investigating why the process did not execute correctly. ‌ **Issued first occurred:** 00:00:17 **Issue identified:** 00:38:10 **Issue remediated:** 01:09:47

resolved

This incident has been fully resolved and SchemeServe is fully functioning again. A full post mortem will follow shortly.

investigating

SchemeServe is currently experiencing connectivity issues affecting all services.

Report: "users logged out and redirect loop"

Last update
postmortem

An update was released to address an issue with authentication where users' access tokens were not correctly refreshing causing them to be logged out and redirected to the login screen when accessing certain pages. Upon release, users who were affected by this issue were instantly logged out and redirected to the login screen regardless of what page they were on. The number of users that were affected by this was significantly greater than anticipated and as such the decision was made to roll back the release to investigate further. During the rollback process, an error was made that caused a redirect loop for anyone attempting to log in again to SchemeServe. This was subsequently rectified to allow users to log in correctly again. We have assessed the initial issue and have deemed that the release was functioning correctly. As the number of users that were impacted was higher than expected SchemeServe will rerelease this update outside of business hours. ‌ Initial release: 12:13 logouts reported: 12:30 rollback issued: 12:34 rollback corrected: 12:50 Total time: 37 minutes

resolved

This incident has been resolved.

Report: "Issue placing new quotes in SchemeServe"

Last update
postmortem

SchemeServe runs several performance monitoring scripts and services on our Primary Database to aggregate metrics to help identify performance issues, bad query plans and long-running queries. One of these scripts crossed a tipping point where the resources required to run the script jumped significantly causing a huge load on the database. This in turn caused other queries to run slowly. As some queries within SchemeServe will lock tables with the increased query time, this caused some additional write queries to this table to timeout. The tracking script is only one of many performance points that we use to monitor the database and so has been removed to stop further performance issues. SchemeServe aims to maintain an average overhead of at least 50% processing power on our databases during peak operational hours to allow for surges in demand. issue start time 09:14 issue first identified 09:21 issue resolved 10:38

resolved

The issue is now fully resolved. A post-mortem will be posted in due course.

monitoring

We have identified the source of the load on the database and have applied mitigation to reduce the load. This has had an immediate effect reducing load by 60%. We will continue to monitor the changes made.

investigating

We have identified the issue as a massively increased load on one of our databases and are currently working to reduce this.

investigating

We are currently investigating an issue with placing new quotes in SchemeServe. We will update as soon as we have more information.

Report: "Issue when creating MTA and Renewal case records"

Last update
postmortem

In order to future-proof SchemeServe we are proactively increasing the datatype of an identity column in the database. An additional column was required to be added to the table to begin the migration from INT to BIGINT over the coming months. This change was tested successfully within development environments however due to a mirroring of the table for archive purposes onto a second table the schemas were out of sync causing an error when attempting to move values from one to the other. As soon as this was identified the additional column was removed stopping the error whilst additional testing around the specific circumstances that caused the error could be checked. Once the update to both tables had been verified the additional column was then updated again. This change only affected MTAs and renewals and not first premium caserecords as originally believed and so the title of this incident has been updated to reflect this. All cases affected would have created a referral stating the error that can now be fixed by clearing the referral and re-running the caserecord. The change was made after working hours on Wednesday 25th at approximately 17:36 First reported on Thursday 26th at 09:15 Identified and rolled back at 09:26

resolved

The issue has been identified and remediation applied, a full post-mortem will be provided soon.

investigating

We are currently investigating an issue affecting the ability to create new quotations within SchemeServe.

Report: "Some users reporting access issues"

Last update
postmortem

SchemeServe uses the Kubernetes framework to auto-scale to meet the demand of any specific time, Sometimes during this scaling process some additional services created fail and another attempt is made to deploy the specific service. However this morning due to a significant increase in the number of failures the allocatable space for new services reached a limit of available IP addresses was reached. To mitigate this the service will auto-scale additional server resources ahead of time however in this instance this didn’t happen fast enough due to the wave of errors and caused a backlog. Once the additional server resource for the services to be deployed to was available and we had cleared a number of failed services full service was resumed again. ‌ We will be updating this Kubernetes auto-scale framework to avoid this from happening again.

resolved

This incident has been resolved.

investigating

We are investigating reports that some users are recieving error messages when logging in and are investigating this as a matter of urgency

Report: "SchemeServe Company Retreat"

Last update
resolved

This incident has been resolved.

monitoring

SchemeServe are away at our bi-annual company staff retreat from Monday 2nd until Wednesday 4th October. Our Obsessive Support desk is being monitored but is running at a reduced capacity during this period, so we may not respond as quickly as usual to your requests.

Report: "MFA - Some SMS codes not being recieved"

Last update
resolved

The issue has been resolved and users can use SMS as an MFA login successfully again

identified

The issue has been identified and we are currently working on a resolution

investigating

Some users are reporting that SMS codes for MFA are not currently being recieved. To access SchemeServe whilst we resolve this issue, please ask your site administrator to temporarily change your MFA method to Email, or contact our Obsessive Support team for assistance.

Report: "Intermittent errors across SchemeServe services"

Last update
postmortem

During a routine scale-up of SchemeServe services at 07:30 this morning the designated subnet used had an issue with assigning available IP addresses, and as such some of the scaled services could not be contacted correctly. We applied mitigating actions, and by 08:10 all services could be contacted correctly again. We have redistributed services across other subnets to ensure that this does not happen again.

resolved

This issue is now resolved, a post mortem will follow shortly.

monitoring

We have identified the issue and have applied mitigating action to resolve it. We will continue to monitor to ensure there are no reoccurrences.

investigating

We are seeing intermittent connectivity issues across SchemeServe services this morning. We are currently investigating and will update when more information is available

Report: "Multi-Factor Authentication"

Last update
resolved

This incident has been resolved.

monitoring

Multi-Factor Authentication (MFA) has now been rolled out across SchemeServe as a requirement for all main site users, and optionally for agent & broker users. Remember, the next time you log in you will be required to select a preferred authentication method, either through SMS or email. If SMS is selected, you'll need to enter a mobile number in your user settings. If the email option is selected, an email address will need to be entered. You won't recieve an email or a code this time, but each time you log in in future a code will be sent to the chosen phone number or email address, which will be valid for 5 minutes. You'll need to enter this code in addition to your username and password in order to log in. For more information <a href="https://view.genial.ly/64aea8436f1d4d00114d7d75">click here</a>.

Report: "Multi-Factor Authentication"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

This is quick note to tell you about Multi-Factor Authentication (MFA) which is being rolled out across SchemeServe on the 1st August as a requirement for all main site users, and optionally for agent & broker users. The SchemeServe process is very similar to MFA you will go through on platforms such as Google, so I’m sure this won’t be news to you! Once we switch this on, the next time you log in you will be required to select a preferred authentication method, either through SMS or email. If SMS is selected, you'll need to enter a mobile number in your user settings. If the email option is selected, an email address will need to be entered. Then, each time you log in in future a code will be sent to the chosen phone number or email address, which will be valid for 5 minutes. You'll need to enter this code in addition to your username and password in order to log in. For more information <a href="https://view.genial.ly/64aea8436f1d4d00114d7d75">click here</a>. If you would like to use MFA for your individual login before the 1st August, you can simply turn it on in your user settings. Alternatively, if you would like to turn-on MFA for your whole site before August 1st, please contact our Obsessive Support team.

Report: "SchemeServe Company Retreat"

Last update
resolved

This incident has been resolved.

monitoring

SchemeServe are away at our bi-annual company staff retreat until Wednesday 26th April. Our Obsessive Support desk is being monitored but is running at a reduced capacity on Monday 24th and Tuesday 25th April, so we may not respond as quickly as usual to your requests.

Report: "Intermittent "Services aren't available right now" error message"

Last update
resolved

We are now satisfied that the issue is resolved, we will update with a post mortem once a full report has been conducted

monitoring

Following an update from our application front door service we are no longer seeing the issue ocuring. We will continue to monitor for any reoccurances today.

identified

The issues has been identified within the routing door routing used by SchemeServe, we ar eworking to remediate the problem now.

investigating

We are continuing to investigate this issue as our top priority however have no further news to share at this time

investigating

We are continuing to investigate an intermittent issue where some clients are receiving an error page stating "Services aren't available right now". If you receive this error refreshing the window will resolve it whilst we continue to investigate.

Report: "Intermittent connectivity"

Last update
resolved

This incident has been resolved.

monitoring

The issue has been patched at our data centre and SchemeServe has returned to normal service. We will continue to monitor the situation to ensure that no further issues are encountered

identified

The issue has been identified as a networking issue at our hosting data centre. The issue is being actively investigated and we will provide an update as soon as we have more information.

investigating

We are continuing to investigate this issue.

investigating

We are currently experiencing intermittent connectivity issues with SchemeServe. We are investigating the issue and will update you with more details in due course.

Report: "Christmas Opening Hours"

Last update
resolved

This incident has been resolved.

monitoring

From all of us at SchemeServe, we wish you a very Merry Christmas and a Happy New Year! We are closed over the bank holidays, but our Obsessive Support team will still be on-hand throughout the rest of the Christmas period. If you need us, give us a call, or pop us a ticket as normal, however between the 28-30 December there will be a reduced number of staff so please do be patient with us.

Report: "SchemeServe currently experiencing slowdown"

Last update
resolved

The issue has now been resolved

investigating

We are currently investigating reports of a slowdown affecting cases and reports on SchemeServe

Report: "SchemeServe slowdown"

Last update
resolved

The slowdown has been fully remediated nd SchemeServe is working at full capacity

identified

We have identified the issue and are applying mitigation.

investigating

We are currently investigating reports of slow page loads and timeouts across SchemeServe

Report: "Agent Statements"

Last update
postmortem

When new cases were added to SchemeServe in certain circumstances the calculation question data from the questionset was not being correctly aggregated to the risk groups. This meant that with a recent update to the agent statement to the risk groups as opposed to just the questionset data that SchemeServe was finding cases that it did not think had been fully settled and so was including these cases on agent statement runs. This was never historically a problem and has only come to light with the update needed to agent statements to account for the new Digit accounting module. The error has now been corrected and agent statements should only include the correct cases again. Please note that SchemeServe has already contacted all clients that this may have affected and would not have caused any incorrect premiums of collected funds in any way.

resolved

This incident has been fully resolved, a post-mortem will follow soon

monitoring

A fix has been applied and we are monitoring the results.

identified

We are aware of an issue with running Agent Statements, which we have identified and are currently resolving. We recommend waiting until this issue is resolved before running any Agent Statements.

Report: "SchemeServe Company Retreat"

Last update
resolved

This incident has been resolved.

monitoring

SchemeServe are currently away at our bi-annual company staff retreat until Wednesday 12th October. Our Obsessive Support desk is being monitored but is running at a reduced capacity so we may not respond as quickly as usual to your requests.

Report: "State funeral of HM Queen Elizabeth II"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

Please note that SchemeServe Obsessive Support is closed for the state funeral of HM Queen Elizabeth II on the 19th of September.

Report: "Connectivity issues to SchemeServe"

Last update
postmortem

**What went wrong, and why?** SchemeServe uses the Azure front door \(AFD\) service in order to route all traffic to the relevant backend services. Between 16:10 and 16:45 UTC Azure observed an unusual spike in traffic where the AFD service attempted to load balance traffic for optimal use and minimal latency for customers. In this instance, the load balancing that occurred during the window of the traffic spike caused multiple environments managing this traffic to go offline. Azure has auto-mitigations which will cause our environments to recover in such an event. By design, these environments will recover and once they are in a healthy state so they can start to resume managing traffic. During this instance, as users and Azure systems retried the requests, it exacerbated the situation where Azure had a build-up of requests and this build-up did not allow time for the environment to fully recover. **How did Azure respond?** Azure manually intervened in the AFD load balancing process by expediting the auto-recovery system and performing more efficient load distributions in regions where there was a large build-up of traffic. Once the environment recovered, we began to gradually bring them back online to resume traffic management in a normal way.

resolved

Full service of SchemeServe has been resumed, a full post mortem will be posted shortly.

monitoring

service to Microsoft Azure has been partially restored and SchemeServe is back online, we are continuing to monitor the situation. Please be aware that connectivity to SchemeServe may still be intermittent while all services recover.

investigating

The Microsoft Azure hosting platform used for hosting SchemeServe is currently offline

investigating

We are currently investigating an issue where we are unable to connect to SchemeServe

Report: "SchemeServe Slow Loading"

Last update
postmortem

SchemeServe uses a session state to check whether a user is authenticated to access the platform. This session state is temporarily locked during requests to stop the same requests from being submitted multiple times, most requests will complete in milliseconds however can sometimes take multiple seconds to run if they are larger requests. The slowdown that was affecting a small subset of customers yesterday was caused when the session state for requests was being locked as expected, however, was not being released correctly at the end of the request. As a result, subsequent requests using the same logged-in user session were timing out as they could not validate whether the user was authenticated or not. This issue affected about 7% of SchemeServe logged-in users We are currently working to remove the session state block to make sure that this doesn’t happen in the future.

resolved

This incident has been resolved.

investigating

We are currently investigating reports from some clients of slow loading speeds.

Report: "SchemeServe slow loading"

Last update
resolved

This incident has been resolved.

identified

The issue has been identified as an increased load of the servers due to an issue filtering by client when searching cases.

investigating

We are currently investigating reports from some clients of slow loading speeds.

Report: "Interminent white screen when logged into SchemeServe"

Last update
resolved

This incident has been resolved.

monitoring

We have identified what we believe to be the cause of the white screen issue and have applied remediation. We will continue to monitor for any reoccurance.

investigating

We are currently investigating the cause of an intermittent white screen issue affecting some users sporadically.

Report: "SchemeServe intermittent service"

Last update
postmortem

SchemeServe employs auto-scaling across its application servers to accommodate more traffic load as required. As the server nodes scaled up for Monday morning one of the nodes did not build correctly meaning the application pods were unable to deploy to it correctly. This then caused a knock-on effect where the pods were caught in a loop of a continual startup. The architecture is designed to cope with issues such as this and although given time it would have sorted itself this should not have caused gateway timeout issues in the way it did. We will be adding additional remediation measures to ensure that this does not happen again.

resolved

We have identified the root cause of the issue and have applied remedial action. We have confirmation that this has been resolved. We thank you for your patience and appreciate your continued partnership.

investigating

We are currently investigating reports of intermittent service across SchemeServe this morning. We will update as soon as we have more information

Report: "Release Rollback"

Last update
resolved

This incident has been resolved.

investigating

Following the release last night an issue has been encountered when accessing certain cases. As a result we are rolling back the release whilst we investigate further.

Report: "SchemeServe slowdown"

Last update
resolved

The incident has now been fully mitigated and SchemeServe service is running as normal speeds.

investigating

SchemeServe is currently experiencing a slowdown across the core application. We are currently investigating the issue.

Report: "Rollback of release from 2022-03-31"

Last update
resolved

The update has now been successfully re-released this evening

monitoring

A fix has been implemented and we are monitoring the results.

identified

We have now identified the cause or the NULL reference exceptions and have applied a patch. This will be re-released this evening

investigating

Following the release, last night an issue has been encountered where NULL reference exceptions may be encountered when running quotes. As a result, the release will be rolled back while we investigate.

Report: "SchemeServe company retreat"

Last update
resolved

This incident has been resolved.

investigating

SchemeServe are currently away at our bi-annual company retreat until Wednesday 23rd March. Our Obsessive Support desk is running at a reduced capacity so we may not respond as quickly as usual to your requests until then.

Report: "Incident slow load times for SchemeServe"

Last update
resolved

This incident has been resolved.

monitoring

The cause of the slowdown has been identified in the authentication refresh and remediation has been applied, we are instantly seeing response times returning to normal for all services. We will continue to monitor SchemeServe over this issue.

investigating

We are currently investigating an issue that affected the login and refresh of access tokens within SchemeServe that caused increased load times and timeouts on page loads.

Report: "Incident from release on 26th October 2021"

Last update
resolved

It has been determined that the issue encountered earlier was due to an error occurring during the release process, rather than an error in SchemeServe's code. As such the deployment was rolled back and redeployment has been scheduled for this evening at 10pm.

identified

Following on from the release last night we have discovered an issue that is affecting the editing capabilities of various aspects of SchemeServe and the aggregator. We have identified the issue and will provide more information as soon as we have it.

Report: "SchemeServe SSL expiration"

Last update
resolved

This incident has been resolved.

monitoring

For most users, this will not cause any issue or noticeable impact but at 3 pm today one of the root SSL signing certificates for SchemeServe will expire. In some cases, you may receive an error screen stating that the connection is not secure. Simply refresh the page a few times and you will be able to continue using SchemeServe as before. Some older devices and operating systems may be more seriously affected, these include: OpenSSL <= 1.0.2 Windows < XP SP3 macOS < 10.12.1 iOS < 10 (iPhone 5 is the lowest model that can get to iOS 10) Android < 7.1.1 (but >= 2.3.6 will work if served ISRG Root X1 cross-sign) Mozilla Firefox < 50 Ubuntu < 16.04 Debian < 8 Java 8 < 8u141 Java 7 < 7u151 NSS < 3.26 Amazon FireOS (Silk Browser) Unfortunately, this is a change at our certificate issuer that will affect 265 million internet sites that we have no control over. For more information you can read about the update from Lets Encrypt here: https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/ SchemeServe is not responsible for the content of externally linked sites.

Report: "Possible issue regarding agents adding notes to cases"

Last update
resolved

Following an investigation, we have identified the issue as due to a very specific circumstance. Remediation has been applied and no further action will be required.

investigating

Following last nights release, we are aware of a possible issue with regard to agents adding notes to cases. We are currently investigating and will provide more details in due course.

Report: "Issues with speed and/or crashing"

Last update
postmortem

Feedback from Microsoft Azure: Summary of Impact: Between 10:48 UTC and 16:56 UTC on 06 Apr 2021, customers using Azure Front Door and Azure CDN Standard from Microsoft may have encountered intermittent connection failures and service degradation. Impacted customers were primarily located in European geographies, however a subset of customers in other geographies may have encountered intermittent impact at a lower rate. Root Cause: Azure Front Door and Azure CDN Standard from Microsoft experienced significant traffic spike at our London edge site. This was detected by our monitoring and resulted in alerts sent to service team. The traffic increase initiated traffic management processes which started distributing traffic to all nearby sites in Europe, as expected. Mitigation: The service team came on board to mitigate the incident by further adjusting the traffic distribution to nearby sites to help contain the issue. By that time, the traffic volume had dropped back to normal levels. As part of our continuous efforts to better the Azure platform we have undertaken the following repair items. These will be released to production at a future release date. Improve automated load management to improve responsiveness to increased traffic events.

resolved

This incident has been resolved.

investigating

This issue is now resolved.

investigating

SchemeServe is hosted through Microsoft Azure which has been having ongoing outage issues across Europe since approximately 12pm today. We have critical tickets raised with their support team and hope to have full functionality restored shortly. Please accept our apologies for any inconvenience during this time.

Report: "Intermittent connection issues"

Last update
resolved

SchemeServe is operating normally again at full capacity

monitoring

The immediate issue has been overcome and we are continuing to monitor the situation.

investigating

We are currently investigating intermittent connection issues with SchemeServe

Report: "SchemeServe availablility issues"

Last update
postmortem

Following the incident it has been concluded that the issue was as a result of a bug caused by a change from the Microsoft Azure hosting platform. The initial report provided by them is as follows: **Summary of Impact**: Between 18:20 UTC and approximately 21:45 UTC on 07 Oct 2020, a subset of customers may have experienced issues connecting to resources that leverage Azure Network infrastructure across regions. Resources with local dependencies in the same region should not have been impacted **Preliminary Root Cause**: A change was made to an internal service that controls routing across the Azure Wide Area Network \(WAN\). A bug in the new version of the service caused traffic to route non-optimally across the WAN, causing network congestion and packet loss. **Mitigation**: At 18:42 UTC the service self-recovered and all packet loss was resolved. To ensure the bug does not repeat, the change was rolled back at 19:30 UTC. By approximately 21:45 UTC, all Azure services reported full recovery.

resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are continuing to monitor for any further issues.

monitoring

SchemeServe is fully operational again, we will continue to monitor for any further issues

investigating

We are currently investigating an availability issue of the core SchemeServe application