Historical record of incidents for Elium
Report: "Corrupt/unavailable file upload"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
We've rollback the affected version, upload are working again, we are monitoring the issue and assessing the corrupted files
Our File provider is experiencing file corruption on upload, since Feb 04, 2025 - 14:30 CET all upload are now blocked to avoid issues, we are currently investigating possible fixes
Report: "DNS issues on our private Cloud provider (Outscale)"
Last updateOur private cloud provider has now told us that the incident is closed on their side. All services seem to be working correctly again.
As of 12:20, we are no longer detecting any errors on the services hosted by our private cloud provider. However, we have not received confirmation from them that the incident has been resolved. We are therefore continuing to monitor the platforms.
We received a feedback from our private Cloud provider : "Nous avons un incident au niveau du réseau interne de notre orchestrateur. C'est en cours de résolution au niveau des équipes techniques. Nous reviendrons vers vous dès que le problème est résolu."
The issue has been identified and a fix is being implemented.
Our private Cloud provider has informed us that they are currently experiencing problems with their DNS service. This is having an impact on certain features of the solution. In particular, services requiring external access, such as integrations with services like Google, Facebook, etc., as well as file storage and sending and receiving emails, are currently experiencing intermittent problems.
Report: "Issue on private (Outscale) hosting"
Last updateWe identified the root cause and fixed it. The platforms on the private (Outscale) hosting are now completely available again.
The platforms on the private (Outscale) hosting are intermittently unavailable. We continue to investigate the root cause of this incident and try to permanently fix the issue.
One of the servers hosting private (Outscale) platforms has an unexpected increase of memory load. We switched traffic to another backup servers. We still investigate the root cause of this incident.
Report: "Issue on private (Outscale) hosting"
Last updateWe deployed a fix to ensure that this problem will not occurs in the future.
Finally, the services of our private Cloud provider are back. We still monitoring and investigate to avoir that the problem will reproduce.
We are detecting problems with connections to platforms hosted by our private cloud (Outscale). We investigate the problem.
Report: "Issue on private (Outscale) hosting"
Last updateThis incident has been resolved.
Every platforms hosted on our private cloud (Outscale) have been restored.
Services hosted on private Cloud (Outscale) are now restored and platforms are available but storage system seems to stil broken and files still unavailable. We are awaiting further information.
Our cloud provider (Outscale) notify that a fix has been implemented and that they still monitoring the results. But we still have issues with files.
The connection to services and virtual machines seem to be accessible again. Storage is still not working, files are not accessible.
Our Cloud provider (Outscale) added the problem to their persistant storage, public network and API. We are awaiting further information.
Our Cloud provider (Outscale) reported a problem with their internal network and data hosting. We are awaiting further information.
Some virtual machines hosted in our private cloud (Outscale) are not responding. We have contacted our cloud provider and are awaiting further information.
We are continuing to investigate this issue.
We are detecting problems with connections to platforms hosted by our private cloud (Outscale). We investigate the problem.
Report: "Unavailable Files on Private Hosting"
Last updateThis incident has been resolved.
At around 3:30pm, our Cloud provider indicated that the incident was over and that the system was stable. For our part, we have not detected any errors since 2:20 pm. We continue to monitor the service.
This Friday morning at around 8:30, we detected an increase in errors accessing files hosted on certain platforms. These errors are due to an incident currently being experienced by our Cloud provider hosting the files. This is making the upload and download of certain files temporarily unavailable. We are monitoring the progress of this incident with our supplier.
We have detected an abnormal number of errors (502) on requests to our Storage provider (for private hosting). For the moment, there is no critical impact on platforms (only few requests are in errors). Our Storage provider has reported an outage on its status page.
Report: "Unavailable Files"
Last updateThe problem was identified and actions were taken to correct it. We have no longer observed 502 errors since 7:00 p.m. However, we are keeping the matter under close surveillance.
We have detected an abnormal number of errors (502) on requests to our Storage provider (for private hosting). For the moment, there is no critical impact on platforms (only few requests are in errors). Our Storage provider has not reported any outage on its status page.
Report: "Unavailable Files"
Last updateThere don't seem to be any more errors (502) since 11:52
We have detected an abnormal number of errors (502) on requests to our Storage provider (for private hosting). For the moment, there is no critical impact on platforms (only few requests are in errors). Our Storage provider has not reported any outage on its status page.
Report: "Unavailable Files"
Last updateThis incident has been resolved.
There don't seem to be any more errors (502) since 12:15.
We have detected an abnormal number of errors (502) on requests to our Storage provider (for private hosting). For the moment, there is no critical impact on platforms (only few requests are in errors). Our Storage provider has not reported any outage on its status page.
Report: "Unavailable Files"
Last updateThis incident has been resolved.
Our upstream provider informed us that the maintenance was completed and the situation is being monitored. We no longer observe any errors coming from the storage system at this point.
Our upstream provider informed us that the recovery has been delayed, and should now be no earlier than 15h30
Our storage provider informed us of an unplanned operation on the file storage service, the service will be unavailable at least until 14h while it is being worked on. Data integrity is not affected. We are in close phone contact with our provider and will be updating the status of the operation. During this time, all file uploads and downloads will not work
The issue has been identified by our provider, we are waiting for a fix on their side
Report: "File access issues"
Last updateThis incident has been resolved.
Upload and thumbnail is affected. The issue is located with the file storage provider.
Report: "unavailable instance"
Last updateThis incident has been resolved.
Situation seems to be back, we are still investigating and monitoring the situation
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Increased issues to access files on private hosting"
Last updateThis incident has been resolved by the provider. We will continue to monitor the system.
As Outscale has marked the incident as fixed, we continue to monitor the situation
A temporary service interruption is currently impacting our file storage functionality. Unfortunately, our Cloud Storage provider is experiencing an issue, which is affecting the download and upload of files within our system. Our technical team is in close communication with their support team to identify the root cause and implement a solution swiftly. During this period, end-users may encounter difficulties in accessing or uploading files. We sincerely apologize for any inconvenience this may cause. Our team will provide regular updates on the incident, including any estimated timelines for resolution or any workarounds available in the meantime.
Some of the files are inaccessible: pdf, thumbnails, ...
Report: "Unavailable frontend"
Last updateWe switched back to the original hosting for static resources (Javascript, CSS). We still monitoring for errors.
We identified random issues to serve static resources (Javascript, CSS) for some platforms and for specific end-users. We attached these issues to the incident at one of Google's data centers this morning that is impacting the global CDN that serves these static resources. We have implemented a temporary work-around to fix the issue while waiting for Google to solve the root cause. Static resources were deployed on another Cloud provider (Amazon) and used from there.
We are continuing to investigate this issue.
We are investigating the issue
Report: "File storage unavailable"
Last updateOur upstream provider reports a fix has been deployed
Since 12h05 we no longer see errors from the storage provider. We are now waiting for a confirmation the problem is fixed
The issue has been acknowledged by our provider and they are working on a solution
We detected an elevated error rate when using our private hosting provider file storage
Report: "File storage unavailability"
Last updateOur provider confirms the fix is operational.
We are no longer seeing errors from our provider. We are waiting for their confirmation that the problem is solved on their end
Our upstream provider has updated its status page: https://status.outscale.com/ We are investigating the time to resolution with them
We are continuing to investigate this issue.
We are currently noticing an increased error rate on file storage operations on our private hosting provider.
Report: "Storage system is unavailable"
Last updateOur provider informs us that the issue is now resolved
More information can be found here: https://status.outscale.com/
Our storage provider is reporting an outage with the system, this makes images and file upload unavailable for the moment
Report: "3DS Outscale Objects Storage issue"
Last updateThis incident has been resolved.
Incident has been fixed by Outscale at 19:10.
We are continuing to work on a fix for this issue.
A fix is being implemented by 3DS Outscale, and an emergency maintenance is ongoing on the 3DS Outscale Objects Storage service.
Report: "Incident with indexing system."
Last updateThis morning, we detected a problem with our system for indexing platform content. It was no longer working properly and some content was temporary not visible on the platforms. We have already corrected the problem and restarted the indexing process. This incident also has an impact on other platform features, such as mentions and search.
Report: "Partial service unavailability"
Last updateA random portion of requests made during the affected period failed with a 404 or 500 error. This only affected a small part of our clients running in our Google Datacenter Requests started failing at 17h00 and service was restored at 17h20
Report: "Service unavailability"
Last updateThis issue is resolved
This incident will be closed when the maintenance window from our provider closes at 14:00 CET. No further interruption should happen until then
Service has been restored upstream
Our infrastructure provider is performing an urgent maintenance on their network causing temporary drops in connectivity
We detected an interruption in network connectivity on our private datacenter causing instances to be unresponsive
We are currently investigating this issue.
Report: "Load on one of our Outscale K8S cluster node"
Last updateWe performed several tests (including the deployment of a new version of the Elium services) to validate that the new node is stable.
We have created a new node using different hardware specifications (CPU type). After several tests, we found that the abnormal load problem no longer occurs on this type of machine. We continue to monitor the behaviour of this node. At the same time, we are reporting our findings to 3DS Outscale support in order to validate that the problem comes from the type of machine used for this node.
We still testing different configurations for the faulty node (different kernel version, create another node).
We completely recreated the node and redeployed the services. The load continues to increase abnormally and this impacts the customer instances. We have therefore, once again, disabled the services on this node.
We are trying to solve the node load problem. This creates slowness on the instances of clients hosted on our private hosting (Outscale) when the services restart on the node.
During rolling updates, restarting containers on the node produces timeouts
Restarting the node solved the load problem. We are still checking why this load occurred. Currently, the services are working properly again.
We have detected an abnormal load on one of the nodes of our Outscale kubernetes cluster. We had to restart it.
Report: "3DS Outscale issue"
Last updateNew network configuration has been applied.
We have reverted to the previous network configuration and are monitoring the behaviour. The new network configuration will be tested again after further impact analysis.
A maintenance on the network was the root cause of the issue. We restored the previous configuration and continue our investigation.
We have an issue involving 3DS Outscale hosting. We are currently investigating it.
Report: "Memory overload on services storage system."
Last updateOur service storage system is unavailable due to an overload. The platforms are currently inaccessible.
Report: "Queue processing issues"
Last updateThis incident has been resolved.
A larger queue processor is now live, some delays are expected while the queue is being processed
Our queue processor went out of capacity, a larger one is being provisioned
We are investigating an issue in processing background tasks
Report: "Service Unavailable for private hosting"
Last updateStarting at 11:46 until 11:55, service for our private hosting customers was unavailable due to wrongly configured background task. The background task took too much resources from the database, resulting in web service not being able to reach the database and failing to respond. Once identified the background task was cancelled and will be scheduled later with better resources management.
Report: "Bug in the production frontend version"
Last updateA new version 1.67.10 that fix bug in release 1.67.9 has been released in production.
We already reverted to the previous release 1.67.8 in production.
A undetected bug has been deployed in production release 1.67.9 of the frontend. We will revert to release 1.67.8 as soon as possible.
Report: "Storage system outage"
Last updateSystem is fully operational
The storage system is now up and running and service should be resumed. We are still seeing some errors for thumbnails serving
Memory issue is repaired and storage system is rebooting
Our storage system is experiencing a memory issue and is affecting general availability of the service
We are currently investigating this issue.
Report: "Loss of internet connectivity"
Last updateVendredi 11/12/2020 – 14 :25 : remontée d’une alarme backbone concernant le switch B19B4530WIN0 et qq autres équipements situés en aval Vendredi 11/12/2020 – 14 :30 : basic troubleshooting – panne electrique supposée Vendredi 11/12/2020 – 15 :10 : arrivée ingénieur au WDC – qq tests effectués sur l’alimentation et les ventilateurs du B19B4530WIN0 Vendredi 11/12/2020 – 15 :20 : tests non probants – nous decidons de remplacer le chassis du B19B4530WIN0. Le B19B4530WIN0 est constitué de 2 chassis en stack et le chassis défectueux est identifié comme étant le C3750-X – disponible en spare backbone au stock à Wierde. Vendredi 11/12/2020 – 15 :30 : sortie du CAT3750-X spare du stock et transfert jusque WDC Vendredi 11/12/2020 – 15 :30 – 16 :15 : détricotage et reperage des connexions UTP se terminant sur le B19B4530WIN0 pour preparer la migration Vendredi 11/12/2020 – 16 :10 : arrivée du switch spare au WDC. Vendredi 11/12/2020 – 16 :15 : configuration du switch spare. Vendredi 11/12/2020 – 16 :40 : remplacement du switch défectueux. Vendredi 11/12/2020 – 17 :00 : formation du stack entre les 2 membres du switch et début du replacement des cables UTP Vendredi 11/12/2020 – 17 :04 : reboot du switch pour configuration du system MTU. Vendredi 11/12/2020 – 17 :07 : fin du replacement des connexions UTP sur le switch spare. Vendredi 11/12/2020 – 17 :07 : fin de l’intervention ROOT-CAUSE : panne hardware du chassis B19B4530WIN0
The upstream provider connectivity has been resumed in our datacenter
We identified another issue related to serving of thumbnail/file contents that should be resolved as soon as the new DNS record propagates
Our internal DNS resolver was still set to the failing primary internet line, and has been switched to use our backup line DNS provider
We are having DNS issues on some of our private hosting facility since the upstream switch
The datacenter has confirmed they have a problem with one of their internet provider, our backup provider is unaffected
We had to update our DNS records to point to our backup external IP addresses, depending on the cached value, this might take some minutes to propagate
We switched our internet connectivity to our backup provider
Instances hosted in our private hosting facility are unreachable because our internet connectivity is down
We are currently investigating this issue.
Report: "Loss of connectivity"
Last updateConnectivity is restored
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Connectivity issues"
Last updateWe have not detected any remaining connectivity issues.
The correct routing configuration is now deployed, and service is stable
Previous configuration has been deployed, serving of requests is resumed. There might be some failing requests while the new configuration is corrected and deployed
We identified a routing issue in our private hosting facility, and restored a previous working configuration
We detected issues serving requests on our private hosting facility, service may be unavailable
We are currently investigating this issue.
Report: "Update memory allocated to distributed storage system."
Last updateAll systems have been updated.
More memory were allocated to the system.
We are continuing to work on a fix for this issue.
We detected memory pressure on one of our systems part of our distributed storage. We allocated more memory and rebooted this system.
Report: "Degraded performances"
Last updateThis incident has been resolved.
We are currently experiencing degraded performances due to a high load on our background tasks.