Proemion

Is Proemion Down Right Now? Discover if there is an ongoing service outage.

Proemion is currently Operational

Last checked from Proemion's official status page

Historical record of incidents for Proemion

Report: "Maintenance of the device activation process and contract overview"

Last update
Completed

The scheduled maintenance has been completed.

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Scheduled

We will perform maintenance to our ERP system. This system is linked to the Data Platform for the device activation process, GoLive, Tariff Change, Contract Renewal and the hosting contract overview.During the update, the device activation functions (Go Live, Tariff Change, Renewals, Cancellation) on DataPortal and REST API will not be possible. Also the hosting contract overview on Data Portal and REST API may not show all details.

Report: "Go-Live is failing"

Last update
resolved

This incident has been resolved.

monitoring

We have identified the root case and deployed a work-around. We are monitoring the situation.

investigating

New device activations (go-live) currently has a problem, we are recovering the services.

Report: "Delays during device activations"

Last update
resolved

The work-around is active. Go Live requests are being handled normally again. Previously incomplete requests will be completed manually by the Proemion team.

monitoring

A work-around has been implemented, we are monitoring the go live requests.

identified

We are experiencing delays during the Go Live on DataPortal and REST API. We still have a record of all the requests and we will make sure that all of them are addressed. But due to a problem with one of our partners we are experiencing delays.

Report: "Delay during the device activation process"

Last update
resolved

We are experiencing delays during the Go Live on DataPortal and REST API. We still have a record of all the requests and we will make sure that all of them are addressed. But due to a problem with one of our partners we are experiencing delays.

Report: "New telematic unit SIM activation is delayed"

Last update
resolved

This incident has been resolved.

monitoring

The work-around has been active since yesterday night, we are monitoring the SIM card activations.

identified

The third-party acknowledged the issue and suggested a workaround which we are considering.

identified

The SIM provider has identified the problem and is working on fixing it.

investigating

We have noticed a problem with one of our SIM providers. We are monitoring the new telematic unit activation requests from our customers and are working with the SIM provider to activate them as soon as possible. Already activated telematics units are fully operational.

Report: "Data Plaform and Data Ingestion outage"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

Report: "Elevated API Errors"

Last update
resolved

The incident has been resolved, we have identified the root cause.

monitoring

We have restored service, it's fully operational. We are monitoring for any more adverse impact.

investigating

We are continuing to investigate this issue.

investigating

We're experiencing an elevated level of API errors and are currently looking into the issue.

Report: "DigiCert Revocation Incident"

Last update
resolved

This incident has been resolved.

monitoring

Certificates have been renewed and are being deployed and we continue to monitor all systems for any issues.

identified

At 00:41 CEST Digicert notified us of the following incident: https://www.digicert.com/support/certificate-revocation-incident We promptly verified that we were affected and are taking the necessary steps as instructed by Digicert to reissue and deploy new certificates. Based on previous experience with certificate renewals and rollouts we are confident that we can deploy new certificates before the revocation goes into effect. We expect no interruptions to services. We will post regular updates on this incident during the day.

Report: "Partial Fulda Data Center Power Outage"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are continuing to monitor for any further issues.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We are currently investigating this issue.

Report: "Telematics Communication Unit activation (provisioning, go live) not possible"

Last update
resolved

This incident has been resolved.

monitoring

All services should now be operational. We are closely monitoring them.

identified

We found that the power outage also affects the availability of our real-time connectivity solution. We are continuing our work to restore service.

identified

Due to a power outage the new TCU activation functionality is currently not working. We have identified the root cause and are actively working to restore service.

Report: "DataPlatform full system outage"

Last update
resolved

The metrics are stable and back at normal levels. The incident is resolved.

monitoring

DataPlatform is back online. We are monitoring the situation.

investigating

We are continuing to investigate this issue.

investigating

We are experiencing a full system outage of the DataPlatform after our deployment today. Our engineers are working on a solution. We will update this notification as soon as we have more information.

Report: "Data Processing Delays"

Last update
resolved

There have been no issues since yesterday afternoon. We also have confirmation from our internet providers that they are no longer experiencing any issues. In total we experienced a bit under 12 hours of degraded performance which may have caused slowness accessing our Data Portal and the RealTime Dashboard. No data was lost at any time during the incident.

monitoring

We are no longer suffering packet loss when accessing our services. We are still monitoring the situation and will continue to work to minimize any issues that may surface.

identified

Issue is still ongoing. We continue to work with our internet providers to resolve the issue while monitoring the health of our services.

identified

One of our internet provider is currently affected by a large-scale disturbance. We continue to work together with them to resolve the issue and to minimize the impact.

investigating

We are suffering packet loss in our connectivity in one of our data centers which might affect or slow down realtime connections. Data import from third party systems may be delayed, too. We are working on it and give feedback as soon as possible.

Report: "Standard Portal not accessible"

Last update
resolved

This incident has been resolved.

monitoring

Our engineers implemented a fix. We are constantly monitoring availability.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are experiencing our Standard Portal is not accessible. Our engineers are investigating the issue.

Report: "Data ingestion is disrupted"

Last update
resolved

Backlogs have cleared, all functionality is now restored.

monitoring

We have fixed the problem and are now monitoring our backlogs.

investigating

We are currently investigating this issue.

Report: "Elevated REST API Errors"

Last update
resolved

The issue has been identified and a fix deployed half an hour ago. We apologize for the lack of updates since opening the incident and we will try to do better next time.

investigating

We're experiencing an elevated level of API errors and are currently looking into the issue. As a consequence of the REST API errors, users might experience issues loading https://dataportal.proemion.com.

Report: "Incident with device activation process and contract overview"

Last update
resolved

This incident has been resolved.

monitoring

A fix has been implemented and we are monitoring the results.

identified

The issue has been identified and a fix is being implemented.

investigating

We have detected an issue in our backend systems that is impacting device activation process and contract overview. We are working to resolve it.

Report: "Data ingestion is interrupted"

Last update
resolved

This incident has been resolved.

monitoring

Backlog is processed, the realtime feature is operational.

investigating

We had a physical server failure, which impacted our data ingestion and realtime feature. We don't expect data loss, we are investigating.

Report: "Portals and APIs down"

Last update
resolved

The incident is fully resolved.

monitoring

All our products are fully operational. We will monitor the systems further.

identified

Our portals and APIs are available again. Most essential functions are restored. We are working on the rest.

investigating

During a routine hardware replacement our portals and APIs have become unreachable unexpectedly.

Report: "Slow processing of incoming data"

Last update
resolved

The incident has been resolved and the backlog has been cleared.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are currently investigating an issue with slow processing of incoming data. Machine raw data files are sent to the DataPlatform and stored as usual. The processing of the raw data files currently is delayed. No data will be lost. We are investigating the issue and will post updates as soon as available.

Report: "Connectivity issues to our portals"

Last update
resolved

The connectivity issue is now fully resolved. We experienced no data loss and full speedy access to our portals and APIs is now completely restored.

identified

Our engineers are actively working on resolving the issue, and all our services are available again. However, some customers might currently still experience degraded performance.

investigating

We are still investigating the incident. Since our engineers need to physically be present in one of our datacenters this will take a minimum of 1hr more.

investigating

We are experiencing connectivity and performance issues on our portals. We are investigating the problem.

Report: "New data and diagnostics (realtime) functionality unavailable for"

Last update
resolved

Our internal messaging infrastructure had an incident between 5.20 and 9.02 UTC on 21jun2020. During this time new data coming from communication units was delayed and diagnostics (realtime) connectivity wasn't possible. The incident was caused by a mis-configuration of our messaging infrastructure. After the incident resolved backlogged data was processed so we see no data loss, just a delay.

Report: "Data Plaform unavailable due to a problem with our release"

Last update
resolved

At 11:00 CEST we have identified a problem with our production release. All data platform services were affected. We have rolled back to a last good version and all services are back to normal.

Report: "Data processing outage"

Last update
resolved

We have identified and fixed an issue in our data processing pipeline. The backlog has now been processed, no data has been lost. All systems are operating normally.

investigating

We are currently experiencing an issue that impacts the PROEMION data platform. We will send updates as they become available. We apologize for the inconvenience.

Report: "Service interruption"

Last update
resolved

The incident has been resolved. All systems are operating normally.

monitoring

We identified and solved an issue on our virtual host infrastructure. Currently, we are monitoring the affected hosts very closely. All Data is processed normal again.

monitoring

We experienced a short service interruption during a standard maintenance procedure. Services are back up, but performance of some parts of the system will be degraded while services return to a clean state in the background.

Report: "DataPortal not accessible"

Last update
resolved

We have resolved the issue and DataPortal is fully accessible again.

identified

We have identified a misconfiguration causing the DataPortal not to load via the web browser and showing a blank page instead. We are currently working on resolving the issue. All other services are fully operational. We advise customers to use the Standard Portal in the meantime.

Report: "Problem with maps component on DataPortal"

Last update
resolved

This incident has been resolved.

monitoring

We have rolled back to a previously working version of the DataPortal. We will investigate and fix the root cause before updating the DataPortal version again.

investigating

We are currently experiencing some issues with our maps component in our DataPortal product. We are working on resolving the problem so that maps are displayed correctly again.

Report: "Partial outage of Real-Time connectivity"

Last update
resolved

We have identified an error in the message handling that caused the connection to abort. However, since this is an edge case that only affects three devices, we're marking this issue as resolved. We will deliver a fix within the next 24 hours.

investigating

According to our observations, only a minor number of devices is affected by this problem. We continue to investigate the root cause of the problem.

investigating

We are continuing to investigate this issue.

investigating

We currently observe an issue with our Real-Time connectivity service. For some Real-Time Client software (e.g. Proemion Dashboard), the Real-Time connection to the device is established but immediately breaks down when the first messages from the remote CAN bus are received. We are currently investigating the root cause of this issue with a high priority. We will inform you as soon as we have updated information.

Report: "Partial mobile network connection problems"

Last update
resolved

The mobile network provider informed us that the problem is resolved. Our monitoring shows that the number of online communication units is normal again. Logged datasets buffered in the non-volatile memory of the communication units has been transmitted to our system again. No data loss has been detected for our monitoring devices. We consider this issue as resolved. Our monitoring will continue to check the status.

monitoring

We are continuing to monitor for any further issues.

monitoring

Our monitoring tools show that the number of connected communication units is within the normal range again. Until now the provider has not indicated that the problem is solved. We will continue to monitor the situation.

monitoring

We have noticed that our monitoring setups are back online again. Until now the provider has not indicated that the problem is solved. We will continue to monitor the situation.

identified

One of our mobile network providers is currently facing infrastructure problems. Our partner is working actively on resolving the problem. At the moment we are monitoring the impact on our customers. Currently, we estimate 5% of affected devices across our customers. Unfortunately, it still might impact all devices of individual customers. The devices might not be able to establish a proper online connection. In that case, Real-Time connections are not possible. Logging data is buffered on the devices what mitigates the problem. We don't assume any data loss. The affected mobile network provider and their signaling vendor already started to activate traffic. This process is done with caution in order to prevent potential overload of the mobile network provider's infrastructure. We are sorry for the inconvenience and are in active contact with the affected mobile network provider.

Report: "Partial mobile network connection problems"

Last update
postmortem

### Course of events On Friday August 3rd, 2018 after 11:00 CEST some of our end to end monitoring devices did not go online after a reset. The automatic monitoring infrastructure detected this and alerted PROEMION on-call service. The staff tried to identify the root cause of the problem: It turned out that also other communication units with SIM cards from the affected provider did not go online anymore after performing a power on reset. Devices with SIM cards from this provider that already had an active online connection continued their sessions without problems. At 14:00 CEST our support contacted the SIM provider. The contact confirmed that there is a problem with the online connectivity of their SIM cards working in roaming environment and that on PROEMION side no action is required. At this point we also checked the amount of affected devices by comparing the number of connected communication units to an average number for the same time on Fridays. At this time about 5% less communication units were connected to our data platform. At 14:30 CEST we announced the outage on our status page. During the outage we: * continuously monitored the situation, * frequently contacted the SIM provider to get updates, * posted updates to the status page to keep our customers informed. The peak outage compared to normal weekend traffic was about 17% less communication units. On Saturday August 4th, 2018 between 16:35 and 16:40 CEST we saw a significant step up in the number of connected communication units. Afterwards the number of connected communication units was in the normal range of the average value for this time on Sundays. The SIM provider also informed us that the problem should be resolved, and connections should be possible again. We updated the information on our status page after confirming that connectivity was normal again and the outage has been resolved. Nevertheless, we continued to monitor the situation to get early information about potential recurrence of the problem. ### Review No log data was lost during the connectivity outage. The data was stored in the internal non-volatile buffer of our communication units. It was transferred to our system after the successful reconnection. In general, we see minor possibilities for improving our monitoring setup to reduce the identification time for such 3rd party outages. We lost some of our end to end monitoring capabilities during the outage. We are currently implementing some fall-back solution for our end to end monitoring for the case of a full provider outage. The core improvement will be to push our partners to provide a better communication and information flow towards us. ### Conclusion We believe that it is very important to inform our customers about issues with our system as early and as comprehensive as possible. In case of 3rd party systems like telecommunications provider's systems, we do not have control about the containment and recovery activities at the 3rd party provider. But especially in these cases we continuously monitor the situation together with the provider and regularly post updates to our status page. We apologize for potential issues that may have occurred on our customer's side during the outage. Even though our influence on the resolution of this issue was low, we did our best to keep you informed.

resolved

The incident has been resolved. All affected communication units are back online and no data has been lost. We are continuing to monitor the situation.

monitoring

We are continuing to monitor for any further issues.

monitoring

The the affected mobile network provider and their signaling vendor started to activate traffic. This process is done slowly and with caution in order to prevent potential overload of the mobile network provider's infrastructure. At Proemion we can see successful connections are being made to our servers and the first affected communication units are coming online. We will keep you updated on the progress.

monitoring

The the affected mobile network provider and their signaling vendor started to activate traffic. This process is done with caution in order to prevent potential overload of the mobile network provider's infrastructure.

monitoring

We still have not received a "green light" from the affected mobile network provider yet. Currently, we observe 17% of machines been impacted by the mobile network provider issue. We'll proceed with the monitoring of the problem and we will keep you informed.

monitoring

We have not received a "green light" from the affected mobile network provider yet. Currently we are observing more affected customer machines, mainly in North America. For Asia and Europe the weekend-low has already arrived and effects are less visible. We'll proceed with the monitoring of the problem and we will keep you informed.

identified

We are continuing to work on a fix for this issue.

identified

We are continuing to work on a fix for this issue.

identified

One of our mobile network providers is currently facing infrastructure problems. Our partner is working actively on resolving the problem. At the moment we are monitoring the impact on our customers. Currently, we estimate 5% of affected devices across our customers. Unfortunately, it still might impact all devices off individual customers. The devices might not be able to establish a proper online connection. In that case, Real-Time connections are not possible. Logging data is buffered on the devices what mitigates the problem. We don't assume any data loss. We are sorry for the inconvenience and are in active contact with the affected mobile network provider.

Report: "Dataportal unavailable"

Last update
postmortem

Today we introduced an eager statistic collection about the structure of our datasets. Using these statistics we can do more targeted and hence faster queries. Unfortunately statistics collection put significant load on our systems. As the Dataportal usage kicked in the backend did not keep up reprocessing and delivering services to the Dataportal at the same time. We did not catch this problem ahead of time due to different usage pattern on the non-production environments that the backend change went though first. We are evaluating options to provide a more comparable usage pattern on the non-production environments. We are sorry for the interruption of the service delivery.

resolved

All services are up and running again. Sorry for the inconvenience. The Dataportal Team is looking into the case right now from preventing it in the future.

identified

During an early morning release of a new version, we run into a problem with the Dataportal. We are fixing it right now.

Report: "Web service performance degradation"

Last update
postmortem

# Facts This morning, from 3 to 5 AM UTC, we experienced server outages that affected the Data Platform. The platform took the outages in a graceful fashion. The only direct impact for our customers was a latency spike in the web service. All requests succeeded, but four percent of requests were delayed. Device connections were lost, but devices were immediately able to reconnect and then worked as usual. The user interfaces, data processing and alerting proceeded to work fine. # Ongoing measures We have improved monitoring for the web services, and we will continue to look into options for how to avoid such latency spikes for similar failures in the future.

resolved

The server has been restored and everything is working normally again.

identified

One of our server nodes is experiencing an outage. The PROEMION Data Platform is fully available, but you may experience degraded performance.

Report: "Full Outage on PSP"

Last update
postmortem

# Outage on February 13th Last week we had a series of problems caused by a failing hardware system. We decided to replace it. During the replacement process we ran into an issue that caused yesterday's outage. We know that our customers rely on our service daily. We're continually improving our setup and we're sorry that we went offline. # Facts On Feb 13, we experienced a full outage from 13:40 to 13:57 CET. All PROEMION Data Platform services have been affected and were restored after 17 minutes. No data has been lost and no customer action has been required. # What happened The PROEMION data platform is hosted on a set of hardware clusters, one of which has been experiencing hardware problems. We are currently in the process of replacing that system with a new one. One of the key components of this system is a cluster manager, which is responsible for resource placement. After we added the replacement node, the cluster manager ran into an issue and failed to allocate resources. This erroneously caused a full service shutdown. Since we were actively working on those components, we immediately recognized the outage and started restore procedures. # Mitigation We are addressing the hardware issues by completing the system replacement. Analyzing the cluster manager's failure and evaluating alternatives is scheduled for this week. # In Conclusion We're aware of how important availability of our data platform is to our customers. We're taking these issues absolutely serious and are commited to not only resolve them quickly, but also to prevent them in the future.

resolved

All services are back up and running. The outage started at 13:40 and get resolved at 13:57-CET. We had no data loss. There is no action required for the customers. We are going to follow up the incident with a postmortem soon.

identified

We encountered a major hardware failure during a hardware replacement. Tech-Teams is restoring the infrastructure at the moment.

Report: "Elevated API Errors"

Last update
resolved

The issue has been resolved and we monitored the system closely during the last hour.

monitoring

The affected components have been successfully recovered. We are monitoring the system.

identified

We have identified the cause and are in the process of recovering the affected components.

investigating

We're experiencing an elevated level of errors (Portal and API) and are currently looking into the issue.

Report: "Webportal not available for some users"

Last update
resolved

We have fixed problems with the Webportal that were hitting some users. The problem has been resolved and we monitored it closely during the last hours.

monitoring

We have rolled back the effected Webportal instances. Currently we are monitoring the performance and tracing the cause of the problem.

identified

We have problems with the Webportal for some of the users. For some users the Webportal instances run fine, for others we notices significant problems. We are currently rolling back the problematic instances.

Report: "Data Platform unavailable"

Last update
resolved

The issue is resolved and everything is back to normal.

investigating

We're currently investigating issues that cause the data platform to be unavailable or experience serious performance degradation.

Report: "Proemion Portal not reachable"

Last update
resolved

We had a problem with the public access to the Proemion Portal from 09:54 - 10:04 The problem has been resolved and we are backtracking the exact cause of the problem to prevent from happening again. We are tracking it closely for the next hours.

Report: "Administration - Manage Machine - Devices are removed from machines"

Last update
resolved

The issue is resolved the bugfix is deployed to the productive system. The problem cannot be reproduced anymore.

identified

We have identified an issue with device entity assignment being removed from machine entities when using the manage machine administration function. The result is that the devices cannot go online and send data anymore after they have been unassigned from the machine entity. We have identified the root cause of the problem and are working on a fix. Our suggestion is to not use the manage machine administration function until this issue is resolved. WORKAROUND: If you need to use the manage machine functionality, you have to re-assign the device to the machine entity after you updated the machine entity. For this purpose open the Administration -> Entities -> Device -> Manage function, select the details of the device that had been unassigned by the manage machine and re-assign the machine in the Machine selector of the manage device details.

Report: "Degraded Performance in Data Processing"

Last update
resolved

We are back to the normal processing latency.

monitoring

Tech team has fixed the problem. The Processing latency is decreasing back to normal. The problem was related to the situation we had at the 15th. the resulting code changes from the 15th. have been deployed today to solve the problem. We are monitoring the processing latency further. Thanks for your patience.

investigating

We are currently experiencing delays in our data processing. Our tech team is working to resolve this issue.

Report: "Problems with slow data processing/interpretation"

Last update
resolved

The entire data processing delay has been processed and is ready to use. We are proceeding with the analysis of the problems we encountered to make sure we don't run into it again.

monitoring

We have addressed the cause of the slow data processing. All services are up and running again. The system is catching up with the delay in data processing. Soon we should have the latest date ready to use.

investigating

We have taken the majority of the services back up. We still don't accept new data from the devices. There for the interpreted data in the DataPlatform is outdated. The tech team is proceeding with addressing the problem.

investigating

We are experiencing a problem with the data processing/interpretation. This mainly affects the WebPortal and introduces a delay on the data processing time. The tech team is on checking the problems.

Report: "Portal might not provide content for some users"

Last update
resolved

The problem is resolved. During the monitoring period we documented the working condition.

monitoring

We have addressed the issue. Portal access is working properly again. The defective system has been fixed. We are monitoring the system closely now. Adjustments for the Portal monitoring have been started to avoid this problem in the future.

identified

We have identified a problem with the Dataplatform Web Portal. For some users, no content is provided. We have tracked the problem to only one of the system serving the Portal. The tech crew is addressing the issue right now.

Report: "Portal very slow / down"

Last update
resolved

The Portal is running as expected again. We follow up with a postmortem soon.

monitoring

The Proemion Data Platform team has addressed the problem and we are currently Monitoring the situation in detail.

investigating

We measured that the Portal is very slow / down. The tech team is investigating the problem right now.

Report: "status.proemion.com integration problem"

Last update
resolved

We had a problem with the integration of our monitoring stack and the status.proemion.com page. The gap in the provided public metrics was caused by this integration problem. The Proemion Data Platform was not affected at any time.

Report: "Full Outage on Proemion Data Platform"

Last update
resolved

Outage on March 14th - 15th Facts Yesterday we had a major file system cluster failure. This failure caused the entire PSP to stall at 23:08 GMT+1. We mitigated the failure at 06:51 GMT+1, starting to parse the backlog that had accumulated in the meantime. Everything was running normally again around 7:30 GMT+1. Our focus during the restore was on the analysis of the problem and 100% recovery of the data. No data has been lost and no customer action has been required. The automatic update of the status.proemion.com page was affected by the outage. Mitigation During the analysis of the problem we found problematic incompatibilities of the file system cluster with our setup. To avoid further problems we have moved the Proemion Data Platform onto a different file system environment. The failure on the statuspage update will be addressed during the next days. In Conclusion To deliver the high available Data Platform that our customers deserve, we have simplified the file system stack. Our team is committed to follow up on problems like this one and we take them absolutely seriously. Besides the push to get new features, APIs and web frontends to our customers we are constantly working on infrastructure of our platform.

Report: "Processing latency spike"

Last update
resolved

We had a performance degredation that caused our "latest data" display fields to be outdated for about half an hour. The main data flow with all the historic data was unharmed. Reporting was also not affected. The degraded components are scheduled for replacement on sunday.