Kindly

Is Kindly Down Right Now? Check if there is a current outage ongoing.

Kindly is currently Operational

Last checked from Kindly's official status page

Historical record of incidents for Kindly

Report: "Handover to human chat are failing"

Last update
resolved

This incident has been resolved.

monitoring

Issue seemed to be solved. Currently monitoring and understanding what caused the issue.

investigating

We are continuing to investigate this issue.

investigating

We are currently investigating the issue.

Report: "Kindly Inbox (Live chat) responds slowly"

Last update
resolved

This incident has been resolved.

investigating

No clear reason yet, still investigating

investigating

Currently investigating the issue.

Report: "Handover to human chat are failing"

Last update
resolved

This incident has been resolved. Handover to humans had been failing for 14 minutes.

Report: "Upgrading analytics database"

Last update
resolved

After a normal business day we have looked at the numbers and they seem to be normal. We will continue to monitor the sessions numbers to correct any discrepancies that might have occurred because of the database update issue.

monitoring

It has been determined that there is a bug in the analytics database that affects the behavior for how Kindly calculates sessions. We have an open ticket with the database provider where they have acknowledged this bug and are working on a fix, but they don't know how long it will take. In order to restore most of the functionality of the database we have disabled a part of the sessions calculation before resuming normal analytics operations. This will affect stats in dashboards that measure sessions. We will resume ingesting analytics events to still have accurate statistics about messages, fallbacks, labels and all stats that are not derived from sessions. Once a bugfix is available, we plan use the accurate data to recreate the sessions data for the affected period and have accurate sessions statistics, but in the meantime the charts that show sessions may not reflect accurate numbers.

identified

During a routing maintenance upgrade of the analytics and unexpected event happened that requires us to reboot the database service. While this is ongoing analytics dashboards may be unavailable for some time. During this, all chatbot operations work as normal and all analytics events are recorded in a queue and no data is lost. Only the ability to view the data will be temporarily unavailable.

Report: "Minor degredation in analytics pages"

Last update
resolved

Minor degredation to the analytics pages of Kindly due to partial outage in Kindly's analytics vendor.

Report: "Analytics database unplanned maintenance"

Last update
resolved

Database is online, dashboards and APIs function as normal, all queued analytics events have been written to the database.

monitoring

The database is now back online and it is possible to make queries again. We will now start writing the queued events to the database.

identified

We are continuing to work on a fix for this issue.

identified

While testing out some new features of the analytics systems we accidentally triggered some downtime for the analytics database. Chats are unaffected, and the platform app is working except you can't currently view analytics. APIs for analytics are also temporarily unavailable. No data is lost, all analytical data is being recorded and waiting in a queue to be written to the analytics database once it is online again.

Report: "Degraded performance of AI services"

Last update
resolved

We experienced degraded performance in our AI services following a faulty deployment, which led to less accurate dialogue matching and AI predictions, such as those from Kindly GPT. We relied on our fallback AI engine system during this period. The issue was identified somewhat late but was resolved shortly after discovery. We will implement measures to detect this kind of degraded performance earlier and reduce its impact.

Report: "Issue with service response times"

Last update
resolved

All systems seems stable.

monitoring

The issue might be related to high traffic, but not clear. Services have been upscaled and we will monitor the changes.

investigating

Still investigating, trying different approaches to find the core issue.

investigating

We are seeing issues where some users experience longer response time from the bot than normal. The issue under investigation. Most users should not be affected.

Report: "Issues with access to the Kindly Platform"

Last update
resolved

Some users where not able to access the Kindly Platform. Affected users would have experienced the outage for 5 minutes.

Report: "Applying changes to dialogues"

Last update
resolved

The issues is now resolved. All bots using Intents/Kindly Plus should no longer experience any delays. Bots still using samples might experience some delay if lots of changes are applied at the same time.

identified

We are currently testing a fix in our staging environments.

identified

We experience some issues when applying changes to newly saved dialogues. Your edit will be saved, but it might not have the immediate effect in the bot as you are used to. The queue of applying the changes gets clogged, resulting in some changes taking up the available resources for others (also known as an indexing issue). We are aware of what is wrong and are working on short term and long term fixes to the problem.

Report: "Chat logs not visible"

Last update
resolved

The issue have been resolved and all chat logs should be visible as normal.

identified

The issue has been identified and a fix is being worked on.

investigating

Users in the platform cannot see history prior to 1st of July. We are working on a solution.

Report: "Problems with handover to live chat"

Last update
postmortem

In the morning hours of June 28th as traffic started picking up, one of our services started having issues serving web requests. This service is used for handover between the main chatbot service and external customer service systems and lets users talk to human agents. The main chatbot was unaffected and continued to respond as normal, it was only communication between end users and agents that was affected. We received automatic alert about errors soon after it started to fail and began to investigate. One of the error messages that the webserver displayed, suggested that the reason might be that it was out of memory, so we started by investigating this and trying to increase memory for the service. Unfortunately this was a misleading error message that lead us to spend our time looking into the wrong solution. The actual problem was latency, as the webserver _was_ responding to requests, but too _slowly_ and the connections were timing out. Once we figured this out we were able to scale up the service to have more capacity and the service recovered quickly and went back to handling requests in milliseconds instead of minutes. To avoid this happening again we have increased the capacity that the service can scale up to automatically so that it can dynamically respond to increased traffic. We’re also looking into if we can add more alerts that tell us explicitly if server response time starts to increase a lot, so that we will get an accurate error message that indicates the actual problem. We are also looking into if it’s possible to change the misleading error message from the webserver, to prevent being mislead again if something similar were to happen in the future

resolved

This incident has been resolved.

monitoring

This morning there was an incident with a sub-system handling handover of chats from bot to human agents. The issue has been handled and everything appears to be working as intended now, but we are still monitoring to make sure. The incident only affected handover for certain clients using this particular service, not all clients. Conversations with chatbots were unaffected, only users attempting to contact a human would have noticed the problem.

Report: "Kindly GPT slow or not responding"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

Bots using Kindly GPT will experience a degraded performance linked to issues on OpenAI servers.

Report: "Kindly GPT slow or not responding"

Last update
resolved

OpenAI is back up and bots using Kindly GPT is back in full operation.

identified

Bots using Kindly GPT will experience a degraded performance linked to issues on OpenAI servers.

Report: "Issues with DNS provider"

Last update
resolved

All services are globally available as normal again.

monitoring

DNS updates are still being propagated. As this progresses, our services are becoming available again for users who were not able to connect previously.

identified

DNS servers are progressively coming back up. The outage currently seem to affect about 50% of users.

identified

Currently experiencing issues with our DNS provider. Our services might be unreachable for some users.

Report: "Platform not responding"

Last update
resolved

Problem should be fixed and resolved.

identified

Issue is identified. Trying out potential fixes to the problem.

investigating

We are currently having issues with loading content in the platform. We are working on solving the issue. All chatbots are running as normal.

Report: "Bots not responding with Greeting"

Last update
resolved

This incident has been resolved.

identified

We are experiencing issues with Greetings and performance. We have a solution to the problem and are working on getting it out.

Report: "Analytics service availability"

Last update
resolved

We have deployed several changes to fix a performance regression after a database upgrade. We are continuing to implement improvements even after the performance of the statistics component have stabilized.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

The analytics component is having database related issues. Analytics in the platform will be unavailable until this is resolved.

Report: "Issues with custom handover functionality"

Last update
resolved

We experienced problems relating to the custom handover functionality. The cause was identified as an issue related to message event listeners in Kindly Chat and a fix was rolled out.

Report: "Analytics availability"

Last update
resolved

This incident has been resolved.

identified

The analytics component is having database related issues. Analytics views in the platform are currently unavailable or slow to load. We are working on fixing the performance issue.

Report: "Bot is not updated when making changes in Build"

Last update
resolved

This incident has been resolved.

monitoring

The queue looks healthy now and updating seems to work, continuing to monitor queue for a while

identified

We have a working theory on why the update queue is struggling and are adjusting some infrastructure to mitigate the pressure

investigating

We have received reports that bots aren't registering new dialogues that are added in Build

Report: "Facebook app deactivated"

Last update
resolved

This incident has been resolved.

monitoring

The app has been reactivated by Facebook and bot owners need to reconnect their Facebook pages to start receiving messages again.

identified

The Facebook app used to connect a Facebook page to a Kindly bot or Inbox has been temporarily deactivated. This means messages from users are not forwarded to bots/Inbox at the moment. Once the app has been reactivated by Meta, the messages will start flowing again. We are waiting for the app to be re-reviewed and reactivated by Meta after we have supplied the necessary information required to perform a review.

Report: "Plugin: Zendesk ticket"

Last update
resolved

We have identified an issue which affected users of the Zendesk ticket plugin and a fix has been deployed. Please review chats after ~10:30 CET making sure your customers receive the proper response. If investigating webhook responses in the debug console to verify a proper response, look in the _request body_ for `exchange_slug: zendesk_succcess`. Please note that even though the plugin returns HTTP 200 OK, it does not mean the ticket has been successfully submitted.

Report: "Google Cloud major outage"

Last update
resolved

Our cloud provider had a major outage on their network for the eu and us regions. The effect of this was that all our bot delivery services and platform interface was unavailable and unresponsive for about 20 minutes. Link to their incident report can be found here: https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5REh

Report: "Message queue provider network issue causing spurious errors"

Last update
resolved

This incident has been resolved.

monitoring

The network issue with our message queue provider should be fixed and we are monitoring the situation.

identified

We are experiencing a network related issue with our message queue provider. This is causing spurious errors on app.kindly.ai when performing actions like creating or updating a dialogue, logging in, etc (these all write to the message queue). We are working on remediating the issue

Report: "Bot not responding in some cases"

Last update
resolved

Immediate issues with bots having Draft & Publish enabled are resolved. We are continuing to improve this features operational reliability going forward.

identified

Issue has been identified to do with bots that had Draft & Publish enabled. The affected bots are now operational again.

investigating

Some users are experiencing lack of response from the bot. We are currently investigating the cause of this.

Report: "Issue accessing bots"

Last update
resolved

The incident has been resolved. The cause of failed requests was a non-production database migration unintentionally run in the production environment.

monitoring

The issue has been resolved and app.kindly.ai is accessible again

identified

The issue has been identified and we are rolling out a fix. app.kindly.ai is currently set in maintenance mode to avoid data inconsistencies.

investigating

We are currently investigating this issue.

Report: "Incomplete statistics"

Last update
resolved

This incident has been resolved.

identified

We are continuing to work on a fix for this issue.

identified

New events are now being processed and fresh statistics should be available again. We are working on backfilling unprocessed events between ca 2020-10-18 21:00 and 2020-10-19 03:00 UTC+2.

investigating

After migrating our services we are experiencing issues with inserting events into our statistics database resulting in incomplete data from around 2020-10-18 21:00 and later. We are working on restoring these events and allowing fresh data to be visible on app.kindly.ai dashboards and directly from sage.kindly.ai.

Report: "Database outage"

Last update
resolved

At 14:36 CEST we were alerted that chatbots were unresponsive. We found that one of our critical databases (Heroku Redis on AWS) had become unavailable. While we were still investigating, at 14:52 the service became available again and chatbots resumed working. We are monitoring the corresponding incident at Heroku (https://status.heroku.com/incidents/2084) to determine what happened to the database and how it can be mitigated in the future.

Report: "Bots answering with fallback"

Last update
resolved

This incident has been resolved.

monitoring

We are continuing to monitor for any further issues.

monitoring

A fix has been deployed to production, we will continue to monitor the situation

investigating

We are continuing to investigate this issue.

investigating

This morning there were reports of bots answering with fallback even though there were dialogues confidence score above the set threshold.

Report: "Datastore outage"

Last update
resolved

All bots and skill indices have now been rebuilt and we are monitoring the situation.

identified

We are currently rebuild message indices and continuing to work on getting all bots back online.

identified

We are experiencing an issue with our message datastore. We are working on mitigating the issue.

Report: "Database outage"

Last update
resolved

This incident has been resolved.

monitoring

We have restored the database to a backup a few minutes before the incident (14:40 UTC+1)

identified

We are continuing to work on a fix for this issue.

identified

The issue has been identified and a fix is being implemented.

Report: "Chatbubble issues"

Last update
resolved

This incident has been resolved.

identified

We are continuing to work on a fix for this issue.

identified

We are currently experiencing performance issues with the systems relating to the chat bubble. We are deploying a fix to mitigate the issue.

Report: "Chatbubble issues"

Last update
resolved

A memory issue has been identified and patched, this incident is resolved but we will continue to monitor the morning traffic peaks.

monitoring

A fix has been implemented and we are monitoring the results.

investigating

We are continuing to investigate this issue.

investigating

This morning we experienced performance issues with the systems relating to the chatbubble. At some point these issues also caused the chatbubble not to load. Systems are back up now, and we are investigating

Report: "Analytics service maintenance"

Last update
resolved

We saw analytics events being dropped after a minor version upgrade of the analytics storage backend. The problem has been resolved and data from the period is being recovered.

Report: "Platform performance"

Last update
resolved

This incident has been resolved.

monitoring

We have identified and resolved the issue, and are monitoring the situation.

investigating

We are currently experiencing performance issues with the platform, it is expected to be fixed shortly. This does not affect the end user chats.

Report: "Analytics service availability"

Last update
resolved

The underlying issue with the analytics service has been fixed. Further fixes to improve platform reliability are being implemented.

monitoring

The issue has been identified as a disruption in the analytics service. This affected the overall availability of bots. Analytics gathering was disabled from 14:47 until 15:13 until the service was restored. We are working on multiple fixes to remove the dependency on the analytics service for general bot operation and allow for graceful degradation.

Report: "Message delivery"

Last update
resolved

We experienced a scaling issue resulting in general issues such as message delivery and failing chat bubble initialisation.

Report: "Analytics: Cache connection error"

Last update
resolved

This incident has been resolved.

monitoring

A version upgrade to one of our dependencies has caused our cache to sometimes experience a timeout error when gathering analytic data. We have rolled back the upgrade, and are monitoring the issue.

Report: "Message storage issues"

Last update
resolved

This incident has been resolved.

monitoring

All systems are back to normal. We are continuing to monitor the situation.

monitoring

We have seen increased load on our chat message storage service, the service capacity has been upgraded to handle the increased load. All systems are back to normal. We are continuing to monitor the situation.

Report: "Elasticsearch networking issues"

Last update
resolved

Our Elasticsearch provider reports that the networking issue has been resolved.

monitoring

Our Elastisearch provider have resolved their networking issue. We are still monitoring the situation.

identified

We are continuing to work on a fix for this issue.

identified

Our Elasticsearch provider is currently experiencing networking issues. This will result in delayed response time for all bots. For detailed updates, please follow http://cloud-status.elastic.co