Historical record of incidents for Close
Report: "Delayed Webhook Processing"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
This issue has been identified and a fix is being implemented.
We've identified an issue with our webhook processing system. Webhooks may be delayed up to twenty minutes. Updates will be posted as they are available.
Report: "Custom Field Event Interruption"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Between April 22nd 14:25 UTC and April 23rd 8:50 UTC our search functionality operated on outdated data for some users, which resulted in potentially incorrect search results. In addition, some webhooks with custom field updates were not delivered. No data was lost during the incident. ## Root Cause and Resolution A software update introduced a bug, where some events for custom field updates were malformed. As a result, the changes to custom fields weren’t reflected in search, and webhooks for these events were not sent. Close Engineering reverted the change, and reindexed all affected leads. We are adding guardrails which will prevent us from sending such malformed events in future. ## Timeline * 22 April 14:25 UTC - A software update is deployed that sends malformed events for some Custom Field updates * 22 April 21:20 UTC - Close Engineering is notified of leads not populating correctly in smart views * 23 April 04:36 UTC - Root cause is identified * 23 April 05:49 UTC - Measures applied to stop further impact * 23 April 08:46 UTC - Completed reindexing of affected leads
This incident has been resolved.
Report: "OpenAI - Increased Error Rates"
Last updateThis incident has been resolved.
OpenAI are currently experiencing increased error rates on their side, which is impacting Lead Summaries and AI Search in Close. You can follow for updates on their status page here - https://status.openai.com/
Report: "Degraded Dialer Performance"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Dialer functionality was impaired for 58 minutes from 15:20 UTC to 16:18 UTC on March 10th 2025. During this time the Dialer feature could get stuck in “connecting” state. ## Root Cause and Resolution The issue was triggered at 15:20 UTC by a service rebalance that caused a number of client connections to close simultaneously. When these clients attempted to reconnect, the sudden spike in traffic that occurred in peak traffic conditions exceeded system limits, leading to service disruptions. Our team quickly identified the cause and worked to stabilize the system. We restored normal operations by 16:18 UTC. To prevent similar incidents in the future, we are reviewing system thresholds and improving our ability to handle sudden increases in demand. ## Timeline * 15:20 UTC - a service rebalance occurs, starting a wave of new connections being established * 15:21 UTC - a portion of requests starts getting dropped due to rate limits * 15:28 UTC - alerts trigger and our response team began identifying the root cause * 15:36 UTC - the rate of dropped requests subsides, but then increases again soon due to a wave-like pattern of retries * 16:18 UTC - final wave of increased errors finishes and situation returns to normal operational levels
This incident has been resolved.
We are continuing to investigate the cause of the degraded performance of our Dialer system. Our Dialer system is now functioning normally. We are monitoring performance.
We've become aware of degraded performance of our Dialer service. We are investigating the issue. Updates will be posted as they become available.
Report: "Search Indexing Delays"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact On February 14th, 2025 for two and a half hours \(between 19:50 UTC and 22:00 UTC\) some users saw outdated data in search results and smart views. No data was lost during the outage. ## Root Cause and Resolution A software update introduced malformed data to our Search platform that disrupted updates to our Search database. Close Engineering reverted the change and deleted the malformed data, which allowed the platform to resume processing updates to our Search database. The processing backlog was cleared by 22:00 UTC when full functionality was restored. We are implementing a new set of metrics and alerts that will help us identify this kind of failure more quickly. We are also enhancing the Search platform to better identify and discard malformed data to prevent similar incidents from occurring. ## Timeline All times are in UTC and events take place on 2025-02-14. * 19:50: A software update is deployed that sends malformed data to our search platform. * 20:33: Close Engineering is notified of an elevated error rate on the search platform and start investigating. * 21:04: Root cause is identified and work starts on a fix. * 21:40: The issue is resolved and all data is up-to-date.
This incident has been resolved.
A fix has been implemented and we are monitoring the results. Search indexing may be delayed for some users.
The issue has been identified and a fix is being implemented. Search indexing may be delayed for some customers.
We have become aware of degraded performance of the Close system. We are actively triaging the issue. Updates will be posted as they become available.
Report: "Degraded Performance"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Beginning 22:00 UTC until 23:03 UTC on Tuesday February 11, 2025 users of the Close App experienced significantly degraded performance and an elevated error rate. This included slow page loads, long API response times, delayed email syncing, and delayed calendar syncing for all customers. ## Root Cause and Resolution At 22:00 UTC on Tuesday February 11, 2025 a change was deployed to the production environment of the Close system. This change introduced a poorly optimized database query. This caused our primary back end database to become overloaded and unstable. Close Engineering identified and removed the offending database query by 22:47 UTC. Performance began to gradually return to baseline, fully recovering by 23:03 UTC. ## Timeline * 22:00 UTC - A change is deployed to production introducing a poorly optimized database query * 22:01 UTC - Load on our primary back end database begins to climb * 22:06 UTC - Load on our primary back end database breaches critical thresholds alerting Close Engineering * 22:17 UTC - Close Engineering isolates the poorly optimized database query and begins to remove it from production * 22:47 UTC - Removal of the poorly optimized database query completes * 23:03 UTC - Close system performance returns to baseline
This incident has been resolved.
Our App UI and API have returned to normal performance. Our email and calendar syncing systems are operational but delayed. We are continuing to monitor for further issues.
We are continuing to monitor for any further issues.
A fix has been implemented. We are monitoring the results.
The issue has been identified. A fix is being rolled out to our production system.
We have become aware of degraded performance of the Close system. We are actively triaging the issue. Updates will be posted as they become available.
Report: "Elevated Response Times"
Last updateThis incident has been resolved.
The response times should be normalized, we are monitoring the situation.
We are taking mitigation steps to reduce high database load.
Users may currently experience elevated response times. We are investigating the situation.
Report: "App UI unavailable"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact The Close Web UI was unavailable for 15 minutes from 2025-01-06 17:33 to 2025-01-06 17:48. ## Root Cause and Resolution A change to our deployment process introduced a build where JavaScript file contents were updated without corresponding filename changes, leading to caching issues that caused outdated files to be served. We quickly rolled back to a previous build to restore services and then resolved the underlying issue by updating the affected assets. ## Timeline * 17:33 UTC - Deployment of the broken Close Web UI * 17:48 UTC - Restored prior version
This incident has been resolved.
Issue was identified and reverted. Monitoring to confirm no further issues.
We are currently investigating the issue.
Report: "Search Indexer Delays"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Changes to Lead data in Close were not reflected on search-based pages, including Leads, Inbox, and Reporting pages from 13:40 UTC to 15:05 UTC on Nov 18th 2024. ## Root Cause and Resolution Due to a change in a recent release, we sent bad data to the system that powers our search tools. As a result, this system failed to process any new incoming information and search results became outdated. Once we determined the root cause, we updated the search system to ignore such bad data in the future. Search results are now up-to-date. ## Timeline * 13:40 UTC - bad data gets sent to the system that powers our search * 14:15 UTC - engineers identify the issue * 15:05 UTC - rollout of the fix is complete
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating an issue where our Search Indexer's ingestion of new events is delayed. This may result in the latest changes not being reflected on many pages across the application – Leads Search, Contacts Search, Inbox, Opportunities, Reporting, etc.
Report: "Search Indexer Delays"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact For some clients changes performed in Close took longer than usual to be reflected in our search functionality on November 4th from 8:06 to 9:31 UTC. ## Root Cause and Resolution We are in the process of moving data to a new search infrastructure. Due to encountering a number of unexpectedly large objects during this process the indexing of new changes got delayed. We adjusted the indexing process to truncate the overly large objects to avoid encountering this in the future.
The issue is resolved now.
The failover succeeded. The Indexer is still catching up with the backlog of events. We are monitoring the situation and estimate that everything will be back to normal within 3-5 minutes.
We have identified a problem with one of our search databases and have initiated a failover.
We are investigating an issue where our Search Indexer's ingestion of new events is delayed. This may result in the latest changes not being reflected on many pages across the application – Leads Search, Contacts Search, Inbox, Opportunities, Reporting, etc.
Report: "Issues syncing with Calendly"
Last updateThis incident has been resolved.
We are currently investigating issues with calendar events syncing via our Calendly integration.
Report: "Application Loading Issues"
Last updateThis incident has been resolved.
We have remediated the performance issue affecting some Close customers. The Close App & API should be functioning normally for all users. We will continue to monitor our system's performance.
We are continuing to investigate the issue. We will post updates as they are available.
We are currently seeing intermittent issues with Close loading. We will let you know as soon as this is fixed.
Report: "High database load causing app performance issues"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Between 13:40 and 16:55 UTC on Wednesday September 28, 2024 the Close App and API experienced degraded performance. Some users may have noticed the App UI & API responding sluggishly. Concurrently, between 14:33 and 19:30 UTC background task processing inside of the Close app was disrupted. During this time Workflows and Email sending may not have occurred on schedule. ## Root Cause and Resolution At 13:14 UTC on Wednesday September 28, 2024 Close Engineering deployed an updated version of our browser application. A bug in this new version caused a large increase of impactful requests to be sent to our back end system. By 14:00 UTC the number of additional requests had grown such that our back end database was overloaded causing poor application performance. Close Engineering was able to revert the change to our browser application by 14:51 UTC. While waiting for all of our clients’ browsers to update to the fixed version of our app Close Engineering undertook several steps to reduce the load on our overloaded database between 14:30 UTC and 17:00 UTC. Disruption during this time also degraded our ability to collect runtime metrics on our background task processing system. This caused the background task processing system to think that it was not under load and to scale down. Close Engineering fixed the issue with metrics gathering by 18:20 UTC. At which point background task processing returned to normal operation. To prevent another incident like the from occurring Close Engineering will audit our growing data stores for opportunities to better distribute load and prevent the database from becoming overloaded. We will also implement a training regimen for our incident responders to ensure more timely and consistent communication during future incidents. ## Timeline * 13:14 UTC - Close Engineering deploys an updated version of our browser application * 13:59 UTC - Close Engineering is alerted to degraded performance of our system * 14:30 UTC - Close Engineering identifies our back end database as overloaded * 14:30 UTC - Close Engineering begins load shedding operations to preserve system performance * 14:33 UTC - Disruption to background task processing begins * 14:51 UTC - Close Engineering reverts the change to our browser application * 17:31 UTC - Close Engineering begins to undo load shedding to restore normal operation * 18:20 UTC - Close Engineering begins manual operations to restore background task processing. * 18:50 UTC - The back end database becomes overloaded once more * 19:30 UTC - Close Engineering scales up the back end database * 19:30 UTC - Background task processing returns to normal. All Close systems are functioning normally
This incident has been resolved.
The Close app is functioning normally. Some background task processing may be delayed. We are continuing to monitor for further issues.
We are continuing to monitor for any further issues.
The Close app is functioning normally. Some background task processing may be delayed. We are continuing to monitor for further issues.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The app is recovering and we are continuing to monitor the situation. Some background tasks could still be delayed.
We are currently investigating this issue
Report: "Application Loading Issue"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Between 1900 and 2000 UTC on September 12, 2024 the Close app and API were severely degraded due to a partial outage of our back end MongoDB database. The database was restored to normal operation by 2000 UTC without data loss. The Close app and API returned to normal operation by 2000 UTC. ## Root Cause and Resolution At 1900 UTC on September 12, 2024 components of our back end MongoDB database in one of our data centers came under anomalous load and became unresponsive. This resulted in wide spread intermittent disruption to the Close app and API. Once the affected components were identified and restarted performance of the Close app and API returned to normal. We are in the process of deploying a new architecture for this part of our system that will be more resilient to this class of failure. In the meantime we are deploying additional monitoring that will reduce the amount of time required to identify and mitigate such issues going forward. ## Timeline * 1900 UTC: The Close app and API begin to experience elevated error rates * 1904 UTC: Close Engineering is alerted of the elevated error rate by automatic monitoring * 1909 UTC: Close Engineering identifies a network issue affecting one of our data-centers * 1945 UTC: Close Engineering identifies our back end MongoDB database as being critically impaired * 1954 UTC: Close Engineering begins operating on the affected database to restore normal operation * 2001 UTC: Close Engineering completes operations on the affected database * 2001 UTC: The Close app and API returns to normal operation
This incident has been resolved, and Close is fully operational.
The issue has been identified, and our team are currently monitoring performance.
We are currently seeing intermittent issues with Close loading. We will let you know as soon as this is fixed.
Report: "Delayed indexing"
Last updateThis incident has been resolved.
This incident has been resolved.
Indexing is currently delayed. Data in search results, smart views, reporting, and inbox may be delayed. We expect this issue to resolve within the hour.
Report: "Application Loading Issue"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact API endpoints and application areas backed by Search infrastructure - leads, contacts, opportunities list pages, and certain reporting pages were inaccessible for 20 minutes. ## Root Cause and Resolution As part of an ongoing effort to increase security of Close application, we shipped a change to how authentication is handled between the application and internal search clusters. However, a certain kind of authentication exchange wasn’t handled correctly, which caused most of the requests to Search infrastructure to fail in production. We have reverted the change, and will be improving our testing setup to catch these issues during development, as well as deployment and monitoring systems to catch production issues earlier. ## Timeline * 10:20 UTC - An engineer rolls out an incorrect change in how authentication handled between application and internal search clusters. * 10:21 UTC - Monitoring systems alert on-call engineer * 10:33 UTC - The change was reverted * 10:40 UTC - The application functionality fully recovered
API endpoints and application areas backed by Search infrastructure - leads, contacts, opportunities list pages, and certain reporting pages were inaccessible for 20 minutes. This has been resolved.
Report: "Calling outage"
Last updateThis incident has been resolved.
A fix has been implemented by Twilio and baseline call functionality has been restored.
Our telephony provider, Twilio, is currently experiencing issues on their side that are impacting Close users. We will update this page as soon as we have more information from Twilio.
Report: "Delayed Webhook Processing"
Last updateWebhook processing is stable.
Webhook processing is no longer delayed. Monitoring will continue and updates will be provided.
We've identified an issue with our webhook processing system. Webhooks may be delayed up to twenty minutes. Updates will be posted as they are available.
Report: "Background Processing Delayed"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Our processing of background tasks was delayed between 13:30 and 15:30 UTC on September 15, 2023. Some customers may have noticed the Close Application taking longer than normal to sync data from third parties like Zoom or Calendly. Inbox notifications may have been delayed. Some Power Dialer sessions may also have been impacted. No data was lost during this incident. ## Root Cause and Resolution At 13:25 UTC on September 15, 2023 a new version of the Close Application was deployed to production. This change introduced a bug into our background task processing system that prevented it from processing new work. By 14:35 Close Engineering deployed a a fix and performance began to recover. The Close Application completely recovered by 15:30 UTC. To prevent similar issues from happening in the future, Close Engineering will upgrade our background task processing system to better isolate workloads that could potentially break. ## Timeline * 13:25 UTC - A new version of the Close app is published. * 13:39 UTC - Close Engineering becomes aware of a problem with background task processing and takes steps to remediate. * 14:35 UTC - Close Engineering deploys a fixed version of the Close application. Performance begins to recover. * 15:30 UTC - The Close Application is fully recovered and performing normally.
This incident has been resolved.
A fix has been implemented. The Close application and API are fully functional. We expect full recovery of background task processing performance in the next 15 minutes.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Elevated Error Rate"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Close users experienced elevated error rates from 16:00 until 16:16 UTC on September 5, 2023. During this time requests may have presented an error or timed out. ## Root Cause and Resolution At 16:00 UTC on September 5, 2023 a version of the Close app with a major performance issue was deployed to production. This resulted in degraded performance for all customers from 16:00 until 16:16 UTC. Close Engineering deployed a new version of the the Close app that did not have the performance issue at 16:16 UTC. Application performance then returned to normal. Close Engineering is working on a new set of pre-deployment tests that will prevent similar performance issues from being introduced in the future. ## Timeline * 2023-09-05 16:00 - A new version of the Close app is published. * 2023-09-05 16:03 - Close Engineering becomes aware of degraded performance of the Close application. * 2023-09-05 16:05 - Close Engineering reverts to the last working version of the Close application. * 2023-09-05 16:16 - The working version of the Close application is deployed. Performance recovers.
This incident has been resolved.
The fix has been deployed. The Close application is now functioning normally. Close Engineering is continuing to monitor application performance.
Customers may be experiencing elevated error rates. The root cause has been identified. A fix is being deployed.
Report: "Limit function not working for bulk actions"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Bulk actions performed in Close between 2023-08-14 05:36 UTC and 2023-08-14 20:16 UTC were potentially delayed, executed on too many entities, or not executed at all. If a bulk action was executed and was using the “limit” search option, said option was not observed and the bulk action was performed on all search results without limit applied. Once we uncovered the issue we cancelled the execution of queued bulk actions to prevent action being taken on unintended entities such as leads or contacts. ## Root Cause and Resolution A bug was introduced in our API that caused the “limit” parameter to be discarded when bulk actions were being performed. Upon finding the bug we deployed an updated version of our API that solved the issue. We have increased the number of test cases covering this functionality to prevent similar issues from happening again. Improvements were also put in place regarding our escalation procedures to ensure issues like these get resolved more promptly in the future. ## Timeline * 2023-08-14 05:36 UTC - A bug is introduced in our API that discards the “limit” parameter when creating bulk actions * 2023-08-14 16:13 UTC - Close Engineering becomes aware of an issue with Bulk Actions * 2023-08-14 19:49 UTC - Close Engineering deploys a change fixing the issue with Bulk Actions * 2023-08-14 20:16 UTC - Bulk actions that weren’t executed yet were paused to avoid unintended action
This incident has been resolved.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are currently investigating an issue whereby the limit function in our search filters is not being applied to bulk actions.
Report: "Elevated Error Rate"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Between 15:55 UTC and 16:05 UTC on August 1, 2023 Close users may have experienced degraded performance and an elevated error rate when using the Close application or API. ## Root Cause and Resolution This incident began after a routine deployment of changes to the Close backend. There was a subtle error in a query to the internal metrics service that caused our compute platform to intermittently and erroneously underestimate the true demand for resources after a deployment. This query to the metrics service has been corrected. Additionally Close has implemented automatic safeguards that will prevent our compute platform from deprovisioning too many resources during peak usage periods. ## Timeline * 15:24 UTC - A routine deployment of changes to the Close application occurs. * 15:31 UTC - Our compute platform begins deprovisioning resources from the Close application. * 15:55 UTC - The Close application becomes starved of resources. Customers begin to notice elevated error rates. * 16:01 UTC - Another routine deployment of changes to the Close application occurs. * 16:05 UTC - Our compute platform begins provisioning the correct amount of resource for the Close application. Performance recovers.
This incident has been resolved.
Close has become aware of degraded performance of our API. This resulted in a brief window between 15:55 and 16:05 UTC when users may have seen elevated error rates when using the Close App and API. Performance has recovered and users should no longer be impacted. Close Engineering will continue to monitor and investigate the root cause.
Report: "Elevated Error Rate"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Between 16:47 UTC and 17:18 UTC on July 26, 2023 Close users may have experienced degraded performance and an elevated error rate when using the Close application or API. ## Root Cause and Resolution This incident began when our internal metrics service became degraded at 16:47 UTC on July 26, 2023. Our internal metrics service is used by our compute platform to automatically determine the appropriate amount of resources needed to run the Close application. When the metrics service became degraded our compute platform was unable to determine the correct amount of resources needed to run the Close application. Over the next several minutes our compute platform de-provisioned resources causing the system to become overloaded and perform poorly. When the internal metrics service was restored at 17:18 UTC our compute platform provisioned the correct amount of resources and performance returned to normal. Our compute platform has a fail safe built in to avoid this exact situation that did not function. We are investigating why this fail safe did not function. We are also deploying improvements to our internal metrics service to make it more stable and alert sooner if it becomes unstable. ## Timeline * 16:47 UTC - Our internal metrics service becomes degraded. * 17:00 UTC - Without metrics available our system assumed it was not under load and deprovisioned resources. This caused performance to degrade. * 17:09 UTC - The Engineering Team becomes aware of an issue with application performance. * 17:18 UTC - Our internal metrics system is restored. This causes our system to provision the appropriate amount of resources. Application performance recovers
Between 16:47 UTC and 17:18 UTC on July 26, 2023 Close users may have experienced degraded performance and an elevated error rate when using the Close application or API.
Report: "Application loading issues"
Last updateThis incident has been resolved.
The performance in Close is now operational. Our engineers are still closely monitoring the situation.
Application Loading Issues
We're currently investigating a slowdown in application performance. You may experience delays in email syncing, report updates, smartviews / search results, calling, and bulk actions. The engineering team is currently looking into resolving this as quickly as possible.
Report: "SMS provider issues"
Last updateThis incident has been resolved.
Our telephony provider, Twilio, is currently experiencing issues on their side that are impacting Close users. We're in contact with them about these issues and they are currently working on a fix. You can follow along with Twilio's status updates at: https://status.twilio.com/ We will update this page as soon as we have more information from Twilio.
Report: "Delayed Webhook events"
Last updateThe event backlog has been processed.
The issue has been identified, and a fix has been implemented. The event backlog is being processed.
Webhook events are being delayed. We are currently investigating this issue.
Report: "Some users cannot log into Close"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact Close CRM was not loading for a subset of our customers for 187 minutes from 12:56 to 16:03 UTC. Users that had a bad version of our application cached were unable to load Close. As a workaround, they could use a different browser or clear their native app \(or browser\) cache in order to immediately fix the issue during this period. ## Root Cause and Resolution One day before the downtime, we published a large change to the way we load our app in order to improve the app'sperformance for our customers. For approximately 17 hours, nothing else was published and the change was live. When a new small change was published in the next day, some customers were unable to open the app, being headed directly to an error screen. The issue happened due to wrong configuration of the software that we use to generate files that run the UI of our app. Files with different content should always have a different name, but due to this misconfiguration, some new files that were published had the exact same name, so customers that had the previous version in cache weren’t able to load the app. In order to resolve the issue, we reverted both changes to a previous working version, and we also took the following steps to ensure this type of issue won’t occur anymore: * We compared file per file of each release in order to understand the bottom of the issue. * Afterwards, we fixed the wrong configuration and in our test environment we: * Published the first release once again and thoroughly tested the app. We also made sure that our browser / native app were holding the files in cache. * Published the second release and once again thoroughly tested the app. * Finally, just to be 100% sure that things were fixed, we compared all files once again, and came to the conclusion that the problem was indeed solved. * As an extra step, we also released a few more different types of changes and compared files every time in order to guarantee that this issue wouldn’t repeat. ## Timeline * 2022-11-07 15:53 UTC - Published a large release for improving the UI performance of the app. * 2022-11-08 12:55 UTC - Published a very small unrelated bug fix. * 2022-11-08 12:56 UTC - Received the first support ticket of a customer not being able to log in. * 2022-11-08 12:56 UTC - Started investigating the issue and checking if any coworker was facing the same issue in order to facilitate the debugging. * 2022-11-08 13:12 UTC - After debugging the error, we figured out that clearing the browser/native app cache fixed the issue. We started asking customers to clear their browser / native app cache in order to regain access to Close in the meantime as a work-around. * 2022-11-08 13:16 UTC - Since we figured out that the main issue was related to cache, we decided to invalidate the cache on our CDN. * 2022-11-08 13:31 UTC - Cache invalidation completed, but the problem persisted. * 2022-11-08 14:08 UTC - After trying many different configuration changes in our CDN and no success, we decided to get back to debugging the error directly in the browser. * 2022-11-08 14:47 UTC - By comparing files from someone who was facing the issue with the files of another person that was able to use the app, we saw that they both had the same file, but the contents of those files were different for them. * 2022-11-08 15:13 UTC - Uncovered the underlying issue with the configuration of the software that generate those files and started implementing a fix. * 2022-11-08 15:31 UTC - Implemented the fix and started publishing the release. * 2022-11-08 16:03 UTC - Published the release and the fix was live.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Partial system outage"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are aware that Close is currently not loading. Our engineering team is investigating this issue, and we will update as soon as possible.
Report: "Delayed Email Sync"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact On October 20 from 5:30 AM until 6:40 AM EDT email syncing into the Close application was delayed for some customers. No data was lost during this incident. Email sync functionality was restored by 6:40 AM EDT. ## Root Cause and Resolution The root cause of this incident was a bug in a vendor’s software introduced during scheduled maintenance on 10/18/2022. The impact of this bug was detected on 10/20 and the Close engineering team rolled the affected component back to a known working version to resolve this incident. To prevent this from happening in the future the Close engineering team has added additional monitoring to alert the Close engineering team when container workloads are not being scheduled properly. ## Timeline 10/18 10:56 PM EDT: The Close engineering team upgrades components of our container platform during a scheduled maintenance window. Error rates for scheduling containers begin to slowly increase. 10/20 5:30 AM EDT: Close support escalates issues with email syncing to the engineering team. 10/20 6:10 AM EDT: Close engineering identifies an issue with scheduling container workloads. 10/20 6:22 AM EDT: The Close engineering team begins manual intervention to lessen customer impact. 10/20 6:38 AM EDT: Email sync functionality is largely restored. 10/20 6:39 AM EDT: The Close engineering team isolates the issue to the interface between our secret store and containers which consume secrets. 10/20 9:19 AM EDT: The root cause is identified as a bug introduced in the version of our secret store connector during the 10/18 scheduled maintenance. 10/20 9:25 AM EDT: The Close engineering team rolls back the secret store connector to a known working version resolving the incident.
Email sync into the Close application was delayed between 5:30 AM and 6:40 AM EDT.
Report: "Calling currently unavailable"
Last updateThis incident has been resolved.
A fix has been implemented. To start calling again, please quit and restart Close.
We are currently investigating this issue.
Report: "Calling outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Our telephony provider, Twilio, is currently experiencing issues on their side that are impacting Close users. You can follow along with Twilio's status updates at: https://status.twilio.com/ We will update this page as soon as we have more information from Twilio.
We are currently investigating this issue.
Report: "Calling currently unavailable"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
Our telephony provider, Twilio, is currently experiencing issues on their side that are impacting Close users. We're in contact with them about these issues and they are currently working on a fix. You can follow along with Twilio's status updates at: https://status.twilio.com/ We will update this page as soon as we have more information from Twilio.
We are currently investigating this issue.
Report: "Delays in indexing"
Last updateThis incident has been resolved.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
Report: "Telephony issues with Twilio"
Last updateWe have processed through the backlog of requests for Twilio and all should be functioning normally.
Twilio has pushed a fix for the issue. Now that the fix is in place, we're working on processing the backlog of requests we needed to send to Twilio during their outage. We will update this page as soon as this processing is completed.
Our telephony provider, Twilio, is currently experiencing issues on their side that are impacting Close users. We're in contact with them about these issues and they are currently working on a fix. You can follow along with Twilio's status updates at: https://status.twilio.com/ We will update this page as soon as we have more information from Twilio.
Report: "Telephony issues with Twilio"
Last updateTwilio has fully resolved this issue.
Twilio has resolved the issue on their end and is currently seeing recovery of their services. We are continuing to monitor the situation to make sure the impact has been mitigated.
Our telephony provider, Twilio, is currently experiencing issues on their side that are impacting Close users. We're in contact with them about these issues and they are currently working on a fix. You can follow along with Twilio's status updates at: https://status.twilio.com/ We will update this page as soon as we have more information from Twilio.
Report: "Application loading issues"
Last updateThis incident has been resolved.
Our hosting provider has remediated the issue with networking performance. We are monitoring the performance of Close for complete recovery.
Our hosting provider is experiencing degraded networking performance. They have identified the underlying issue and are working towards a resolution.
We are aware that Close is currently not loading or loading slowly. Our engineering team is investigating this issue, and we will update as soon as possible.
Report: "Telephony Connection Issues"
Last updateTwilio has resolved the issue on their end
Our telephony provider, Twilio, is currently experiencing connection issues that are impacting Close users. We're investigating this with their team and will update this incident with more information as it becomes available.
Report: "Delayed Indexing"
Last updateIndexing and search are now functioning normally.
Indexing is currently delayed. Data in search results and Smart Views may be slightly out of date for the next few minutes.
Report: "Delayed indexing"
Last updateThis incident has been resolved.
Our engineers have uncovered an issue with indexing that may cause data updates to be slow to appear in Close. Our engineers have pushed a fix and indexing should return to normal after processing the backlog.
Report: "Application loading issues"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact All Close systems were disrupted for approximately 30 minutes between 2021-10-30 23:40 and 2021-10-31 00:11 UTC. ## Root cause The disk volume performance for two shards of our primary database cluster became severely degraded due to issues in our hosting providers network. Unfortunately the cluster did not failover to healthy instances because this failure only caused significant degraded performance and not a complete failure of the instances. ## Timeline Oct 30 23:40 UTC - Instances for two shards of primary database experience severely degraded disk volume performance Oct 30 23:51 UTC - Engineering team begins investigating system alerts Oct 31 00:03 UTC - Impacted instances are stopped to trigger election of new primaries for affected shards Oct 31 00:07 UTC - Restarted app facing services that didn’t automatically recover after database cluster became healthy Oct 31 00:24 UTC - Finished restarting all backend services to ensure they all had healthy database connections ## Next Steps * Investigate options to detect when parts of the database cluster are in a degraded state and trigger failing over to healthy instances if they are available. * Work with hosting provider to better understand the failure to determine if there are better ways to detect this type of issue. * Fix the issue that required some services to be restarted to reconnect to the healthy cluster.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating this issue.
Report: "Network Degradation"
Last updateOur hosting provider's network has returned to baseline performance. All Close systems should be functioning normally.
The network of our hosting provider remains degraded. We are redeploying our workloads around the degraded zone.
We have identified a degradation in our hosting provider's network that is adversely affect the Close application's performance. We are investigating the issue and will post updates as they are available.
Report: "SMS and Email Sending Degraded"
Last updateThis issue has been resolved.
We’re currently investigating an issue with sending processes in close that are impacting SMS and Email
Report: "Problems sending and receiving SMS"
Last updateTwilio has fully resolved the issue, and SMS is back to normal.
Twilio has pushed a fix on their side and everything is returning to normal. As Twilio processes through its message backlog, you may see a bit of a delay when sending or receiving SMS.
Our telephony provider, Twilio, is currently experiencing issues with their SMS platform. During this time, SMS in Close may have difficulty sending or being received. Twilio is currently working on a fix for this issue.
Report: "Slow app performance"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. Close systems were suffering from degraded performance for 3 hours between 12:20 and 15:25 UTC on December 21, 2020. ## Root Cause The primary Close App suffered from performance issues due to an issue with our backend database starting at 12:20 UTC. Close Engineering identified the issue at 15:07 UTC and had a fix deployed at 15:25 UTC. ## Timeline Dec 21 12:06 UTC - The first signs of inconsistent query execution occur on our MongoDB database. Dec 21 12:20 UTC - Alerts begin firing indicating degraded performance Dec 21 12:32 UTC - Close Engineering identifies the affected database shard and triggers a failover Dec 21 12:57 UTC - The issue reoccurs after the fail over. Troubleshooting continues. Dec 21 13:47 UTC - Close Engineering identifies the email sync service as the source of the issue Dec 21 15:07 UTC - Close Engineering identifies a MongoDB query using an inappropriate index intermittently Dec 21 15:25 UTC - Close Engineering deploys a fix to production Dec 21 15:25 UTC - Close systems return to normal performance ## Next Steps To make sure this doesn’t happen again Close is taking the following steps: * Specifying hints for our highest impact database queries to ensure consistent performance * Implement additional monitoring to detect inconsistent query execution * Implement additional monitoring to more aggressively alert around slow queries
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have isolated the issue to email syncing and are continuing to investigate.
Some customers may be experiencing slow app performance. We are investigating a database issue.
Report: "Premature Sending Of Sequence Emails"
Last updateClose sincerely apologizes for the incorrect behavior of our service. We take reliability and correctness of our platform very seriously and the issues you have experienced are not representative of level of service we aim to provide. Below is an explanation of what happened and how we will prevent such mistakes from occurring in the future. ## Impact On 11/24/2020, during a period between 2:48pm UTC and 3:39pm UTC \(51 minutes\), Sequence Subscriptions that were due to send a single email sent all of the remaining emails with only 5 minutes of a delay between each step \(instead of the user-defined delay specified in days\). ## Root Cause & Resolution During the period of the incident, the Close application successfully sent Sequence emails, but – due to a code change in how we persist data in our database – failed to update the relevant Sequence records with the date & time when the Sequence should be processed next. In result, during the next processing iteration for a given Subscription \(5 minutes later\), the next step in the Sequence would be sent prematurely. The issue was caused by at attempt to fix some of the session management issues which caused [the outage on 11/23](https://status.close.com/incidents/1r0x2kk00s48). The new code has prematurely detached Sequence records from the database session, causing any changes to these records to not be persisted. The date & time when the Sequence should be processed next was among the data that failed to be updated. ## Timeline Nov 24 14:48 UTC – Deployed change that caused data persistence issues. Nov 24 14:56 UTC – Engineers became aware of the persistence problem \(though not its end-user impact\) and started reverting the problematic code change. Nov 24 15:30 UTC – The reverted code has been deployed. Nov 24 15:46 UTC – We have received first \(delayed\) reports of this issue causing Sequences to send some steps prematurely. We have investigated these reports and connected it with this incident. ## Next Steps * Prioritize refactoring our older code to use the new, safer session management logic. * Add additional warnings and protections with regards to scheduling the next step in a Sequence to our Sequence Scheduling Service.
We are currently investigating this issue.
Report: "Application loading issues"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of our platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact All Close systems were disrupted for 11 minutes between 19:56 and 20:07 UTC on November 23th, 2020. ## Root Cause & Resolution The Close application exhausted all available connections to a backend database, preventing the application from functioning normally. This was due to a code change that resulted in our indexing processes not releasing connections to the database. The issue was caused by session management differences between some of our older and newer code. A change was deployed at 12:30 UTC that caused connections to be left open in certain situations where call paths crossed into older session management code. At 19:25 UTC, a combination of increased load and an accumulation of stale connections caused the database to run out of connections and trigger alerts. Temporary actions were taken to free connections until the problematic code was reverted. ## Timeline Nov 23 09:30 UTC - Deployed change that left connections open Nov 23 19:25 UTC - Engineering team begins investigating system alerts Nov 23 19:56 UTC - Close begins receiving reports of application instability Nov 23 19:59 UTC - Engineering team begins taking temporary actions to free DB resources Nov 23 20:07 UTC - Close application behavior returns to normal. Nov 23 20:30 UTC - Revert of problematic code deployed ## Next Steps * Implement a warning system for the new and old code interacting in an incorrect manner\(already done\). * Continue refactoring our older code to use the new session management logic. * Implement alert rules that can give early warnings to this particular issue.
We are currently investigating this issue.
Report: "Application loading issues"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact All Close systems were disrupted for most customers for 23 minutes between 16:48 and 17:11 UTC on November 9th, 2020. ## Root Cause & Resolution The Close application exhausted the available connections to our backend database, preventing the app from functioning normally. This was due to a setting on our new database proxy being set too low to accommodate peak traffic. Due to a combination of routine internal operations and traffic volume on Monday morning, our application stack consumed the maximum connection limits that were configured on the database proxy. At 17:00, the problem was diagnosed and we increased the connection limits. It took approximately another 10 minutes to restore the application back to a completely healthy state. ## Timeline Nov 09 16:48 UTC - Close begins responding to reports of application instability Nov 09 17:00 UTC - The misconfigured setting on our database proxy is identified and corrected Nov 09 17:11 UTC - Close application behavior returns to normal. ## Next Steps * Review the best practices for our new database proxy, adjusting the Close app to leverage them * Make adjustments to how the Close app manages the lifecycle of database connections
Our engineering team has deployed a fix for the issue.
We're currently investigating an issue causing our app not to load properly for some users.
Report: "Issues with inbound calls and inbound + outbound SMS for some phone numbers"
Last updateWe have worked with our carriers to implement a fix for this issue. If you continue to have issues with inbound calls or inbound/outbound SMS, please reach out to support@close.com
We have identified an issue causing some phone numbers in Close to have issues with receiving calls and sending/receiving SMS. We are working closely with our carriers to resolve this issue.
Report: "Application not loading"
Last updateClose sincerely apologizes for the interruption of our service. We take the stability of a platform very seriously. Below is an explanation of what happened and how we will prevent another such interruption from occurring. ## Impact The Close App and API was unavailable to all customers for 43 minutes from 17:00 to 17:43 UTC on Wednesday July 22, 2019 due to the failure of a backend database. Severe degradation began at 16:53 UTC. The database was recovered and all services were restored by 17:43 UTC. ## Root Cause & Resolution One of our backend PostreSQL databases became starved of available memory. This prevented the database from accepting new work, resulting in an interruption of service to the Close system. The issue was resolved by increasing the amount of memory available to the database. ## Timeline 15:57: First alarms begin to fire about delays in Email Sequences and PostgreSQL CPU usage 16:00: Close Infrastructure begins investigation 16:30: Memory pressure identified as the cause of alarms on the affected database 16:53: The affected database failed, causing the Close app and API to become unavailable 16:53: Decision made to scale the database to an instance class with more memory 17:04: The maintenance page is posted in preparation for the database scaling operation 17:04: Scaling operation begins on the affected database 17:32: Scaling operation completes 17:43: Application services are restored ## Next Steps To ensure that events such as this do not occur in the future we are taking the following actions: * Enhance our monitoring so memory starvation can be proactively avoided * Enhance our incident response procedures to lessen the effect of database performance issues * Enhance our deployment automation so our system recover more quickly for scaling operations * Enhance our database systems to be more robust during scaling operations
After the database upgrade, the system is performing well and the problem is resolved. We'll post more details about this incident in an upcoming postmortem.
App is back up. We are closely monitoring and verifying all app components.
App is back up. We are closely monitoring.
We performed an emergency database upgrade, which has finished, and we are beginning to redeploy the app
Our engineers have identified a fix and are in the process of deploying it.
Our engineers have identified the issue and are currently working on a fix.
We are continuing to investigate this issue.
We are currently investigating an issue causing the Close Application not to load properly for users.
Report: "Issues adding or editing Close users via the Team Management Settings page"
Last updateA fix for this has been pushed by our engineering team and the "Team Management" page now works as expected.
We're currently experiencing issues with our "Team Management" settings page when trying add or edit users in your Close organization. Our engineering team is currently working on a fix. In the meantime, if you need to add or edit users in your organization, please reach out to support@close.com