Historical record of incidents for Files.com
Report: "FTP and SFTP services only: Elevated error rates"
Last updateWe have resolved an incident which caused elevated error rates on the SFTP and FTP services on Files.com in all regions. This incident began at 19:27 UTC, and was resolved at 19:49 UTC. This incident did not impact other network services such as API, WebDAV, AS2, and others. We will post an incident report, including a full postmortem, as soon as one is available.
FTP and SFTP only: We are investigating elevated error rates on the FTP and SFTP services on Files.com in all regions. This incident does not impact other network services such as API, WebDAV, AS2, and others.
Report: "FTP and SFTP Services Only: Degraded Transfer Performance in the USA Region"
Last updateWe have resolved an incident causing degraded performance on FTP and SFTP transfers in our USA region, and transfer performance is now back to normal. The incident began to affect customer transfer performance as early as 6am Pacific Time on January 21st, and was resolved at 9:53 AM Pacific Time on January 24th. This issue was intermittent and affected a small set of customers. We are still investigating the root cause of this incident. When we have more information to share we will post a detailed postmortem here.
We are continuing to investigate this issue.
FTP and SFTP only: We are investigating reports of intermittent degraded performance on FTP and SFTP services on Files.com in our primary USA region. This incident appears to only impact a small number of customers. This incident does not impact other network services such as API, WebDAV, AS2, and others. If you are experiencing degraded performance on FTP or SFTP transfers and you have an urgent need to access Files.com, you should be able to immediately connect (and access your existing files and account) using the hostname of our Canada region, which is app-ca-central-1.files.com. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "London Region Only: Elevated Error Rates and/or Performance Degradation To Network Services"
Last updateAll services have been restored and are operating normally. London only: We have resolved a performance degradation on Files.com affecting Files.com services in the London region. This incident occurred between the times of 6:07 a.m. PST to 7:07 a.m. PST, but only in the London region. If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the London region. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
London only: We are investigating a partial network outage on Files.com affecting Files.com services in the London region. This outage is only affecting our network gateway networking in the London region, and our Core Services are running correctly. At this time, we believe that all network services are currently up in our other regional locations. If you have an urgent need to access Files.com via FTP, SFTP, or WebDAV, you should be able to immediately connect (and access your existing files and account) using the hostname of our USA region, which is app-us-east-1.files.com. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "FTP: Elevated Error Rates in USA and Germany regions only"
Last updateWe have resolved an incident causing elevated error rates on the FTP service on Files.com in the USA and Germany regions. From 3:05pm to 3:15pm Pacific Time, some FTP uploads and downloads failed to transfer. This incident only impacted a portion of new connections that were initiated between 3:05pm and 3:15pm. FTP connections established before the start of the incident were not impacted. This incident only impacted FTP, and none of our other interfaces (API, Web, SFTP, AS2, etc). This incident only impacted the USA and Germany regions. The incident is fully resolved. We will post a full RCA to this Status Page when it is available. If you have any questions, please contact our Support department at support@files.com or call (800) 286-8372 x 2
Report: "FTP Degraded – All Regions"
Last updateOn November 19 from 8:49am to 10:30am PST we experienced an issue with our FTP services that resulted in elevated error rates and inability to connect entirely to FTP for some, but not all, customers. This issue did not affect any other network services at [Files.com](http://Files.com), such as SFTP, WebDAV, AS2, API, etc. It was specific to FTP. This issue affected approximately 12.5% \(1/8th\) of all source IP addresses connecting to [Files.com](http://Files.com) FTP. To explain how this issue affected only 1/8th of source IP addresses, and to give a sense of the scale we operate at regarding FTP, some background on [Files.com](http://Files.com)’s FTP services is helpful. [Files.com](http://Files.com) serves over 4,000 customers across a variety of file transfer protocols and paradigms from a globally distributed multi-tenant architecture. Each file transfer service is implemented and deployed individually so that they can be updated, monitored, and restarted on an individual basis without affecting other network services. For example, our FTP service is not co-located in any way with our SFTP service. For FTP specifically, we operate a total of 16 server machines in 7 global regions, and on each server machine we operate two different FTP daemons in a Blue/Green configuration. This allows us to deploy software changes to FTP without disrupting existing connections. And we do in fact regularly deploy software changes to FTP, most of which go totally unnoticed to our customers. In front of those 32 FTP servers \(16 machines x 2 instances per machine\), we also operate 20 front-end proxy servers which serve as the termination point for the nearly 2,000 dedicated IP addresses that we host on behalf of our customers. **Background** This incident affected only certain combinations of connectivity between the front- end proxy servers and the back-end FTP servers, and that’s why it only affected 12.5% of source IP addresses. To explain this more, first you need to understand a little bit about the FTP protocol. The FTP protocol, originally developed in 1971, is an old and legacy protocol. It was designed way before the idea of load balancing even existed, and therefore its design is such that it is difficult to load balance in a reliable way. This is one of the main reasons we generally recommend against the use of FTP entirely. Instead, we recommend that you prefer modern connectivity methods such as our CLI, SDKs, API, [Files.com](http://Files.com) apps, and direct integrations with cloud providers such as Boomi, MuleSoft, Zapier, and more. Nevertheless, we understand that a lot of customers have legacy business processes that are built on legacy technologies such as FTP, and therefore, we do our best to support it. The specific reason that FTP is hard to load balance is that FTP does not use a single encrypted TCP connection between the client and server. Instead, FTP uniquely creates a control connection, which is one TCP connection between the client and server, and then it also creates individual data connections for each file being transferred. Incidentally, this design also hurts the performance of FTP because it results in separate TLS and TCP connection initiations for every single file in a batch of transfers. More modern protocols like the ones I mentioned before are able to multiplex multiple transfers over the same TCP connection. That FTP maintains separate data connections is what also creates a major challenge for load balancing. That's because all of those data connections need to end up at the same backend FTP server, which means there needs to be a load balancing strategy that ensures that these data connections all end up at the same backend server. Most load balancers don’t do this by default, because it isn’t what you’d usually want in a load balancing arrangement. We implement FTP load balancing using a capability called balance\_source in the popular HAProxy software. It computes a hash of your source IP address and uses that hash to target FTP requests to specific backend servers that are maintained in a list of backends. We have a sophisticated health check apparatus that maintains that list of backend servers using health metrics about each backend server. This ensures that our load balancers generally only send traffic to FTP servers which are healthy. **Explanation of this Incident** In this incident, one eighth of our backend FTP servers were marked as healthy when, in fact, they were not healthy. This caused about 1/8 of FTP traffic to be directed to those servers. Unlike on other protocols where this would have applied to 1/8th of all requests from all customers, due to the load balancing design of FTP this ended up applying to all FTP connections from 1/8th of source IP addresses. How did the processes mistakenly report as healthy when they were not? This was a software bug in our health check software, related to an invalid assumption about which FTP server was running in the “most recent” deployment position of the Blue/Green deployment. When we wrote the software, we made the mistaken assumption in the code that in Linux, server processes with higher process ID numbers would always be newer than processes with lower process ID numbers. In fact, that's not true. A Linux system only has about 65,000 process ID numbers available, and in a long-running system such as our servers, which often stay up for months at a time, the process IDs can wrap back around. So our health check software misinterpreted which of the two FTP services on each server was the newest, and this occurred on 4 out of our 32 FTP servers. We corrected this problem in our health check software and we do not expect this issue to recur. **Monitoring and Incident Resolution Time** This incident occurred for one hour and 41 minutes. The selective impact of this issue unfortunately also caused it to escape our monitoring systems. Because our monitoring systems experienced 100% success during this time, we did not detect that one out of eight source IP addresses to FTP were unsuccessful at connecting. It is not clear how we could have better detected this situation through monitoring. We ultimately became aware of this issue through our regular customer support channel. Unfortunately, it took our customer support team over 30 minutes to confirm and escalate this incident after first being alerted of it, and this increased the time to resolution of the issue. **Similarity to the Incident from November 13** Although this incident seems similar to the incident on November 13, which also affected FTP, this incident has a different root cause. Please read that incident’s Root Cause Analysis document to learn more. **Stability and Uptime, Generally** Any time we have two separate incidents affecting the same service within a short period of time, we are often asked by customers whether there's some sort of systemic problem with our business or our infrastructure or our software or our processes that should be considered unstable. One of the reasons that we write such long root cause analysis reports is to hopefully convince you that these two incidents were not caused by the same root cause. Additionally, we hope that these reports also help to convince you that neither root cause has at its root a culture or systemic problem at [Files.com](http://Files.com). When comparing [Files.com](http://Files.com) to other systems, it’s important to compare apples to apples. [Files.com](http://Files.com) is a continuously updated multi-tenant cloud service that does not ever schedule downtime. We are designed for 24/7/365 operation all while performing regular performance, feature, and security updates. It is not a fair comparison to compare the uptime of [Files.com](http://Files.com) to the uptime of an on-premise system where updates are never installed. We do our own independent monitoring of many of the cloud providers for file transfer and according to our statistics, [Files.com](http://Files.com) exceeds these other providers in many of the metrics that we measure as it relates to uptime and speed and performance. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved elevated error rates on the FTP service on Files.com in all regions. This incident did not impact other network services such as API, SFTP, WebDAV, AS2, and others. This incident occurred between the times of 8:49am PST and 10:30am PST. We are compiling a Root Cause Analysis for this incident, which we will post here.
Report: "FTP: Elevated error rates in USA region only"
Last updateOn November 13 from 7:25am to 8:10am PST we experienced an issue with our FTP services that resulted in elevated error rates and inability to connect entirely to FTP for some, but not all, customers. This issue did not affect any other network services at [Files.com](http://Files.com), such as SFTP, WebDAV, AS2, API, etc. It was specific to FTP in our USA region. According to our logs, this issue affected approximately half of all traffic connecting to FTP in our USA region during the impacted time window. All other regions were unaffected. This incident related to a deployment of a critical software update to our production proxy servers in the USA region. We have 20 worldwide proxy servers, and they are the toughest devices in our fleet to update. These servers often need to be updated in place because they route traffic for all of our network services. Of those 20, 8 are located in our USA region. In this incident, 4 of the 8 proxy servers failed to restart the FTP proxy service after the critical software update. Because we experienced more than one proxy server failure at once, the incident caused total failure of FTP to some customers. The impact of this incident was mainly caused by a failure in our deployment process to detect the failure after the first proxy server failed, leading to a larger failure. We have revised our deployment process to reduce the chance of this sort of cascade in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved elevated error rates on the FTP service on Files.com in our primary USA region. This incident did not impact other network services such as API, SFTP, WebDAV, AS2, and others. This incident occurred between the times of 7:25am PST and 8:12am PST. If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region. We are compiling a Root Cause Analysis for this incident, which we will post here.
FTP only: We are investigating elevated error rates on the FTP service on Files.com in our primary USA region. This incident does not impact other network services such as API, SFTP, WebDAV, AS2, and others. This incident does not impact FTP in any other regions. If you are affected by this incident and have an urgent need to access Files.com, we recommend using SFTP in lieu of FTP. If you must connect via FTP, you should be able to immediately connect (and access your existing files and account) using the hostname of our Canada region, which is app-ca-central-1.files.com.
Report: "Web Interface Only: Failures downloading more than 1 file at a time"
Last updateFrom October 10th at 5:14 PM PST through October 11th at 6:42 AM PST, [Files.com](http://Files.com) customers experienced failures when downloading multiple files at once through the web interface. Downloads of more than one file use a code path that results in Zipping the multiple files together through a ZIP generation process. This ZIP process was impacted during this incident. This incident began when we deployed a configuration change to our production environment that was intended to improve the security of our HTTP Headers. While we did extensively test this change, our testing failed to thoroughly test the ZIP download function. After being notified about this incident through our customer support channel, we identified the issue and rolled back the change on October 11th at 6:42 AM PST. We are disappointed that this issue took so long to resolve and we’d like to provide some detailed color about the multiple causes of the delay. **Customer Support Hours** This issue began at 5:14PM PST, while our customer support department was closed. [Files.com](http://Files.com) staffs our customer support department from 6am-5pm PST Monday through Friday. Although the issue was reported to us immediately by multiple customers, these reports were all received while our customer support department was closed, and none of the reports were escalated via our after hours support services. As a result, we did not become aware of the issue until our support department reopened the following morning. As soon as we became aware of the issue, we fixed the issue promptly. [Files.com](http://Files.com) also offers a 24/7 Enterprise Support product that about 100 of our customers subscribe to, however none of those customers alerted us about this issue. If you rely on [Files.com](http://Files.com) for business-critical needs, please consider subscribing to our Enterprise Support service so that you have the ability to guarantee resolution of issues 24/7. Learn more at [https://www.files.com/enterprise-support](https://www.files.com/enterprise-support). This is the first incident in recent memory where [Files.com](http://Files.com)’s lack of 24/7 support for non-Enterprise customers has been implicated in the impact profile of a major incident. We are considering adding additional after-hours support resources, however, we are not making any official changes at this time due to the still limited impact of this incident. If you have opinions about this topic, we’d love to hear from you. **Monitoring Deficiency** Additionally, this incident exposed a major deficiency in our monitoring as it relates to ZIP downloads. While we do have sophisticated monitoring that covers ZIP downloads, our monitoring was not sophisticated enough to catch this issue because our monitor did not actually inspect the generated ZIP for correctness. We have developed an improved monitoring tool that will now more extensively test the ZIP download function. This would make us able to detect this situation and catch it. We expect to deploy that improved monitoring tool within the next several weeks. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved an incident related to download failures of more than one file at a time via the web interface. This incident occurred between 5:14 PM PT on October 10 and 6:42 AM PT on October 11. We are compiling a final Root Cause Analysis for this incident, which we will post here when it is complete.
Report: "Web interface only: Reports of elevated errors"
Last updateFrom 11:37 AM PST to 1:24PM PST, a small number of [Files.com](http://Files.com) customers experienced errors in the web interface when viewing some pages. This was caused by a JavaScript bug introduced during a routine update, related to rendering pages with “tab-like” navigation. Because the buggy behavior impacted only a small number of pages, it was not detected during our routine testing. Our monitoring systems did record an increase in frontend error rates, however the increase in errors was not high enough to raise an alert and instead we were alerted to this bug through our customer support channel. As a result of this incident, we have reduced the alerting threshold for similar front end errors. We expect that we will now detect and resolve similar incidents in the future.
Files.com has resolved an incident that caused elevated errors on the web interface. No other services were impacted by this incident. Certain pages in the web interface failed to load between 11:37am and 1:24pm Pacific time. We are still investigating this incident and compiling a Root Cause Analysis, which we will post here.
Report: "SFTP, FTP, and WebDAV services only: Elevated error rates"
Last updateOn October 29th from 10:29 am PST until 10:36am PST, [Files.com](http://Files.com) customers experienced considerably higher error rates for FTP, SFTP, and WebDAV requests. No other [Files.com](http://Files.com) services, including the [Files.com](http://Files.com) API, were impacted during this window. This incident began when we deployed a new version of the software package we use internally that contains our FTP, SFTP, and WebDAV software. The root cause of this incident was that the internal tool that we use for deploying software had an edge case bug that resulted in deployment failure if a large number of deployments were performed during the same day. We quickly identified the issue and deployed a fix. We have added additional testing around this scenario and do not expect this issue to recur. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
Files.com is aware of elevated error rates on the FTP, SFTP, and WebDAV services on Files.com in all regions from 10:29am to 10:36am Pacific Time. We believe this incident impacted less than half of total traffic across these services during this time period. This incident did not impact any other network services such as API, AS2, HTTP, or others. We are compiling a Root Cause Analysis for this incident, which we will post here as soon as it is ready.
Report: "Singapore Region Only: FTP Service Outage"
Last updateAll services have been restored and are operating normally. We have resolved a major outage of the FTP service on Files.com in the Singapore region. This incident did not impact other network services such as API, SFTP, WebDAV, AS2, and others. The FTP service was down from 10:59 PM PST on November 24th to 12:46 AM PST November 25th, with a total downtime of 1 hour 47 minutes, but only in the Singapore region. If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the Singapore region. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Singapore FTP only: We are investigating a major outage of the FTP service on Files.com in the Singapore region. This incident does not impact other network services such as API, SFTP, WebDAV, AS2, and others, nor does it impact regions other than Singapore. At this time, we believe that all network services are currently up in our other regional locations. If you have an urgent need to access Files.com, we recommend using SFTP in lieu of FTP. If you must connect via FTP, you should be able to immediately connect (and access your existing files and account) using the hostname of our USA region, which is app-us-east-1.files.com. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "Slower than usual API response times, affecting downstream services"
Last updateFrom 8:55 AM PST through 9:08 AM PST, [Files.com](http://Files.com) customers experienced much slower than normal response times to our core API, which affected other downstream services. Most API requests still completed successfully, albeit slower than normal. The root cause of the slower than normal response times was that one of our databases was running at a dramatically higher load than normal. Upon investigation, [Files.com](http://Files.com) determined that database queries were performing many orders of magnitude slower than intended due to the misconfiguration of an index in the database. This misconfiguration, it turns out, has existed for over a decade, but the particular query pattern had never been seen before in production. [Files.com](http://Files.com) immediately reacted to this situation by first disabling the problematic jobs that were generating the unoptimized queries. This returned the system to normal performance. [Files.com](http://Files.com) then fixed the database index configuration and re-enabled the problematic jobs, which then ran to completion quickly with no further impact on system performance. As part of our incident post-mortem process, we discovered and remedied a few deficiencies that contributed to this incident taking 13 minutes to resolve. First, we discovered a 5 minute delay in importing the relevant time series data from one of our monitoring systems \(Amazon Cloudwatch\) into another of our monitoring systems \(Influxdb\), the latter of which is used to trigger our internal alerting. We have made configuration changes to remedy this delay. Second, in addition to the delay in importing the time series data, we also had a poorly configured alert threshold that introduced an additional 6 minutes of delay before an on-call engineer was paged. We have made configuration changes to remove this delay, ensuring that an on-call engineer will be paged immediately in the event of a similar situation in the future. Additionally, as part of the post-mortem process for this incident, we implemented much stricter controls to detect and reject slow queries at the database itself. We conducted a simulated recreation of this incident in our staging environment and determined that our new controls are sufficient to prevent a recurrence of this incident. Additionally, after reviewing this incident, we built a new tool for our on-call engineers that implements a much faster, one-click action to quarantine a problematic job type once it has been flagged as problematic. This will improve our ability to react quickly to newly discovered performance deficiencies in the future. We will begin incorporating training on this new tool into our training for on-call engineers in our next recurrent training cycle. We greatly appreciate your patience and understanding as we resolved this issue.
We have resolved an incident causing slower than usual API responses for 13 minutes on September 19th, 2024. From 8:55 AM PST through 9:08 AM PST, Files.com customers experienced much slower than normal response times to our core API, which affected other downstream services. Expand this incident to read a full postmortem.
Report: "FTP, SFTP, and WebDAV Only: Elevated Error Rates"
Last updateAt 8:31 AM PST on September 5th, [Files.com](http://Files.com) made a routine code deployment which introduced a bug preventing SFTP, FTP, and WebDav operations from completing. [Files.com](http://Files.com) detected this issue at 8:36 AM and reverted a change restoring the SFTP, FTP, and WebDAV services at 8:40 AM PST. Additionally a small number of customers experienced a continuation of authentication failures for sessions that were incorrectly cached as failures. [Files.com](http://Files.com) received an escalated report of this problem and resolved it at 9:36 AM PST by clearing the login caches for SFTP, FTP, and WebDAV connections. The elevated error rates during this period were caused by an update to our internal service authentication to add authentication to a new service. While this update was needed to provide new connectivity for services, it introduced a regression for SFTP, FTP, and WebDav’s internal authentication methods. This problem was promptly detected by the [Files.com](http://Files.com) monitoring and alerting services and we immediately began remediation. The root cause of this incident was [Files.com](http://Files.com)’s insufficient testing of a change that was deployed. We have updated the testing that failed to identify this issue to improve our future delivery. We promise a system that works perfectly, all of the time, and today we failed to deliver that to you. Our entire engineering team is working hard to prevent issues like this one from occurring in the future. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved an incident causing elevated error rates on the FTP, SFTP, and WebDAV services on Files.com in all regions. This incident did not impact other network services such as our API, AS2, or any others. This incident occurred between the times of 8:31am and 8:40am Pacific Time. We are compiling a final Root Cause Analysis for this incident, which we will post here when it is complete.
Report: "Major Partial Outage – All Regions - FTP, SFTP, WebDAV, and the legacy ExaVault API"
Last updateFrom 3:08 AM PST through 4:09 AM PST, [Files.com](http://Files.com) customers experienced elevated error rates when connecting via certain protocols. These included SFTP, FTP, and WebDAV, as well as our support for the legacy ExaVault API \(which is only applicable to a small number of customers\). Although this incident may seem similar to the incident that occurred on September 5th, the root cause was not related. The elevated error rates during this period were caused when an internal SSL certificate expired, disrupting internal system communication between certain servers at [Files.com](http://Files.com). [Files.com](http://Files.com) has a sophisticated system for certificate management and this sort of failure is embarrassing and unacceptable. [Files.com](http://Files.com) constantly and automatically checks the certificate of every service we operate. However, the internal service impacted by this issue was inadvertently left out of our Service Catalog. We have reviewed and updated our procedures to ensure this does not happen in the future, and performed an audit to ensure there are no other unregistered services. [Files.com](http://Files.com) uses a service called Consul Template to ensure certificates are up to date. A bug was identified where this particular certificate was not correctly configured for updates. A project has been initiated to standardize and improve the handling of certificates to prevent this issue in the future. The root cause of this issue was that [Files.com](http://Files.com)’s configuration management did not update a certificate on an internal service in a timely manner. A secondary cause was a failure to properly monitor that service, which prevented us from detecting the expiring certificate in advance. We promise a system that works perfectly, all of the time, and today we failed to deliver that to you. Our entire engineering team is working hard to prevent issues like this one from occurring in the future. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved a major partial outage of SFTP, FTP, WebDAV, and the legacy ExaVault API in all regions. This outage only affected SFTP, FTP, WebDAV, and the minimally implemented ExaVault API. Other services were not impacted. This incident occurred between the times of 3:08am and 4:09am Pacific Time. We are compiling a Root Cause Analysis that we will post here.
We have identified the issue that is causing a major partial outage of SFTP, FTP, and WebDAV in all regions, and we are working to resolve it.
We are investigating a major partial outage of the Files.com service in all regions.
Report: "SFTP Service Only: Elevated Error Rates"
Last updateOn August 6th, 2024, at 3:05 PM PST, [Files.com](http://Files.com) received multiple monitoring alerts indicating _‘SFTP Service Only: Elevated Error Rates’_, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The _‘SFTP Service Only: Elevated Error Rates’_ issue was resolved on August 6th, 2024, at 4:06 PM PST, returning the platform to full functionality. From 3:01 PM PST through 4:06 PM PST, [Files.com](http://Files.com) customers experienced elevated error rates when connecting via the SFTP protocol. Although this incident seems similar to the incident which occurred on August 2, it was a completely distinct situation. The elevated error rates during this period were actually caused by a denial-of- service \(“DoS”\) attack against [Files.com](http://Files.com)’s SFTP service. Like all large providers of services on the Internet, we are under constant attack from a variety of threat actors. [Files.com](http://Files.com) uses a variety of sophisticated tools to defend against attacks against its infrastructure. There are no commercial providers \(that we know of\) who produce DoS mitigation tools which work specifically for SFTP, and so we’ve had to invest heavily in developing our own protection and mitigation tools specifically for SFTP. One of our mitigation strategies is to completely block connections from SFTP counter-parties who appear to be abusive. A very hard challenge associated with this is correctly determining whether a counterparty is being intentionally abusive as opposed to being a misconfigured script or automation from an otherwise legitimate customer. Accidentally blocking a legitimate customer can take down a major workflow for a customer, and we try very hard to never have that happen. It’s a delicate balance and we spend a lot of time and engineering resources trying to get this right. About 4 weeks ago, [Files.com](http://Files.com) released an update to our internal security tools to add more logic to the part of our code where we try to determine abusive connections via SFTP. This was done with the hope of making it even less likely for a legitimate customer to ever be blocked inadvertently. While this improvement was good overall, it turns out that this update introduced a regression that allowed a particular type of malicious counterparty to open up SFTP connections and leave them hanging in an idle state. That’s what happened on August 6. A malicious counterparty “used up” a number of our connection pool slots by opening them and letting them hang idle, leaving them unavailable for legitimate use. After fixing the logic error in our security software, the malicious counterparty was automatically blocked and full SFTP functionality was restored. We want to be very clear about two things: 1. This was not a full outage of SFTP, rather it was a degradation due to partial inability to connect. If you operated SFTP software which used retries, it is likely that your connections worked on retry. 2. The \*only\* thing that this malicious actor was able to do was hold open connection pool slots so that legitimate customers weren’t able to connect to them. That’s what a denial-of-service attack is: they denied service to you, the legitimate customer. There was absolutely no access to our systems at all beyond the denial-of-service. Even denial-of-service attacks cause real economic impact, and we work hard to defend against them. The root cause of this issue was [Files.com](http://Files.com)'s incomplete testing of the security software change from 4 weeks ago. It is hard to produce synthetic testing that simulates anything that a malicious actor might do, but we learned from this incident and have updated our testing accordingly. We promise a system that works perfectly, all of the time, and we are disappointed that you may have experienced issues today that were caused by a malicious actor. Defense against the ever-present threat environment is one of the main reasons you chose to use [Files.com](http://Files.com) as opposed to operating your own on-premise server, and it is absolutely our job to prevent these sorts of things from ever affecting your workflows. We take that mission seriously. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved elevated error rates on the SFTP service on Files.com in all regions. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. This incident occurred between the times of 3:01 PM PT to 4:06 PM PT on August 6th, 2024.
SFTP only: We are investigating elevated error rates on the SFTP service on Files.com in all regions. This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others. If you are experiencing problems with SFTP, we recommend using FTP in lieu of SFTP.
Report: "Connection failures over SFTP, FTP, and WebDAV for recently logged in users attempting new connections"
Last updateOn August 2nd, 2024, at 7:35 AM PST, [Files.com](http://Files.com) correlated multiple customer tickets indicating _‘Connection failures over SFTP, FTP, and WebDAV for recently logged in users attempting new connections’_, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The _‘Connection failures over SFTP, FTP, and WebDAV for recently logged in users attempting new connections’_ issue was resolved on August 2nd, 2024, at 7:41 AM PST, returning the platform to full functionality. At 6:52 AM PST on August 2, [Files.com](http://Files.com) made a routine code deployment which introduced a bug that prevented more than one session from being opened via the SFTP, FTP, or WebDAV protocols. [Files.com](http://Files.com) reverted this deployment at 7:41 AM PST, restoring proper functionality. This resulted in 49 minutes of degraded performance for many customer use cases. It is common for many automated and ad-hoc processes to use several connections at once when communicating via SFTP or FTP. During the degraded period, only one of those connections was likely to work. Depending on the exact software in use, this might have resulted in failures of your process to run, or it might have worked with only a single connection. In either case, the situation was clearly unacceptable because it likely broke a number of critical customer workflows. The fix to the bug was simple and involved a one line change. The bug was not caught originally because we did not consider testing multiple simultaneous connections in our testing environment. While we are disappointed by the original bug making it past our testing pipeline, our true disappointment relates to our systems that monitor and alert on the status of our production environment. If our monitoring had operated perfectly, we would have solved the original bug in 2 minutes, not 49 minutes. This incident revealed an interesting set of weaknesses in our monitoring systems. First, our automated testing platform which tests our production environment did not attempt multiple simultaneous connections when testing SFTP. In this incident, the issue/downtime only occurred when attempting multiple simultaneous connections. We will update our automated testing platform to attempt multiple simultaneous connections in the future. Additionally, when responding to this incident we discovered that the original bug occurred in a section of server-side code which was excluded from reporting to Sentry, a platform we use for exception tracking and real time alerting. This exclusion was in error. As a result, our on-call team was not immediately paged like we should have been. We have updated our code to ensure that future bugs in this part of the code result in immediate reporting to Sentry, which would result in immediate notification to our on-call team in a future similar incident. To cover the possibility of Sentry alerts failing to fire in the future, we have added additional belt-and-suspenders alerting to look for spikes in 5xx HTTP error codes from our web proxy layer which don’t have a corresponding alert in Sentry. This provides a backup mechanism to ensure that our on-call team will be paged in the future in a situation like this one. The root cause of this issue was [Files.com](http://Files.com)’s failure to have a robust, multi-layered monitoring system to detect production failures and alert our on-call team. We have already implemented multiple mitigations at different layers to reduce the odds of a similar issue occurring in the future. We promise a system that works perfectly, all of the time, and today we failed to deliver that to you. Our entire engineering team is working hard to prevent issues like this one from occurring in the future. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved an issue causing connection failures over SFTP, FTP, and WebDAV for a small segment of users who had very recently logged in successfully using those protocols. This incident occurred between the times of 6:52 AM PT to 7:41 AM PT on August 2nd, 2024, and was resolved by reverting a recent code deploy. During this time window, only users who tried to log in to SFTP, FTP, and WebDAV with the same credentials multiple times would have experienced login failures when trying to open new connections. We are still compiling a final Root Cause Analysis for this incident, which we will post here when it is complete.
Report: "Issues Impacting Logins via SSO (Single Sign On)"
Last updateWe have resolved an issue that was preventing certain logins on Files.com via Single Sign On (SSO). This incident occurred between the times of 10:57 AM PST to 11:12 AM PST. This issue did not impact logins which do not use SSO (Single Sign On).
Report: "SFTP and Google Drive remote server failures on certain sites"
Last updateThis incident affected SFTP and Google Drive remote server syncs and mounts on a portion of customer sites, and has been resolved at the time of this posting. We have resolved an incident causing failures in remote server connections from Files.com to SFTP and Google Drive remote servers. The incident did not impact connections to or from Files.com that do not involve SFTP or Google Drive remote servers. This incident occurred between the times of 6:10am Pacific to 7:05am Pacific on June 11th.
Report: "HTTPS outage for sites using the ExaVault host key without a Custom Domain for users in non-US regions"
Last updateOn May 1st, 2024, at 11:17 AM PST, [Files.com](http://Files.com) correlated multiple customer tickets indicating _‘HTTPS outage for sites using the ExaVault host key without a Custom Domain for users in non-US regions_, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The _‘HTTPS outage for sites using the ExaVault host key without a Custom Domain for users in non-US regions’_ issue was resolved on May 1st, 2024, at 11:39 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released an initial posting to the [Status Page](https://status.files.com/) on May 1st, 2024, at 11:42 AM PST stating: _‘ExaVault is a service that was acquired by_ [_Files.com_](http://Files.com)_. A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to_ [_Files.com_](http://Files.com)_. If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident does not affect you._ _This incident only affects users connecting to_ [_Files.com_](http://Files.com) _from non-US regions._ _We are currently investigating an incident causing an HTTPS outage for sites that use the ExaVault host key without a Custom Domain for users in non-US regions. This incident does not affect other services like unencrypted HTTP, FTP or SFTP. At this time, we believe that all network services are up in our US region.’_ [Files.com](http://Files.com) released a resolution posting to the [Status Page](https://status.files.com/) on May 1st, 2024, at 11:49 AM PST stating: _‘ExaVault is a service that was acquired by_ [_Files.com_](http://Files.com)_. A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to_ [_Files.com_](http://Files.com)_. If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident did not affect you._ _As of 11:39am Pacific Time, we have resolved an incident causing an HTTPS outage for sites that use the ExaVault host key without a Custom Domain for users in non-US regions._ _This incident did not affect sites with Custom Domains, sites that do not use the ExaVault host key, or users connecting from our US region.’_ [Files.com](http://Files.com) acquired ExaVault, another MFT service, in 2022. Although all former ExaVault customers have been migrated to the mainline [Files.com](http://Files.com) platform, we do still support and maintain connectivity options which use legacy former ExaVault SFTP Host Key. This allows former ExaVault customers to continue their connections which were established using that host key. We recently deployed a change to expand our support for the ExaVault host key from beyond just our USA region service to all 7 of our global [Files.com](http://Files.com) service regions. This was done at customer request and to improve performance for non-US based customers. As part of this deployment, we replaced several customer-facing systems, and we updated our automatic DNS management system to use regional “latency-based” routing for the ExaVault Host Key domain. There was [another incident](https://status.files.com/incidents/dknqmws6cvr4) which occurred as part of that process. In addition to that incident, we also discovered that in some regions, customers were not made aware of the new IP addresses that we intended to use for this connectivity. We generally promise to alert customers to new IP addresses in advance of their use so they can update their firewalls. While we did publish these IP addresses to our documentation and APIs, we didn’t send an email notification of the new IPs. This was caused by the fact that we maintain a separate list of IPs for ExaVault, and that list was not fully integrated into our operational procedures. We have since integrated that list and updated our documentation to make very clear that there is a separate list for ExaVault IPs. We have reverted the global support for the ExaVault host key until we can provide adequate notice of the IP addresses which will be used. We will be making such announcement soon. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
ExaVault is a service that was acquired by Files.com. A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to Files.com. If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident did not affect you. As of 11:39am Pacific Time, we have resolved an incident causing an HTTPS outage for sites that use the ExaVault host key without a Custom Domain for users in non-US regions. This incident did not affect sites with Custom Domains, sites that do not use the ExaVault host key, or users connecting from our US region.
ExaVault is a service that was acquired by Files.com. A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to Files.com. If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident does not affect you. This incident only affects users connecting to Files.com from non-US regions. We are currently investigating an incident causing an HTTPS outage for sites that use the ExaVault host key without a Custom Domain for users in non-US regions. This incident does not affect other services like unencrypted HTTP, FTP or SFTP. At this time, we believe that all network services are up in our US region.
Report: "Intermittent connection failures, followed by a brief outage of all services for sites using the ExaVault Host Key without a Custom Domain"
Last updateOn May 1st, 2024, at 5:23 AM PST, [Files.com](http://Files.com) correlated multiple customer tickets indicating _‘SFTP Connection failures with the ExaVault SFTP host key and host name’_, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The _‘Intermittent connection failures, followed by a brief outage of all services for sites using the ExaVault Host Key without a Custom Domain’_ issues were resolved on May 1st, 2024, at 5:51 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution posting to the [Status Page](https://status.files.com/) on May 1st, 2024, at 5:51 AM PST stating: _‘ExaVault is a service that was acquired by_ [_Files.com_](http://Files.com)_. A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to_ [_Files.com_](http://Files.com)_. If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident did not affect you._ _We have resolved an incident that caused intermittent connection failures, followed by a brief outage of all_ [_Files.com_](http://Files.com) _core and auxiliary services in all regions for sites that use the ExaVault Host Key without a Custom Domain._ _Sites with a Custom Domain and those that use the default_ [_Files.com_](http://Files.com) _Host Key were not affected by this outage._ _From 6:30PM Pacific Time on 4/30 until 5:35AM Pacific Time on 5/1, connection attempts failed intermittently on affected sites. From 5:35AM to 5:51AM, services were entirely down for the affected sites._ _All services were restored and operational at 5:51AM._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.’_ [Files.com](http://Files.com) acquired ExaVault, another MFT service, in 2022. Although all former ExaVault customers have been migrated to the mainline [Files.com](http://Files.com) platform, we do still support and maintain connectivity options which use legacy former ExaVault SFTP Host Key. This allows former ExaVault customers to continue their connections which were established using that host key. We recently deployed a change to expand our support for the ExaVault host key from beyond just our USA region service to all 7 of our global [Files.com](http://Files.com) service regions. This was done at customer request and to improve performance for non-US based customers. As part of this deployment, we replaced several customer-facing systems and we updated our automatic DNS management system to use regional “latency-based” routing for the ExaVault Host Key domain. Unfortunately, a configuration issue occurred which resulted in incorrect DNS records being published to DNS, but only for customers configured to use the ExaVault SFTP Host Key. This resulted in some IPs being returned by DNS no longer representing valid, active servers. This resulted in intermittent connection failures for customers using the ExaVault Host Key. After becoming aware of this problem, we immediately moved to correct the DNS using manual intervention. This caused a short but complete outage of the ExaVault Host Key domain as we removed the automated entry and repopulated it with the correct IP addresses. We subsequently identified the automated DNS configuration issue and resolved it, moving the DNS back into automation. While this sort of change is rare, we regret the impact on our customers and we are committed to perfecting our processes. We have already implemented new monitoring to alert us to cross check all IPs that are published to DNS against other internal resources which list active servers. We’ve also improved logging to make the requested public IP address more visible to customers and our Customer Support team. Additionally, we have started a project to improve visibility into the DNS management logs, so that similar future bugs will be readily apparent. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
ExaVault is a service that was acquired by Files.com. A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to Files.com. If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident did not affect you. We have resolved an incident that caused intermittent connection failures, followed by a brief outage of all Files.com core and auxiliary services in all regions for sites that use the ExaVault Host Key without a Custom Domain. Sites with a Custom Domain and those that use the default Files.com Host Key were not affected by this outage. From 6:30PM Pacific Time on 4/30 until 5:35AM Pacific Time on 5/1, connection attempts failed intermittently on affected sites. From 5:35AM to 5:51AM, services were entirely down for the affected sites. All services were restored and operational at 5:51AM. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
ExaVault is a service that was acquired by Files.com. A small percentage of former ExaVault customers are still using the ExaVault host key after their migration to Files.com. If your site was not migrated from ExaVault, or if you no longer use the ExaVault host key, this incident does not affect you. We are investigating a major outage on Files.com affecting all Files.com core and auxiliary services in all regions for sites that use the ExaVault SFTP Host Key without a Custom Domain. Sites with a Custom Domain and those that use the default Files.com Host Key are not affected by this outage. We will provide updates on this situation as they become available. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone.
Report: "Emails from Files.com failing to send"
Last updateOn March 25th, 2024, at 7:43 AM PST, [Files.com](http://Files.com) correlated multiple customer tickets indicating _‘elevated error rates related to outbound emails sent by_ [_Files.com_](http://Files.com)_’_, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The _‘elevated error rates related to outbound emails sent by_ [_Files.com_](http://Files.com)_’_ issue was resolved on March 25th, 2024, at 8:48 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released an initial investigation posting to the [Status Page](https://status.files.com/) on March 25th, 2024, at 8:01 AM PST stating: _‘We are investigating elevated error rates related to outbound emails sent by_ [_Files.com_](http://Files.com)_. This impacts any emails, including password reset requests, email 2-FA codes, and activity notifications.’_ [Files.com](http://Files.com) released a resolution posting to the [Status Page](https://status.files.com/) on March 25th, 2024, at 9:16 AM PST stating: _‘We have resolved the issue causing elevated error rates on outbound emails sent from_ [_Files.com_](http://Files.com)_. This incident occurred between the times of 3:32 AM Pacific Time on March 23rd, 2024 and 8:48 AM Pacific Time on March 25th, 2024._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause._ _Unfortunately, emails that failed to send during this incident cannot be recovered and resent. We apologize for the impact that this may have had on your operations._ _If you need additional support, please do not hesitate to contact our Customer Support team by email or phone.’_ [Files.com](http://Files.com) released a more accurate resolution posting to the [Status Page](https://status.files.com/) on March 25th, 2024, at 9:25 AM PST stating: _‘Emails from_ [_Files.com_](http://Files.com) _failing to send_ _We have resolved the issue causing a total failure of outbound emails sent from_ [_Files.com_](http://Files.com)_. Emails from_ [_Files.com_](http://Files.com) _failed to deliver between the times of 3:32 AM Pacific Time on March 23rd, 2024 and 8:48 AM Pacific Time on March 25th, 2024._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause._ _Unfortunately, the emails that failed to send during this incident cannot be recovered and resent. We apologize for the impact that this may have had on your operations._ _If you need additional support, please do not hesitate to contact our Customer Support team by email or phone.’_ In this incident, outbound E-Mail sending from the [Files.com](http://Files.com) application did not work for a period of time. [Files.com](http://Files.com) sends outbound E-Mail using Amazon Web Services. This incident began when Amazon sent two messages to [Files.com](http://Files.com) about our E-Mail delivery at times outside of business hours. Those messages went to an inbox that is only monitored during regular business hours. Because [Files.com](http://Files.com) did not reply to the message from Amazon, Amazon disabled our access to their E-Mail service. The original message from Amazon was submitted to [Files.com](http://Files.com) on Saturday, March 23rd, 2024, at 3:32 AM PST, and then Amazon disabled our access on Saturday, March 23rd, at 2:24 PM PST. As soon as [Files.com](http://Files.com) became aware of the issue, [Files.com](http://Files.com) switched E-Mail delivery to a separate vendor \(Postmark\) that we also have a relationship with. This returned the platform to full functionality on March 25th, 2024, at 8:48 AM PST. We switched back to Amazon on March 27th, 2024, at 6:37 AM PST. The root cause of this incident is that [Files.com](http://Files.com) did not have an automatic means of escalating the emails from Amazon to an after-hours operational pager address. [Files.com](http://Files.com) has since implemented this automated escalation tool. Additionally, [Files.com](http://Files.com) has added support for more quickly switching to our backup E-Mail vendor \(Postmark\) and we will maintain this capability for any future events. Finally, [Files.com](http://Files.com) is also adding additional E-Mail logging which would have made it easier for our monitoring systems to detect the actual bounced E-Mails themselves. To the extent possible, we intend to extend access to these logs to you, our customers, so that you have direct visibility into outbound email deliverability as well in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved the issue causing a total failure of outbound emails sent from Files.com. Emails from Files.com failed to deliver between the times of 3:32 AM Pacific Time on March 23rd, 2024 and 8:48 AM Pacific Time on March 25th, 2024. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. Unfortunately, the emails that failed to send during this incident cannot be recovered and resent. We apologize for the impact that this may have had on your operations. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone.
We are investigating elevated error rates related to outbound emails sent by Files.com. This impacts any emails, including password reset requests, email 2-FA codes, and activity notifications.
Report: "Some Larger Automations and Bulk File Actions Failed to Fully Complete"
Last updateWe have identified a coding error that caused Automations and manual bulk file move/copy/delete jobs to fail to fully complete in certain situations. This coding error was caused by a race condition in the code that manages our orchestration layer for parallelized file operations. The race condition was in our production environment from 2:22 PM PST on April 13 through 2:18 PM PST on April 16, when we deployed a fix. This issue was more pronounced in larger jobs. The logging output from the jobs is correct. We understand how important it is for your automated jobs to run on time and complete as expected, and we are absolutely committed to ensuring reliability on our platform. We sincerely apologize for any impact that this error had on your operations. We have the capability to re-submit the affected Automation Runs and File Migrations for processing again. We elected not to do this by default because of the potential for unexpected outcomes, but we are happy to manually re-submit your jobs if you would like us to. Please get in touch with our support team, and we can perform this task for you, as well as answer any other questions you may have.
Report: "RESOLVED - FTP Authentication Issues for Customers with "SSL/TLS Required" Sitewide, but with Individual Users Overriding that Setting"
Last updateFrom 9:17am through 9:19pm EST, users on sites with the "SSL/TLS Required" setting set on a sitewide level, but overridden on a per-user basis were unable to log in via insecure FTP (i.e. FTP without SSL/TLS). This was due to a configuration error in a security update that was rolled out on Sunday morning EST.
Report: "RESOLVED - Failed Uploads to Mounts"
Last updateAll services have been restored and are operating normally. From 3:08p.m. PT to 3:22 p.m. PT, some uploads to mounts failed to upload. In these cases, uploads will need to be uploaded again. The majority of services were unaffected during this time. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Report: "RESOLVED - Performance Degradation and/or High Error Rates"
Last updateAll services have been restored and are operating normally. We have resolved a performance degradation on Files.com which affected all Files.com core and auxiliary services in all regions. Performance was slightly degraded from 11:33am to 12:06pm PT, then highly degraded from 12:07pm to 12:14pm PT. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Report: "Performance Degradation and/or High Error Rates"
Last updateAll services have been restored and are operating normally. We have resolved a performance degradation on Files.com which affected all Files.com core and auxiliary services in all regions. This incident occurred between the times of 12:17 pm PT to 1:01 pm PT. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
We are investigating a major performance degradation and/or elevated error rates that is affecting all Files.com core and auxiliary services in all regions. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "SFTP Service in the USA Region Service Outage"
Last updateOn December 5th, 2023, at 4:06 AM PST, [Files.com](http://Files.com) correlated multiple customer tickets indicating _‘authentication errors when logging into SFTP’_, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The _‘authentication errors when logging into SFTP’_ issue was resolved on December 5th, 2023 at 6:29 AM PST, returning the platform to full functionality. In this incident, our SFTP servers became unstable and failed to process requests for certain customers due to a bad configuration file that was applied on 12-04-2023 at 10:41 PM to our SFTP servers via our automated configuration management system. This failure only affected a small number of customers. Specifically, it only affected customers where our API was required to authenticate the provenance of the origin IP of the connecting SFTP user. This includes customers who use IP Whitelisting or IP Geolocation \(such as country whitelist/blacklisting\). We use a sophisticated system to cryptographically authenticate the origin IP of the connectiing SFTP user when making upstream calls to our internal API, and it was a configuration related to this system that was inadvertently misapplied. The reason for the bad configuration file being deployed is as follows: A separate configuration change was correctly and successfully made to another system \(our HTTP servers\) via our configuration management systems. Due to a logic error in the code of the change, the change also inadvertently targeted our SFTP systems as well. This change should not have been deployed to our SFTP systems, but was inadvertently deployed to them anyway. Internally, [Files.com](http://Files.com) runs SFTP services on several dedicated servers in each service region. Our configuration management system deploys changes to servers one at a time, checking to ensure correct operation prior to continuing forward with the rollout of configuration changes. The contents of this document are for general release and classified PUBLIC Unfortunately, while this check did validate proper operation of SFTP in general, it did not specifically validate proper operation of the subsystem that provides for cryptographic authentication of IP addresses. Upon discovery of the incident, [Files.com](http://Files.com) reverted the inappropriate configuration change on the SFTP servers. The root cause of this incident is twofold. Firstly, [Files.com](http://Files.com) failed to automatically monitor and validate the correct operation of the subsystem that provides for cryptographic authentication of IP addresses on SFTP servers. While a downtime of this system doesn’t cause a full downtime of SFTP, it causes a functional equivalent of that if customers require IP Whitelisting or IP Geolocation. Secondly, [Files.com](http://Files.com) failed to provide feedback to the engineers who developed and deployed the original configuration change targeted at the HTTP servers to let them know that the change would also be applied to SFTP servers. [Files.com](http://Files.com) will be developing two major improvements to its processes as a result of this incident. First, [Files.com](http://Files.com) will implement additional detection and monitoring around the subsystem that provides for cryptographic authentication of IP addresses on SFTP servers. Second, [Files.com](http://Files.com) will develop a system to provide feedback to its infrastructure engineers about exactly which servers will be affected by a configuration change before that change will be approved. Both of these improvements will require substantial engineering work and are not completed yet. We look forward to completing them in the coming quarter. We are hugely disappointed by the downtime, and we will work hard to implement the additional layers of protection needed to avoid similar incidents in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
All services have been restored and are operating normally. We have resolved a major outage of the SFTP service on Files.com in all regions. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. The SFTP service was partially down from 10:40 p.m. PST 12/04/23 to 4:20 a.m. PST on 12/05/23. A more extensive SFTP interruption occurred from 4:20 a.m. to 6:29 a.m. for a total of 129 minutes impacting some, but not all, customers. Customers with certain region, IP, custom namespace, or other requirements were most likely to be impacted. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
We are continuing to investigate this issue. We identified a configuration error. We have made a change that we believe has solved this configuration error. SFTP issues may be resolved for some connections. We will post an update as soon as the issue has been identified and a fix is being implemented. If you need additional assistance, please do not hesitate to contact our Customer Support team by email. Thank you for your continued patience.
SFTP only: We are investigating a major outage of the SFTP service on Files.com in all regions. This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others. If you have an urgent need to access Files.com, we recommend using FTP in lieu of SFTP. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "Delays To Batch and Scheduled Operations"
Last updateWe have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. Operations were delayed beginning around 5:00 am PST and ending by 6:45 am PST. All operations did successfully complete despite delays. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server.
We are investigating reports of delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. This situation should not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "EU (Germany) Region Only: Delays and Errors Related to Certain Background Processing"
Last updateAll services have been restored and are operating normally. EU (Germany) Region only: We have resolved an issue with certain background processing performed as part of the core Files.com file transfer pipeline in the EU (Germany) region. Impacted functions of Files.com included file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. The issue with background processing began at 10:13 a.m. PST and was resolved completely by 10:57 a.m. PST. Resolution means that any background jobs that were previously delayed have now been processed successfully. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
EU (Germany) only: We are investigating elevated error rates related to certain background processing performed as part of the core Files.com file transfer pipeline in the EU (Germany) region. Impacted functions of Files.com include file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. This situation should not impact customers at all unless they have files stored in the EU (Germany) region. This situation should not affect real-time operations such as the Files.com API, FTP, SFTP, AS2, and other operations where Files.com acts as a server. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "Delays To Batch and Scheduled Operations; Performance Degradation and/or High Error Rates"
Last updateAll services have been restored and are operating normally. We have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. We have also resolved a performance degradation on Files.com which affected all Files.com core and auxiliary services in all regions. This incident occurred during the late evening hours of October 5th until approximately 9:00 a.m Pacific on 10/6. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
We have identified the cause of the operational delays for batch and scheduled operations and are working to implement a fix. At this time, the vast majority of batch and scheduled operations are caught up, but there is a small minority of jobs that are still impacted. We have resolved a real-time operations performance degradation on Files.com which affected all Files.com core and auxiliary services in all regions. Real-time operations, such as FTP, SFTP, AS2, and other operations where Files.com acts as a server, should be back to normal performance. We will post an update as soon as all services are restored to normal, and continue to monitor to confirm resolution. If you need additional assistance, please do not hesitate to contact our Customer Support team by email. Thank you for your patience.
We are investigating reports of delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. We are also investigating a major performance degradation and/or elevated error rates that is affecting all Files.com core and auxiliary services in all regions. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "RESOLVED - Web UI Case Sensitivity Issues in File/Folder Names"
Last updateAll services have been restored and are operating normally. Web UI Only: We have resolved an issue where permissions for non-site-admins were being interpreted case sensitive manner where they should have been interpreted in a case-insensitive manner. As a result, certain action buttons (such as the button to Upload a file) may have not been displayed when it should have been due to the mismatch in case. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Report: "RESOLVED - Delays To Batch and Scheduled Operations"
Last updateAll services have been restored and are operating normally. We have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. Operations were delayed beginning at 4:20 a.m. Pacific and ending by 5:43 a.m. Pacific. All operations did successfully complete despite delays. Only a small number of customers were impacted by these delays. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Report: "RESOLVED - 7 Minute Outage"
Last updateAll services have been restored and are operating normally. We have resolved a major outage on Files.com which affected all Files.com core and auxiliary services in all regions. Services were down from 11:18 a.m to 11:25 a.m., with a total downtime of 7 minutes. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Report: "Five Minute Outage"
Last updateWe have resolved a major outage on Files.com which affected all Files.com core and auxiliary services in all regions. Services were down from 10:22 a.m. to 10:27 a.m., with a total downtime of 5 minutes. All services have been restored and are operating normally. Our engineers have confirmed that all systems have returned to normal. If you continue to experience any issues, please contact our Customer Support team by email or phone. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. Thanks for your support while we resolved this issue.
Report: "USA Region Only: Elevated Error Rates and/or Performance Degradation To Network Services"
Last updateAll services have been restored and are operating normally. We have resolved a performance degradation on Files.com which affected FTP services in the US region, causing transfers to be slower than normal. Other services on the platform may have seen minor performance effects during this timeframe. This incident occurred between the times of 8:52 a.m. to 11:08 a.m. PST. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
We have isolated the performance degradation to FTP transfers (SFTP is no longer impacted). FTP transactions should still complete, however the performance may be slower than normal. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
We are investigating a major performance degradation and/or elevated error rates that is affecting certain Files.com network services in our primary USA region. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "Performance Degradation and/or High Error Rates"
Last updateAll services have been restored and are operating normally. We have resolved a situation causing slight error rates and delays to batch and scheduled operations, including moves, copies, and batch file deletion operations. Operations were delayed beginning at 10:00 a.m. PST and ending by 5:30 p.m. PST. All operations did successfully complete despite delays. Preview jobs and some sync jobs were affected for less than 20 minutes during this timeframe. This situation impacted a small proportion of customers. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
We are investigating reports of delays to batch and scheduled operations, including some moves, copies, and deletes. This situation should not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server.
We are investigating a major performance degradation and/or elevated error rates that is affecting all Files.com core and auxiliary services in all regions.
Report: "Reports of Elevated Errors"
Last updateWe have resolved an issue that was causing elevated errors. This incident occurred between the times of 1:59 AM PST to 4:35 AM PST.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are investigating reports of errors on the Files.com service. This is affecting various elements of the service. Overall error rates are low and recovering. Our engineering team has identified the issue and are implementing further fixes at this time.
Report: "Australia Region Only: Issues Affecting WebDAV, Public Hosting, and ZIP Downloads"
Last updateWe have resolved a major outage impacting WebDAV, publicly hosted files (hosted-by-files.com), and ZIP downloads in the Australia region. This incident did not impact our primary network services such as the Files.com API, FTP, SFTP, WebDAV, AS2, and others. The relevant service was down from 1:12pm PST to 1:24pm PST, with a total downtime of 12 minutes, but only in the Australia region.
Report: "US Region Only: Web Service Elevated Error Rates"
Last updateOn May 2nd, 2023, at 12:40 PM PST, [Files.com](http://Files.com) received automated alerting of elevated rates on web services which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 2nd, 2023, at 1:11 PM PST stating: “**US Region Only: Web Service Elevated Error Rates:** _US Web services only: We are investigating elevated error rates on the web service on_ [_Files.com_](http://Files.com) _in the US region. This is causing preview delays in the web interface. This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others, nor does it impact regions other than US. At this time, we believe that all network services are currently up in our other regional locations.”_ The was resolved on May 2nd, 2023 at 1:04 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 2nd, 2023, at 1:18 PM PST stating: _“All services have been restored and are operating normally. All web services should be operating as normal. The issue with preview processing began at 12:35 PDT and was resolved completely by 1:04 PDT.”_ This incident was started when a deadlock occurred in one of [Files.com](http://Files.com)’s backend job processing systems, specifically the system that generates image and PDF previews of large images and documents for web viewing. A recent code change resulted in the system getting into a state where it locked up and did not process preview generation on 1 out of 6 backend servers. As a result of “backflow” caused by very high error rates, other jobs such as syncs were delayed by 5 minutes on two separate occasions. The root cause of this incident was a failure of [Files.com](http://Files.com)’s internal job scheduling system to probably route around the failed preview worker and prevent its failure from causing broader impact. Ultimately this was caused by a design failure internal job scheduling system, which we have now redesigned to avoid this type of issue. \(See next paragraph.\) A contributing cause was the failure of the preview worker itself, which was caused by [Files.com](http://Files.com)’s failure to properly test the recent code change in a high load situation. As a result of this incident and several other recent incidents, [Files.com](http://Files.com) worked on dramatic improvements to its internal job scheduling code during the last week of April and first week of May, and those improvements have been tested in staging and are now in production. These improvements provide multiple new protection mechanisms to prevent issues with specific customers, job types, or regions from “backflowing” and impacting other customers, job types, or regions. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
All services have been restored and are operating normally. All web services should be operating as normal. The issue with preview processing began at 12:35 PDT and was resolved completely by 1:04 PDT. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
US Web services only: We are investigating elevated error rates on the web service on Files.com in the US region. This is causing preview delays in the web interface. This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others, nor does it impact regions other than US. At this time, we believe that all network services are currently up in our other regional locations. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "Delays To Batch and Scheduled Operations"
Last updateOn May 2nd, 2023, at 12:40 AM PST, [Files.com](http://Files.com) received automated alerts of delays in batch and scheduled operations which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 2nd, 2023, at 1:07 PM PST stating: “**Delays To Batch and Scheduled Operations:** _We are investigating reports of delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. This situation should not affect real-time operations such as FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ The delays in batch and scheduled operations were resolved on May 2nd, 2023, at 12:55 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 2nd, 2023, at 1:15 PM PST stating: _“We have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. Operations were delayed beginning at 12:35 PDT and ending by 12:55 PDT. All operations did successfully complete despite delays. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ This incident was started when a deadlock occurred in one of [Files.com](http://Files.com)’s backend job processing systems, specifically the system that generates image and PDF previews of large images and documents for web viewing. A recent code change resulted in the system getting into a state where it locked up and did not process preview generation on 1 out of 6 backend servers. As a result of “backflow” caused by very high error rates, other jobs such as syncs were delayed by 5 minutes on two separate occasions. The root cause of this incident was a failure of [Files.com](http://Files.com)’s internal job scheduling system to probably route around the failed preview worker and prevent its failure from causing broader impact. Ultimately this was caused by a design failure internal job scheduling system, which we have now redesigned to avoid this type of issue. \(See next paragraph.\) A contributing cause was the failure of the preview worker itself, which was caused by [Files.com](http://Files.com)’s failure to properly test the recent code change in a high load situation. As a result of this incident and several other recent incidents, [Files.com](http://Files.com) worked on dramatic improvements to its internal job scheduling code during the last week of April and first week of May, and those improvements have been tested in staging and are now in production. These improvements provide multiple new protection mechanisms to prevent issues with specific customers, job types, or regions from “backflowing” and impacting other customers, job types, or regions. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. Operations were delayed beginning at 12:35 PDT and ending by 12:55 PDT. All operations did successfully complete despite delays. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
We have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. Operations were delayed beginning at 12:35 PDT and ending by 12:55 PDT. All operations did successfully complete despite delays. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "Canada Region Only: Delays and Errors Related to Certain Background Processing"
Last updateOn May 1st, 2023, at 7:39 AM PST, [Files.com](http://Files.com) received automated alerts of delays and errors related to certain background processing in the Canada region, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 1st, 2023, at 8:10 AM PST, stating: “**Canada Region Only: Delays and Errors Related to Certain Background Processing:** _Canada only: We are investigating elevated error rates related to certain background processing performed as part of the core_ [_Files.com_](http://Files.com) _file transfer pipeline in the \[REGION\] region. Impacted functions of_ [_Files.com_](http://Files.com) _include file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. This situation should not impact customers at all unless they have files stored in the Canada region. This situation should not affect real-time operations such as the_ [_Files.com_](http://Files.com) _API, FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ The delays and errors related to certain background processing in the Canada region was resolved on May 1st, 2023, at 11:46 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 1st, 2023, at 12:18 PM PST stating: _“All services have been restored and are operating normally. Canada only: We have resolved an issue with certain background processing performed as part of the core_ [_Files.com_](http://Files.com) _file transfer pipeline in all regions. Impacted functions of_ [_Files.com_](http://Files.com) _included file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. The issue with background processing began at 6:46 AM PST and was resolved completely by 11:46 AM PST. Resolution means that any background jobs that were previously delayed were now been processed successfully.”_ This incident started when a customer uploaded an large amount of data via our web interface to our Canada region. Due to the exact nature of the files uploaded, our Canada region’s worker servers became overloaded and unresponsive to any type of communication. As a result, all regional background jobs in Canada began failing for customers using our Canada region. Upon investigation, we determined the overload to be caused by a design flaw in our checksum calculation code which failed to properly use all available CPU cores on the machine, and instead only attempted to use a single CPU core. Basically, the machine locked up because dozens of jobs were attempting to use the same core, rather than spreading out to all available cores. As part of the incident resolution, [Files.com](http://Files.com) pushed an update to introduce more parallelism to this calculation and allowed all available CPU cores to be used. Additionally, one CPU core is now reserved for communication with our job scheduling system, which will prevent the communication problems in high load situations in the future. As a result of “back pressure” caused by very high error rates, other jobs on [Files.com](http://Files.com)’s background job scheduling system outside of the Canada region were also impacted with delays. The root cause of the broader delays \(non-Canada\) was a failure of [Files.com](http://Files.com)’s internal job scheduling system to probably route around the failed Canada workers and prevent their failure from causing broader impact. Ultimately this was caused by a design failure internal job scheduling system, which we have now redesigned to avoid this type of issue. \(See next paragraph.\) As a result of this incident and several other recent incidents, [Files.com](http://Files.com) worked on dramatic improvements to its internal job scheduling code during the last week of April and first week of May, and those improvements have been tested in staging and are now in production. These improvements provide multiple new protection mechanisms to prevent issues with specific customers, job types, or regions from “backflowing” and impacting other customers, job types, or regions. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
All services have been restored and are operating normally. Canada only: We have resolved an issue with certain background processing performed as part of the core Files.com file transfer pipeline in Canada. Impacted functions of Files.com included file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. The issue with background processing began at 6:46 AM PST and was resolved completely by 11:46 AM PST. Resolution means that any background jobs that were previously delayed have now been processed successfully. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
We are continuing to investigate this issue. Most delays and errors have been corrected. We continue to work on a small number of transactions that continue to provide delayed errors. If you need additional assistance, please do not hesitate to contact our Customer Support team by email. Thank you for your continued patience.
We are continuing to investigate this issue with Canada region background processing. We will post an update as soon as the issue has been identified and a fix is being implemented. If you need additional assistance, please do not hesitate to contact our Customer Support team by email. Thank you for your continued patience.
We are continuing to investigate this issue with Canada region background processing. We will post an update as soon as the issue has been identified and a fix is being implemented. If you need additional assistance, please do not hesitate to contact our Customer Support team by email. Thank you for your continued patience.
Canada only: We are investigating elevated error rates related to certain background processing performed as part of the core Files.com file transfer pipeline in the Canada region. Impacted functions of Files.com include file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. This situation should not impact customers at all unless they have files stored in the Canada region. This situation should not affect real-time operations such as the Files.com API, FTP, SFTP, AS2, and other operations where Files.com acts as a server. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "SFTP, FTP/FTPS, WebDAV Service Degraded"
Last updateOn May 8th, an d May 9th, 2023, [Files.com](http://Files.com) received multiple automated alerts and customer reports of intermittent issues with the [Files.com](http://Files.com) platform, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 5:12 PM PST stating: _**“SFTP, FTP/FTPS, WebDAV Service Degraded:** FTP/FTPS, SFTP, WebDAV only: We are investigating elevated error rates on these services on_ [_Files.com_](http://Files.com) _in all regions._ _This incident does not impact other network services such as API, AS2, and others._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 5:37 PM PST stating _“All services have been restored and are operating normally._ _Users connecting to accounts with a custom namespace, an ExaVault host key, a custom host key, or an enforced IP whitelist experienced authentication errors. Logins were impacted between 1:34 p.m. PST and 5:33 p.m. PST. Other users may have experienced elevated error rates as well._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ Customers continued reporting other intermittent issues with the platform, which resulted in second incident being declared on May 9th, 2023, at 6:47 AM PST. The IMT convened and immediately began investigation The intermittent issues with the [Files.com](http://Files.com) platform were resolved on May 9th, 2023, at 8:07 AM PST, returning the platform to full functionality. This incident occurred due to a complex set of circumstances with times that vary by region. This narrative will focus on the overall story of what happened. On May 5, [Files.com](http://Files.com) experienced an incident that resulted in a 3\+ hour service outage. Prior to that, on May 3, [Files.com](http://Files.com) conducted a successful upgrade of certain regional proxy servers in certain regions from Intel architecture to ARM architecture as part of our overall transition from Intel to ARM across all of our services. As we explained in the RCA of the May 5 incident, our Incident Management Team originally misidentified the root cause of that incident as being related to the new ARM servers and made the decision to roll back from our new ARM servers to the old Intel servers in certain regions on May 5. Unfortunately, that rollback was not correctly performed. We make use AWS \(Amazon Web Services\) EC2 \(Elastic Compute Cloud\) for all of our compute resources on [Files.com](http://Files.com). Both the Intel and ARM servers being discussed run inside AWS EC2. The EC2 networking backplane suffers from a long-standing bug that we have long been aware of where migrating an IP from one server to another can result in erroneous data reported by EC2 to our instances. In short, if you live migrate an IP on EC2 from one server to another, EC2 can report to both servers that they still “own” the IP. Because of this bug, we have a complicated procedure for migrating IPs from one server to another. This procedure is highly automated and provides that we always fully shut down servers after IPs are moved off of them. This procedure works around the EC2 bug. When we performed the rollback from ARM to Intel servers on May 5, we failed to fully follow our procedure and fully shut down the ARM servers. They were “disabled” using a softer disabling mechanism, but at some point they rebooted and once they rebooted, EC2 began to report conflicting information about which server “owned” the IPs related to this incident. In our architecture, servers report their internal and external IP list to our central routing system on a regular schedule. As a result of the two sets of servers reporting conflicting information, our routing systems began to oscillate routing traffic between the Intel and ARM servers every few minutes, and only one set of servers would work at a given time. The root cause of this incident was our failure to follow our own procedure during the transition between ARM and Intel servers. A major contributing factor was our failure to detect a situation where IP addresses appear to oscillate between multiple servers. Another contributing factor is the AWS EC2 bug that results in incorrect IP address information being reported to instances. As a result of this incident, we have conducted remedial training with all of our Infrastructure team to re-train them on the procedure to migrate IPs from one server to another. We have additionally added new protection to our routing system that will detect a situation where IP addresses oscillate between servers and raise an alarm when that happens in the future. Furthermore, we have improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and treat it as a failure. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We greatly appreciate your patience and understanding as we resolved these issues. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
All services have been restored and are operating normally. Users connecting to accounts with a custom namespace, an ExaVault host key, a custom host key, or an enforced IP whitelist experienced authentication errors. Logins were impacted between 1:34 p.m. PST and 5:33 p.m. PST. Other users may have experienced elevated error rates as well. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
FTP/FTPS, SFTP, WebDAV only: We are investigating elevated error rates on these services on Files.com in all regions. This incident does not impact other network services such as API, AS2, and others. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "SFTP Entirely Down – US East Region (Primary)"
Last updateOn May 8th, 2023, at 1:39 PM PST, [Files.com](http://Files.com) received automated alerting of SFTP entirely down in the US East region which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 8th, 2023, at 1:47 PM PST stating: _**“SFTP Entirely Down – US East Region \(Primary\):** SFTP only: We are investigating a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region._ _This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others._ _If you have an urgent need to access_ [_Files.com_](http://Files.com)_, we recommend using FTP in lieu of SFTP. If you must connect via SFTP, you should be able to immediately connect \(and access your existing files and account\) using the hostname of our Canada region, which is_ [_app-ca-central-1.files.com_](http://app-ca-central-1.files.com)_._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ The SFTP entirely down in the US East region was resolved on May 8th, 2023, at 1:47 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 8th, 2023, at 1:51 PM PST stating _“All services have been restored and are operating normally._ _We have resolved a major outage of the SFTP service on_ [_Files.com_](http://Files.com) _in our primary USA region. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. The SFTP service was down from 1:34 p.m. to 1:47 p.m., with a total downtime of 13 minutes, but only in the primary USA region._ _If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ This incident occurred during a time period that also contained multiple other incidents, some of which are overlapping. This report focuses specifically on the symptoms described here, but many customers who experienced this incident also experienced one of the other incidents. This incident had two distinct parts and root causes. First, [Files.com](http://Files.com) deployed a change to its SFTP server as part of our overall project to dramatically improve the logging and handling of errors on SFTP. The deployment of that change crashed our SFTP servers in several of our smaller regions due to an “out of memory” condition. Our SFTP server is developed in Java, and anyone familiar with Java can tell you how sensitive Java can be to memory configuration settings. We immediately identified the issue with the Java memory settings and pushed a change to Chef, our infrastructure configuration management system, to tweak the SFTP memory settings and resolve the initial crash. The root cause of this first part was [Files.com](http://Files.com)’s failure to monitoring Java runtime parameters such as memory usage to defend against an out of memory condition. We have added additional monitoring around Java memory usage and are optimistic that this situation will be avoided in the future. One benefit of the [Files.com](http://Files.com) architecture as compared with many of our peers is that on [Files.com](http://Files.com), SFTP is a completely isolated subsystem, so this incident did not impact other network services such as FTP, AS2, WebDAV, or API. Unfortunately, when we deployed the configuration change via Chef, we inadvertently deployed an unrelated configuration change at the same time that had been previously merged but not deployed to the SFTP servers. This is due to the fact that we use one unified Chef repository for server configuration where certain recipes can be shared by different server types. That configuration change introduced an error into the upstream communication with our API, resulting in inability to connect via SFTP for certain customers. After investigating the issue, we were able to identify the bad configuration change and revert it. The root cause of the second part is [Files.com](http://Files.com)’s failure to operate adequate change management procedures to prevent an unintended change from being deployed. Our incident management team was quite disappointed to learn about the chain of events that led to this incident. We have already improved our internal synthetic monitoring systems with the ability to detect the situation that occurred during this incident and alert on it immediately. Additionally, as a result of this incident, we are implementing major changes to our change management procedures designed to prevent this sort of configuration management error from happening again. Those changes are fairly complicated and will require a great deal of internal development. As such, they will likely not be deployed until the middle of Q3. It is our goal to have them implemented before our next SOC 2 Type II observation period \(which runs from Q2-Q3 2023\) and documented in our next SOC 2 Type II report. On a more general note, we have added a considerable amount of sophistication to our monitoring and routing systems as a result of the several incidents that occurred in May, and we are adding more. These improvements amount to over 5,000 lines of code and we are optimistic that they will reduce the frequency and impact of incidents in the future. We hope to share more about the improvements in our next SOC 2 Type II report. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
All services have been restored and are operating normally. We have resolved a major outage of the SFTP service on Files.com in our primary USA region. This incident did not impact other network services such as API, FTP, WebDAV, AS2, and others. The SFTP service was down from 1:34 p.m. to 1:47 p.m., with a total downtime of 13 minutes, but only in the primary USA region. If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA region. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
SFTP only: We are investigating a major outage of the SFTP service on Files.com in our primary USA region. This incident does not impact other network services such as API, FTP, WebDAV, AS2, and others. If you have an urgent need to access Files.com, we recommend using FTP in lieu of SFTP. If you must connect via SFTP, you should be able to immediately connect (and access your existing files and account) using the hostname of our Canada region, which is app-ca-central-1.files.com. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "Reports of Elevated DNS Errors"
Last updateOn May 12th, 2023, at AM/PM PST, [Files.com](http://Files.com) received customer reports of elevated DNS errors which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 12th, 2023, at 3:45 PM PST stating: _**“Reports of Elevated DNS Errors:** We are investigating reports of DNS errors on the_ [_Files.com_](http://Files.com) _service._ _This is intermittently affecting some logins for all services._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.”_ The elevated DNS errors was resolved on May 12th, 2023, at 4:16 PM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 12th, 2023, at 4:23 PM PST stating _“All services have been restored and are operating normally._ _We resolved a DNS issue resulting in some intermittent errors on accessing_ [_Files.com_](http://Files.com) _sites. Users without the site name cached were potentially affected from approximately 2:25 p.m. PST to 4:16 p.m. PST. This issue did not anyone with dedicated IP addresses._ _We will follow up with an Incident Report within ten \(10\) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.”_ This incident occurred during the deployment of changes to our corporate domain registrations as part of the post-mortem/resolution process for the incident that occurred on May 5. As discussed in the RCA for that incident, we moved the registration records for all domain names owned by [Files.com](http://Files.com) to CSC Domains, an enterprise and security-focused domain name registrar, for the purpose of mitigating domain name registrar risk. During the process of the domain transfer, the nameservers for one of our domain names were inadvertently entered incorrectly into the new registrar. As a result, DNS lookups for certain domains resulted in failure. This issue only affected a subset of our customers, and did not affect any customers using custom domain names or custom IP addresses. Once we diagnosed the problem, we were able to call CSC Domains and get the matter resolved immediately. As of now, all domains owned by [Files.com](http://Files.com) are managed by CSC Domains, and we do not expect any further registrar-related incidents to occur in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
All services have been restored and are operating normally. We resolved a DNS issue resulting in some intermittent errors on accessing Files.com sites. Users without the site name cached were potentially affected from approximately 2:25 p.m. PST to 4:16 p.m. PST. This issue did not anyone with dedicated IP addresses. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
We are investigating reports of DNS errors on the Files.com service. This is intermittently affecting some logins for all services. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "Delays To Scheduled Operations"
Last updateOn May 15th, 2023, at 1:41 AM PST, [Files.com](http://Files.com) received customer reports of delays to batch and scheduled operations which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on May 15th, 2023, at 1:54 AM PST stating: _**“Delays To Batch and Scheduled Operations:** We are investigating reports of delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations._ _This situation should not affect real-time operations such as FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ [Files.com](http://Files.com) released an updated Status Page posting on May 15th, 2023 at 2:16 AM PST stating: _“We are continuing to investigate this issue.”_ The delays to batch and scheduled operations was resolved on May 15th, 2023, at 2:16 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on May 15th, 2023, at 2:20 AM PST stating: _“We have resolved a situation causing delays to scheduled operations around syncs._ _Operations were delayed beginning at 17:45 PDT and ending by 2:16 PDT. All operations did successfully complete despite delays._ _This situation was only a delay of scheduled syncs and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ As part of the incident postmortem, the root cause was identified: [Files.com](http://Files.com) uses an internally-developed job scheduling software to manage certain background tasks such as syncs. Due to a bug in the software, a value larger than 32-bit was inadvertently stored into a database column only able to hold 32-bit values. The software correctly identified the discrepancy and stopped job processing for Sync jobs specifically until the issue could be manually resolved. After becoming aware of the issue, our team pushed several fixes to improve the robustness of this part of the scheduling software. The root cause of this issue was insufficient testing of the job scheduling software against edge cases such as bad data in a database. This incident was further complicated by the fact that it was not detected by our internal monitoring systems at all. We responded to this incident after being alerted by one of our Enterprise Support customers via our 24/7 contact line for Enterprise Support customers. Frankly, we are embarrassed about this. We conducted a full investigation into the alerting situation and determined that although we did have monitoring about Sync job processing, our alerts for Sync jobs were based on Error Count, not Success Count. So a situation like this one where 0 Errors occurred and 0 Successes occurred during a monitoring interval did not result in an alert. We have updated our alerting rules to look at Success Count in addition to Error count. Additionally we will soon be rolling out a brand new internal monitoring system that will provide an independent source of alerting and monitoring for the entire Sync process. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved a situation causing delays to scheduled operations around syncs. Operations were delayed beginning at 17:45 PDT and ending by 2:16 PDT. All operations did successfully complete despite delays. This situation was only a delay of scheduled syncs and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are investigating reports of delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. This situation should not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server.
Report: "Delays To Batch and Scheduled Operations"
Last updateOn April 22nd, 2023, at 10:30 AM PST, [Files.com](http://Files.com) received automated alerts of delays on batch and scheduled operations in the Canada region which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on April 22nd, 2023, at 11:04 AM stating: _“**Canada Region Only: Delays and Errors Related to Certain Background Processing**: Canada only: We are investigating elevated error rates related to certain background processing performed as part of the core_ [_Files.com_](http://Files.com) _file transfer pipeline in the Canada region. Impacted functions of_ [_Files.com_](http://Files.com) _include file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. This situation should not impact customers at all unless they have files stored in the Canada region. This situation should not affect real-time operations such as the_ [_Files.com_](http://Files.com) _API, FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ [Files.com](http://Files.com) released an updated Status Page posting on April 22nd, 2023, at 11:31 AM PST stating: _“We are investigating reports of delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. This situation should not affect real-time operations such as FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ The delays on batch and scheduled operations in the Canada region was resolved on April 22nd, 2023, at 11:32 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on April 22nd, 2023, at 11:38 AM PST stating: _“We have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. Operations were delayed beginning at 10:30 AM PST and ending by 11:32 AM PST. All operations did successfully complete despite delays. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where_ [_Files.com_](http://Files.com) _acts as a server.”_ This incident was triggered by a customer-initiated regional migration of a fairly large amount of data from our USA region to our Canada region. We typically process hundreds or thousands of such migrations daily without incident. During this process our Canada region worker servers became overloaded. As a result, all regional background jobs in Canada began failing for customers using our Canada region. At the time the incident actually occurred, we incorrectly identified the root cause, but were able to resolve the issue anyway by resubmitting the failed jobs. On May 2nd, another similar incident occurred also in Canada where we discovered the true root cause of this incident. In that investigation, we determined the overload to be caused by a design flaw in our checksum calculation code which failed to properly use all available CPU cores on the machine, and instead only attempted to use a single CPU core. Basically, the machine locked up because dozens of jobs were attempting to use the same core, rather than spreading out to all available cores. As part of that incident’s resolution, [Files.com](http://Files.com) pushed an update to introduce more parallelism to this calculation and allowed all available CPU cores to be used. Additionally, one CPU core is now reserved for communication with our job scheduling system, which will prevent the communication problems in high load situations in the future. As a result of “back pressure” caused by very high error rates, other jobs on [Files.com](http://Files.com)’s background job scheduling system outside of the Canada region were also impacted with delays. The root cause of the broader \(non-Canada\) was a failure of [Files.com](http://Files.com)’s internal job scheduling system to probably route around the failed Canada workers and prevent their failure from causing broader impact. Ultimately this was caused by a design failure internal job scheduling system, which we have now redesigned to avoid this type of issue. \(See next paragraph.\) As a result of this incident and several other recent incidents, [Files.com](http://Files.com) worked on dramatic improvements to its internal job scheduling code during the last week of April and first week of May, and those improvements have been tested in staging and are now in production. These improvements provide multiple new protection mechanisms to prevent issues with specific customers, job types, or regions from “backflowing” and impacting other customers, job types, or regions. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
We have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. Operations were delayed beginning at 10:30 AM PST and ending by 11:32 AM PST. All operations did successfully complete despite delays. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server.
We are investigating reports of delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. This situation should not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server.
Canada only: We are investigating elevated error rates related to certain background processing performed as part of the core Files.com file transfer pipeline in the Canada region. Impacted functions of Files.com include file transformations such as zip/unzip, GPG, preview generation, and file statistics calculation such as MD5. This situation should not impact customers at all unless they have files stored in the Canada region. This situation should not affect real-time operations such as the Files.com API, FTP, SFTP, AS2, and other operations where Files.com acts as a server.
Report: "RESOLVED - FTP Service Only: Elevated Error Rates On Wildcard Searches"
Last updateOn April 21st, 2023, at 10:26 AM PST, [Files.com](http://Files.com) received customer reports of elevated errors rates with wildcard searches on FTP, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The elevated error rates with wildcard searches on FTP was resolved on April 21st, 2023, at 10:36 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on April 21st, 2023, at 11:00 AM PST, stating: _“**FTP Service Only: Elevated Error Rates On Wildcard Searches:** FTP only: We identified elevated error rates on the FTP service on_ [_Files.com_](http://Files.com) _in all regions. Attempts to list files using \* \(wildcard\) searches were not returning results. This issue affected wildcard search results from 11:00 a.m. PDT on 4/20 until 10:36 a.m. PDT on 4/21. All services have been restored and are operating normally. This incident did not impact other network services such as API, SFTP, WebDAV, AS2, and others.”_ This incident began when a code change was deployed to the [Files.com](http://Files.com) API to make the operation of filtering operate consistently with the API documentation at [http://developers.files.com/](http://developers.files.com/). However, it turns out that FPS-FTP, [Files.com](http://Files.com)’s FTP server, was relying on undocumented API behavior in order to implement filtering, and therefore broke when the API change was deployed. The incident was resolved by updating FPS-FTP's use of the API to follow the documented API interface. The root cause of this issue was the [Files.com](http://Files.com) FTP team’s failure to rely on only documented API behavior. As an additional note, although the [Files.com](http://Files.com) FTP interface uses the [Files.com](http://Files.com) API internally, it does not use a [Files.com](http://Files.com) SDK as a wrapped around that API connection. We recommend to our customers to always use a [Files.com](http://Files.com) SDK because it provides protection against incorrect API usage, and if we used it ourselves for FTP, it may have prevented this incident. We are tracking a long-term project to have FPS always use a [Files.com](http://Files.com) SDK rather than implementing API methods directly. We hope to have this project accomplished this year. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
FTP only: We identified elevated error rates on the FTP service on Files.com in all regions. Attempts to list files using * (wildcard) searches were not returning results. This issue affected wildcard search results from 11:00 a.m. PDT on 4/20 until 10:36 a.m. PDT on 4/21. All services have been restored and are operating normally. This incident did not impact other network services such as API, SFTP, WebDAV, AS2, and others. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
Report: "Reports of Elevated Errors - 2FA and SSO Logins"
Last updateOn April 20th, 2023, at 11:57 AM PST, [Files.com](http://Files.com) received customer reports of 2FA login errors which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. [Files.com](http://Files.com) released an initial Status Page posting on April 20th, 2023, at 11:57 AM PST, stating: _“**Reports of Elevated Errors - 2FA and SSO Logins:** We are investigating reports of errors on the_ [_Files.com_](http://Files.com) _service. This is affecting user logins on the web portal that use two-factor authentication or single-sign on.”_ The 2FA login errors were resolved on April, 20th, 2023, at 11:58 AM PST, returning the platform to full functionality. [Files.com](http://Files.com) released a resolution Status Page posting on April 20th, 2023, at 12:08 PM PST stating: _“All services have been restored and are operating normally. We have resolved an issue that was causing web logins to fail when using single sign-on or two-factor authentication. This incident occurred between the times of 11:28 to 11:58 a.m.”_ This incident began when our frontend team deployed a code change designed to fix a bug in the login process in the [Files.com](http://Files.com) web interface. Unfortunately, the code change actually made things worse and prevented 2FA or SSO logins from working at all. This change was rolled back as soon as the incident was identified. The root cause of this incident was [Files.com](http://Files.com)’s failure to properly identify a change to the login page as a high-risk change and implement a sufficient level of automated and manual testing prior to deployment. [Files.com](http://Files.com) will be conducting remedial training on change risk identification at its upcoming Engineering meeting. Extensive review and testing was conducted by [Files.com](http://Files.com) staff to ensure this resolution, and we have already taken steps internally to prevent this issue from recurring in the future. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
All services have been restored and are operating normally. We have resolved an issue that was causing web logins to fail when using single sign-on or two-factor authentication. This incident occurred between the times of 11:28 to 11:58 a.m. We will follow up with an Incident Report within ten (10) business days including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
We are investigating reports of errors on the Files.com service. This is affecting user logins on the web portal that use two-factor authentication or single-sign on. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "All Regions: Major Network Outage"
Last updateOn May 5th, 2023 at 11:39 AM, [Files.com](http://Files.com) received internal monitoring alerts of issues related to DNS, which resulted in an incident being declared. The Incident Management Team \(IMT\) convened and immediately began investigation. The DNS resolution issue was resolved on May 5th, 2023, at 3:20 PM, returning the platform to full functionality. [Files.com](http://Files.com) released an initial investigation to its Status Page posting on May 5th, 2023, at 12:00 PM PST stating: _“All regions: We are investigating a major network outage on_ [_Files.com_](http://files.com/) _affecting_ [_Files.com_](http://files.com/) _services in all regions. This outage is affecting our gateway networking in regions other than USA, and our Core Services are running correctly._ _At this time, we are also investigating elevated error rates in our primary USA region._ _We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience_.” [Files.com](http://Files.com) released an updated investigation to its Status Page posting on May 5th, 2023, at 12:58 PM PST stating: _“We are continuing to investigate this issue. We will post an update as soon as the issue has been identified and a fix is being implemented. If you need additional assistance, please do not hesitate to contact our Customer Support team by email. Thank you for your continued patience.”_ [Files.com](http://Files.com) released a second updated investigation to its Status Page posting on May 5th, 2023, at 14:03 PM PST stating: _“We are continuing to investigate the DNS issue causing_ [_Files.com_](http://files.com/) _sites to be inaccessible for some users. We will post an update as soon as the issue has been identified and a fix is being implemented._ _If you need additional assistance, please do not hesitate to contact our Customer Support team by email. Thank you for your continued patience.”_ [Files.com](http://Files.com) released a resolution Status Page posting on May 5th, 2023, at 3:30 PM PST stating: _“All services have been restored and are operating normally. We have identified and resolved the root cause underlying these DNS issues._ _We will follow up with an Incident Report within one business day including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone.”_ This incident started when a prominent FinTech company sent a fraudulent and erroneous report accusing [Files.com](http://Files.com) of cybercrime to GoDaddy, the .COM domain name registrar where the [Files.com](http://Files.com) domain name is registered. This FinTech company is a [Files.com](http://Files.com) customer, and the report was sent in error. Upon receipt of the accusation, GoDaddy chose to immediately suspend the [Files.com](http://Files.com) domain without providing [Files.com](http://Files.com) any advance notice or warning. Once the domain suspension was identified, we worked with GoDaddy and the source of the erroneous cybercrime report to escalate and respond to the erroneous report, which ultimately resulted in GoDaddy removing the suspension from the [Files.com](http://Files.com) domain, returning the services back to functionality. Unfortunately, escalating this situation within GoDaddy took a disappointingly long time and we ended up with a 3\+ hour downtime. We are thoroughly disappointed with the way they handled the situation. You may be questioning why we use GoDaddy as our domain registrar, and we want to speak to that briefly. They're actually a customer of ours and neighbor of ours in Scottsdale, AZ, and we have a generally good relationship with the company. With that said, we were already in the process of moving all of our domain registrations to CSC Domains, an enterprise-focused and security-focused domain registrar whose business is structured to prevent exactly these sorts of mishaps. Although we have our GoDaddy account secured as strongly as possible, including with two-factor authentication, CSC Domains offers a much stronger level of enterprise security and protections, and domain registrar risks are something we identified in a previous meeting of our risk committee. Unfortunately, we missed the mark on timing, because we ended up having an incident with GoDaddy prior to completing that migration. As of Monday, May 8th, 2023, we are actively underway with migrating the [Files.com](http://Files.com) domain \(and all other domains that we own\) to CSC. The transfer process, which is controlled by GoDaddy, can take up to a week to complete. Ultimately, the root cause of this incident was [Files.com](http://Files.com)’s use of GoDaddy as its domain registrar and our failure to complete the project to switch to CSC Domains in a more timely manner. We recognize the impact that this incident has had on our customers. We greatly appreciate your patience and understanding as we resolved this issue. If you need additional assistance or continue to experience issues, please contact our Customer Support team.
All services have been restored and are operating normally. We have identified and resolved the root cause underlying these DNS issues. We will follow up with an Incident Report within one business day including the root cause and steps taken to address the root cause. If you need additional support, please do not hesitate to contact our Customer Support team by email or phone. Thanks for your support while we resolved this issue.
We are continuing to investigate the DNS issue causing Files.com sites to be inaccessible for some users. We will post an update as soon as the issue has been identified and a fix is being implemented. If you need additional assistance, please do not hesitate to contact our Customer Support team by email. Thank you for your continued patience.
We are continuing to investigate this issue. We will post an update as soon as the issue has been identified and a fix is being implemented. If you need additional assistance, please do not hesitate to contact our Customer Support team by email. Thank you for your continued patience.
All regions: We are investigating a major network outage on Files.com affecting Files.com services in all regions. This outage is affecting our gateway networking in regions other than USA, and our Core Services are running correctly. At this time, we are also investigating elevated error rates in our primary USA region. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "USA and Australia Regions Only: Elevated Error Rates and/or Performance Degradation To Network Services"
Last updateWe have resolved a performance degradation on Files.com affecting Files.com services in our primary USA and Australia regions. This incident occurred between the times of 11:51am PST to 3:04pm PST, but only in the primary USA and Australia regions. If you previously moved any workloads to another region in response to this incident, you are cleared to move those regional workloads back to the USA or Australia regions.
We are investigating a major performance degradation and/or elevated error rates that is affecting all Files.com network services in our primary USA and Australia regions. If you have an urgent need to access Files.com, we recommend using SFTP in lieu of FTP. If you must connect via FTP, you should be able to immediately connect (and access your existing files and account) using the hostname of our Canada region, which is app-ca-central-1.files.com. We will provide additional details as they become available. Customers with urgent questions are encouraged to contact our Customer Support team by email. Thank you for your patience.
Report: "Delays To Batch and Scheduled Operations"
Last updateWe have resolved a situation causing delays to batch and scheduled operations, including but not limited to scheduled and ad-hoc syncs, moves, copies, file transformations, automations, webhooks, batch deletes, email delivery, previews, and similar batch operations. Operations were delayed beginning at 3:58 AM PST and ending by 4:22 AM PST. All operations did successfully complete despite delays. This situation was only a delay of scheduled and batch processing and did not affect real-time operations such as FTP, SFTP, AS2, and other operations where Files.com acts as a server.