Historical record of incidents for Campfire HQ
Report: "Ranking System"
Last updateWe have now resolved this issue. Ranking is now operational, though many cookies may have been expired due to a change pushed by Roblox.
Campfire/Hyra ranking is currently not performing as expected due to a change with how Roblox manages cookies. We are working on a solution.
Report: "Issues with bots"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have identified the issue and are rolling out a fix.
The issue has been identified and a fix is being implemented.
A fix has been implemented and we are monitoring the results.
We're currently investigating an issue with our bot network
Report: "Delayed or no response on website"
Last updateThis incident has been resolved.
This issue is caused by an upstream internet provider issue. We will provide more information as we receive it.
Report: "Degraded ranking performance"
Last updateRoblox has recovered and Campfire Ranking requests are now operational.
A Roblox issue causing decreased performance on the platform is impacting performance of Campfire Ranking.
We are currently investigating this issue.
Report: "Degraded ranking performance"
Last updateRoblox has recovered and Campfire Ranking requests are now operational.
We're experiencing an elevated level of API errors due to a Roblox outage. The majority of requests are still going through, however a small percentage are failing.
Ranking performance is currently degraded. We are looking into this issue and will provide an update shortly.
Report: "Security vulnerability"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
A fix has been implemented and these changes are now deploying across the CFHQ network.
We are working hard to roll out a fix as soon as possible for this security bug.
Report: "Member Counters"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
An issue has automatically been identified with member counters. A team is now investigating the issue.
Report: "All systems unavailable"
Last updateWe're marking this as resolved, further information can be found at https://www.cloudflarestatus.com/
There is currently an outage across Campfire products and services due to an upstream internet issue.
Report: "Ranking services down"
Last updateThis incident has been resolved.
We are currently investigating this issue.
Report: "Member Counters not operating as expected."
Last updateThis incident has been resolved.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue regarding member counters not sending correct or at a reduced interval. As this product continues to grow in size, we are finding it increasingly difficult to manage the 1,728,000+ requests per day we send to Roblox. We are working closely with our DevOps Team to resolve the situation as soon as possible and we will update you as required. We apologise about the inconvenience.
Report: "Roblox.com down"
Last updateThis incident has been resolved.
The site is now resumed. We are monitoring and will update if the situation changes.
Issues are still occuring. We will update you as required.
The site now back up. Our team are continuing to monitor and will keep you updated should issueso occur again.
The site is now coming back online slowly.
Roblox.com is currently down.
Report: "Member Counters"
Last updateThis incident has been resolved.
We are working hard on a fix.
A fix has been implemented and we are monitoring the results.
Weāre going to test a fix in production now. The only caveat of this is that there will be an increased delay between the counting - i.e. a longer amount of time will pass before we check for new members.
There is currently an issue where member counters may jump back and forth between two numbers. This is caused by a problem in the Roblox group count system, and we are looking into fixes for this problem. This service continues to operate as normal.
Report: "Member counters not operating"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
A fix will be rolling out very shortly.
Issue has been identified and a fix is being implemented.
Our DevOps Team are continuing to investigate the issue.
DevOps Team are now working on a solution to this problem.
All hooks and the ability to create and manage hooks has been disabled until DevOps Team becomes available. Apologies about the inconvenience.
All DevOps team are currently unavailable. No expectation of resolution until DevOps Team become available again in 5-10 hours.
We are currently investigating this issue.
Report: "Issues with caching CDN"
Last updateThis incident has been resolved.
This issue may still be present for some users. We advise clearing your cache or using Ctrl + Shift + R / ā + Shift + R
We expect this to be resolved for all users by 00:00 UTC.
A fix has been implemented and we are monitoring the results.
Report: "Billing System Not Activating Bots"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue was identified as a type error. Weāre now rolling out a patch.
We are currently investigating this issue.
Report: "Ranking issues"
Last updateThis incident has been resolved.
A fix has been deployed. We are now monitoring the fix.
Our team are working on implementing rate limiting technology. We wonāt ever hard rate limit, weāll simply slow down your requests in transit. This is to prevent us from sending too many requests to Robloxās servers. We will publish documentation to our Developer Docs when this has been implemented. Service will resume once this has been implemented. Thank you for your patience and cooperation.
We've identified the root cause as a denial of service attack on our ranking service. We are taking security precautions to resolve the issue.
We have no ETA of when this can be resolved. The issues are beyond the scope of Campfire, and are with our providers.
We are continuing to investigate this issue.
We've temporarily paused this service as we are facing issues on our backend database. We work with our team to resolve the issue.
We are continuing to investigate this issue.
We're currently investigating an issue with Campfire ranking and 429 errors. We're working closely with our DevOps Team to resolve the issues as quickly as possible.
Report: "API Timeouts"
Last updateResolved on our end.
We've directed all traffic to our AMS3 server.
There is currently issues with our Digital Ocean NYC1 server.
Report: "Error accessing portal"
Last updateThis incident has been resolved.
A fix has been implemented and is now rolling out over our CDN.
This issue has been identified as an SSL error. A fix is being implemented.
Report: "Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are now going to be putting our public bot and all other client bots onto a different host, as we're becoming more and more reliant on another hosting provider instead of AWS.
The root cause is our AWS EC2 Instance. It has stopped unexpectedly.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are currently investigating this issue.
Report: "Bot Offline"
Last updateThis incident has been resolved.
Weāre still working on a fix, as this appears to be caused by a bug with prefixes. Weāre going to be changing the prefixes feature to improve stability and reliability, as it has caused quite a large majority of uptime over the last month.
Bot is offline and we are working on a fix.
Report: "Bot not responding"
Last updateThis incident has been resolved.
We are continuing to work on a fix for this issue.
This issue has been identified as high CPU utilisation.
We are currently investigating this issue.
Report: "š Community Forum"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
Weāre currently investigating issues with all users accessing our community forum.
Report: "Typeform 500 Error"
Last updateThis looks to have been resolved now!
There's currently an issue with Typeform meaning that no users can access our Typeforms. We believe this is across the whole of Typeform. We will update you with more information soon.
Report: "Downtime"
Last updateThis incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are now monitoring our services to ensure that everything is working. For some customers it may take additional time due to a change in DNS.
Weāve identified this issue as a change in our IP address. We are now working to resolve this.
Weāre continuing to have issues with Campfire. Campfire apologises and is working to resolve the issue.
We are currently investigating this issue.
Report: "Bot not responding to commands"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We believe this was a localised issue. Weāve identified the problem and are working to a resolution.
Weāre currently investigating an issue where the bot is not responding to commands.
Report: "Elevated API response time"
Last updateThis incident has been resolved.
This issue looks to have fixed itself. Weāre monitoring the bot to assure quality and performance.
Weāve checked with our whitelabel bots and theyāre running a-ok, therefore weāre ruling out a connection problem.
We are currently investigating an issue affecting the latency of the bot. Weāll update you when we have more information. Live updates available via Twitter and https://cmpf.ml/status
Report: "Issue with Cronjobs"
Last updateThis incident has been resolved.
Weāve now restored all clients bots. Weāll monitor this over the next 24 hours and see what we get back.
This issue has been identified as an issue with one of our dependencies. Weāve restarted the server and are in the process of restoring all clients bots
We are continuing to investigate this issue.
Weāre restarting our main server to see if it resolves the issue.
We are currently investigating this issue.
Report: "Elevated Response Time"
Last updateThis incident has been resolved.
Weāre currently experiencing elevated response time across all Campfire services whilst Discord experience issues. Live updates available on https://status.discordapp.com
Report: "Major Outage"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
We believe to have identified and issue and a fix is now being implemented.
This issue appears to have been present since 02:55AM (approx) UTC. We're extremely sorry and are still looking into solutions.
We're currently investigating the issue.
Report: "Downtime to main server"
Last updateThis incident has been resolved.
This issue has been identified as a lack of disk drive space on the hard-drive. We're now working to resolve the problem.
We are currently investigating this issue.
Report: "Issue with API."
Last updateWe believe this to now have been resolved.
This issue has now been identified and we are working with our partners to resolve it.
Report: "Campfire Downtime"
Last updateRecovered and expected to be a Discord API issue.
We are currently investigating this issue.
Report: "Discord - Server Outages and Increased API Errors"
Last updateThis incident has been resolved.
Campfire is experiencing issues due to a Discord outage. Please see further information here: https://status.discordapp.com/
Report: "Elevated response time"
Last updateThis incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are currently investigating this issue.
Report: "Bot Downtime"
Last update# Postmortem: Downtime šØ Hi all, Campfireās Development Operations Manager, Sam here with a postmortem for the issues experienced here. ## What went wrong? Weāve been looking at our history of downtime like this and weāve come to the conclusion that weāre simply putting a little too much load on our London server. This likely is causing the server to āburstā \(run out of CPU credits\) and therefore is powering down our instance. Weād have no issue believing this, but the issue is itās not actually powering down. Itās becoming entirely unresponsive. Weāve taken screenshots via the AWS EC2 manager and the instance looks perfectly normal. ## What do we 100% know? We know that the server became overloaded \(at about 99%\) CPU utilisation and then went completely silent. This therefore leads us to believe itās a performance issue. ## What are we doing about this? Weāre doing a few things. 1. Weāre moving our EC2 instance to America, so it has a quicker response time with the Discord gateway \(this therefore means less CPU time and better performance\) 2. Weāre investing more money into compute power to increase our cloud footprint by at least 200%. 3. Weāre trying out a different process manager. We previously were using forever. Weāre now trialing PM2 based on a recommendation from a third party who develops bots and services in our space. We donāt anticipate to have any more issues like this, but naturally we may have some again in the future. We are trying our best and training staff on procedures for our downtime so we can handle it in the best way possible. Many thanks for your continued support.
All looks healthy! We're closing this incident now. So sorry about the issues this caused. A post mortem may be published soon.
We are continuing to monitor for any further issues.
All bots have been restored and we are now monitoring the issue to ensure that things run smoothly.
We're now beginning to restore all bots running on this instance.
The issue has been identified as a large amount of stress on our CPU causing it to terminate our instance. We're now looking into what caused this large CPU spike. CPU Graph: https://cdn.discordapp.com/attachments/552944213768011816/693491730477088858/unknown.png
This issue appears to have been present since UTC 13:55. We're now launching an investigation with our Engineering Ops Team. We'll keep you updated as this case develops. Our main priority is to restore service.
We are currently investigating this issue.
Report: "Bot not responding"
Last updateThis incident has been resolved.
We're continuing to investigate this issue and will update you as required.
We're currently investigating an issue that's making the bot not respond to commands.