Historical record of incidents for CocoaPods
Report: "Podspec deploys down"
Last updateHey folks, I was exploring ways to handle the occasional errors we get on new Pod deploys to trunk ( #11621 ) which involved trying to ease the API pressure on trunk (lots of folks are asking for Podspec info via the trunk API) and tried putting the API in-front of cloudflare's DNS-level CDN. This, while doing a bit of help on the pressure, also took down deploying a podspec for 2 hours - I disabled the DNS-level CDN and am currently trying a more fine-grained approach to just get any API calls to `/api/v1/pods/MessagePack.swift/specs/latest` - ./orta
Report: "CDN Unreliable"
Last updateCalling it good, we've had no feedback of anyone having issues for the last 4 hours.
We think this is fixed, and is now in monitoring. The DDoS detection was triggered which can happen occasionally because CDN activity is still going up overall. It looks like traffic bounces over a threshold for stricter rules now. The way DDoS detection works in Cloudflare is through a series of rules/heuristics which can be individually tuned ( which is what I looked at in https://github.com/CocoaPods/CocoaPods/issues/11355#issuecomment-1123465704 .) After tuning the one people were hitting down, we paused a bit to determine if it was making changes. After seeing no changes, we flipped every rule/heuristic to off https://github.com/CocoaPods/CocoaPods/issues/11355#issuecomment-1123499704 What we were seeing was still similar CDN traffic patterns but some traffic occasionally getting through correctly. This was when we reached out to Cloudflare support. With their help we determined that it was likely that the migration of our rules from the central settings repo to the different CDN edges (e.g. for simplicity think the servers closest to users each time) were using the cached (older) settings. This meant some regions didn't have the new rules saying ignore the rule about 'allow the custom user-agent' - the Cloudflare support folks clears up the cache and now its looking like everything is working fine. I'm going to maybe give till the weekend till I go and turn back on some of the DDoS settings (but not the main culprit) - giving me now time to go do some [baking with my wife](https://mastodon.social/@orta/108243829603470408) on our day off :P
We've been in conversation with Cloudflare, still figuring it out, no hard ETA yet but things are happening
Our DNS provider and front-end to the CDN Cloudflare has triggered a DDoS detection with normal CocoaPods traffic to the CDN. We have a raised support request with them and are figuring out how to bypass. https://github.com/CocoaPods/CocoaPods/issues/11355
Report: "CDN down due to DNS issues"
Last updateWe shipped a revised version of the CDN this morning, and have received no reports of issues - which we think should address everything.
We're getting reports that It's possible that we've fixed this with some DNS changes. Not certain of the original trigger/cause though.
We're investigating in https://github.com/CocoaPods/CocoaPods/issues/10078
Report: "Maintenance work on Trunk"
Last updateThis incident has been resolved.
This should now be done, and trunk pushes are allowed again
We'll briefly disable uploads to trunk on the 13th of April at 12EST to ensure a smooth migration for CDN-powered pods. See https://github.com/CocoaPods/Specs/pull/14409 for details.
Report: "CocoaPods Search is Unresponsive"
Last updateSearch has been re-created from scratch using Algolia's index by @haroen and @jugutier - we've added the simplest version of the search on cocoapods.org, doing so has removed a lot of the features we used to have. How much of those features come back is still up in the air, but I'm moving this to being operational :+1:
We are currently investigating this issue.
Report: "GitHub returning 500 for Trunk Commit Pushes"
Last updateLooks like everything has been back to normal for the last hour 👍
Not too sure if there's anything we can do here. I've been seeing some hiccups just browsing GitHub, so I assume it's a system wide thing that hasn't been raise on their status pages. ./
Report: "Trunk pushing to GitHub is down"
Last updateReasonably sure this was just a one off GitHub API issue that was fixed on their side. In the mean time, I added some more debugging tools for trunk admins in-case this happens again. ./
We're currently seeing issues creating Podspecs on the CocoaPods/Specs repo, so for the moment uploading new Podspecs is disabled.
Report: "CocoaDocs Update to Xcode 7.3"
Last updateWell, OK, that was quicker than expected. Go me. Some interesting findings too, https://twitter.com/orta/status/721342807538069504
Hello there, it had to happen sometime, and that sometime is this weekend. Honestly, last time for 7.2, this took a week, and I'm really, really, really hoping it will take less time this time. Hah, sure.
Report: "`CocoaDocs Upgrading"
Last updateAlright, the CocoaDocs server is working as well on the server as it is on my machine, should be ok. :+1:
I've started to update the installed Xcode on CocoaDocs and do the usual system maintenance. This is never a simple operation. Hoping to have everything back up and running today, so expect delays on seeing the READMEs/QIs on CocoaPods.org ./
Report: "CocoaDocs software update"
Last updateAlright, by the morning the MAS had updated Xcode - now it has 6.4, 7.0.1 and 7.1 available for swift projects.
I updated the Mac, and got the server back up and running but not on 7.1 yet. So moving to monitoring but not fully fixed.
Installing Xcode 7.1 for Swift 2.1 support, and daring to hit the "update button" for the OS install. Don't expect this to be a long one.
Report: "Work on CocoaDocs"
Last updateWe're now running on El Cap and Xcode 7 is the default Xcode.
Looks like El Cap's System Integrity Protection is really making this difficult, could be another few days. :-/
We're upgrading the CocoaDocs server to Swift 2.0 and El Cap, hard to estimate the time this could take. We will re-run all pods that are submitted to trunk during the outage.
Report: "Search Engine down & resolved"
Last updateThis incident has been resolved.
Report: "Upstream issues at Heroku & GitHub."
Last updateThis incident has been resolved.
We are currently investigating this issue.
A fix has been implemented and we are monitoring the results.
Report: "Updating to Xcode 6.3"
Last updateLooking good.
Got swift 1.2 running ( turns out installing Xcode was non-trivial ) re-running all docs from the downtime. Will monitor for a few hours periodically.
OK, time to upgrade the Xcode on CocoaDocs. This is one of those "could be 10 minutes, "could be 10 hours" kind of jobs. Hoping for the former. After this support for Swift 1.2 will be the only option for documentation. Blame Apple for making it hard to run multiple Xcode's.
Report: "Trunk authentication issues"
Last updateThis incident has been resolved.
Looks like it's over, we'll be looking at ways to protect against API usage without timeouts.
We're experiencing an unusual amount of HTTP traffic to trunk which is causing timeouts.
Report: "Moved to HTTPS"
Last updateNot strictly an incident per se, but we moved the majority of the CocoaPods websites to HTTPS yesterday, so if you are experiencing problems, please run try clean your web cache.
Report: "CocoaPods.org down"
Last updateWe use heroku web hooks on master to auto deploy a lot of our web-tools. CocoaPods.org had it enabled but it never worked, until last night at 4am NYC time, and it rolled back to a commit from a few days ago, twice. Glad to have the webhook working, but not strictly sure what triggered it yet. It was resolved by Florian Hanke redeploying master. For now the webhook is turned off, until it can be tested correctly.
Report: "Trunk Offline"
Last updateThis issue was identified and a fix has been deployed.
We're experiencing an outage for Trunk are currently looking into the issue.
Report: "CocoaDocs down for Maintenance"
Last updateThe software updates are complete and the CocoaDocs server is once again operational.
The CocoaDocs server is down for scheduled maintenance, including software upgrades.
Report: "CocoaDocs - New Pod Documentation Affected"
Last updateGeneration of documentation for new pods has been restored.
There is a currently a failure with CocoaDocs uploading new Pod documentation to the website. This will cause newly release Pods to not have any documentation until the service is restored.
Report: "DNS Issues via DNSimple"
Last updateIf you're still experiencing this you may need to flush your DNS cache.
December 2nd our DNS host went down making the search, guides and blog inaccessible - http://dnsimplestatus.com