Your API calls are failing. The instinct is immediate: open the logs, check the recent deploy, stare at the request payload. Twenty minutes later you find out the vendor had an incident. Their status page has been showing a degraded state for the past half hour.
This is one of the most common and most expensive debugging loops in software development. The fix isn't faster debugging — it's knowing the answer before you start.
The Core Diagnostic Question
Every third-party API failure comes down to a binary: is the problem in your system or theirs? Everything else flows from that question. If it's yours, debug your code. If it's theirs, stop debugging entirely — route around it, communicate with your users, and wait.
The mistake is assuming it's you by default. That assumption is natural (you just deployed, after all), but it costs time every time it's wrong.
Step 1: Read the Error Code
HTTP status codes are the first triage signal, and they're reliable:
| Code | What it means | Who to blame |
|---|---|---|
| 400 Bad Request | Malformed request — bad parameters, missing fields | Your code |
| 401 Unauthorized | Auth failure — expired token, wrong key | Your config |
| 403 Forbidden | Access denied — wrong permissions or plan | Your account |
| 429 Too Many Requests | Rate limit hit | Your usage pattern (or a traffic spike) |
| 500 Internal Server Error | Something broke on their end | Probably them |
| 502 Bad Gateway | Upstream failure behind their load balancer | Probably them |
| 503 Service Unavailable | Their service is explicitly unavailable | Them |
| 504 Gateway Timeout | Their servers aren't responding | Them |
The 4xx/5xx boundary is your first split. A 400 or 401 points inward — check your request construction and credentials. A 500 or 503 points outward — stop debugging your code and check their status.
There are edge cases. A vendor with bad error handling might return a 500 for a malformed request. A rate limit might be presented as a 503. But the boundary is correct often enough to be your first move.
Step 2: Check the Vendor Status Page
If you're getting 5xx errors and nothing changed on your end, go to the vendor's status page immediately. Don't file a ticket. Don't dig deeper into logs. Go look.
Most major services break their status into components. Stripe separates payment intents from webhooks from the dashboard. AWS breaks down by service and region. GitHub separates Actions from the API from Git operations. This matters: "Stripe is operational" doesn't tell you much if their payment intents component is degraded.
The limitation is that status pages require you to go look. They don't push alerts. They have publication lag — vendors typically post an incident 5–15 minutes after it starts, sometimes longer. And they're written by the vendor, so the framing is sometimes optimistic.
Use status pages as confirmation, not as your primary detection mechanism.
Step 3: Check If It's Specific to You
Some failures look like vendor outages but are account-specific. A few things to verify before concluding it's a broad incident:
- Is your API key valid? Regenerated keys, rotated secrets, and permissions changes can look like a service failure.
- Are you in a restricted region? Some APIs have geographic restrictions or regional routing that affects only certain users.
- Are you on a plan that includes this feature? Hitting a plan limitation often returns a 403 or a vendor-specific error that looks like a failure.
- Is it affecting all customers or just some? If you can reproduce from multiple environments and the error is consistent, it's more likely a vendor issue.
A quick test: try the same API call with a known-good key from your staging environment or from a curl command outside your application stack. If it fails there too, it's upstream.
The Decision Flowchart
When an API call fails, run this sequence:
- Is the error code 4xx? → Check your request construction, credentials, and account state. This is your code or config.
- Is the error code 5xx? → Go to their status page immediately.
- Is there an active incident on their status page? → Stop debugging. Document the incident time, communicate with affected users if needed, and wait for resolution.
- No incident on the status page, but still getting 5xx? → The incident may not be posted yet (lag is common). Try from a different environment. If it reproduces, file a support ticket with your request IDs.
- Error is ambiguous (network timeout, connection reset)? → Check if others are reporting issues. Try from a different network or region.
This sequence doesn't take long. Running it takes two minutes. Not running it can cost an hour.
Where Statusfield Fits
The status page step is the bottleneck. You have to know to check it, remember the URL, and navigate to the right component. During an incident — when you're under pressure and everything is happening at once — that's friction you don't need.
Statusfield monitors official vendor status pages continuously and delivers the signal the moment it matters. When Stripe posts a degraded payment intents incident, Statusfield picks it up and routes the alert to whoever needs to know — before you've opened your log viewer.
For teams that rely on third-party APIs in production, this changes the diagnostic sequence: step 2 (check vendor status) becomes a push notification rather than a manual lookup. You know before you start debugging whether the problem is yours to fix.
Statusfield's free plan monitors up to 3 services. The Pro plan ($29/month) covers up to 20 services with unlimited email and Slack notifications.
FAQ
What's the difference between a 500 and a 503? Both indicate server-side errors, but they carry different signals. A 500 (Internal Server Error) means the server encountered an unexpected condition — it could be a bug, a dependency failure, or a resource exhaustion. A 503 (Service Unavailable) is explicit: the server is intentionally not serving requests, often due to maintenance or overload. Both point to the vendor, but a 503 is a clearer signal that the service knows it's unavailable.
When should I trust a vendor's status page that shows "All Systems Operational"? Status pages have publication lag. A vendor may be investigating an incident internally before posting publicly — this window can be 5–30 minutes. If you're consistently getting 5xx errors and the status page shows all green, don't take it at face value. Try reproducing from a different environment, check developer forums or social channels, and file a support ticket with your request IDs. The absence of a status page entry is not proof that the vendor is healthy.
What if the error is a network timeout rather than an HTTP status code? Connection timeouts and network resets are harder to attribute. They can come from your network, an intermediate proxy, the vendor's load balancer, or the vendor's backend. Start by testing from a different network or environment. If the timeout is consistent from multiple origins, check the vendor status page for connectivity or networking incidents. Some vendors post these under component names like "API Gateway" or "Edge Network."
How does Statusfield help with this diagnostic process? Statusfield monitors official vendor status pages continuously and delivers alerts when incidents are posted. In practice, this means that when you're looking at a failing API call, you can check your Statusfield dashboard or alert history to immediately see whether there's a known incident — instead of navigating to each vendor's status page manually. Statusfield monitors component-level status, so you can see whether it's a specific endpoint or the whole service.
Should I alert my users when a third-party vendor has an outage? Yes, if the outage affects user-facing functionality. The alternative — letting users hit errors without explanation — is worse for trust than a clear status update. A short in-app banner ("We're aware of an issue with payment processing. We'll update as soon as it's resolved.") is almost always the right call. The faster you know about the incident (via monitoring), the faster you can show this message rather than a cryptic error.
What do I do while waiting for a vendor to resolve an incident? Document the incident start time and the error you're seeing. If you have a fallback (a different payment processor for non-critical transactions, cached data for read endpoints), enable it. Communicate status to your users on a schedule — "we're monitoring and will update in 30 minutes" sets expectations. When the vendor resolves the incident, verify recovery from your end before removing the status message.
Know the moment a tool you depend on goes down
Statusfield watches 2,000+ services your business depends on and alerts you the moment they break.
Free plan · No credit card
Related Articles
How to Debug Slow API Responses: Yours vs. Theirs
Slow API responses are harder to diagnose than failures. The service looks up, but something is wrong. Here's a systematic approach to finding the bottleneck — in your code, your infrastructure, or theirs.
How to Detect When a Third-Party API Is Degraded (Not Just Down)
Full outages are easy to detect. Partial degradation — when a service is responding but not reliably — is harder and more common. Here's how to recognize the signals, and why catching them is harder than it looks.
How to Handle Rate Limiting From Third-Party APIs in Production
Rate limits are one of the most common production failures caused by third-party APIs. Here's how to detect them early, implement proper backoff, and build systems that degrade gracefully when you hit the ceiling.