How to Debug Slow API Responses: Yours vs. Theirs

Slow API responses are harder to diagnose than failures. The service looks up, but something is wrong. Here's a systematic approach to finding the bottleneck — in your code, your infrastructure, or theirs.

·10 min read

A slow API is worse than a down API. When a service is down, the failure is obvious — you get a 503, your monitoring fires, you stop debugging. When a service is slow, you keep investigating. The page loads in 12 seconds instead of 1. Is it your database? Your network? Their CDN? A single slow vendor endpoint pulling the whole request down?

This guide is about resolving that question fast.

Why Latency Is Harder to Diagnose Than Failures

Hard failures have binary signals: the request fails, the error code tells you something, the stack trace points somewhere. Latency is analog. A vendor that normally responds in 80ms is now responding in 2 seconds. That could mean:

  • Their service is degraded but not down (partial outage, specific region, single component)
  • Their infrastructure is healthy but you're hitting a rate limit or throttle
  • The problem is somewhere between you and them — a proxy, a DNS resolver, a CDN node
  • Your code is making more requests than it did before (N+1, missing cache, new code path)
  • Your infrastructure changed (new deployment, different region, cloud provider routing update)

None of these are your code being broken. All of them require different fixes.

Step 1: Isolate Whether the Slowness Is Endpoint-Specific

Before anything else, narrow the scope. One slow endpoint is different from all requests being slow.

Test the same endpoint from:

  • Your production environment
  • Your local machine via curl
  • A separate region or cloud provider (use a VPN or a Lambda function in a different region)
# Time a specific vendor API call
time curl -s -o /dev/null -w "%{time_total}\n" \
  -H "Authorization: Bearer $API_KEY" \
  "https://api.vendor.com/v1/endpoint"
 
# Test from a different location using a public endpoint tester
curl -s "https://ping.canhazip.com" # or use a Lambda in a different AWS region

If the request is fast from your local machine but slow from production, the problem is in your infrastructure or routing. If it's slow from everywhere, it's the vendor.

Step 2: Separate Your P50 From Your P99

Average response times hide the real problem. A vendor might be serving 90% of requests in 80ms and 10% in 8 seconds — your average looks fine, your users don't.

Instrument your vendor calls to capture latency percentiles:

// TypeScript — log percentile-friendly timing data
async function callVendorApi(endpoint: string): Promise<Response> {
  const start = performance.now();
  try {
    const response = await fetch(`https://api.vendor.com${endpoint}`, {
      headers: { Authorization: `Bearer ${process.env.VENDOR_API_KEY}` },
      signal: AbortSignal.timeout(10_000), // hard timeout
    });
    const duration = performance.now() - start;
    console.log(JSON.stringify({
      type: 'vendor_latency',
      endpoint,
      durationMs: Math.round(duration),
      status: response.status,
      ts: new Date().toISOString(),
    }));
    return response;
  } catch (err) {
    const duration = performance.now() - start;
    console.error(JSON.stringify({
      type: 'vendor_error',
      endpoint,
      durationMs: Math.round(duration),
      error: err instanceof Error ? err.message : String(err),
      ts: new Date().toISOString(),
    }));
    throw err;
  }
}

With structured logs, you can query: what's the p95 latency for this vendor over the last hour? If p50 is normal but p95 spiked, it's affecting a subset of requests — often a specific server-side condition on their end.

Step 3: Check the Vendor's Status Page for Degradation

Most status pages distinguish between "down" and "degraded." A service can be processing requests but responding slowly — latency incident, partial outage, database degradation, CDN issue in a specific region.

When you see unexplained latency, check the vendor's status page immediately. Look for:

  • Degraded performance incidents (not just outages)
  • Specific component issues (their API might be healthy but their CDN or auth layer is slow)
  • Regional impact (if your production runs in us-east-1 and they're reporting EU issues, you may be fine — or you might share a backend dependency)
  • Historical pattern — did this happen at the same time last week or month? Some vendors have predictable maintenance windows.

Status pages lag behind the actual incident. A vendor might be internally investigating for 10–20 minutes before posting. If your latency spiked 15 minutes ago and their status page is green, don't stop checking — the incident notice may be in draft.

Step 4: Check for Rate Limiting

Slow responses are sometimes throttled responses. When a vendor starts rate limiting your account, they may not return a 429 immediately — some rate-limit implementations slow down responses before rejecting them entirely.

Check the response headers from the vendor:

curl -v -H "Authorization: Bearer $API_KEY" \
  "https://api.vendor.com/v1/endpoint" 2>&1 | grep -i "rate\|limit\|retry\|x-ratelimit"

Common rate-limit headers:

HeaderMeaning
X-RateLimit-LimitTotal requests allowed in the window
X-RateLimit-RemainingRequests left in the current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds to wait before retrying
X-Rate-Limit-UsedHow many you've used (Stripe, GitHub)

If X-RateLimit-Remaining is near zero, you're close to the limit. If you're not getting these headers, check the vendor documentation — some rate-limit at the account level and don't surface headers on every request.

Step 5: Profile Your Own Request Path

If the vendor looks healthy and you're not hitting rate limits, the slowness might be in your request construction — not the vendor's response.

Things to check on your end:

// Where is time actually being spent?
async function timedVendorCall() {
  const t0 = performance.now();
 
  // DNS resolution + TCP handshake + TLS negotiation
  // (only matters for first request in a session)
  const authToken = await getAuthToken(); // Is this slow?
  const t1 = performance.now();
 
  const payload = buildRequestPayload(); // Any expensive serialization?
  const t2 = performance.now();
 
  const response = await fetch('https://api.vendor.com/v1/data', {
    method: 'POST',
    body: JSON.stringify(payload),
    headers: { Authorization: `Bearer ${authToken}` },
  });
  const t3 = performance.now();
 
  const data = await response.json(); // Large response body parsing?
  const t4 = performance.now();
 
  console.log({
    authMs: t1 - t0,
    buildMs: t2 - t1,
    fetchMs: t3 - t2,
    parseMs: t4 - t3,
    totalMs: t4 - t0,
  });
 
  return data;
}

Common culprits in your own code:

  • Auth token generation on every request instead of caching
  • Creating a new HTTP client per request instead of reusing connections
  • Large payloads that take time to serialize/deserialize
  • Missing connection pooling (new TCP handshake on every call)
  • Synchronous blocking before the async request starts

The Latency Diagnostic Decision Tree

When you see unexplained slow responses from a vendor:

  1. Is it all endpoints or one specific endpoint?

    • One endpoint → likely their side (or your payload for that endpoint)
    • All endpoints → could be network, auth layer, or account-level issue
  2. Is it slow from your infrastructure but fast from your local machine?

    • Yes → check your network path, CDN config, or regional routing
    • No → it's likely the vendor
  3. Is the vendor's status page showing degraded performance?

    • Yes → wait for resolution, enable fallback if you have one
    • No → check rate limit headers, file a support ticket with timestamps
  4. Are you near your rate limit?

    • Yes → implement backoff, reduce request frequency, or request a limit increase
    • No → check your own request construction and connection management
  5. Is the latency percentile distribution skewed (good p50, bad p99)?

    • Yes → likely a specific failure mode on their end affecting a subset of requests
    • No → more consistent issue — look at connection overhead and payload size

Where Statusfield Fits

A vendor can be responding slowly for 20 minutes before they post a status update. During that window, your team is debugging — checking logs, profiling code, reviewing recent deploys — when the real answer is already known on the vendor's side.

Statusfield monitors official vendor status pages and surfaces degradation incidents, not just outages. When a vendor posts a "degraded performance" notice, Statusfield delivers the alert before your on-call engineer has finished reading the error logs.

For teams with production dependencies on third-party APIs, this cuts the mean time to stop debugging and start routing around the problem.

Statusfield's free plan monitors up to 3 services. The Pro plan ($29/month) covers up to 20 services with unlimited email and Slack notifications.

FAQ

How do I tell if a slow response is a network issue between me and the vendor? Run the same request from multiple origins — your production server, your local machine, and a cloud function in a different region. If latency is high from your server but low from your local machine, the issue is in your network path or infrastructure. If it's high from everywhere, the problem is on the vendor's side or a shared CDN layer.

What is a reasonable timeout to set for third-party API calls? It depends on the vendor's documented SLO. For most REST APIs, set a hard timeout at 2–3× the vendor's stated p99 latency. If the vendor documents a p99 of 200ms, a 600ms timeout catches genuine problems while allowing headroom. Never rely on the default timeout in your HTTP client — most are set to 30 seconds or more, which is too long for user-facing requests.

What's the difference between a degraded performance incident and an outage? An outage means the service is returning errors or is completely unreachable. A degraded performance incident means the service is responding but more slowly than normal — often affecting only some users, some regions, or some request types. Degraded incidents are more common than full outages and harder to detect without monitoring, because your error rate may not change, only your latency distribution.

Should I implement a circuit breaker for vendor APIs? Yes, for production systems that depend on third-party APIs for user-facing functionality. A circuit breaker tracks the error rate and latency for a vendor call and opens (stops sending requests) when it exceeds a threshold — which prevents your system from queuing up requests to a degraded vendor. Libraries like cockatiel (Node.js) or resilience4j (JVM) implement this pattern. The circuit should close again after a probe request succeeds.

How do I distinguish between my caching layer being warm vs. the vendor being faster? Run requests that bypass your cache (a cache miss or a cache-busted request) and compare latency directly against the vendor. If cache misses are consistently slow regardless of vendor status, the problem is in your code path. If cache misses are fast when the vendor is healthy and slow when they report degradation, the correlation confirms the vendor is the bottleneck.

What should I include in a vendor latency support ticket? Include: the exact endpoint URL, the time range in UTC when you observed the slowness, representative request and response logs (with timing and status codes), your account identifier, and whether the slowness was isolated to specific regions or present across all your environments. Vendors investigate latency tickets faster when you give them a precise time range and a request ID or trace ID from your logging system.

Know the moment a tool you depend on goes down

Statusfield watches 2,000+ services your business depends on and alerts you the moment they break.

Free plan · No credit card