How to Handle Rate Limiting From Third-Party APIs in Production

Rate limits are one of the most common production failures caused by third-party APIs. Here's how to detect them early, implement proper backoff, and build systems that degrade gracefully when you hit the ceiling.

·8 min read

Rate limits don't announce themselves. One minute your API integration works fine. The next, requests start failing with 429s — or worse, silent throttling that slows responses without returning an error code. By then, something has already broken for your users.

Rate limiting is a design decision by the vendor: they're protecting their infrastructure from overuse, and your account is responsible for staying within bounds. The question is whether you find out when a limit is hit or before.

How Rate Limiting Works

Most APIs implement rate limits at one or more of these levels:

LevelWhat it limitsExample
Per-minuteRequests per minute per account60 req/min
Per-dayTotal requests in a 24-hour window10,000 req/day
Per-endpointSpecific endpoints with tighter limits5 req/sec on auth
Per-userLimits applied to end-users via your keyOAuth token limits
ConcurrentSimultaneous in-flight requests10 concurrent
BurstShort burst allowed above sustained rate100 req in first 10s

Most APIs return a 429 Too Many Requests when you exceed a limit, but some return 503 (service unavailable) or a vendor-specific status code. Some don't return an error at all — they just slow responses until you back off.

Reading Rate Limit Headers

Most APIs include response headers that tell you where you stand. Check these on every response:

async function callApiWithRateLimitTracking(url: string, options: RequestInit = {}) {
  const response = await fetch(url, options);
 
  // Log rate limit state from response headers
  const remaining = response.headers.get('x-ratelimit-remaining');
  const limit = response.headers.get('x-ratelimit-limit');
  const reset = response.headers.get('x-ratelimit-reset');
  const retryAfter = response.headers.get('retry-after');
 
  if (remaining !== null && limit !== null) {
    const pctUsed = ((parseInt(limit) - parseInt(remaining)) / parseInt(limit)) * 100;
    if (pctUsed > 80) {
      console.warn(`[rate-limit] ${url}: ${pctUsed.toFixed(0)}% of limit used (${remaining}/${limit} remaining)`);
    }
  }
 
  if (response.status === 429) {
    const waitSeconds = retryAfter ? parseInt(retryAfter) : (
      reset ? Math.max(0, parseInt(reset) - Math.floor(Date.now() / 1000)) : 60
    );
    throw new RateLimitError(`Rate limited. Retry after ${waitSeconds}s`, waitSeconds);
  }
 
  return response;
}
 
class RateLimitError extends Error {
  constructor(message: string, public retryAfterSeconds: number) {
    super(message);
    this.name = 'RateLimitError';
  }
}

Common rate-limit headers across major APIs:

APIRemainingResetLimit
GitHubx-ratelimit-remainingx-ratelimit-reset (Unix)x-ratelimit-limit
Striperatelimit-remainingratelimit-resetratelimit-limit
Twiliox-rate-limit-remainingx-rate-limit-limit
Slackx-ratelimit-limitretry-after on 429
OpenAIx-ratelimit-remaining-requestsx-ratelimit-reset-requestsx-ratelimit-limit-requests

Not every API documents these headers. If you don't see them, test by sending a burst of requests and watching when you start getting 429s.

Implementing Exponential Backoff

Never retry immediately after a rate limit. Immediate retries make the situation worse — you hit the limit again, retry again, and create a thundering herd against the vendor's throttle.

The standard approach is exponential backoff with jitter:

async function withRetry<T>(
  fn: () => Promise<T>,
  options: {
    maxRetries?: number;
    baseDelayMs?: number;
    maxDelayMs?: number;
    jitter?: boolean;
  } = {}
): Promise<T> {
  const {
    maxRetries = 3,
    baseDelayMs = 1000,
    maxDelayMs = 30_000,
    jitter = true,
  } = options;
 
  let attempt = 0;
 
  while (true) {
    try {
      return await fn();
    } catch (err) {
      if (attempt >= maxRetries) throw err;
 
      let delayMs: number;
 
      if (err instanceof RateLimitError) {
        // Respect the vendor's retry-after if they gave us one
        delayMs = err.retryAfterSeconds * 1000;
      } else if (isRetryable(err)) {
        // Exponential backoff: 1s, 2s, 4s, 8s...
        delayMs = Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs);
      } else {
        throw err; // Non-retryable error — don't retry
      }
 
      // Add jitter to spread retries across multiple instances
      if (jitter) {
        delayMs = delayMs * (0.5 + Math.random() * 0.5);
      }
 
      console.warn(`[retry] Attempt ${attempt + 1}/${maxRetries} failed. Retrying in ${Math.round(delayMs)}ms.`);
      await new Promise(resolve => setTimeout(resolve, delayMs));
      attempt++;
    }
  }
}
 
function isRetryable(err: unknown): boolean {
  if (err instanceof RateLimitError) return true;
  if (err instanceof TypeError && err.message.includes('fetch')) return true; // Network error
  // Add other retryable conditions specific to your vendor
  return false;
}

Building a Request Queue

For high-volume integrations, individual retry logic isn't enough. You need a queue that smooths your request rate to stay within the vendor's limit.

A simple token bucket implementation:

class RateLimiter {
  private tokens: number;
  private readonly maxTokens: number;
  private readonly refillRate: number; // tokens per millisecond
  private lastRefill: number;
 
  constructor({ requestsPerSecond }: { requestsPerSecond: number }) {
    this.maxTokens = requestsPerSecond;
    this.tokens = requestsPerSecond;
    this.refillRate = requestsPerSecond / 1000;
    this.lastRefill = Date.now();
  }
 
  async acquire(): Promise<void> {
    while (true) {
      this.refill();
      if (this.tokens >= 1) {
        this.tokens -= 1;
        return;
      }
      // Wait until we can refill at least one token
      const waitMs = (1 - this.tokens) / this.refillRate;
      await new Promise(resolve => setTimeout(resolve, Math.ceil(waitMs)));
    }
  }
 
  private refill(): void {
    const now = Date.now();
    const elapsed = now - this.lastRefill;
    this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }
}
 
// Usage: limit to 10 requests per second
const limiter = new RateLimiter({ requestsPerSecond: 10 });
 
async function rateLimitedCall(url: string) {
  await limiter.acquire(); // Waits if we're at the limit
  return fetch(url, { headers: { Authorization: `Bearer ${process.env.API_KEY}` } });
}

For production workloads, consider using a battle-tested library like bottleneck (Node.js) rather than a home-rolled implementation. It handles edge cases around concurrent requests, queued work, and priority levels.

Graceful Degradation When Rate Limited

Being rate limited shouldn't mean your users see an error. Build fallback behavior for when the vendor isn't available:

async function getVendorData(id: string): Promise<VendorData | null> {
  try {
    return await callVendorApi(`/data/${id}`);
  } catch (err) {
    if (err instanceof RateLimitError) {
      // Return cached data if available, or null to indicate unavailable
      const cached = await cache.get(`vendor:${id}`);
      if (cached) {
        console.warn(`[fallback] Rate limited. Returning cached data for ${id}`);
        return cached;
      }
      // Log and surface a graceful degradation to the caller
      console.error(`[fallback] Rate limited and no cache. Feature unavailable for ${id}`);
      return null;
    }
    throw err; // Re-throw non-rate-limit errors
  }
}

Design the calling code to handle null — show a "temporarily unavailable" state rather than a hard error. This is the difference between a degraded user experience and a broken one.

Monitoring Rate Limit Health

You shouldn't find out you hit a rate limit when users start reporting errors. Set up proactive monitoring:

  1. Alert at 80% of limit — log a warning when x-ratelimit-remaining / x-ratelimit-limit < 0.2. This gives you time to react before hitting the ceiling.

  2. Track 429 rate — if your 429 error rate rises above 0 in a 5-minute window, something changed. Either you're sending more requests or the vendor lowered your limit.

  3. Monitor vendor status pages — rate limit changes are sometimes announced as incidents ("degraded performance" or "API rate limiting in effect"). Getting this signal immediately means you know what's happening before your error logs tell you.

// Simple 429 rate tracking
class RateLimitMonitor {
  private hitCount = 0;
  private windowStart = Date.now();
 
  record429(): void {
    this.hitCount++;
    const elapsed = (Date.now() - this.windowStart) / 1000;
 
    if (elapsed > 300) { // Reset every 5 minutes
      this.hitCount = 1;
      this.windowStart = Date.now();
    }
 
    if (this.hitCount > 5) {
      console.error(`[alert] ${this.hitCount} rate limit hits in last ${Math.round(elapsed)}s`);
      // Hook into your alerting system here
    }
  }
}

Where Statusfield Fits

Rate limiting behavior sometimes changes due to vendor incidents — they may throttle more aggressively during degraded states, or raise limits temporarily to clear a backlog. The signal that explains why your rate limit behavior changed is often on their status page.

Statusfield monitors official vendor status pages and delivers alerts when incidents are posted — including degradation events that affect rate limiting behavior. When a vendor tightens their throttles due to infrastructure pressure, Statusfield picks up the incident notice before your 429 rate climbs.

Statusfield's free plan monitors up to 3 services. The Pro plan ($29/month) covers up to 20 services with unlimited email and Slack notifications.

FAQ

What's the difference between a rate limit and a quota? Rate limits restrict how many requests you can make within a short time window — per second or per minute. Quotas restrict total usage over a longer period — per day or per month. A rate limit prevents bursts; a quota prevents sustained high volume. You can be within your quota but still hit a rate limit if you send requests too fast.

Should I use a single API key or separate keys for different parts of my application? If the vendor supports multiple API keys, use separate keys per service or per logical function. This isolates rate limit consumption — a batch job that exhausts the limit doesn't take down your real-time API. It also makes it easier to track which part of your application is consuming the most requests.

What happens if I keep retrying after a 429? Repeated retries after a 429 without backing off can escalate the problem. Some vendors will temporarily ban your account or lower your limit further if they see persistent hammering. Always respect the Retry-After header if present. If the vendor doesn't provide one, wait at least 60 seconds before retrying.

How do I handle rate limits in a background job that processes thousands of records? Implement a request queue with a controlled rate (token bucket or leaky bucket pattern), process records in batches, and add delays between batches. Calculate your throughput: if you can make 100 requests per minute and need to process 10,000 records, you need at least 100 minutes of processing time. Design the job to be resumable — save progress so it can restart where it left off if interrupted.

Can I request a higher rate limit from the vendor? Yes, most vendors offer higher limits on paid plans or by request. If you consistently operate near your limit, contact vendor support and explain your use case. Many vendors will grant temporary or permanent increases for legitimate production workloads. Budget for the time this takes — it can be a week or more for approval.

Know the moment a tool you depend on goes down

Statusfield watches 2,000+ services your business depends on and alerts you the moment they break.

Free plan · No credit card