Rate limits don't announce themselves. One minute your API integration works fine. The next, requests start failing with 429s — or worse, silent throttling that slows responses without returning an error code. By then, something has already broken for your users.
Rate limiting is a design decision by the vendor: they're protecting their infrastructure from overuse, and your account is responsible for staying within bounds. The question is whether you find out when a limit is hit or before.
How Rate Limiting Works
Most APIs implement rate limits at one or more of these levels:
| Level | What it limits | Example |
|---|---|---|
| Per-minute | Requests per minute per account | 60 req/min |
| Per-day | Total requests in a 24-hour window | 10,000 req/day |
| Per-endpoint | Specific endpoints with tighter limits | 5 req/sec on auth |
| Per-user | Limits applied to end-users via your key | OAuth token limits |
| Concurrent | Simultaneous in-flight requests | 10 concurrent |
| Burst | Short burst allowed above sustained rate | 100 req in first 10s |
Most APIs return a 429 Too Many Requests when you exceed a limit, but some return 503 (service unavailable) or a vendor-specific status code. Some don't return an error at all — they just slow responses until you back off.
Reading Rate Limit Headers
Most APIs include response headers that tell you where you stand. Check these on every response:
async function callApiWithRateLimitTracking(url: string, options: RequestInit = {}) {
const response = await fetch(url, options);
// Log rate limit state from response headers
const remaining = response.headers.get('x-ratelimit-remaining');
const limit = response.headers.get('x-ratelimit-limit');
const reset = response.headers.get('x-ratelimit-reset');
const retryAfter = response.headers.get('retry-after');
if (remaining !== null && limit !== null) {
const pctUsed = ((parseInt(limit) - parseInt(remaining)) / parseInt(limit)) * 100;
if (pctUsed > 80) {
console.warn(`[rate-limit] ${url}: ${pctUsed.toFixed(0)}% of limit used (${remaining}/${limit} remaining)`);
}
}
if (response.status === 429) {
const waitSeconds = retryAfter ? parseInt(retryAfter) : (
reset ? Math.max(0, parseInt(reset) - Math.floor(Date.now() / 1000)) : 60
);
throw new RateLimitError(`Rate limited. Retry after ${waitSeconds}s`, waitSeconds);
}
return response;
}
class RateLimitError extends Error {
constructor(message: string, public retryAfterSeconds: number) {
super(message);
this.name = 'RateLimitError';
}
}Common rate-limit headers across major APIs:
| API | Remaining | Reset | Limit |
|---|---|---|---|
| GitHub | x-ratelimit-remaining | x-ratelimit-reset (Unix) | x-ratelimit-limit |
| Stripe | ratelimit-remaining | ratelimit-reset | ratelimit-limit |
| Twilio | x-rate-limit-remaining | — | x-rate-limit-limit |
| Slack | x-ratelimit-limit | — | retry-after on 429 |
| OpenAI | x-ratelimit-remaining-requests | x-ratelimit-reset-requests | x-ratelimit-limit-requests |
Not every API documents these headers. If you don't see them, test by sending a burst of requests and watching when you start getting 429s.
Implementing Exponential Backoff
Never retry immediately after a rate limit. Immediate retries make the situation worse — you hit the limit again, retry again, and create a thundering herd against the vendor's throttle.
The standard approach is exponential backoff with jitter:
async function withRetry<T>(
fn: () => Promise<T>,
options: {
maxRetries?: number;
baseDelayMs?: number;
maxDelayMs?: number;
jitter?: boolean;
} = {}
): Promise<T> {
const {
maxRetries = 3,
baseDelayMs = 1000,
maxDelayMs = 30_000,
jitter = true,
} = options;
let attempt = 0;
while (true) {
try {
return await fn();
} catch (err) {
if (attempt >= maxRetries) throw err;
let delayMs: number;
if (err instanceof RateLimitError) {
// Respect the vendor's retry-after if they gave us one
delayMs = err.retryAfterSeconds * 1000;
} else if (isRetryable(err)) {
// Exponential backoff: 1s, 2s, 4s, 8s...
delayMs = Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs);
} else {
throw err; // Non-retryable error — don't retry
}
// Add jitter to spread retries across multiple instances
if (jitter) {
delayMs = delayMs * (0.5 + Math.random() * 0.5);
}
console.warn(`[retry] Attempt ${attempt + 1}/${maxRetries} failed. Retrying in ${Math.round(delayMs)}ms.`);
await new Promise(resolve => setTimeout(resolve, delayMs));
attempt++;
}
}
}
function isRetryable(err: unknown): boolean {
if (err instanceof RateLimitError) return true;
if (err instanceof TypeError && err.message.includes('fetch')) return true; // Network error
// Add other retryable conditions specific to your vendor
return false;
}Building a Request Queue
For high-volume integrations, individual retry logic isn't enough. You need a queue that smooths your request rate to stay within the vendor's limit.
A simple token bucket implementation:
class RateLimiter {
private tokens: number;
private readonly maxTokens: number;
private readonly refillRate: number; // tokens per millisecond
private lastRefill: number;
constructor({ requestsPerSecond }: { requestsPerSecond: number }) {
this.maxTokens = requestsPerSecond;
this.tokens = requestsPerSecond;
this.refillRate = requestsPerSecond / 1000;
this.lastRefill = Date.now();
}
async acquire(): Promise<void> {
while (true) {
this.refill();
if (this.tokens >= 1) {
this.tokens -= 1;
return;
}
// Wait until we can refill at least one token
const waitMs = (1 - this.tokens) / this.refillRate;
await new Promise(resolve => setTimeout(resolve, Math.ceil(waitMs)));
}
}
private refill(): void {
const now = Date.now();
const elapsed = now - this.lastRefill;
this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
}
// Usage: limit to 10 requests per second
const limiter = new RateLimiter({ requestsPerSecond: 10 });
async function rateLimitedCall(url: string) {
await limiter.acquire(); // Waits if we're at the limit
return fetch(url, { headers: { Authorization: `Bearer ${process.env.API_KEY}` } });
}For production workloads, consider using a battle-tested library like bottleneck (Node.js) rather than a home-rolled implementation. It handles edge cases around concurrent requests, queued work, and priority levels.
Graceful Degradation When Rate Limited
Being rate limited shouldn't mean your users see an error. Build fallback behavior for when the vendor isn't available:
async function getVendorData(id: string): Promise<VendorData | null> {
try {
return await callVendorApi(`/data/${id}`);
} catch (err) {
if (err instanceof RateLimitError) {
// Return cached data if available, or null to indicate unavailable
const cached = await cache.get(`vendor:${id}`);
if (cached) {
console.warn(`[fallback] Rate limited. Returning cached data for ${id}`);
return cached;
}
// Log and surface a graceful degradation to the caller
console.error(`[fallback] Rate limited and no cache. Feature unavailable for ${id}`);
return null;
}
throw err; // Re-throw non-rate-limit errors
}
}Design the calling code to handle null — show a "temporarily unavailable" state rather than a hard error. This is the difference between a degraded user experience and a broken one.
Monitoring Rate Limit Health
You shouldn't find out you hit a rate limit when users start reporting errors. Set up proactive monitoring:
-
Alert at 80% of limit — log a warning when
x-ratelimit-remaining / x-ratelimit-limit < 0.2. This gives you time to react before hitting the ceiling. -
Track 429 rate — if your 429 error rate rises above 0 in a 5-minute window, something changed. Either you're sending more requests or the vendor lowered your limit.
-
Monitor vendor status pages — rate limit changes are sometimes announced as incidents ("degraded performance" or "API rate limiting in effect"). Getting this signal immediately means you know what's happening before your error logs tell you.
// Simple 429 rate tracking
class RateLimitMonitor {
private hitCount = 0;
private windowStart = Date.now();
record429(): void {
this.hitCount++;
const elapsed = (Date.now() - this.windowStart) / 1000;
if (elapsed > 300) { // Reset every 5 minutes
this.hitCount = 1;
this.windowStart = Date.now();
}
if (this.hitCount > 5) {
console.error(`[alert] ${this.hitCount} rate limit hits in last ${Math.round(elapsed)}s`);
// Hook into your alerting system here
}
}
}Where Statusfield Fits
Rate limiting behavior sometimes changes due to vendor incidents — they may throttle more aggressively during degraded states, or raise limits temporarily to clear a backlog. The signal that explains why your rate limit behavior changed is often on their status page.
Statusfield monitors official vendor status pages and delivers alerts when incidents are posted — including degradation events that affect rate limiting behavior. When a vendor tightens their throttles due to infrastructure pressure, Statusfield picks up the incident notice before your 429 rate climbs.
Statusfield's free plan monitors up to 3 services. The Pro plan ($29/month) covers up to 20 services with unlimited email and Slack notifications.
FAQ
What's the difference between a rate limit and a quota? Rate limits restrict how many requests you can make within a short time window — per second or per minute. Quotas restrict total usage over a longer period — per day or per month. A rate limit prevents bursts; a quota prevents sustained high volume. You can be within your quota but still hit a rate limit if you send requests too fast.
Should I use a single API key or separate keys for different parts of my application? If the vendor supports multiple API keys, use separate keys per service or per logical function. This isolates rate limit consumption — a batch job that exhausts the limit doesn't take down your real-time API. It also makes it easier to track which part of your application is consuming the most requests.
What happens if I keep retrying after a 429?
Repeated retries after a 429 without backing off can escalate the problem. Some vendors will temporarily ban your account or lower your limit further if they see persistent hammering. Always respect the Retry-After header if present. If the vendor doesn't provide one, wait at least 60 seconds before retrying.
How do I handle rate limits in a background job that processes thousands of records? Implement a request queue with a controlled rate (token bucket or leaky bucket pattern), process records in batches, and add delays between batches. Calculate your throughput: if you can make 100 requests per minute and need to process 10,000 records, you need at least 100 minutes of processing time. Design the job to be resumable — save progress so it can restart where it left off if interrupted.
Can I request a higher rate limit from the vendor? Yes, most vendors offer higher limits on paid plans or by request. If you consistently operate near your limit, contact vendor support and explain your use case. Many vendors will grant temporary or permanent increases for legitimate production workloads. Budget for the time this takes — it can be a week or more for approval.
Know the moment a tool you depend on goes down
Statusfield watches 2,000+ services your business depends on and alerts you the moment they break.
Free plan · No credit card
Related Articles
How to Debug Slow API Responses: Yours vs. Theirs
Slow API responses are harder to diagnose than failures. The service looks up, but something is wrong. Here's a systematic approach to finding the bottleneck — in your code, your infrastructure, or theirs.
How to Detect When a Third-Party API Is Degraded (Not Just Down)
Full outages are easy to detect. Partial degradation — when a service is responding but not reliably — is harder and more common. Here's how to recognize the signals, and why catching them is harder than it looks.
How to Know If an API Is Down or Your Code Is Broken
When API calls fail, the hardest question is: is it them or is it you? Here's a systematic approach to diagnosing third-party API failures fast, before you waste an hour debugging working code.