A third-party API failure doesn't have to mean your app fails. Depending on the vendor and the feature, you have options — degraded modes, cached responses, alternative providers, or graceful error states. Most teams don't implement these until after a painful outage teaches them they should.
This guide covers the practical patterns for building fallbacks before you need them.
Classify Your Third-Party Dependencies First
Not all vendor failures have the same impact. Before designing fallbacks, classify each dependency:
| Dependency type | Example | Fallback viability |
|---|---|---|
| Critical-blocking | Payment processor | Hard (need alternative provider or hold queue) |
| Critical-degraded | Authentication | Medium (cache sessions, allow read-only) |
| Non-critical | Analytics | Easy (drop silently) |
| Enhancement | AI features | Easy (show original content, skip enrichment) |
Start with the critical-blocking category. These are the failures that cause complete feature loss. Everything else matters less.
Pattern 1: Circuit Breaker
A circuit breaker wraps API calls and stops trying after repeated failures, preventing cascading load and giving the vendor time to recover.
type CircuitState = 'closed' | 'open' | 'half-open';
class CircuitBreaker {
private state: CircuitState = 'closed';
private failureCount = 0;
private lastFailureTime = 0;
constructor(
private readonly threshold: number = 5,
private readonly resetTimeoutMs: number = 30_000,
) {}
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
if (Date.now() - this.lastFailureTime > this.resetTimeoutMs) {
this.state = 'half-open';
} else {
throw new Error('Circuit open: upstream unavailable');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failureCount = 0;
this.state = 'closed';
}
private onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.threshold) {
this.state = 'open';
}
}
}
// Usage
const stripeBreaker = new CircuitBreaker(5, 30_000);
async function createPaymentIntent(amount: number) {
return stripeBreaker.call(() =>
stripe.paymentIntents.create({ amount, currency: 'usd' }),
);
}The circuit breaker doesn't provide a fallback — it provides fast failure. When the circuit is open, calls fail immediately rather than timing out after 30 seconds. This protects your users from long waits and reduces retry storms during incidents.
Pattern 2: Stale Cache Fallback
For read operations, serving stale cached data is often better than serving an error. Pricing, product data, user preferences — these change infrequently enough that showing 5-minute-old data is acceptable.
interface CacheEntry<T> {
data: T;
cachedAt: number;
staleAfterMs: number;
}
class StaleWhileRevalidate<T> {
private cache: CacheEntry<T> | null = null;
constructor(private readonly maxStaleAgeMs: number = 300_000) {} // 5 minutes
async get(fetchFn: () => Promise<T>): Promise<{ data: T; stale: boolean }> {
const now = Date.now();
// Cache is fresh — return immediately
if (this.cache && now - this.cache.cachedAt < this.cache.staleAfterMs) {
return { data: this.cache.data, stale: false };
}
try {
const data = await fetchFn();
this.cache = { data, cachedAt: now, staleAfterMs: this.cache?.staleAfterMs ?? 60_000 };
return { data, stale: false };
} catch (error) {
// Fetch failed — return stale data if available and not too old
if (this.cache && now - this.cache.cachedAt < this.maxStaleAgeMs) {
return { data: this.cache.data, stale: true };
}
throw error; // No usable cache — let the error propagate
}
}
}When serving stale data, consider surfacing it to users. A subtle "Prices last updated 4 minutes ago" is better than silently serving outdated information.
Pattern 3: Feature Flag Degradation
When a vendor is down, automatically disable the features that depend on it rather than showing errors:
interface FeatureState {
enabled: boolean;
degradedAt?: number;
reason?: string;
}
class FeatureFlags {
private flags: Map<string, FeatureState> = new Map();
degrade(feature: string, reason: string) {
this.flags.set(feature, {
enabled: false,
degradedAt: Date.now(),
reason,
});
}
restore(feature: string) {
this.flags.set(feature, { enabled: true });
}
isEnabled(feature: string): boolean {
return this.flags.get(feature)?.enabled ?? true;
}
}
const flags = new FeatureFlags();
// When Stripe incident detected
flags.degrade('checkout', 'Payment provider incident in progress');
// In your React component
function CheckoutButton() {
if (!flags.isEnabled('checkout')) {
return (
<div className="checkout-degraded">
Checkout is temporarily unavailable due to a payment provider issue.
We'll update when it resolves.
</div>
);
}
return <button>Proceed to checkout</button>;
}The key is connecting your monitoring to your feature flags. When an incident is detected, degrade the affected features automatically — don't wait for an engineer to do it manually.
Pattern 4: Queue and Retry for Write Operations
For write operations that can tolerate delay, queue them during outages rather than failing:
interface QueuedOperation {
id: string;
type: string;
payload: unknown;
createdAt: number;
attempts: number;
}
class OutageQueue {
private queue: QueuedOperation[] = [];
async enqueue(type: string, payload: unknown): Promise<string> {
const id = crypto.randomUUID();
this.queue.push({
id,
type,
payload,
createdAt: Date.now(),
attempts: 0,
});
return id;
}
async drain(processor: (op: QueuedOperation) => Promise<void>) {
const pending = [...this.queue];
this.queue = [];
for (const op of pending) {
try {
await processor(op);
} catch (error) {
op.attempts++;
if (op.attempts < 3) {
this.queue.push(op); // Re-queue for retry
}
// After 3 attempts, drop or dead-letter
}
}
}
}This pattern works well for analytics events, audit logs, and other non-transactional writes. It doesn't work for payments or auth — users can't wait for those.
Pattern 5: Multi-Provider Failover
For payment processing and transactional email, having a secondary provider is the most robust fallback. The tradeoff is cost and integration maintenance.
async function sendEmail(to: string, subject: string, html: string) {
// Try primary provider
try {
return await sendgrid.send({ to, subject, html });
} catch (primaryError) {
console.error('SendGrid failed, falling back to SES:', primaryError);
}
// Fall back to secondary
try {
return await ses.sendEmail({
Destination: { ToAddresses: [to] },
Message: {
Subject: { Data: subject },
Body: { Html: { Data: html } },
},
Source: '[email protected]',
});
} catch (secondaryError) {
// Both failed — queue for later or alert
throw new Error(`All email providers failed: ${secondaryError}`);
}
}Multi-provider failover requires maintaining integrations, credentials, and feature parity across providers. It's worth it for email (one integration, few features) but harder to justify for payments (significant integration complexity, regulatory considerations).
Connecting Monitoring to Fallback Activation
The gap between incident detection and fallback activation is where outages hurt most. Your monitoring and your fallback logic need to be connected:
// When your monitoring detects an incident
async function onVendorIncident(vendor: string, severity: string) {
if (vendor === 'stripe' && severity === 'critical') {
// Activate payment fallbacks
flags.degrade('checkout', 'Payment provider incident');
flags.degrade('subscription-changes', 'Payment provider incident');
// Alert on-call
await pagerduty.triggerIncident({
title: `Stripe incident detected — checkout degraded`,
severity: 'high',
});
}
}
// When the incident resolves
async function onVendorRecovery(vendor: string) {
if (vendor === 'stripe') {
flags.restore('checkout');
flags.restore('subscription-changes');
}
}This loop — detect, degrade, recover, restore — should be automatic. Manual intervention adds minutes of delay during incidents when every minute matters.
Where Statusfield Fits
The trigger for all of this is early incident detection. The faster you know a vendor is degraded, the faster you can activate fallbacks and communicate with users.
Statusfield monitors official vendor status pages continuously and delivers alerts the moment an incident is posted. When Stripe posts a payment intents degradation, Statusfield routes the alert to your on-call channel — before your error rates have climbed enough to trigger your internal monitors. That lead time is what lets you activate fallbacks before users encounter errors.
The pattern works best when your monitoring, your feature flags, and your incident communication are all connected. Statusfield handles the detection layer; your fallback patterns handle the response.
Know the moment a tool you depend on goes down
Statusfield watches 2,000+ services your business depends on and alerts you the moment they break.
Free plan · No credit card
Related Articles
How to Monitor Third-Party Service Uptime
Your app's reliability depends on services you don't control. Here's what effective third-party uptime monitoring actually requires — so you know about incidents before your users do.
What to Do When a Vendor Has No Status Page
Not every vendor publishes a public status page. Here's how to get visibility into the operational health of dependencies that tell you nothing — and why building that visibility yourself rarely scales.
How to Detect Third-Party Outages Before Your Users Do
Your users are not your monitoring system. Here's how to get detection coverage that surfaces third-party incidents in time to act — before the support tickets arrive.