How to Build Reliable Fallbacks for Third-Party API Failures

When a vendor your app depends on goes down, what happens? If the answer is 'everything breaks,' this guide covers the patterns for building fallbacks that keep your app functional during third-party outages.

·6 min read

A third-party API failure doesn't have to mean your app fails. Depending on the vendor and the feature, you have options — degraded modes, cached responses, alternative providers, or graceful error states. Most teams don't implement these until after a painful outage teaches them they should.

This guide covers the practical patterns for building fallbacks before you need them.

Classify Your Third-Party Dependencies First

Not all vendor failures have the same impact. Before designing fallbacks, classify each dependency:

Dependency typeExampleFallback viability
Critical-blockingPayment processorHard (need alternative provider or hold queue)
Critical-degradedAuthenticationMedium (cache sessions, allow read-only)
Non-criticalAnalyticsEasy (drop silently)
EnhancementAI featuresEasy (show original content, skip enrichment)

Start with the critical-blocking category. These are the failures that cause complete feature loss. Everything else matters less.

Pattern 1: Circuit Breaker

A circuit breaker wraps API calls and stops trying after repeated failures, preventing cascading load and giving the vendor time to recover.

type CircuitState = 'closed' | 'open' | 'half-open';
 
class CircuitBreaker {
  private state: CircuitState = 'closed';
  private failureCount = 0;
  private lastFailureTime = 0;
 
  constructor(
    private readonly threshold: number = 5,
    private readonly resetTimeoutMs: number = 30_000,
  ) {}
 
  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailureTime > this.resetTimeoutMs) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit open: upstream unavailable');
      }
    }
 
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
 
  private onSuccess() {
    this.failureCount = 0;
    this.state = 'closed';
  }
 
  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.threshold) {
      this.state = 'open';
    }
  }
}
 
// Usage
const stripeBreaker = new CircuitBreaker(5, 30_000);
 
async function createPaymentIntent(amount: number) {
  return stripeBreaker.call(() =>
    stripe.paymentIntents.create({ amount, currency: 'usd' }),
  );
}

The circuit breaker doesn't provide a fallback — it provides fast failure. When the circuit is open, calls fail immediately rather than timing out after 30 seconds. This protects your users from long waits and reduces retry storms during incidents.

Pattern 2: Stale Cache Fallback

For read operations, serving stale cached data is often better than serving an error. Pricing, product data, user preferences — these change infrequently enough that showing 5-minute-old data is acceptable.

interface CacheEntry<T> {
  data: T;
  cachedAt: number;
  staleAfterMs: number;
}
 
class StaleWhileRevalidate<T> {
  private cache: CacheEntry<T> | null = null;
 
  constructor(private readonly maxStaleAgeMs: number = 300_000) {} // 5 minutes
 
  async get(fetchFn: () => Promise<T>): Promise<{ data: T; stale: boolean }> {
    const now = Date.now();
 
    // Cache is fresh — return immediately
    if (this.cache && now - this.cache.cachedAt < this.cache.staleAfterMs) {
      return { data: this.cache.data, stale: false };
    }
 
    try {
      const data = await fetchFn();
      this.cache = { data, cachedAt: now, staleAfterMs: this.cache?.staleAfterMs ?? 60_000 };
      return { data, stale: false };
    } catch (error) {
      // Fetch failed — return stale data if available and not too old
      if (this.cache && now - this.cache.cachedAt < this.maxStaleAgeMs) {
        return { data: this.cache.data, stale: true };
      }
      throw error; // No usable cache — let the error propagate
    }
  }
}

When serving stale data, consider surfacing it to users. A subtle "Prices last updated 4 minutes ago" is better than silently serving outdated information.

Pattern 3: Feature Flag Degradation

When a vendor is down, automatically disable the features that depend on it rather than showing errors:

interface FeatureState {
  enabled: boolean;
  degradedAt?: number;
  reason?: string;
}
 
class FeatureFlags {
  private flags: Map<string, FeatureState> = new Map();
 
  degrade(feature: string, reason: string) {
    this.flags.set(feature, {
      enabled: false,
      degradedAt: Date.now(),
      reason,
    });
  }
 
  restore(feature: string) {
    this.flags.set(feature, { enabled: true });
  }
 
  isEnabled(feature: string): boolean {
    return this.flags.get(feature)?.enabled ?? true;
  }
}
 
const flags = new FeatureFlags();
 
// When Stripe incident detected
flags.degrade('checkout', 'Payment provider incident in progress');
 
// In your React component
function CheckoutButton() {
  if (!flags.isEnabled('checkout')) {
    return (
      <div className="checkout-degraded">
        Checkout is temporarily unavailable due to a payment provider issue.
        We'll update when it resolves.
      </div>
    );
  }
  return <button>Proceed to checkout</button>;
}

The key is connecting your monitoring to your feature flags. When an incident is detected, degrade the affected features automatically — don't wait for an engineer to do it manually.

Pattern 4: Queue and Retry for Write Operations

For write operations that can tolerate delay, queue them during outages rather than failing:

interface QueuedOperation {
  id: string;
  type: string;
  payload: unknown;
  createdAt: number;
  attempts: number;
}
 
class OutageQueue {
  private queue: QueuedOperation[] = [];
 
  async enqueue(type: string, payload: unknown): Promise<string> {
    const id = crypto.randomUUID();
    this.queue.push({
      id,
      type,
      payload,
      createdAt: Date.now(),
      attempts: 0,
    });
    return id;
  }
 
  async drain(processor: (op: QueuedOperation) => Promise<void>) {
    const pending = [...this.queue];
    this.queue = [];
 
    for (const op of pending) {
      try {
        await processor(op);
      } catch (error) {
        op.attempts++;
        if (op.attempts < 3) {
          this.queue.push(op); // Re-queue for retry
        }
        // After 3 attempts, drop or dead-letter
      }
    }
  }
}

This pattern works well for analytics events, audit logs, and other non-transactional writes. It doesn't work for payments or auth — users can't wait for those.

Pattern 5: Multi-Provider Failover

For payment processing and transactional email, having a secondary provider is the most robust fallback. The tradeoff is cost and integration maintenance.

async function sendEmail(to: string, subject: string, html: string) {
  // Try primary provider
  try {
    return await sendgrid.send({ to, subject, html });
  } catch (primaryError) {
    console.error('SendGrid failed, falling back to SES:', primaryError);
  }
 
  // Fall back to secondary
  try {
    return await ses.sendEmail({
      Destination: { ToAddresses: [to] },
      Message: {
        Subject: { Data: subject },
        Body: { Html: { Data: html } },
      },
      Source: '[email protected]',
    });
  } catch (secondaryError) {
    // Both failed — queue for later or alert
    throw new Error(`All email providers failed: ${secondaryError}`);
  }
}

Multi-provider failover requires maintaining integrations, credentials, and feature parity across providers. It's worth it for email (one integration, few features) but harder to justify for payments (significant integration complexity, regulatory considerations).

Connecting Monitoring to Fallback Activation

The gap between incident detection and fallback activation is where outages hurt most. Your monitoring and your fallback logic need to be connected:

// When your monitoring detects an incident
async function onVendorIncident(vendor: string, severity: string) {
  if (vendor === 'stripe' && severity === 'critical') {
    // Activate payment fallbacks
    flags.degrade('checkout', 'Payment provider incident');
    flags.degrade('subscription-changes', 'Payment provider incident');
 
    // Alert on-call
    await pagerduty.triggerIncident({
      title: `Stripe incident detected — checkout degraded`,
      severity: 'high',
    });
  }
}
 
// When the incident resolves
async function onVendorRecovery(vendor: string) {
  if (vendor === 'stripe') {
    flags.restore('checkout');
    flags.restore('subscription-changes');
  }
}

This loop — detect, degrade, recover, restore — should be automatic. Manual intervention adds minutes of delay during incidents when every minute matters.

Where Statusfield Fits

The trigger for all of this is early incident detection. The faster you know a vendor is degraded, the faster you can activate fallbacks and communicate with users.

Statusfield monitors official vendor status pages continuously and delivers alerts the moment an incident is posted. When Stripe posts a payment intents degradation, Statusfield routes the alert to your on-call channel — before your error rates have climbed enough to trigger your internal monitors. That lead time is what lets you activate fallbacks before users encounter errors.

The pattern works best when your monitoring, your feature flags, and your incident communication are all connected. Statusfield handles the detection layer; your fallback patterns handle the response.

Know the moment a tool you depend on goes down

Statusfield watches 2,000+ services your business depends on and alerts you the moment they break.

Free plan · No credit card