What is a circuit breaker pattern and when should I use it?

A circuit breaker wraps API calls and stops trying after repeated failures, failing fast instead of waiting for timeouts. Use it for any third-party API in your critical path. It prevents cascading failures and protects your users from long waits during vendor incidents. It is not a fallback by itself — it is fast failure, which you combine with other patterns like caching or feature degradation.

Should I build multi-provider failover for my payment processor?

It depends on your outage tolerance and integration capacity. Multi-provider failover for payments adds significant complexity: two integrations to maintain, regulatory considerations, and feature parity challenges. For most SaaS apps, the better first investment is a hold queue that defers checkout attempts during outages rather than full failover to a secondary processor.

Is it safe to serve stale cached data during a vendor outage?

For read-only data that changes infrequently — pricing, product catalog, user preferences — yes, stale data is almost always better than an error. Set a maximum stale age (typically 5–30 minutes depending on how frequently the data changes) and surface the staleness to users when it matters. Never serve stale data for security-sensitive reads like authentication tokens or permission checks.

How do I automatically activate fallbacks when a vendor incident is detected?

Connect your vendor monitoring alerts to your feature flag system. When an incident is detected for a critical vendor, trigger an automated degradation event that disables the affected features and shows users a clear status message. When the incident resolves, restore the flags automatically. The key is removing the manual step — waiting for an engineer to notice and act adds minutes of unnecessary user impact.

What should I show users when a third-party feature is degraded?

Be direct and specific. 'Checkout is temporarily unavailable due to a payment provider issue. We're monitoring and will update as soon as it resolves.' is better than a generic error. Avoid blaming the vendor by name if you are not certain of the cause. Update the message when you have a resolution time. Remove it promptly when the issue resolves.

How to Build Reliable Fallbacks for Third-Party API Failures

A third-party API failure doesn't have to mean your app fails. Depending on the vendor and the feature, you have options — degraded modes, cached responses, alternative providers, or graceful error states. Most teams don't implement these until after a painful outage teaches them they should.

This guide covers the practical patterns for building fallbacks before you need them.

Classify Your Third-Party Dependencies First

Not all vendor failures have the same impact. Before designing fallbacks, classify each dependency:

Dependency type	Example	Fallback viability
Critical-blocking	Payment processor	Hard (need alternative provider or hold queue)
Critical-degraded	Authentication	Medium (cache sessions, allow read-only)
Non-critical	Analytics	Easy (drop silently)
Enhancement	AI features	Easy (show original content, skip enrichment)

Start with the critical-blocking category. These are the failures that cause complete feature loss. Everything else matters less.

Pattern 1: Circuit Breaker

A circuit breaker wraps API calls and stops trying after repeated failures, preventing cascading load and giving the vendor time to recover.

type CircuitState = 'closed' | 'open' | 'half-open';
 
class CircuitBreaker {
  private state: CircuitState = 'closed';
  private failureCount = 0;
  private lastFailureTime = 0;
 
  constructor(
    private readonly threshold: number = 5,
    private readonly resetTimeoutMs: number = 30_000,
  ) {}
 
  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailureTime > this.resetTimeoutMs) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit open: upstream unavailable');
      }
    }
 
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
 
  private onSuccess() {
    this.failureCount = 0;
    this.state = 'closed';
  }
 
  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.threshold) {
      this.state = 'open';
    }
  }
}
 
// Usage
const stripeBreaker = new CircuitBreaker(5, 30_000);
 
async function createPaymentIntent(amount: number) {
  return stripeBreaker.call(() =>
    stripe.paymentIntents.create({ amount, currency: 'usd' }),
  );
}

The circuit breaker doesn't provide a fallback — it provides fast failure. When the circuit is open, calls fail immediately rather than timing out after 30 seconds. This protects your users from long waits and reduces retry storms during incidents.

Pattern 2: Stale Cache Fallback

For read operations, serving stale cached data is often better than serving an error. Pricing, product data, user preferences — these change infrequently enough that showing 5-minute-old data is acceptable.

interface CacheEntry<T> {
  data: T;
  cachedAt: number;
  staleAfterMs: number;
}
 
class StaleWhileRevalidate<T> {
  private cache: CacheEntry<T> | null = null;
 
  constructor(private readonly maxStaleAgeMs: number = 300_000) {} // 5 minutes
 
  async get(fetchFn: () => Promise<T>): Promise<{ data: T; stale: boolean }> {
    const now = Date.now();
 
    // Cache is fresh — return immediately
    if (this.cache && now - this.cache.cachedAt < this.cache.staleAfterMs) {
      return { data: this.cache.data, stale: false };
    }
 
    try {
      const data = await fetchFn();
      this.cache = { data, cachedAt: now, staleAfterMs: this.cache?.staleAfterMs ?? 60_000 };
      return { data, stale: false };
    } catch (error) {
      // Fetch failed — return stale data if available and not too old
      if (this.cache && now - this.cache.cachedAt < this.maxStaleAgeMs) {
        return { data: this.cache.data, stale: true };
      }
      throw error; // No usable cache — let the error propagate
    }
  }
}

When serving stale data, consider surfacing it to users. A subtle "Prices last updated 4 minutes ago" is better than silently serving outdated information.

Pattern 3: Feature Flag Degradation

When a vendor is down, automatically disable the features that depend on it rather than showing errors:

interface FeatureState {
  enabled: boolean;
  degradedAt?: number;
  reason?: string;
}
 
class FeatureFlags {
  private flags: Map<string, FeatureState> = new Map();
 
  degrade(feature: string, reason: string) {
    this.flags.set(feature, {
      enabled: false,
      degradedAt: Date.now(),
      reason,
    });
  }
 
  restore(feature: string) {
    this.flags.set(feature, { enabled: true });
  }
 
  isEnabled(feature: string): boolean {
    return this.flags.get(feature)?.enabled ?? true;
  }
}
 
const flags = new FeatureFlags();
 
// When Stripe incident detected
flags.degrade('checkout', 'Payment provider incident in progress');
 
// In your React component
function CheckoutButton() {
  if (!flags.isEnabled('checkout')) {
    return (
      <div className="checkout-degraded">
        Checkout is temporarily unavailable due to a payment provider issue.
        We'll update when it resolves.
      </div>
    );
  }
  return <button>Proceed to checkout</button>;
}

The key is connecting your monitoring to your feature flags. When an incident is detected, degrade the affected features automatically — don't wait for an engineer to do it manually.

Pattern 4: Queue and Retry for Write Operations

For write operations that can tolerate delay, queue them during outages rather than failing:

interface QueuedOperation {
  id: string;
  type: string;
  payload: unknown;
  createdAt: number;
  attempts: number;
}
 
class OutageQueue {
  private queue: QueuedOperation[] = [];
 
  async enqueue(type: string, payload: unknown): Promise<string> {
    const id = crypto.randomUUID();
    this.queue.push({
      id,
      type,
      payload,
      createdAt: Date.now(),
      attempts: 0,
    });
    return id;
  }
 
  async drain(processor: (op: QueuedOperation) => Promise<void>) {
    const pending = [...this.queue];
    this.queue = [];
 
    for (const op of pending) {
      try {
        await processor(op);
      } catch (error) {
        op.attempts++;
        if (op.attempts < 3) {
          this.queue.push(op); // Re-queue for retry
        }
        // After 3 attempts, drop or dead-letter
      }
    }
  }
}

This pattern works well for analytics events, audit logs, and other non-transactional writes. It doesn't work for payments or auth — users can't wait for those.

Pattern 5: Multi-Provider Failover

For payment processing and transactional email, having a secondary provider is the most robust fallback. The tradeoff is cost and integration maintenance.

async function sendEmail(to: string, subject: string, html: string) {
  // Try primary provider
  try {
    return await sendgrid.send({ to, subject, html });
  } catch (primaryError) {
    console.error('SendGrid failed, falling back to SES:', primaryError);
  }
 
  // Fall back to secondary
  try {
    return await ses.sendEmail({
      Destination: { ToAddresses: [to] },
      Message: {
        Subject: { Data: subject },
        Body: { Html: { Data: html } },
      },
      Source: '[email protected]',
    });
  } catch (secondaryError) {
    // Both failed — queue for later or alert
    throw new Error(`All email providers failed: ${secondaryError}`);
  }
}

Multi-provider failover requires maintaining integrations, credentials, and feature parity across providers. It's worth it for email (one integration, few features) but harder to justify for payments (significant integration complexity, regulatory considerations).

Connecting Monitoring to Fallback Activation

The gap between incident detection and fallback activation is where outages hurt most. Your monitoring and your fallback logic need to be connected:

// When your monitoring detects an incident
async function onVendorIncident(vendor: string, severity: string) {
  if (vendor === 'stripe' && severity === 'critical') {
    // Activate payment fallbacks
    flags.degrade('checkout', 'Payment provider incident');
    flags.degrade('subscription-changes', 'Payment provider incident');
 
    // Alert on-call
    await pagerduty.triggerIncident({
      title: `Stripe incident detected — checkout degraded`,
      severity: 'high',
    });
  }
}
 
// When the incident resolves
async function onVendorRecovery(vendor: string) {
  if (vendor === 'stripe') {
    flags.restore('checkout');
    flags.restore('subscription-changes');
  }
}

This loop — detect, degrade, recover, restore — should be automatic. Manual intervention adds minutes of delay during incidents when every minute matters.

Where Statusfield Fits

The trigger for all of this is early incident detection. The faster you know a vendor is degraded, the faster you can activate fallbacks and communicate with users.

Statusfield monitors official vendor status pages continuously and delivers alerts the moment an incident is posted. When Stripe posts a payment intents degradation, Statusfield routes the alert to your on-call channel — before your error rates have climbed enough to trigger your internal monitors. That lead time is what lets you activate fallbacks before users encounter errors.

The pattern works best when your monitoring, your feature flags, and your incident communication are all connected. Statusfield handles the detection layer; your fallback patterns handle the response.