🎉 svc-infra v1.0 is here — Production-ready backend infrastructure
What's new
Nfrax Docs logoNfrax Docs

Infrastructure that just works. Ship products, not boilerplate.

Frameworks

  • svc-infra
  • ai-infra
  • fin-infra
  • robo-infra

Resources

  • Getting Started
  • What's New
  • Contributing

Community

  • GitHub

© 2026 nfrax. All rights reserved.

Nfrax Docs logoNfrax Docs
Start HereWhat's New
GitHub
svc-infra

Resilience Patterns

View source

This guide covers retry logic, circuit breakers, and timeout enforcement for building robust services that handle failures gracefully.

Why Resilience?

  • Retry logic handles transient failures (network blips, temporary unavailability)
  • Circuit breakers prevent cascading failures when dependencies are down
  • Timeouts ensure operations don't hang indefinitely

Quick Start

python
from svc_infra.resilience import with_retry, CircuitBreaker, RetryConfig

# Retry with exponential backoff
@with_retry(max_attempts=3, base_delay=0.1)
async def fetch_data():
    return await external_api.get("/data")

# Circuit breaker for failing dependencies
breaker = CircuitBreaker("payment-service", failure_threshold=5)

async with breaker:
    result = await payment_service.charge(amount)

Retry with Exponential Backoff

The with_retry decorator automatically retries failed async operations with configurable backoff.

Basic Usage

python
from svc_infra.resilience import with_retry

@with_retry(max_attempts=3)
async def fetch_user(user_id: str):
    return await api.get(f"/users/{user_id}")

Configuration Options

ParameterDefaultDescription
max_attempts3Total attempts including first try
base_delay0.1Initial delay in seconds
max_delay60.0Maximum delay cap
exponential_base2.0Backoff multiplier
jitter0.1Random jitter factor (0.0-1.0)
retry_on(Exception,)Exception types to retry
on_retryNoneCallback (attempt, exception) -> None

Retry Only on Specific Exceptions

python
from svc_infra.resilience import with_retry

@with_retry(
    max_attempts=5,
    retry_on=(TimeoutError, ConnectionError),
)
async def connect_to_service():
    return await socket.connect()

Retry Callback for Logging/Metrics

python
from svc_infra.resilience import with_retry

def on_retry(attempt: int, exc: Exception) -> None:
    logger.warning(f"Attempt {attempt} failed: {exc}")
    metrics.increment("api.retry", tags={"attempt": attempt})

@with_retry(max_attempts=3, on_retry=on_retry)
async def fetch_data():
    return await api.get("/data")

Sync Function Retry

For synchronous functions, use retry_sync:

python
from svc_infra.resilience import retry_sync

@retry_sync(max_attempts=3, base_delay=0.5)
def fetch_config():
    return requests.get("/config").json()

Using RetryConfig

For reusable configuration across multiple functions:

python
from svc_infra.resilience import RetryConfig, with_retry

api_retry = RetryConfig(
    max_attempts=5,
    base_delay=0.2,
    max_delay=30.0,
    retry_on=(TimeoutError, ConnectionError),
)

# Use config values in decorator
@with_retry(
    max_attempts=api_retry.max_attempts,
    base_delay=api_retry.base_delay,
    max_delay=api_retry.max_delay,
    retry_on=api_retry.retry_on,
)
async def fetch():
    ...

RetryExhaustedError

When all retries fail, RetryExhaustedError is raised:

python
from svc_infra.resilience import with_retry, RetryExhaustedError

@with_retry(max_attempts=3)
async def flaky_operation():
    raise ValueError("Always fails")

try:
    await flaky_operation()
except RetryExhaustedError as e:
    print(f"Failed after {e.attempts} attempts")
    print(f"Last error: {e.last_exception}")

Circuit Breaker

The circuit breaker pattern prevents repeated calls to a failing service, giving it time to recover.

States

code
    ┌─────────┐  failure threshold   ┌──────┐
    │ CLOSED  │ ──────────────────► │ OPEN │
    │ (normal)│                      │(fail)│
    └─────────┘                      └──────┘
         ▲                              │
         │                              │ recovery timeout
         │      success threshold       ▼
         │ ◄──────────────────── ┌───────────┐
         │                       │ HALF_OPEN │
         └────────────────────── │  (test)   │
              failure            └───────────┘
  • CLOSED: Normal operation, calls pass through
  • OPEN: Calls blocked, CircuitBreakerError raised immediately
  • HALF_OPEN: Limited calls allowed to test if service recovered

Basic Usage

python
from svc_infra.resilience import CircuitBreaker

breaker = CircuitBreaker(
    name="payment-api",
    failure_threshold=5,      # Open after 5 failures
    recovery_timeout=30.0,    # Wait 30s before trying again
)

async with breaker:
    result = await payment_api.charge(amount)

Configuration Options

ParameterDefaultDescription
name"default"Name for logging/metrics
failure_threshold5Failures before opening
recovery_timeout30.0Seconds before half-open
half_open_max_calls3Max calls in half-open
success_threshold2Successes to close circuit
failure_exceptions(Exception,)Exception types that count as failures

Using as Decorator

python
from svc_infra.resilience import CircuitBreaker

breaker = CircuitBreaker("external-api", failure_threshold=5)

@breaker.protect
async def call_external():
    return await external_api.get("/data")

Checking Circuit State

python
from svc_infra.resilience import CircuitBreaker, CircuitState

breaker = CircuitBreaker("api")

if breaker.state == CircuitState.OPEN:
    # Fall back to cache or default
    return cached_value

# Normal path
async with breaker:
    return await api.get("/data")

Handling CircuitBreakerError

python
from svc_infra.resilience import CircuitBreaker, CircuitBreakerError

breaker = CircuitBreaker("api")

try:
    async with breaker:
        return await api.call()
except CircuitBreakerError as e:
    logger.warning(f"Circuit {e.name} is {e.state.value}")
    if e.remaining_timeout:
        logger.info(f"Retry in {e.remaining_timeout:.1f}s")
    return fallback_value

Circuit Breaker Statistics

python
from svc_infra.resilience import CircuitBreaker

breaker = CircuitBreaker("api")

# After some operations...
stats = breaker.stats

print(f"Total calls: {stats.total_calls}")
print(f"Successful: {stats.successful_calls}")
print(f"Failed: {stats.failed_calls}")
print(f"Rejected: {stats.rejected_calls}")
print(f"State changes: {stats.state_changes}")

Manual Reset

For testing or manual intervention, you can force-reset the circuit:

python
breaker.reset()  # Force circuit to CLOSED state, clear counters

Warning: Use reset() sparingly in production. It's mainly for testing or emergency recovery.

Combining Retry and Circuit Breaker

For robust external API calls, combine both patterns:

python
from svc_infra.resilience import with_retry, CircuitBreaker, CircuitBreakerError

# Circuit breaker at the service level
api_breaker = CircuitBreaker("external-api", failure_threshold=5)

# Retry for transient failures, but not when circuit is open
@with_retry(
    max_attempts=3,
    retry_on=(TimeoutError, ConnectionError),  # NOT CircuitBreakerError
)
async def fetch_with_retry():
    async with api_breaker:
        return await external_api.get("/data")

async def fetch_data():
    try:
        return await fetch_with_retry()
    except CircuitBreakerError:
        # Circuit is open - use fallback
        return get_cached_data()

Best Practices

1. Set Appropriate Thresholds

python
# High-traffic, latency-sensitive
breaker = CircuitBreaker(
    failure_threshold=10,     # Need more data points
    recovery_timeout=10.0,    # Recover faster
)

# Low-traffic, can wait
breaker = CircuitBreaker(
    failure_threshold=3,      # Open quickly
    recovery_timeout=60.0,    # Longer recovery
)

2. Use Specific Exception Types

python
#  Retries on ALL exceptions (including validation errors)
@with_retry(max_attempts=3)
async def fetch():
    ...

#  Only retry on transient failures
@with_retry(max_attempts=3, retry_on=(TimeoutError, ConnectionError))
async def fetch():
    ...

3. Add Observability

python
def on_retry(attempt: int, exc: Exception) -> None:
    metrics.increment("retry", tags={"attempt": str(attempt)})
    logger.warning(f"Retry {attempt}: {exc}")

@with_retry(max_attempts=3, on_retry=on_retry)
async def fetch():
    ...

4. Don't Retry Non-Idempotent Operations

python
#  Dangerous - may charge multiple times
@with_retry(max_attempts=3)
async def charge_card(amount):
    ...

#  Use idempotency keys instead
async def charge_card(amount, idempotency_key: str):
    return await payment_api.charge(
        amount=amount,
        idempotency_key=idempotency_key,
    )

5. Implement Fallbacks

python
from svc_infra.resilience import CircuitBreaker, CircuitBreakerError

breaker = CircuitBreaker("recommendations")

async def get_recommendations(user_id: str):
    try:
        async with breaker:
            return await recommendation_service.get(user_id)
    except CircuitBreakerError:
        # Graceful degradation
        return get_default_recommendations()

API Reference

Retry

ExportTypeDescription
with_retryDecoratorAsync function retry with backoff
retry_syncDecoratorSync function retry with backoff
RetryConfigDataclassReusable retry configuration
RetryExhaustedErrorExceptionRaised when all retries fail

Circuit Breaker

ExportTypeDescription
CircuitBreakerClassCircuit breaker implementation
CircuitBreakerErrorExceptionRaised when circuit is open
CircuitBreakerStatsDataclassStatistics about circuit usage
CircuitStateEnumCLOSED, OPEN, HALF_OPEN

See Also

  • Timeouts & Resource Limits - Request and handler timeouts
  • Error Handling - Exception hierarchy and patterns
  • Idempotency - Safe retries for non-idempotent operations

On This Page

Resilience PatternsWhy Resilience?Quick StartRetry with Exponential BackoffBasic UsageConfiguration OptionsRetry Only on Specific ExceptionsRetry Callback for Logging/MetricsSync Function RetryUsing RetryConfigRetryExhaustedErrorCircuit BreakerStatesBasic UsageConfiguration OptionsUsing as DecoratorChecking Circuit StateHandling CircuitBreakerErrorCircuit Breaker StatisticsManual ResetCombining Retry and Circuit BreakerBest Practices1. Set Appropriate Thresholds2. Use Specific Exception Types3. Add Observability4. Don't Retry Non-Idempotent Operations5. Implement FallbacksAPI ReferenceRetryCircuit BreakerSee Also