Skip to main content
Model Failover automatically switches between LLM providers when one fails, ensuring your agents remain operational even during API outages or rate limits.

Quick Start

1

Configure Auth Profiles

from praisonaiagents import AuthProfile, FailoverManager

# Create profiles for different providers
openai = AuthProfile(
    name="openai",
    provider="openai",
    api_key="sk-...",
    priority=1
)

anthropic = AuthProfile(
    name="anthropic", 
    provider="anthropic",
    api_key="sk-ant-...",
    priority=2
)
2

Setup Failover Manager

from praisonaiagents import FailoverConfig, FailoverManager

config = FailoverConfig(
    max_retries=3,
    retry_delay=1.0,
    exponential_backoff=True
)

manager = FailoverManager(config)
manager.add_profile(openai)
manager.add_profile(anthropic)
3

Use with Agent


How failover activates during retries

Failover now drives LLM retries through direct integration with the retry mechanism:
  • On every LLM call, the system first gets the current profile via get_next_profile() and applies its api_key, base_url, and model settings
  • On success, mark_success(profile) is called to track the working provider
  • On failure, mark_failure(profile, error, is_rate_limit=...) marks the provider as failed, then get_next_profile() fetches the next available provider
  • Profile switching overrides non-retryable classification—one extra attempt is always granted after switching providers
  • The LLM automatically updates request parameters (api_key, base_url, model) when switching between profiles

How It Works

ComponentRole
AuthProfileCredentials for a single provider
FailoverManagerOrchestrates failover logic
FailoverConfigRetry and backoff settings
ProviderStatusTracks provider health

Configuration Options

from praisonaiagents import FailoverConfig

config = FailoverConfig(
    max_retries=3,              # Max retry attempts
    retry_delay=1.0,            # Initial delay (seconds)
    exponential_backoff=True,   # Enable exponential backoff
    max_retry_delay=60.0,       # Max delay between retries
    failover_on_rate_limit=True,# Failover on 429 errors
    failover_on_timeout=True,   # Failover on timeouts
    failover_on_error=True,     # Failover on other errors
)
OptionTypeDefaultDescription
max_retriesint3Maximum retry attempts
retry_delayfloat1.0Initial retry delay
exponential_backoffboolTrueUse exponential backoff
max_retry_delayfloat60.0Maximum retry delay
cooldown_on_rate_limitfloat60.0Rate limit cooldown (seconds)
cooldown_on_errorfloat30.0Error cooldown (seconds)
rotate_on_successboolFalseRotate profiles on success

Auth Profiles

Configure credentials for each provider:
from praisonaiagents import AuthProfile

profile = AuthProfile(
    name="openai-primary",
    provider="openai",
    api_key="sk-...",
    base_url=None,           # Custom endpoint (optional)
    priority=1,              # Lower = higher priority
    weight=1.0,              # For load balancing
    rate_limit=100,          # Requests per minute
    metadata={}              # Custom metadata
)
FieldTypeDescription
namestrUnique profile identifier
providerstrProvider: openai, anthropic, etc.
api_keystrAPI key (masked in logs)
base_urlstrCustom API endpoint
modelstrDefault model for this profile
priorityintFailover priority (lower = higher priority)
rate_limit_rpmintRequests per minute limit
rate_limit_tpmintTokens per minute limit
metadatadictAdditional provider-specific config

Common Patterns

from praisonaiagents import AuthProfile, FailoverManager

manager = FailoverManager()

# Add multiple providers
manager.add_profile(AuthProfile(
    name="openai",
    provider="openai",
    api_key="sk-...",
    priority=1
))

manager.add_profile(AuthProfile(
    name="anthropic",
    provider="anthropic", 
    api_key="sk-ant-...",
    priority=2
))

manager.add_profile(AuthProfile(
    name="groq",
    provider="groq",
    api_key="gsk-...",
    priority=3
))

Failover Callbacks

React to failover events:
from praisonaiagents import FailoverManager, FailoverConfig

def on_failover(from_profile, to_profile, error):
    print(f"Failing over from {from_profile} to {to_profile}")
    print(f"Reason: {error}")
    # Log to monitoring system
    
config = FailoverConfig(
    on_failover=on_failover
)

manager = FailoverManager(config)

Provider Status

Monitor provider health:
from praisonaiagents import FailoverManager

manager = FailoverManager()

# Get status of all providers
status = manager.status()
for name, info in status.items():
    print(f"{name}: {info['status']}")
    print(f"  Failures: {info['failure_count']}")
    print(f"  Last success: {info['last_success']}")

# Reset a provider after recovery
manager.mark_success("openai")

# Reset all profiles
manager.reset_all()

Best Practices

Always have at least 2-3 providers configured. This ensures availability even during major outages.
Enable exponential_backoff=True to avoid hammering providers during issues. This helps you stay within rate limits.
Order providers by cost and reliability. Put cheaper/faster providers first, with premium providers as fallback.
Use the on_failover callback to track when failovers occur. This helps identify provider issues early.

Tool Circuit Breaker

Automatic tool failure protection

Models

Supported LLM providers