Load Balancing

Onwards supports load balancing across multiple providers for a single alias, with automatic failover, weighted distribution, and configurable retry behavior.

Configuration

{
  "targets": {
    "gpt-4": {
      "strategy": "weighted_random",
      "fallback": {
        "enabled": true,
        "on_status": [429, 5],
        "on_rate_limit": true
      },
      "providers": [
        { "url": "https://api.openai.com", "onwards_key": "sk-key-1", "weight": 3 },
        { "url": "https://api.openai.com", "onwards_key": "sk-key-2", "weight": 1 }
      ]
    }
  }
}

Strategy

weighted_random (default): Distributes traffic randomly based on weights. A provider with weight: 3 receives ~3x the traffic of weight: 1.
priority: Always routes to the first provider. Falls through to subsequent providers only when fallback is triggered.

Fallback

Controls automatic retry on other providers when requests fail:

Option	Type	Default	Description
`enabled`	bool	`false`	Master switch for fallback
`on_status`	int[]	–	Status codes that trigger fallback (supports wildcards)
`on_rate_limit`	bool	`false`	Fallback when hitting local rate limits

Status code wildcards:

5 matches all 5xx (500-599)
50 matches 500-509
502 matches exact 502

When fallback triggers, the next provider is selected based on strategy (weighted random resamples from remaining pool; priority uses definition order).

Pool-level options

Settings that apply to the entire alias:

Option	Description
`keys`	Access control keys for this alias
`rate_limit`	Rate limit for all requests to this alias
`concurrency_limit`	Max concurrent requests to this alias
`response_headers`	Headers added to all responses
`strategy`	`weighted_random` or `priority`
`fallback`	Retry configuration (see above)
`providers`	Array of provider configurations

Provider-level options

Settings specific to each provider:

Option	Description
`url`	Provider endpoint URL
`onwards_key`	API key for this provider
`onwards_model`	Model name override
`weight`	Traffic weight (default: 1)
`rate_limit`	Provider-specific rate limit
`concurrency_limit`	Provider-specific concurrency limit
`response_headers`	Provider-specific headers
`trusted`	Override pool-level trust for strict mode error sanitization (`true`/`false`; omit to inherit from pool)

Examples

Primary/backup failover

{
  "targets": {
    "gpt-4": {
      "strategy": "priority",
      "fallback": { "enabled": true, "on_status": [5], "on_rate_limit": true },
      "providers": [
        { "url": "https://primary.example.com", "onwards_key": "sk-primary" },
        { "url": "https://backup.example.com", "onwards_key": "sk-backup" }
      ]
    }
  }
}

Multiple API keys with pool-level rate limit

{
  "targets": {
    "gpt-4": {
      "rate_limit": { "requests_per_second": 100, "burst_size": 200 },
      "providers": [
        { "url": "https://api.openai.com", "onwards_key": "sk-key-1" },
        { "url": "https://api.openai.com", "onwards_key": "sk-key-2" }
      ]
    }
  }
}

Backwards compatibility

Single-provider configs still work unchanged:

{
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-key"
    }
  }
}

Keyboard shortcuts

Onwards