Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Onwards

Crates.io Documentation GitHub

A Rust-based AI Gateway that provides a unified interface for routing requests to OpenAI-compatible targets. The goal is to be as “transparent” as possible.

Features

  • Unified routing to any OpenAI-compatible provider
  • Hot-reloading configuration with automatic file watching
  • Authentication with global and per-target API keys
  • Rate limiting per-target and per-API-key with token bucket algorithm
  • Concurrency limiting per-target and per-API-key
  • Load balancing with weighted random and priority strategies
  • Automatic failover across multiple providers
  • Strict mode for request validation, response sanitization, and error standardization
  • Response sanitization for OpenAI schema compliance
  • Prometheus metrics for monitoring
  • Custom response headers for pricing and metadata
  • Upstream auth customization for non-standard providers

Quickstart

Create a configuration file

Create a config.json file with your target configurations:

{
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key",
      "onwards_model": "gpt-4"
    },
    "claude-3": {
      "url": "https://api.anthropic.com",
      "onwards_key": "sk-ant-your-anthropic-key"
    },
    "local-model": {
      "url": "http://localhost:8080"
    }
  }
}

Start the gateway

cargo run -- -f config.json

Modifying the file will automatically and atomically reload the configuration. To disable this, set the --watch flag to false.

Send a request

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Configuration

Onwards is configured through a JSON file. Each key in the targets object defines a model alias that clients can request.

Global options

OptionTypeRequiredDescription
strict_modeboolNoEnable strict mode globally for all targets (see Strict Mode). Default: false
authobjectNoGlobal authentication configuration (see Authentication)
targetsobjectYesMap of model aliases to target configurations

Target options

OptionTypeRequiredDescription
urlstringYesBase URL of the AI provider
onwards_keystringNoAPI key to include in requests to the target
onwards_modelstringNoModel name to use when forwarding requests
keysstring[]NoAPI keys required for authentication to this target
rate_limitobjectNoPer-target rate limiting (see Rate Limiting)
concurrency_limitobjectNoPer-target concurrency limiting (see Concurrency Limiting)
upstream_auth_header_namestringNoCustom header name for upstream auth (default: Authorization)
upstream_auth_header_prefixstringNoCustom prefix for upstream auth header value (default: Bearer )
response_headersobjectNoKey-value pairs to add or override in the response headers
sanitize_responseboolNoEnforce strict OpenAI schema compliance for responses only (see Sanitization)
strategystringNoLoad balancing strategy: weighted_random or priority
fallbackobjectNoRetry configuration (see Load Balancing)
providersarrayNoArray of provider configurations for load balancing

Rate limit object

FieldTypeDescription
requests_per_secondfloatNumber of requests allowed per second
burst_sizeintegerMaximum burst size of requests

Concurrency limit object

FieldTypeDescription
max_concurrent_requestsintegerMaximum number of concurrent requests

Auth configuration

The top-level auth object configures global authentication:

FieldTypeDescription
global_keysstring[]Keys that grant access to all authenticated targets
key_definitionsobjectNamed key definitions with per-key rate/concurrency limits

See Authentication for details.

Minimal example

{
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key"
    }
  }
}

Full example

{
  "strict_mode": true,
  "auth": {
    "global_keys": ["global-api-key-1"],
    "key_definitions": {
      "premium_user": {
        "key": "sk-premium-67890",
        "rate_limit": {
          "requests_per_second": 100,
          "burst_size": 200
        },
        "concurrency_limit": {
          "max_concurrent_requests": 10
        }
      }
    }
  },
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key",
      "onwards_model": "gpt-4-turbo",
      "keys": ["premium_user"],
      "rate_limit": {
        "requests_per_second": 50,
        "burst_size": 100
      },
      "concurrency_limit": {
        "max_concurrent_requests": 20
      },
      "response_headers": {
        "Input-Price-Per-Token": "0.0001",
        "Output-Price-Per-Token": "0.0002"
      },
      "sanitize_response": true
    }
  }
}

Command Line Options

FlagDescriptionDefault
--targets <file> / -f <file>Path to configuration fileRequired
--port <port>Port to listen on3000
--watchEnable configuration file watching for hot-reloadingtrue
--metricsEnable Prometheus metrics endpointtrue
--metrics-port <port>Port for Prometheus metrics9090
--metrics-prefix <prefix>Prefix for metric namesonwards

Examples

Start with defaults:

cargo run -- -f config.json

Custom port, metrics disabled:

cargo run -- -f config.json --port 8080 --metrics false

Custom metrics configuration:

cargo run -- -f config.json --metrics-port 9100 --metrics-prefix gateway

API Usage

List available models

Get a list of all configured targets in the OpenAI models format:

curl http://localhost:3000/v1/models

Sending requests

Send requests to the gateway using the standard OpenAI API format:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

The model field determines which target receives the request.

Model override header

Override the target using the model-override header. This routes the request to a different target regardless of the model field in the body:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "model-override: claude-3" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

This is also used for routing requests without bodies – for example, to get the embeddings usage for your organization:

curl -X GET http://localhost:3000/v1/organization/usage/embeddings \
  -H "model-override: claude-3"

Metrics

When the --metrics flag is enabled (the default), Prometheus metrics are exposed on a separate port:

curl http://localhost:9090/metrics

See Command Line Options for metrics configuration flags.

Authentication

Onwards supports bearer token authentication to control access to your AI targets. You can configure authentication keys both globally and per-target.

Global authentication keys

Global keys apply to all targets that have authentication enabled:

{
  "auth": {
    "global_keys": ["global-api-key-1", "global-api-key-2"]
  },
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key",
      "keys": ["target-specific-key"]
    }
  }
}

Per-target authentication

You can specify authentication keys for individual targets:

{
  "targets": {
    "secure-gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key",
      "keys": ["secure-key-1", "secure-key-2"]
    },
    "open-local": {
      "url": "http://localhost:8080"
    }
  }
}

In this example:

  • secure-gpt-4 requires a valid bearer token from the keys array
  • open-local has no authentication requirements

If both global and local keys are supplied, either global or local keys will be valid for accessing models with local keys.

How authentication works

When a target has keys configured, requests must include a valid Authorization: Bearer <token> header where <token> matches one of the configured keys. If global keys are configured, they are automatically added to each target’s key set.

Successful authenticated request:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer secure-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "secure-gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Failed authentication (invalid key):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer wrong-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "secure-gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# Returns: 401 Unauthorized

Failed authentication (missing header):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "secure-gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# Returns: 401 Unauthorized

No authentication required:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "open-local",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# Success - no authentication required for this target

Upstream Authentication

By default, Onwards sends upstream API keys using the standard Authorization: Bearer <key> header format. Some AI providers use different authentication header formats. You can customize both the header name and prefix per target.

Custom header name

Some providers use custom header names for authentication:

{
  "targets": {
    "custom-api": {
      "url": "https://api.custom-provider.com",
      "onwards_key": "your-api-key-123",
      "upstream_auth_header_name": "X-API-Key"
    }
  }
}

This sends: X-API-Key: Bearer your-api-key-123

Custom header prefix

Some providers use different prefixes or no prefix at all:

{
  "targets": {
    "api-with-prefix": {
      "url": "https://api.provider1.com",
      "onwards_key": "token-xyz",
      "upstream_auth_header_prefix": "ApiKey "
    },
    "api-without-prefix": {
      "url": "https://api.provider2.com",
      "onwards_key": "plain-key-456",
      "upstream_auth_header_prefix": ""
    }
  }
}

This sends:

  • To provider1: Authorization: ApiKey token-xyz
  • To provider2: Authorization: plain-key-456

Combining custom name and prefix

You can customize both the header name and prefix:

{
  "targets": {
    "fully-custom": {
      "url": "https://api.custom.com",
      "onwards_key": "secret-key",
      "upstream_auth_header_name": "X-Custom-Auth",
      "upstream_auth_header_prefix": "Token "
    }
  }
}

This sends: X-Custom-Auth: Token secret-key

Default behavior

If these options are not specified, Onwards uses the standard OpenAI-compatible format:

{
  "targets": {
    "standard-api": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-openai-key"
    }
  }
}

This sends: Authorization: Bearer sk-openai-key

Rate Limiting

Onwards supports rate limiting using a token bucket algorithm. You can configure limits per-target and per-API-key.

Per-target rate limiting

Add rate limiting to any target in your config.json:

{
  "targets": {
    "rate-limited-model": {
      "url": "https://api.provider.com",
      "onwards_key": "your-api-key",
      "rate_limit": {
        "requests_per_second": 5.0,
        "burst_size": 10
      }
    }
  }
}

How it works

Each target gets its own token bucket. Tokens are refilled at a rate determined by requests_per_second. The maximum number of tokens in the bucket is determined by burst_size. When the bucket is empty, requests to that target are rejected with a 429 Too Many Requests response.

Examples

// Allow 1 request per second with burst of 5
"rate_limit": {
  "requests_per_second": 1.0,
  "burst_size": 5
}

// Allow 100 requests per second with burst of 200
"rate_limit": {
  "requests_per_second": 100.0,
  "burst_size": 200
}

Rate limiting is optional – targets without rate_limit configuration have no rate limiting applied.

Per-API-key rate limiting

In addition to per-target rate limiting, Onwards supports individual rate limits for different API keys. This allows you to provide different service tiers – for example, basic users might have lower limits while premium users get higher limits.

Configuration

Per-key rate limiting uses a key_definitions section in the auth configuration:

{
  "auth": {
    "global_keys": ["fallback-key"],
    "key_definitions": {
      "basic_user": {
        "key": "sk-user-12345",
        "rate_limit": {
          "requests_per_second": 10,
          "burst_size": 20
        }
      },
      "premium_user": {
        "key": "sk-premium-67890",
        "rate_limit": {
          "requests_per_second": 100,
          "burst_size": 200
        }
      },
      "enterprise_user": {
        "key": "sk-enterprise-abcdef",
        "rate_limit": {
          "requests_per_second": 500,
          "burst_size": 1000
        }
      }
    }
  },
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key",
      "keys": ["basic_user", "premium_user", "enterprise_user", "fallback-key"]
    }
  }
}

Priority order

Rate limits are checked in this order:

  1. Per-key rate limits (if the API key has limits configured)
  2. Per-target rate limits (if the target has limits configured)

If either limit is exceeded, the request returns 429 Too Many Requests.

Usage examples

Basic user request (10/sec limit):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer sk-user-12345" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'

Premium user request (100/sec limit):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer sk-premium-67890" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'

Legacy key (no per-key limits, only target limits apply):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer fallback-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'

Concurrency Limiting

In addition to rate limiting (which controls how fast requests are made), concurrency limiting controls how many requests are processed simultaneously. This is useful for managing resource usage and preventing overload.

Per-target concurrency limiting

Limit the number of concurrent requests to a specific target:

{
  "targets": {
    "resource-limited-model": {
      "url": "https://api.provider.com",
      "onwards_key": "your-api-key",
      "concurrency_limit": {
        "max_concurrent_requests": 5
      }
    }
  }
}

With this configuration, only 5 requests will be processed concurrently for this target. Additional requests will receive a 429 Too Many Requests response until an in-flight request completes.

Per-API-key concurrency limiting

You can set different concurrency limits for different API keys:

{
  "auth": {
    "key_definitions": {
      "basic_user": {
        "key": "sk-user-12345",
        "concurrency_limit": {
          "max_concurrent_requests": 2
        }
      },
      "premium_user": {
        "key": "sk-premium-67890",
        "concurrency_limit": {
          "max_concurrent_requests": 10
        },
        "rate_limit": {
          "requests_per_second": 100,
          "burst_size": 200
        }
      }
    }
  },
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key"
    }
  }
}

Combining rate limiting and concurrency limiting

You can use both rate limiting and concurrency limiting together:

  • Rate limiting controls how fast requests are made over time
  • Concurrency limiting controls how many requests are active at once
{
  "targets": {
    "balanced-model": {
      "url": "https://api.provider.com",
      "onwards_key": "your-api-key",
      "rate_limit": {
        "requests_per_second": 10,
        "burst_size": 20
      },
      "concurrency_limit": {
        "max_concurrent_requests": 5
      }
    }
  }
}

How it works

Concurrency limits use a semaphore-based approach:

  1. When a request arrives, it tries to acquire a permit
  2. If a permit is available, the request proceeds (holding the permit)
  3. If no permits are available, the request is rejected with 429 Too Many Requests
  4. When the request completes, the permit is automatically released

The error response distinguishes between rate limiting and concurrency limiting:

  • Rate limit: "code": "rate_limit"
  • Concurrency limit: "code": "concurrency_limit_exceeded"

Both use HTTP 429 status code for consistency.

Response Headers

Onwards can include custom headers in the response for each target. These can override existing headers or add new ones.

Configuration

{
  "targets": {
    "model-with-headers": {
      "url": "https://api.provider.com",
      "onwards_key": "your-api-key",
      "response_headers": {
        "X-Custom-Header": "custom-value",
        "X-Provider": "my-gateway"
      }
    }
  }
}

Pricing headers

One use of this feature is to set pricing information. If you have a dynamic token price, when a user’s request is accepted the price is agreed and can be recorded in the HTTP headers:

{
  "targets": {
    "priced-model": {
      "url": "https://api.provider.com",
      "onwards_key": "your-api-key",
      "response_headers": {
        "Input-Price-Per-Token": "0.0001",
        "Output-Price-Per-Token": "0.0002"
      }
    }
  }
}

When using load balancing, response headers can be configured at both the pool level and provider level. Provider-level headers take precedence.

Strict Mode

Strict mode provides enhanced security and API compliance by using typed request/response handlers instead of the default wildcard passthrough router. This feature:

  • Validates all requests against OpenAI API schemas before forwarding
  • Sanitizes all responses by removing third-party provider metadata
  • Standardizes error messages to prevent information leakage
  • Ensures model field consistency between requests and responses
  • Supports streaming and non-streaming for all endpoints

This is useful when you need guaranteed API compatibility, security hardening, or protection against third-party response variations.

Enabling strict mode

Strict mode is a global configuration that applies to all targets in your gateway. Add strict_mode: true at the top level of your configuration (not inside individual targets).

{
  "strict_mode": true,
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-openai-key"
    },
    "claude": {
      "url": "https://api.anthropic.com",
      "onwards_key": "sk-ant-key"
    }
  }
}

When enabled, all requests to all targets will use strict mode validation and sanitization.

How it works

When strict_mode: true is enabled:

  1. Request validation: Incoming requests are deserialized through OpenAI schemas. Invalid requests receive immediate 400 Bad Request errors with clear messages.
  2. Response sanitization: Third-party responses are deserialized (automatically dropping unknown fields). If deserialization fails, a standard error is returned - malformed responses are never passed through. The model field is rewritten to match the original request, then re-serialized as clean OpenAI responses with correct Content-Length headers.
  3. Error standardization: Third-party errors are logged internally but never forwarded to clients. Clients receive standardized OpenAI-format errors based only on HTTP status codes.
  4. Streaming support: SSE streams are parsed line-by-line to handle multi-line data events and strip comment lines, each chunk is sanitized, and re-emitted as clean events.

Security benefits

Prevents information leakage:

  • Third-party stack traces, database errors, and debug information are never exposed
  • Error responses contain only standard HTTP status codes and generic messages
  • No provider-specific metadata (trace IDs, internal IDs, costs) reaches clients
  • Malformed provider responses fail closed with standard errors (never leaked)
  • SSE comment lines stripped to prevent metadata leakage in streaming responses

Ensures consistency:

  • Responses always match OpenAI’s API format exactly
  • The model field always reflects what the client requested, not what the provider returned
  • Extra fields like provider, cost, trace_id are automatically dropped

Fast failure:

  • Invalid requests fail immediately with clear, actionable error messages
  • No wasted upstream requests for malformed input
  • Reduces debugging time for integration issues

Error standardization

When strict mode is enabled, all error responses follow OpenAI’s error format exactly:

{
  "error": {
    "message": "Invalid request",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

Status code mapping:

HTTP StatusError TypeMessage
400invalid_request_errorInvalid request
401authentication_errorAuthentication failed
403permission_errorPermission denied
404not_found_errorNot found
429rate_limit_errorRate limit exceeded
500api_errorInternal server error
502api_errorBad gateway
503api_errorService unavailable

Third-party error details are always logged server-side but never sent to clients.

Supported endpoints

Strict mode currently supports:

  • /v1/chat/completions (streaming and non-streaming) - Full sanitization
  • /v1/embeddings - Full sanitization
  • /v1/responses (Open Responses API, non-streaming) - Full sanitization
  • /v1/models - Model listing (no sanitization needed)

All supported endpoints include:

  • Request validation - Invalid requests fail immediately with clear error messages
  • Response sanitization - Third-party metadata automatically removed
  • Model field rewriting - Ensures consistency with client request
  • Error standardization - Third-party error details never exposed

Requests to unsupported endpoints will return 404 Not Found when strict mode is enabled.

Comparison with response sanitization

FeatureResponse SanitizationStrict Mode
Request validation✗ No✓ Yes
Response sanitization✓ Yes✓ Yes
Error standardization✗ No✓ Yes
Endpoint coverage/v1/chat/completions onlyChat, Embeddings, Responses, Models
Router typeWildcard passthroughTyped handlers
Use caseSimple response cleaningProduction security & compliance

Important: When strict mode is enabled globally, the per-target sanitize_response flag is automatically ignored. Strict mode handlers perform complete sanitization themselves, so enabling sanitize_response: true on individual targets has no effect and won’t cause double sanitization.

When to use strict mode:

  • Production deployments requiring security hardening
  • Compliance requirements around error message content
  • Multi-provider setups needing guaranteed response consistency
  • Applications that need request validation before forwarding

When to use response sanitization:

  • Simple use cases where you only need response cleaning
  • Non-security-critical deployments
  • Maximum flexibility with endpoint coverage

Trusted Providers

In strict mode, you can mark providers as trusted to bypass error sanitization while keeping success response sanitization. This is useful when you have providers you fully control (e.g., your own OpenAI account) and want their detailed error messages to help with debugging, while still ensuring response consistency.

Trust can be set at two levels:

  • Pool level (trusted on the target) — default for all providers in the pool
  • Provider level (trusted inside a provider entry) — overrides the pool default for that specific provider

This is the only exception to strict mode’s error standardization guarantees: when a provider is effectively trusted, its errors may be forwarded with full third-party details instead of being standardized.

Configuration

Single-provider (pool-level trusted):

{
  "strict_mode": true,
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-...",
      "trusted": true
    },
    "third-party": {
      "url": "https://some-provider.com",
      "onwards_key": "sk-..."
    }
  }
}

Uniform pool-level trusted:

{
  "strict_mode": true,
  "targets": {
    "gpt-4-pool": {
      "trusted": true,
      "providers": [
        { "url": "https://api.openai.com", "onwards_key": "sk-primary-..." },
        { "url": "https://api.openai.com", "onwards_key": "sk-backup-..." }
      ]
    }
  }
}

Mixed trust within a pool (per-provider override):

{
  "strict_mode": true,
  "targets": {
    "gpt-4": {
      "trusted": false,
      "providers": [
        { "url": "https://internal.example.com", "trusted": true },
        { "url": "https://external.example.com" }
      ]
    }
  }
}

Here, the internal provider’s error responses pass through unchanged. The external provider omits trusted, so it inherits the pool default (false) and has its error responses sanitized. This lets you mix trusted internal infrastructure with untrusted external providers inside a single pool.

Behavior

When a pool is marked as trusted: true:

Success responses (200 OK) are STILL sanitized:

  • Model field IS rewritten to match the client’s request
  • Provider-specific metadata IS removed (costs, trace IDs, custom fields)
  • Response IS validated against OpenAI schemas
  • Content-Length headers ARE updated correctly
  • Streaming responses ARE parsed and sanitized line-by-line

Error responses (4xx, 5xx) bypass sanitization:

  • Original error messages and metadata forwarded to clients
  • Provider-specific error details preserved (stack traces, debug info)
  • Custom error fields passed through unchanged

This allows you to get detailed debugging information from errors while maintaining response consistency for successful requests.

Security Warning

⚠️ Use trusted providers carefully. Marking a provider as trusted bypasses error sanitization for that provider:

What is exposed for trusted providers:

  • Error details and stack traces from the provider
  • Provider-specific error metadata (trace IDs, internal error codes)
  • Debug information in error responses

What is NOT exposed (still sanitized):

  • Success responses are fully sanitized (model rewritten, metadata removed)
  • Provider metadata in successful requests (costs, trace IDs) is still stripped
  • Responses still match OpenAI schema exactly for successful requests

Only mark providers as trusted when you fully control or trust them. This typically means:

  • Your own OpenAI/Anthropic accounts (providers using your API keys)
  • Self-hosted models you operate
  • Internal services you maintain

Do not mark third-party providers as trusted unless you want their detailed error messages exposed to your clients. Trusted providers are designed for debugging your own infrastructure, not for production use with external providers.

Interaction with Model Override Header

Onwards supports the model-override header to route requests to different pools than specified in the request body. Trust is resolved from the provider that actually handles the request (after routing and provider selection), so it correctly reflects the resolved model rather than what the client specified in the body.

This means if a client sends:

  • Request body with "model": "trusted-pool"
  • Header with model-override: untrusted-pool

The request will route to untrusted-pool and sanitization will be applied based on that pool’s provider trust settings, preventing metadata leakage. Clients cannot bypass sanitization by exploiting mismatches between body and header model resolution.

Implementation details

For developers working on the Onwards codebase:

Router architecture:

  • Strict mode uses typed Axum handlers defined in src/strict/handlers.rs
  • Each endpoint has dedicated request/response schema types in src/strict/schemas/
  • Requests are deserialized using serde, which automatically validates structure
  • response_transform_fn is skipped when strict mode is enabled to prevent double sanitization

Response sanitization:

  • Responses are deserialized through strict schemas (extra fields automatically dropped by serde)
  • Malformed responses fail closed with standard errors - never passed through
  • Model field is rewritten to match the original request model
  • Re-serialized to ensure only defined fields are present
  • Content-Length headers updated to match sanitized response size
  • Applies to both non-streaming responses and SSE chunks
  • SSE streams processed line-by-line to handle multi-line events and strip comments

Error handling:

  • Third-party errors are intercepted in sanitize_error_response()
  • Original error logged with error!() macro for server-side debugging
  • Standard error generated based only on HTTP status code
  • OpenAI-compatible format guaranteed via error_response() helper
  • Deserialization failures return standard errors, never leak malformed responses

Trust resolution:

  • target_message_handler resolves effective trust as provider.trusted.unwrap_or(pool.trusted) after provider selection
  • The resolved trust is attached to the response via a ResolvedTrust extension
  • Strict mode handlers read it via ForwardResult.trusted — no separate pool lookup needed
  • Ensures trust reflects the actual provider that handled the request, including after fallback retries

Testing:

  • Request/response schema tests in each schema module
  • Integration tests in src/strict/handlers.rs verify sanitization behavior
  • Tests verify fail-closed behavior on malformed responses (no passthrough)
  • Tests verify SSE multi-line events and comment stripping
  • Tests verify Content-Length header correctness after sanitization
  • Tests verify per-provider trusted overrides pool-level setting in both directions

Response Sanitization

Onwards can enforce strict OpenAI API schema compliance for /v1/chat/completions responses. This feature:

  • Removes provider-specific fields from responses
  • Rewrites the model field to match what the client originally requested
  • Supports both streaming and non-streaming responses
  • Validates responses against OpenAI’s official API schema
  • Sanitizes error responses to prevent upstream provider details from leaking to clients

This is useful when proxying to non-OpenAI providers that add custom fields, or when using onwards_model to rewrite model names upstream.

Note: For production deployments requiring additional security (request validation, error standardization), consider using Strict Mode instead, which includes response sanitization plus comprehensive security features.

Enabling response sanitization

Add sanitize_response: true to any target or provider in your configuration.

Single provider:

{
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-key",
      "onwards_model": "gpt-4-turbo-2024-04-09",
      "sanitize_response": true
    }
  }
}

Pool with multiple providers:

{
  "targets": {
    "gpt-4": {
      "sanitize_response": true,
      "providers": [
        {
          "url": "https://api1.example.com",
          "onwards_key": "sk-key-1"
        },
        {
          "url": "https://api2.example.com",
          "onwards_key": "sk-key-2"
        }
      ]
    }
  }
}

How it works

When sanitize_response: true and a client requests model: gpt-4:

  1. Request sent upstream with model: gpt-4
  2. Upstream responds with custom fields and model: gpt-4-turbo-2024-04-09
  3. Onwards sanitizes:
    • Parses response using OpenAI schema (removes unknown fields)
    • Rewrites model field to gpt-4 (matches original request)
    • Reserializes clean response
  4. Client receives standard OpenAI response with model: gpt-4

Common use cases

Third-party providers (e.g., OpenRouter, Together AI) often add extra fields like provider, native_finish_reason, cost, etc. Sanitization strips these.

Provider comparison – normalize responses from different providers for consistent handling.

Debugging – reduce noise by filtering to only standard OpenAI fields.

Error sanitization

When sanitize_response: true, error responses from upstream providers are also sanitized. This prevents information leakage – upstream error bodies can contain provider names, internal URLs, and model identifiers that you may not want exposed to clients.

How it works

Onwards replaces the upstream error body with a generic OpenAI-compatible error, while preserving the original HTTP status code:

  • 4xx errors are replaced with:
{
  "error": {
    "message": "The upstream provider rejected the request.",
    "type": "invalid_request_error",
    "param": null,
    "code": "upstream_error"
  }
}
  • 5xx errors (and any other non-2xx status) are replaced with:
{
  "error": {
    "message": "An internal error occurred. Please try again later.",
    "type": "internal_error",
    "param": null,
    "code": "internal_error"
  }
}

The original error body is logged at ERROR level (up to 64 KB) for debugging, so operators can still investigate upstream failures without exposing details to clients.

Error format

All Onwards error responses (both sanitized upstream errors and errors generated by Onwards itself) use the OpenAI-compatible {"error": {...}} envelope:

{
  "error": {
    "message": "...",
    "type": "...",
    "param": null,
    "code": "..."
  }
}
FieldDescription
messageHuman-readable error description
typeError category (invalid_request_error, rate_limit_error, internal_error)
paramThe request parameter that caused the error, if applicable
codeMachine-readable error code

Supported endpoints

Currently supports:

  • /v1/chat/completions (streaming and non-streaming)

Load Balancing

Onwards supports load balancing across multiple providers for a single alias, with automatic failover, weighted distribution, and configurable retry behavior.

Configuration

{
  "targets": {
    "gpt-4": {
      "strategy": "weighted_random",
      "fallback": {
        "enabled": true,
        "on_status": [429, 5],
        "on_rate_limit": true
      },
      "providers": [
        { "url": "https://api.openai.com", "onwards_key": "sk-key-1", "weight": 3 },
        { "url": "https://api.openai.com", "onwards_key": "sk-key-2", "weight": 1 }
      ]
    }
  }
}

Strategy

  • weighted_random (default): Distributes traffic randomly based on weights. A provider with weight: 3 receives ~3x the traffic of weight: 1.
  • priority: Always routes to the first provider. Falls through to subsequent providers only when fallback is triggered.

Fallback

Controls automatic retry on other providers when requests fail:

OptionTypeDefaultDescription
enabledboolfalseMaster switch for fallback
on_statusint[]Status codes that trigger fallback (supports wildcards)
on_rate_limitboolfalseFallback when hitting local rate limits

Status code wildcards:

  • 5 matches all 5xx (500-599)
  • 50 matches 500-509
  • 502 matches exact 502

When fallback triggers, the next provider is selected based on strategy (weighted random resamples from remaining pool; priority uses definition order).

Pool-level options

Settings that apply to the entire alias:

OptionDescription
keysAccess control keys for this alias
rate_limitRate limit for all requests to this alias
concurrency_limitMax concurrent requests to this alias
response_headersHeaders added to all responses
strategyweighted_random or priority
fallbackRetry configuration (see above)
providersArray of provider configurations

Provider-level options

Settings specific to each provider:

OptionDescription
urlProvider endpoint URL
onwards_keyAPI key for this provider
onwards_modelModel name override
weightTraffic weight (default: 1)
rate_limitProvider-specific rate limit
concurrency_limitProvider-specific concurrency limit
response_headersProvider-specific headers
trustedOverride pool-level trust for strict mode error sanitization (true/false; omit to inherit from pool)

Examples

Primary/backup failover

{
  "targets": {
    "gpt-4": {
      "strategy": "priority",
      "fallback": { "enabled": true, "on_status": [5], "on_rate_limit": true },
      "providers": [
        { "url": "https://primary.example.com", "onwards_key": "sk-primary" },
        { "url": "https://backup.example.com", "onwards_key": "sk-backup" }
      ]
    }
  }
}

Multiple API keys with pool-level rate limit

{
  "targets": {
    "gpt-4": {
      "rate_limit": { "requests_per_second": 100, "burst_size": 200 },
      "providers": [
        { "url": "https://api.openai.com", "onwards_key": "sk-key-1" },
        { "url": "https://api.openai.com", "onwards_key": "sk-key-2" }
      ]
    }
  }
}

Backwards compatibility

Single-provider configs still work unchanged:

{
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-key"
    }
  }
}

Contributing

Testing

Run the test suite:

cargo test

Release process

This project uses automated releases through release-plz.

How releases work

  1. Make changes using conventional commits:

    • feat: for new features (minor version bump)
    • fix: for bug fixes (patch version bump)
    • feat!: or fix!: for breaking changes (major version bump)
  2. Create a pull request with your changes

  3. Merge the PR – this triggers the release-plz workflow

  4. Release PR appears – release-plz automatically creates a PR with:

    • Updated version in Cargo.toml
    • Generated changelog
    • All changes since last release
  5. Review and merge the release PR

  6. Automated publishing – when the release PR is merged:

    • release-plz publishes the crate to crates.io
    • Creates a GitHub release with changelog

Conventional commit examples

feat: add new proxy authentication method
fix: resolve connection timeout issues
docs: update API documentation
chore: update dependencies
feat!: change configuration file format (BREAKING CHANGE)

The release workflow automatically handles version bumping and publishing based on your commit messages.