May 27, 2026

OpenRouter API chat completions: setup guide

Reading time :  
7
 min
Isha Maggu
Isha Maggu

OpenRouter API chat completions: one endpoint for every LLM

The OpenRouter API at https://openrouter.ai/api/v1/chat/completions is a single endpoint that routes requests to dozens of LLM providers — OpenAI, Anthropic, Google, Meta, Mistral, and more. Instead of managing separate API keys, SDKs, and billing accounts for each provider, you send one request and OpenRouter handles the rest.

As of early 2025, OpenRouter serves over 2 million API requests daily across 200+ models, according to their public stats page. The LLM API market grew 340% year-over-year in 2024 according to a 16z's AI infrastructure report. On CodeWords, you already get native access to OpenAI, Anthropic, and Google Gemini without API keys — but OpenRouter adds access to every other model through a single integration.

Unlike generic AI automation posts, this guide shows real CodeWords workflows — not just theory. You will connect to OpenRouter, route requests to the right model, and build multi-model workflows.

TL;DR

  • https://openrouter.ai/api/v1/chat/completions is OpenAI-compatible — swap the base URL and API key, keep your existing code.
  • OpenRouter routes to 200+ models with unified billing, automatic fallbacks, and per-model pricing.
  • CodeWords workflows can use OpenRouter alongside native LLM access for model comparison, fallback chains, and cost optimization.

What is the OpenRouter API chat completions endpoint?

OpenRouter is an LLM API router. The https://openrouter.ai/api/v1/chat/completions endpoint accepts the same request format as OpenAI's Chat Completions API. This means any code, library, or tool built for OpenAI works with OpenRouter by changing two things: the base URL and the API key.

The endpoint supports:

  • All OpenAI-format parameters: messages, temperature, max_tokens, top_p, stream, tools/function calling
  • Model selection via the model field: Specify "openai/gpt-4o", "anthropic/claude-sonnet-4", "google/gemini-2.5-pro", "meta-llama/llama-3.1-405b", or any other supported model
  • Streaming responses: Server-sent events for real-time token streaming
  • Tool/function calling: Supported on models that offer it natively

The key advantage is access breadth. A single API key and billing account covers models from every major provider, plus open-source models running on third-party infrastructure. The OpenRouter models page lists every available model with current pricing.

How do you connect to https://openrouter.ai/api/v1/chat/completions?

Setup takes under five minutes:

  1. Create an account at openrouter.ai.
  2. Generate an API key from the dashboard.
  3. Make a request:
import openai

client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Explain vector databases in one paragraph."}],
)
print(response.choices[0].message.content)

That is the entire integration. The OpenAI Python SDK works unchanged because OpenRouter mirrors the API schema. The same applies to JavaScript (openai npm package), curl, or any HTTP client.

Authentication: Pass your API key in the Authorization header as Bearer your-key. Optionally include HTTP-Referer and X-Title headers for better analytics in your OpenRouter dashboard.

On CodeWords, you can call OpenRouter from any workflow microservice. The FastAPI Python environment includes the OpenAI SDK pre-installed. Store your OpenRouter key as a workflow secret, and every workflow can access it securely.

How does model routing work?

The model parameter in your request determines where OpenRouter sends it. The format is provider/model-name:

  • openai/gpt-4o → routes to OpenAI
  • anthropic/claude-sonnet-4 → routes to Anthropic
  • google/gemini-2.5-pro → routes to Google
  • meta-llama/llama-3.1-405b-instruct → routes to available inference providers
  • mistralai/mistral-large → routes to Mistral

For open-source models, OpenRouter selects the best available inference provider based on availability, latency, and cost. You can also specify provider preferences using the provider parameter in the request body, documented in the OpenRouter API reference.

Automatic fallbacks: If your primary model is down, OpenRouter can fall back to an alternative. Configure this with the route parameter set to "fallback" and provide an ordered list of models.

Cost-optimized routing: Use route: "lowest-cost" to automatically select the cheapest model that meets your requirements. Useful for high-volume, cost-sensitive workloads like batch classification or data extraction.

How does pricing work compared to direct API access?

OpenRouter adds a small margin on top of provider pricing. For most models, the markup is 0-20%. Some community and open-source models are free to use.

Pricing comparison for common models (as of 2025):

  • GPT-4o: OpenRouter pricing closely tracks OpenAI's published rates — check the OpenRouter models page for current per-token costs
  • Claude Sonnet: Similar markup structure to GPT-4o, priced per input/output token
  • Llama 3.1 405B: Available from multiple providers at varying rates; OpenRouter shows all options
  • Free models: Several models (smaller Llama variants, some Mistral models) are available at zero cost for experimentation

The OpenRouter pricing page shows live per-token costs for every model. You pay per token used — no subscriptions or minimum commitments.

On CodeWords, built-in LLM access (OpenAI, Anthropic, Gemini) is included in platform pricing without separate API costs. OpenRouter is useful when you need models outside that set — Mistral, Llama, Cohere, or experimental providers.

How do you build multi-model workflows with OpenRouter?

The real power is not calling one model — it is orchestrating multiple models in a single workflow. Patterns that work well:

Model comparison pipeline: Send the same prompt to three models, compare outputs, pick the best. Useful for evaluating model quality for a specific task.

models = ["openai/gpt-4o", "anthropic/claude-sonnet-4", "google/gemini-2.5-pro"]
results = []
for model in models:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    )
    results.append({"model": model, "output": response.choices[0].message.content})

Cost-tiered routing: Use a fast, cheap model for classification or triage. Only send complex items to expensive models. A CodeWords workflow can classify incoming support tickets with Llama 3.1 8B (near-free) and route only escalated tickets to Claude or GPT-4o.

Fallback chains: Primary model → secondary model → tertiary model. If OpenAI is down, fall back to Anthropic. If both fail, use a self-hosted Llama instance. OpenRouter handles provider-level fallbacks, and CodeWords handles workflow-level retry logic.

Consensus workflows: Send a critical decision to multiple models, accept only if 2 out of 3 agree. Useful for content moderation, data validation, or any high-stakes classification where hallucination risk matters.

These patterns run natively on CodeWords. The platform's serverless microservices can call OpenRouter in parallel, aggregate results, and route to downstream integrations like Slack, Airtable, or Google Sheets.

What are common errors and how do you handle them?

401 Unauthorized: Invalid or expired API key. Verify the key in your OpenRouter dashboard. On CodeWords, check the workflow secret configuration.

402 Payment Required: Insufficient credits. Top up your OpenRouter balance. Free-tier models still work; paid models require credits.

429 Rate Limited: Too many requests. OpenRouter rate limits vary by model and account tier. Implement exponential backoff or use CodeWords' built-in retry logic.

503 Service Unavailable: The underlying provider is down. Use OpenRouter's fallback routing (route: "fallback") to automatically switch to an alternative model.

Streaming errors: If the connection drops mid-stream, retry the full request. Partial responses cannot be resumed. On CodeWords, streaming is handled at the workflow level — the platform buffers responses and retries on connection failures.

FAQs

Is OpenRouter OpenAI-compatible? Yes. The https://openrouter.ai/api/v1/chat/completions endpoint accepts the same request format as OpenAI's API. Any OpenAI SDK or compatible library works by changing the base URL and API key.

Can I use OpenRouter for production workloads? Yes. OpenRouter serves millions of requests daily and provides uptime SLAs for paid accounts. For mission-critical workloads, combine OpenRouter's provider fallbacks with your own retry logic. On CodeWords, production workflows include built-in error handling.

Does OpenRouter support function calling and tool use? Yes, for models that natively support it (GPT-4o, Claude 3.5+, Gemini). The tool/function calling parameters pass through to the underlying provider. Check the model capabilities page to verify support for specific models.

How does OpenRouter compare to LiteLLM? OpenRouter is a hosted service — no infrastructure to manage. LiteLLM is an open-source proxy you self-host. Choose OpenRouter for simplicity; choose LiteLLM for full control. Both offer OpenAI-compatible interfaces.

One endpoint, every model

The https://openrouter.ai/api/v1/chat/completions endpoint removes the operational overhead of multi-provider LLM access. The teams building the most capable AI workflows are not locked into one provider — they route dynamically based on cost, quality, and availability.

Combine OpenRouter with CodeWords to build workflows that use the right model for each task without managing a fleet of API keys. The templates library includes multi-model patterns ready to customize.

Contents
Ready to try CodeWords?
Get started free
Sign in
Sign in