May 27, 2026

OpenRouter embedding models: a complete guide

Reading time :  
7
 min
Osman Ramadan
Osman Ramadan

OpenRouter embedding models: choosing the right one for your project

OpenRouter embedding models give you access to multiple embedding providers through a single API endpoint — the same way OpenRouter unifies chat model access across OpenAI, Anthropic, Google, and dozens of others. Instead of managing separate API keys and SDKs for each embedding provider, you route all embedding requests through OpenRouter's unified interface.

The practical value: you can switch between OpenAI's text-embedding-3-small, Cohere's embed-v3, and open source models without changing your integration code. Pricing, rate limits, and availability vary by model — and those differences matter when you are embedding millions of documents for a production RAG pipeline.

According to OpenRouter's 2025 documentation, the platform routes requests to over 200 models across 30+ providers. A 2025 MTEB (Massive Text Embedding Benchmark) study found that embedding model choice affects retrieval accuracy by up to 15% on standard benchmarks — making model selection a meaningful engineering decision, not a cosmetic one.

Unlike generic AI automation posts, this guide shows real CodeWords workflows — not just theory. You will understand which openrouter embedding models fit your use case and how to integrate them.

Related reading: Supabase vector store, Qdrant API key, document loaders, AI automation examples, locally hosted LLM, CodeWords integrations, CodeWords templates.

TL;DR

  • OpenRouter provides a unified API for embedding models from OpenAI, Cohere, Google, and open source providers — one integration, multiple model options.
  • Model selection depends on your use case: text-embedding-3-small for cost-efficient general search, Cohere embed-v3 for multilingual, and open source models for privacy-sensitive workloads.
  • CodeWords can call OpenRouter embedding models directly through its built-in LLM access, then store vectors in Supabase, Qdrant, or any vector database — no API key setup required.

How does the OpenRouter embeddings API work?

OpenRouter's embedding endpoint follows OpenAI's API format. You send a POST request to https://openrouter.ai/api/v1/embeddings with a model identifier and input text. The response contains the embedding vector — an array of floats representing the semantic meaning of the input.

Request format:

{
  "model": "openai/text-embedding-3-small",
  "input": "Your text to embed"
}

The response includes the embedding vector, token usage, and model metadata. Because OpenRouter mirrors OpenAI's API schema, any SDK or library that supports OpenAI embeddings works with OpenRouter by changing the base URL and API key.

Authentication uses a bearer token (your OpenRouter API key). Pricing is pass-through plus a small OpenRouter margin — typically 0–10% above the provider's direct price. The tradeoff: slightly higher per-request cost in exchange for unified billing, simplified key management, and the ability to switch models without code changes.

Which openrouter embedding models should you use?

Model choice depends on four factors: accuracy, cost, dimensions, and language support.

OpenAI text-embedding-3-small (1536 dimensions)

The default choice for most English-language applications. At $0.02 per million tokens (OpenAI's 2025 pricing), it is cheap enough for large-scale indexing. Performance on the MTEB benchmark is strong — top 10 among commercial models for English retrieval tasks.

Best for: General-purpose RAG, semantic search, recommendation engines. The 1536 dimensions balance accuracy and storage cost.

OpenAI text-embedding-3-large (3072 dimensions)

Higher accuracy than the small model, at $0.13 per million tokens. The 3072 dimensions improve retrieval precision, especially on nuanced queries where small differences in meaning matter.

Best for: Applications where retrieval quality directly affects user experience — customer-facing search, legal document retrieval, medical knowledge bases.

Cohere embed-v3

Cohere's multilingual embedding model supports 100+ languages with strong cross-lingual retrieval. If your documents are in one language and queries arrive in another, embed-v3 handles the translation at the embedding level.

Best for: Multilingual RAG, international support knowledge bases, cross-language document search.

Open source models via OpenRouter

Models like BAAI/bge-large-en-v1.5 (1024 dimensions) and sentence-transformers/all-MiniLM-L6-v2 (384 dimensions) are available through OpenRouter's integration with hosted inference providers. Smaller dimensions mean faster queries and less storage, at the cost of some accuracy.

Best for: Cost-sensitive applications, prototyping, and workloads where you need embedding model portability.

How do you integrate openrouter embedding models into a RAG pipeline?

The integration pattern follows three stages: embed, store, retrieve.

Stage 1: Generate embeddings.

For each document chunk, call OpenRouter's embedding endpoint. Batch your requests — most models support arrays of up to 96 inputs per request, reducing HTTP overhead.

In CodeWords, the built-in LLM access handles embedding calls directly. Cody writes the chunking logic (splitting documents by paragraph, sentence, or token count), calls the embedding model, and handles rate limit retries automatically.

Stage 2: Store vectors.

Insert the embedding vectors into your vector database. Common choices:

  • Supabase pgvector — vectors in your existing Postgres database
  • Qdrant — dedicated vector database with advanced filtering
  • Pinecone — fully managed, optimized for high-throughput retrieval
  • Chroma — lightweight, open source, good for prototyping

Each database has its own insertion API. CodeWords connects to all of them through its 500+ integrations.

Stage 3: Retrieve similar documents.

When a user query arrives, embed it using the same model and dimensions you used for indexing (this is critical — mismatched models produce meaningless similarity scores). Query your vector database for the top-k most similar documents. Pass those documents as context to your chat model for response generation.

What are the performance and cost tradeoffs?

Cost at scale. Embedding 1 million documents of ~500 tokens each:

  • text-embedding-3-small: ~$10
  • text-embedding-3-large: ~$65
  • Cohere embed-v3: ~$10
  • Open source via hosted inference: varies, often $5–15

For initial indexing, cost is a one-time expense. The ongoing cost comes from embedding queries (each user search generates one embedding call) and re-indexing when documents change.

Latency. Embedding calls through OpenRouter add ~50–100ms of routing overhead compared to direct API calls. For batch indexing, this is negligible. For real-time search where every millisecond matters, consider whether the unified API convenience is worth the latency.

Dimension tradeoffs. Higher dimensions improve accuracy but increase storage and query time. A table with 1 million rows of 3072-dimensional vectors uses approximately 12 GB of storage. The same table with 1536 dimensions uses ~6 GB. With 384 dimensions (MiniLM), it drops to ~1.5 GB. Choose based on your accuracy requirements and infrastructure budget.

How do you handle model switching and versioning?

One advantage of routing through OpenRouter: switching embedding models is a configuration change, not a code rewrite. Change the model identifier in your API call and the rest of the pipeline works the same.

The catch: you cannot mix embeddings from different models in the same vector space. If you switch from text-embedding-3-small (1536d) to Cohere embed-v3 (1024d), you need to re-embed your entire corpus. The vectors from different models are not comparable — even if the dimensions matched, the semantic spaces are different.

Best practices for model versioning:

  • Store the model identifier alongside each vector in your database
  • When switching models, re-index in a parallel table or collection, verify quality, then swap
  • Use A/B testing to compare retrieval quality between models before fully migrating
  • Pin to specific model versions (e.g., openai/text-embedding-3-small rather than a generic alias) to prevent unexpected behavior from model updates

CodeWords handles re-indexing as a batch processing workflow. Cody builds the pipeline: read from the old collection, generate new embeddings with the new model, write to the new collection, and run a quality comparison.

FAQs

Does OpenRouter charge extra for embeddings? OpenRouter applies a small margin (typically 0–10%) on top of the provider's direct pricing. Check OpenRouter's pricing page for current rates per model.

Can I use OpenRouter embeddings with LangChain? Yes. LangChain's OpenAIEmbeddings class works with OpenRouter by setting the openai_api_base to https://openrouter.ai/api/v1 and using your OpenRouter API key. No custom integration needed.

What is the maximum input length for OpenRouter embedding models? It depends on the underlying model. text-embedding-3-small supports up to 8,191 tokens. Cohere embed-v3 supports up to 512 tokens per input (longer texts are truncated). Always check the specific model's documentation on OpenRouter.

Should I use OpenRouter or call embedding providers directly? Use OpenRouter if you want unified billing, easy model switching, and simplified key management. Call providers directly if you need the lowest possible latency, have negotiated enterprise pricing, or need features specific to one provider's SDK.

Conclusion

OpenRouter embedding models turn model selection from an infrastructure commitment into a configuration decision. The unified API means you can prototype with a cheap model, benchmark against alternatives, and upgrade to a higher-accuracy model — all without rewriting your embedding pipeline.

The implication for teams building AI applications: the embedding model is not a set-and-forget choice. As new models emerge and benchmarks shift, the ability to swap models quickly becomes a competitive advantage. OpenRouter makes that swap cheap.

Build your embedding and RAG pipeline on CodeWords — Cody handles the OpenRouter calls, vector storage, and retrieval logic as a single deployable workflow.

Contents
Ready to try CodeWords?
Get started free
Sign in
Sign in