May 27, 2026

What is retrieval augmented generation (RAG)?

Reading time :  
4
 min
Codewords
Codewords

What is retrieval augmented generation (RAG)?

Retrieval augmented generation — RAG — is the pattern of feeding an LLM relevant data at query time so it answers based on facts instead of memory. The model doesn't know your company's pricing page, your internal docs, or yesterday's sales numbers. RAG retrieves that information from a data source and injects it into the prompt before the model generates a response.

This is how you get AI that answers questions about your data without fine-tuning a model. Unlike generic AI automation posts, this guide shows real CodeWords workflows — not just theory.

Related reading: what are AI function calls, AI workflow automation, document loaders, pinecone vector store, supabase vector store, CodeWords integrations, CodeWords templates.

How RAG works

RAG has two phases that run sequentially for every query:

Retrieval phase. Your system searches a knowledge base — vector database, search index, or even a live web scrape — for documents relevant to the user's question. This typically involves:

  1. Converting the query into a vector embedding
  2. Performing similarity search against pre-embedded documents
  3. Returning the top-k most relevant chunks

Generation phase. The retrieved chunks are inserted into the LLM's prompt as context. The model generates a response grounded in this specific information rather than relying solely on its training data.

A 2024 paper from Meta AI (the team that coined the term) demonstrated that RAG models outperform purely parametric models on knowledge-intensive tasks while requiring far less training compute than fine-tuning.

The key insight: instead of teaching the model new facts (expensive, slow, stale), you give it the facts it needs at the moment it needs them (cheap, fast, current).

Why RAG matters for automation

LLMs hallucinate. They state incorrect facts with confidence. For casual chat, that's annoying. For automated workflows that make business decisions, it's dangerous.

RAG reduces hallucination by anchoring generation in retrieved evidence. A 2025 Stanford study on LLM reliability found that RAG-augmented systems reduced factual errors by 40–60% compared to standalone LLM calls, depending on the domain and retrieval quality.

In CodeWords automation contexts, RAG powers:

Customer support bots that answer questions from your actual knowledge base, not the model's general training data. When a customer asks "What's your refund policy?", RAG retrieves your specific policy document and generates an accurate answer.

Internal research workflows that search across company documents — Google Drive files, Notion pages, Confluence wikis — and synthesize answers grounded in organizational knowledge.

Competitive analysis that retrieves current competitor information via web scraping (Firecrawl) and search APIs, then generates analysis based on today's data, not the model's training cutoff.

Building RAG with CodeWords

CodeWords supports RAG workflows through its native integrations and serverless Python execution:

Vector storage. Connect to Pinecone, Supabase vector, or Qdrant via the platform's 500+ integrations. Store and query embeddings without managing database infrastructure.

Document ingestion. Use Firecrawl for web content, Google Drive for internal docs, or SearchAPI.io for live web results. CodeWords' E2B sandboxes support any Python library — LangChain, LlamaIndex, or raw API calls.

Multi-model generation. Choose the right model for the generation step: OpenAI for speed, Anthropic Claude for nuance, Google Gemini for multimodal inputs. CodeWords provides access to all three without API key setup.

State persistence. RAG workflows that run on schedules — daily knowledge base updates, recurring research reports — use Redis to track what's been indexed and what's changed.

RAG vs. fine-tuning

Both customize model behavior, but they solve different problems:

RAG Fine-tuning

| Best for | | Factual recall from specific docs | | Style, tone, domain patterns |

| Data freshness | | Real-time | | Frozen at training time |

| Cost | | Per-query retrieval cost | | Upfront training cost |

| Maintenance | | Update documents anytime | | Retrain to incorporate new info |

| Hallucination risk | | Lower (grounded in evidence) | | Moderate (still parametric) |

For most automation use cases — support bots, research workflows, data analysis — RAG is the right choice. Fine-tuning makes sense when you need the model to adopt a specific writing style or follow domain conventions that are hard to express in prompts.

Platforms like Zapier and Make offer basic AI nodes but don't natively support vector search or custom retrieval pipelines. n8n has vector store nodes for common providers. CodeWords, running full Python, supports any retrieval strategy — from simple keyword search to hybrid vector + BM25 — with no platform restrictions.

Getting RAG right

Chunk wisely. How you split documents affects retrieval quality. Too large and you dilute relevance. Too small and you lose context. 200–500 token chunks with overlap is a common starting point.

Evaluate retrieval. The generation is only as good as the retrieval. If the wrong chunks come back, the model produces confident wrong answers. Log and review what's being retrieved.

Keep data current. Stale embeddings produce stale answers. Schedule re-indexing in your CodeWords workflow to keep the knowledge base fresh.

RAG is the most practical pattern for getting production-quality AI answers about your specific data — start building at CodeWords.

Contents
Ready to try CodeWords?
Get started free
Sign in
Sign in