Skip to content

When RAG Gets Smart

When RAG Gets Smart

Standard RAG has a dumb retrieval step. You embed a query, find similar chunks, stuff them into context, and hope the LLM can work with whatever came back. It works surprisingly well for simple questions, but falls apart when the answer requires multiple steps, cross-referencing, or knowing when the first retrieval attempt missed the mark.

Agentic RAG puts an AI agent in charge of retrieval. Instead of a fixed pipeline (embed -> search -> generate), an agent decides what to search for, evaluates what it found, searches again if needed, and pulls from multiple sources. The retrieval becomes a reasoning process, not a lookup.

Why standard RAG breaks down

If you've run a RAG system in production, you've hit these:

  • Wrong chunks retrieved: The query "What's our refund policy for enterprise customers?" pulls chunks about refunds AND chunks about enterprise pricing, but not the chunk that actually covers enterprise refunds.
  • No follow-up capability: If the first retrieval misses, there's no second chance. The model hallucinates or says "I don't know."
  • Can't cross-reference: "Compare Q3 and Q4 revenue" requires pulling from two different sections and combining them. Standard RAG retrieves one set of chunks and hopes both are in there.
  • No source validation: The system can't tell whether retrieved chunks actually answer the question or just look similar.

How agentic RAG works

An agentic RAG system has a few moving parts:

The agent loop

  1. Plan: Given a user question, the agent decides what information it needs and formulates search queries.
  2. Retrieve: The agent runs searches against one or more knowledge bases.
  3. Evaluate: The agent checks if the retrieved information actually answers the question.
  4. Iterate: If the answer is incomplete, the agent reformulates queries and searches again.
  5. Generate: Once the agent has sufficient context, it produces the final answer.

This loop is the key difference. Standard RAG is a single pass. Agentic RAG can take multiple passes, refining its approach each time.

Multi-source retrieval

An agent can search across different backends in a single query:

  • Internal docs in a vector store
  • Structured data via SQL
  • External APIs for real-time data
  • Web search for public information

The agent decides which sources are relevant for each question, rather than searching everything every time.

Memory across queries

Agentic RAG systems can remember context from previous interactions. If you asked about Q3 revenue two minutes ago, the system doesn't need to re-retrieve that data when you ask about Q3 vs Q4. It already has Q3 in memory and only retrieves Q4.

Building it: framework options

A few frameworks make this easier to implement:

  • LangGraph gives you the most control. You define the agent's state machine explicitly -- when to retrieve, when to evaluate, when to retry. Good for production systems where you need predictable behavior.
  • LlamaIndex has built-in agentic retrieval patterns. Their SubQuestionQueryEngine breaks complex queries into sub-questions automatically.
  • CrewAI works well when you want specialized agents (one for retrieval, one for evaluation, one for synthesis) working together.
  • Direct function calling with Claude or GPT-4 is sometimes enough. Give the model retrieval tools and let it decide when and how to use them.

When you actually need this

Agentic RAG adds complexity. Here's when it's worth it:

Use agentic RAG when:

  • Questions regularly require information from multiple documents or sections
  • Users ask multi-step questions ("Compare X to Y", "What changed between reports?")
  • Retrieval accuracy is business-critical (legal, financial, medical)
  • You're dealing with diverse data sources (docs + databases + APIs)

Standard RAG is fine when:

  • Most questions map cleanly to a single chunk
  • You're building a FAQ bot or simple knowledge base search
  • Latency matters more than accuracy (agentic RAG is slower)
  • Your document set is small and well-structured

The cost tradeoff

Agentic RAG uses more LLM calls per query. A standard RAG query might use one LLM call (for generation). An agentic query might use 3-5 calls (planning, evaluation, reformulation, generation). At current API pricing, that's 3-5x the cost per query.

Whether that's worth it depends on what wrong answers cost you. For an internal knowledge base, maybe standard RAG is fine. For a financial compliance tool where a missed document could mean regulatory trouble, the extra cost per query is nothing.

What's next

The line between "RAG system" and "AI agent with access to documents" is blurring. As models get better at tool use and reasoning, the retrieval step becomes just another tool the agent wields. The interesting question isn't really "agentic RAG vs standard RAG" -- it's "how much autonomy should the retrieval process have?"

For most production systems today, a middle ground works well: use an agent loop for evaluation and retry, but keep the retrieval strategy relatively constrained. You get the accuracy benefits of agentic retrieval without the unpredictability of a fully autonomous agent deciding where and how to search.