The future of RAG isn't just about retrieving text—it's about reasoning over structured knowledge.

If you're reading this article, there's a good chance you already know what RAG is. Feel free to skip ahead to the GraphRAG section. But if you want a solid reintroduction to the fundamentals, let's start from the beginning.

What is RAG?

The Context Engineering Challenge

Let's begin with a simple experiment. Send a prompt to an LLM: "Who is Barack Obama?"

LLM answering 'Who is Barack Obama?' — The model successfully answers questions about well-known public figures from its training data

The model responds confidently with detailed information about the former U.S. president. This works because Barack Obama is a well-known public figure, and countless articles and web pages about him were ingested into the model's training data. The information lives in the model's internal "memory," allowing it to provide answers that are nearly 100% accurate—or at least, less prone to hallucination.

Now ask the same model: "Who is Pierre Ange Leundeu?" (that's me, by the way).

LLM unable to answer 'Who is Pierre Ange Leundeu?' — Without RAG, the model cannot answer questions about information not in its training data

The model doesn't know. It had no data about me during training, which explains why it can't answer the question.

This is where RAG intervenes. The idea is to augment a query with new, relevant context about a concept the model doesn't know (or knows poorly). This process unfolds in four steps.

The RAG Pipeline

Step 0: Knowledge Base Construction

The knowledge base can be either dynamic or persistent. In dynamic RAG (also called "in-the-fly" RAG), we don't use a pre-existing database. Instead, we build context on demand—think of web search tools that fetch information in real-time. In persistent RAG, we create and populate a static knowledge database beforehand, which we then use to augment our model.

In both cases, we work with a vector store—temporary in the dynamic case, persistent in the static one. To construct the vector store, we start with raw documents (PDFs, web page extracts, etc.), split them into chunks, transform each chunk into a vector representation (embeddings), and store these vectors. The vector store becomes our repository of knowledge—a collection of vectors representing the context available to answer user questions.

The fundamental question becomes: among all these vectors and chunks, which ones are most relevant for answering a specific question Q? This is where retrieval comes in.

Step 1: Retrieve

Retrieval involves searching the knowledge base to fetch the k (say, 16) chunks that provide the most relevant context for answering question Q. How does this query actually work?

The intuition behind classic (or vanilla) RAG is that a question Q is "similar" to its answer A, which is located in specific portions (chunks) of the contextual documents. "Similar" here means that the distance between the vector representations (embeddings) of the question and the chunk containing the answer is small. Retrieval therefore breaks down into two steps: transform the user's question into a vector, then retrieve the k nearest vectors from the knowledge base based on distance metrics like cosine similarity or dot product.

Step 2: Augment

We take the k textual chunks corresponding to the k nearest vectors and augment the original question with them. This provides context that should be relevant to the question.

Step 3: Generate

We pass the context-augmented prompt to the model, which performs generation with relevant context.

That's RAG in a nutshell. I've obviously made some generalizations of the vanilla process to keep things simple, but this captures the essence.

A Practical Example

To illustrate, let's revisit the question about Pierre Ange Leundeu, but this time activate the "Web Search" tool available in one click on modern chat interfaces.

Web search tool in chat interface — Modern chat interfaces provide web search tools that enable dynamic RAG

Without RAG, the model can't answer. But when we enable in-the-fly RAG using a tool like web search (also called agentic RAG), the application can now provide an answer thanks to the retrieved context.

LLM successfully answering 'Who is Pierre Ange Leundeu?' with RAG — With RAG enabled via web search, the model can now answer questions using retrieved context

Current Limitations of Vanilla RAG

Vanilla RAG is used today in countless enterprises, starting from scattered PDFs of varying complexity, enabling users to extract information and transform it into value. However, vanilla RAG has significant room for improvement. Several optimization methods have emerged to enhance RAG quality, particularly in the retrieval stage.

One fundamental limitation of RAG is the difficulty of extracting relevant information to answer what we call global questions about an entire document or corpus. As explained above, the retrieval step R allows us to recover the most relevant chunks, which works well for specific factual queries.

But consider this question: "What are the main themes in the file?" Which chunks are relevant? All of them? This type of question falls into the category of Query-Focused Summarization (QFS). It's self-explanatory—these are questions with summarization intent that concern the entire document. QFS is a variant of automatic summarization that uses a given query to generate summaries.

RAG is, by definition, designed for explicit retrieval tasks. As such, it's fundamentally ill-suited at least in its vanilla form for this type of question.

Methods for answering QFS existed before RAG, but they were complex, costly, and didn't scale as effectively. They couldn't handle the volume of documents that modern RAG systems can process today.

Another fundamental limitation is that chunks alone provide excellent context for retrieval, but they fall short for reasoning. Agents need to understand how these chunks, included entities, relate to each other and how to reason about their connections. Classic RAG struggles with this because it treats chunks as isolated pieces of information. For multi-hop questions that require gathering information from multiple sources and conducting multi-step reasoning to arrive at a comprehensive answer, the ability to reason over relationships becomes crucial.

To address these limitations and the other use cases I'll discuss later, the community introduced GraphRAG.

What is GraphRAG?

The Intuition

Similarity vs Relationships in RAG — Classic RAG finds similar chunks, but GraphRAG understands relationships between entities - By AWS

GraphRAG represents a fundamental shift from retrieving isolated chunks to reasoning over structured knowledge. While vanilla RAG treats documents as independent text segments, GraphRAG builds a knowledge graph that captures entities, relationships, and the semantic structure of your corpus.

The process begins similarly to RAG: you start with raw documents. But instead of just chunking and vectorizing, GraphRAG extracts entities (people, places, concepts, events) and their relationships from the text. These become nodes and edges in a knowledge graph. The graph is then analyzed to identify communities—clusters of closely related entities. Each community receives a summary that encapsulates the key information within that cluster.

When a query arrives, the system doesn't just retrieve similar chunks. Instead, it identifies relevant entities in the graph, traverses relationships to find connected concepts, retrieves community summaries, and uses this structured context to generate answers. This enables the model to reason about relationships, infer connections that weren't explicitly stated, and answer questions that require understanding the corpus as a whole.

The key insight is that knowledge graphs separate information by different scales: local details (specific facts in chunks) and global patterns (community summaries, entity relationships). This hierarchical structure, reminiscent of Hierarchical Navigable Small World (HNSW) graphs used in advanced vector search, allows GraphRAG to efficiently navigate from broad themes to specific details and back.

Benchmarks and Performance

The evidence for GraphRAG's superiority isn't just theoretical. Real-world benchmarks demonstrate significant improvements over vanilla RAG, especially for complex queries.

According to a benchmark study on knowledge graphs and LLM accuracy, systems leveraging knowledge graphs show substantially higher accuracy rates. LinkedIn's implementation of GraphRAG for customer service question answering demonstrates concrete improvements: a 77.6% increase in MRR (Mean Reciprocal Rank) compared to baseline RAG methods. In production, their system reduced median per-issue resolution time by 28.6% over six months of deployment. The research shows that GraphRAG excels particularly in handling multi-hop questions and complex data formats, producing fewer hallucinations compared to traditional RAG methods.

The performance gap widens when dealing with global queries and really complex ones. Questions like "What are the main themes?" or "How do these concepts relate?" require understanding the entire corpus structure, not just retrieving similar chunks. GraphRAG's community detection and graph traversal capabilities make it uniquely suited for these scenarios.

Do I Really Need GraphRAG?

This is the million-xaf question. The answer depends on your use case, your data, and your requirements.

When Vanilla RAG Suffices

Vanilla RAG works exceptionally well for explicit retrieval tasks where you need to find specific facts or answer questions that can be answered from isolated chunks. If your questions are like "What is the refund policy?" or "What are the specifications of product X?", vanilla RAG will likely serve you perfectly. The setup is simpler, the infrastructure is more mature, and the latency is often lower, and it's way cheaper.

For many enterprises starting their RAG journey, vanilla RAG is the right choice. It's easier to implement, requires less upfront investment in knowledge graph construction, and handles the majority of common queries effectively.

When GraphRAG Becomes Essential

GraphRAG becomes essential when you need to answer questions that require understanding relationships, reasoning across multiple concepts, or gaining insights about your entire corpus. As I discussed in my previous article on Knowledge, Ontologies & Knowledge Graphs, structured knowledge becomes critical when agents need to reason, not just retrieve.

You should consider GraphRAG if:

Your queries are global or require corpus-level understanding. Questions like "What are the main themes in the dataset?" or "How do these concepts relate to each other?" are classic GraphRAG use cases. Vanilla RAG struggles because it can't determine which chunks are relevant when the answer requires understanding the entire document.

You need multi-hop reasoning. If your questions require connecting information across multiple entities or concepts, GraphRAG's graph traversal capabilities enable this reasoning in ways classic RAG cannot. For example:

"What are the dependencies between microservices that have experienced failures in the past month?" — This needs to traverse relationships between services, failure events, and dependency graphs.
"Which research papers cite both Transformer architectures and attention mechanisms, and what are their common limitations?" — This requires finding papers connected to multiple concepts, then reasoning about their intersections.
"How do regulatory changes in one region impact products sold in another region through our supply chain?" — This demands following relationships across regulations, products, regions, and supply chain entities.

In each case, vanilla RAG might retrieve relevant chunks, but it can't systematically traverse the graph of relationships to connect the dots. GraphRAG's structured knowledge graph enables this multi-step reasoning by following edges between entities.

Accuracy and explainability are critical. The structured nature of knowledge graphs allows you to trace answers back to specific entities and relationships, providing transparency that's harder to achieve with pure vector similarity.

Your domain has rich relational structure. If your knowledge base contains many entities with complex relationships (legal documents, medical records, technical documentation with dependencies), GraphRAG can capture and leverage this structure more effectively.

You're building agents that need to reason, not just retrieve. As I explored in the knowledge graph article, the evolution from "Retrieval Augmented Generation" to "Reasoning Augmented Generation" requires structured knowledge. GraphRAG is a step in that direction.

The Hybrid Approach

The choice isn't always binary. Many production systems use a hybrid approach: classic RAG for simple factual queries (fast, efficient) and GraphRAG for complex reasoning tasks (accurate, comprehensive). The system routes queries to the appropriate method based on complexity and requirements.

Making the Decision

Start with classic RAG if you're new to RAG, have straightforward retrieval needs, or want to validate the approach quickly. You can always evolve to GraphRAG later as your requirements become more sophisticated.

Invest in GraphRAG if you're already hitting the limits of vanilla RAG, need to answer global queries, require high accuracy for complex questions, or are building systems where reasoning over relationships is central to the value proposition.

The key insight from both this article and the knowledge graph piece is that the future of RAG isn't just about retrieving text—it's about reasoning over structured knowledge. GraphRAG represents that evolution, but it's not always necessary to start there. Understand your use case, validate with vanilla RAG first, and evolve to GraphRAG when the limitations become apparent.

The journey from RAG to GraphRAG mirrors a broader shift in how we think about AI systems: from simple retrieval to sophisticated reasoning. Classic RAG has democratized access to external knowledge for LLMs, but GraphRAG points toward a future where agents can navigate, infer, and reason over structured knowledge, not just retrieve isolated chunks of text.

As enterprises build more sophisticated AI applications, the question isn't whether to use RAG, but which flavor of RAG serves their needs. For many, classic RAG is the perfect starting point. For others, GraphRAG is already essential. The good news is that both approaches can coexist, and the infrastructure for both continues to mature rapidly.

The real challenge, as always, isn't the technology, it's understanding your use case, your data, and your requirements well enough to make the right choice. Start simple, measure results, and evolve as needed.

PA,