Language Models

The Limits of RAG for Knowledge Work

Retrieval-augmented generation has become the default approach for knowledge-intensive tasks. But its assumptions about how humans organize and use knowledge may be fundamentally flawed.

Shep Bryan
Shep Bryan
Founder
Data center with glowing connections

Retrieval-augmented generation has become the standard approach for building knowledge-intensive AI applications. The formula is straightforward: embed documents, retrieve relevant chunks when queried, inject them into a language model's context. It works well enough that it's become the default.

But 'well enough' isn't good enough for serious knowledge work. The assumptions underlying RAG may be fundamentally misaligned with how humans actually use knowledge.

Assumption 1: Relevance Is About Similarity

RAG retrieves based on semantic similarity. But when you're working on a strategic decision, the most valuable information often isn't similar to your query—it's complementary, contradictory, or contextual in ways that similarity search can't capture.

The insight that changes your thinking is rarely the one that matches your search terms. It's the unexpected connection, the adjacent idea, the historical parallel you didn't know to look for.

Assumption 2: Chunks Are Independent

Chunking documents for embedding treats knowledge as modular. But meaning is often distributed—an insight in paragraph 3 only makes sense in light of context from paragraph 1. When we retrieve chunks in isolation, we lose the connective tissue that gives them meaning.

Assumption 3: More Context Is Better

As context windows grow, the temptation is to retrieve more. But cognitive science tells us that human attention is limited. Flooding a prompt with marginally relevant information doesn't improve decision quality—it degrades it by obscuring what matters.

Toward Better Architectures

We're exploring alternatives that address these limitations:

  • Graph-structured retrieval that preserves relationships between concepts
  • Hierarchical representations that maintain document coherence
  • Attention-aware retrieval that surfaces what will actually be processed
  • Active retrieval that iteratively refines based on reasoning needs

The goal isn't to replace RAG, but to understand when it works and when we need something more sophisticated. For casual question-answering, RAG is often sufficient. For strategic thinking that requires synthesis across disparate sources, we need architectures that mirror how human cognition actually handles knowledge.

Research by

Shep Bryan
Shep Bryan
Founder

Shep is the founder of Penumbra, building knowledge systems that transform how teams capture, connect, and leverage institutional intelligence for strategic decisions.

Continue reading

Related Research