We Built Oracles. We Need Librarians — Why the “R” in RAG Is Harder Than Google Search

The Illusion of a “Solved Problem”

Search has been with us since the Yahoo! days, and for most people it feels “solved.” You type a few words, hit enter, and out comes a list that’s “good enough.” Over time, we’ve learned a habit: search, skim, click, resort. So I get why search seems like a solved problem.

But that’s an illusion. Two quick stories.

I’ve got a friend — let’s call him Frank. He’s been CTO at many multinational tech companies. When we talked about RAG, his take was basically, “Search is solved; just plug in BM25 or embeddings.” I told him it wouldn’t work, citing issues like query-document mismatch and cross-session synthesis. He wouldn’t listen, calling my concerns too theoretical, too academic. It turned out I was right.

Another friend, let’s call him Tom, is a former P8 engineer from Alibaba and an expert in Elasticsearch. When Tom found out I was working on RAG, his reaction was, “What is there to even work on? It’s so low-tech.” But this time, I didn’t have to argue. I just suggested he go down the rabbit hole himself. Because of all people, an expert in search should have known better.

The illusion is real.

In this article, I’ll show why retrieval in RAG is not web search — and why, in many respects, it’s harder. We’ll look at what made web search work, what RAG actually faces, and why the “librarian” matters more than the “oracle.”

The Golden Age of Web Search

Most data is garbage.

In 2023, according to Ahrefs, only 3.45% of the entire web gets traffic from Google. This isn’t just a fact today; it was a fact 30 years ago. For this very reason, Larry Page and Sergey Brin invented the PageRank algorithm to find the gold in all that garbage. That algorithm marked the beginning of the Golden Age of Web Search.

But PageRank was just the first step. It tells you which websites are high-quality, but not necessarily if they’re what you’re looking for. So then came keyword matching (TF−IDF), semantic understanding (neural embeddings), and entity recognition (Knowledge Graph) to go beyond simple string matching. Google and others use these techniques to better understand user intent and bring us the most relevant data possible. We all know how effective they are.

Today’s RAG systems borrow a trick or two from their search engine senpais. Sparse and dense text encodings are used everywhere. Some agent memory libraries like Zep and Mem0 have their own knowledge graph implementations. Solutions like HippoRAG even borrow the idea of PageRank to rank documents in a knowledge base.

This begs the question: if RAG is learning from the master, why does it often feel so brittle? Why is Google Search so robust, while RAG systems can be so frustratingly naive?

The Great Divergence: When the Map Disappears

What makes web search an intrinsically simpler problem is that the web is, well, a web. It possesses an inherent structure, a map built and maintained by millions of people who want to be found.

This map has explicit directions. Websites have a purpose and a topic. Pages are designed with HTML structure, sitemaps, and robots.txt files to guide crawlers. Most importantly, backlinks form a global citation network, telling you what other pages and sites are relevant.

But the map also has implicit signals, learned from decades of human interaction. Google has access to a torrent of user behavior data — billions of clicks, query refinements, and dwell times — that constantly reinforce which paths on the map lead to treasure. It understands domain authority, temporal signals like freshness, and countless other cues that give the map its richness and reliability.

The web is a signal-rich environment.

When it comes to RAG, we’re in hell mode. We don’t have a map; we have a junkyard.

The data isn’t connected. It can be anything from dense legal contracts to chaotic Slack transcripts. There’s no context when you’re lucky and the wrong context when you’re not. And what makes the junkyard even junkier? We often smash our junk into smaller “chunks,” further stripping it of what little context it had.

This brings us back to the Oracle and the Librarian. The LLM is the Oracle — like a sweet and mysterious cookie-baking gramma, it appears to know everything, even when it doesn’t. We build RAG to give this brilliant but unreliable Oracle a meticulous Librarian to ground it in fact.

But how can we expect the Librarian to do their job with honor when we’ve placed them in a junkyard?

Welcome to the Junkyard

The problems in the RAG junkyard are unique. On the query side, user behavior is completely different:

Users talk, they don’t search. They chat with the LLM as if it’s a person, meaning queries are often long, conversational, and lack clear keywords.
Context is assumed, not given. Users expect the LLM to remember past conversations or possess universal knowledge. They’ll say things like, “What was that important thing I told you last week?” Good luck retrieving for that.

On the system side, the challenges are just as daunting:

Context Rot. Even if you can afford to retrieve dozens of documents to ensure you find the right fact (a costly process in itself), you run into a new problem: flooding the LLM with too much context can drown out the correct answer and degrade its reasoning.
Contradictions are everywhere. When you pull disconnected chunks from different documents, they often appear to contradict each other. Without the original structure, the LLM is left to guess which fact is correct.

These are the battles RAG builders fight every day. The “R” is hard not because we lack algorithms, but because algorithms are useless without their favorite partner: data structures. The files and texts we pour into RAG systems lack the metadata, hierarchy, and connectivity of the web.

The Future is Curation, Not Just Search

Faced with this junkyard of problems, the industry is realizing we can’t just invent better retrieval algorithms. We have to become better data curators.

Like any junk yard, the yard itself is innocent; it’s the lack of organization that creates the mess. We’re now seeing the rise of “Context Engineering” as a dedicated discipline. We see companies like Databricks building entire businesses around data processing, governance, and lineage. It took the near-arrival of AGI for us to finally understand that “Big Data” was a myth.

Rich, high-quality, and well-structured data is king.

The algorithmic side of search may be a mature field. The data structure side has just begun.

Originally published on Medium.