Beyond traditional vector search: How page-level retrievalaugmented generation can help

Rob Donnelly

Global Analyst & Advisor Relations Leader, PwC United States

Email

Brian Levy

Global Deals Industries Leader, PwC United States

Email

Introduction

Retrieval-augmented generation (RAG) is a method which improves language model responses by retrieving relevant external information before generating an answer.

It has become the go-to method for grounding language model outputs in real-world information since it directly addresses the problem of hallucination which often impacts standalone large language models (LLMs). RAGs are widely applied in enterprise search, document question and answers, customer support, legal research, and healthcare systems where accuracy and updated information is required. However, standard implementation of RAG has already started showing cracks. Current pipelines split documents into chunks, convert them into numerical embeddings, and store them in vector databases. In practice, this breaks tables apart from headings, separates paragraphs from their parent sections, and relies on embedding models which were never trained for specialized fields like law or medicine. Even when the system retrieves the closest chunks, closeness in the vector space does not always mean closeness in meaning. For example, a medical research paper where a dosage recommendation buried in the results section only makes sense alongside the patient criteria defined earlier in the methodology. A standard RAG pipeline, blind to these dependencies, retrieves one fragment and discards the other, thus the model is left reasoning over half the picture. A page-level approach preserves that connection, retrieving both sections together as intended.

Since RAG operates at the page level and leverages the language model’s own reasoning to drive retrieval, it is gaining popularity compared to other retrieval models. This article examines where RAG outperforms traditional vectorbased RAG, where it still falls short, and the way forward.

How vector-based RAG works

Traditional RAG follows four steps: process the query, retrieve relevant data, augment the prompt, and generate text. Documents are split into smaller chunks, each converted into an embedding and stored in a vector database. At query time, the system embeds the question, finds the most numerically similar chunks, and feeds them as context to the LLM.

Though this pipeline, popularised through frameworks such as LlamaIndex and AutoGen, has gained popularity, it carries structural problems that surface-level fixes cannot resolve

Challenges of vector-based RAG

Despite its popularity, vector-based RAG is limited where retrieval has to preserve the structure, interpret specialised language, and justify its choices. Some of the challenges of vector-based RAG are:

Structural breakdown

Chunking serves connections between related elements; tables split from captions; numbered clauses separated from their definitions. Even advanced splitting strategies cannot reliably preserve these links.

Mismatch in domain

Embedding models trained on broad internet data falter with specialised terminology in healthcare, legal, or engineering contexts. Swapping models means re-embedding the entire index which is costly and time-consuming.

Semantic misalignment

Cosine similarity measures numerical distance, not meaning. Searching for the latest model’s performance could surface results about a financial forecasting model or a 3D product design model. The system sees the word ‘model’, not the user’s intent.

Flattened hierarchy

Vectorisation treats every text fragment identically. Sections, chapters, footnotes, and tables lose the structural relationships that gave them meaning originally.

Hidden costs

Embedding generation incurs API charges and vector database can be expensive. Additionally, migrations, backups, and scaling also require ongoing engineering inputs.

Opaque retrieval

A high similarity score tells a compliance officer nothing about why a result was chosen. Regulated industries need human-readable reasoning, which is something vector retrieval rarely provides.

Page-level retrieval method

This method considers each page as a separate unit while keeping its original formatting. It then creates a hierarchical index that the LLM can use directly for reasoning.

When documents are added, they’re processed one page at a time, keeping all the original elements like tables, headings, images, and cross-references. For each page, the system creates a structured metadata record. This record includes summaries, main points, and identifiable entities. These records together form a complete catalogue which is created once and can be used for all later questions. When a question is asked, the model looks at both the question and the entire page-level index. It then finds the pages that are relevant by understanding the language instead of the vector proximity, gets their full content, and makes a response with page-level citations that a person can check.

There are several tools that work with page-level retrieval methods. For instance, PageIndex creates a document map, readable by large language models, allowing you to select pages based on their underlying reasoning. LlamaIndex Routers use language model reasoning to figure out which sub-indices are most relevant to a query and to navigate document summaries. GraphRAG maps entities and their relationships into a graph structure, allowing retrieval to follow connections across concepts rather than relying only on similarity between isolated chunks. The unifying principle across these approaches is that retrieval should consider the text’s structure and linguistic comprehension, rather than just numerical proximity.

Common operating principles for these tools:

Pages or logical sections remain intact with structured metadata.
Language understanding, not approximate similarity, drives retrieval.
Original formatting tables, headings, cross-references are preserved.
Retrieval decisions are transparent and traceable.

Challenges and benefits

Though page-level retrieval systems can prove to be more beneficial than vector-based RAG, they still have a few limitations. For instance, the current context window limits how big an index may get, so even thousand-page documents may still need hierarchical summarisation. Quality of metadata is another factor which can impact the results. Another challenge is time since vector-based retrieval can give results in milliseconds but LLM-driven reasoning across huge indexes could add seconds of delay. Making summaries for every page could incur additional cost for big or often updated collections and the ecosystem is still younger than the tools that have been around for vector search.

A hybrid architecture comprising vector-based RAG and page-level retrieval could fill these gaps by utilising vector search or BM25 for quick initial filtering and then passing the job to a page-aware layer for accurate, context-rich selection.

Key application areas: Where page-level retrieval can help

Some of the areas where page-level retrieval might help are:

Legal and regulatory content

Numbered sections, cross-references, footnotes, and defined terms stay connected, producing complete answers that include penalty schedules and exception clauses.

Technical documentation

Hierarchical structure, parameter tables, and troubleshooting steps maintain their relationships, so answers reflect the document’s actual logic.

Financial reports

Balance sheets and income statements remain intact alongside their notes, enabling detailed responses to queries about yearover-year performance changes.

Healthcare records

Lab values appear alongside reference ranges, clinical notes, and patient history, meeting the safety and accuracy standards the domain demands.

Way forward

In future, retrieval will probably use a combination of methods, such as vector and keyword searches for broad filtering and structured page-level reasoning for exact selection. Building on this foundation, the next wave of development is already taking shape. The metadata for visual elements will make page records richer, real-time updates will keep single-page indexes current, and tighter integration with agentic workflows will enable multi-step analysis across complex documents. As the open-source LLMs improve, structured retrieval running on-premises will become increasingly valuable for industries where privacy is not a preference but a requirement.

Conclusion

Page-level retrieval marks a meaningful shift in RAG design. By letting language models interpret structured, intact document content rather than abstract vector representations, it preserves critical relationships, improves transparency, reduces infrastructure complexity, and places intelligent comprehension at the heart of retrieval. In domains such as law, finance, technology, and healthcare, where structure matters most, this approach is already proving transformative. As context windows expand and models sharpen, their significance will only grow, though whether this ultimately leads to systems that genuinely understand documents or merely become more sophisticated at pattern-matching remains an open question. What is clear, however, is that page-level retrieval brings us meaningfully closer to that goal.

Contributors

Ayush Ranjan

Designation, PwC India

Email

Ayush Ranjan

Authors

Rob Donnelly

Global Analyst & Advisor Relations Leader, PwC United States

title, is lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ut nisl tellus. Quisque nec libero risus. Nam feugiat quam ut diam luctus consectetur. Vivamus eget.

Brian Levy

Global Deals Industries Leader, PwC United States

title, is lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ut nisl tellus. Quisque nec libero risus. Nam feugiat quam ut diam luctus consectetur. Vivamus eget.

Email

Lucy Stapleton

Global Deals Leader, PwC United Kingdom

title, is lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ut nisl tellus. Quisque nec libero risus. Nam feugiat quam ut diam luctus consectetur. Vivamus eget.

Email

David Brown

Asia Pacific Deals Leader, Partner, PwC China

title, is lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ut nisl tellus. Quisque nec libero risus. Nam feugiat quam ut diam luctus consectetur. Vivamus eget.

Email

Contributors

Francesca Ambrosini, Family Business Client Programs , PwC United Kingdom

Federico Mussi, Partner, Private Leader , PwC Italy

,

Lorem ipsum dolor sit amet

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Vestibulum lorem sed risus ultricie.

Lorem ipsum dolor sit amet

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Vestibulum lorem sed risus ultricies.