Computer scienceData scienceNLPRetrieval-augmented generation

Query augmentation techniques

14 minutes read

RAG has evolved with the introduction of several techniques to improve information retrieval. Each of these techniques addresses specific limitations in basic RAG implementations, particularly when dealing with complex queries or challenges in matching query language with document content.

In this topic, we will look at three techniques to improve traditional RAG architectures: query decomposition, Hypothetical Document Embeddings (HyDE), and router query engines.

Query decomposition

Query decomposition breaks down complex queries into more focused sub-queries that are easier to process. There are two main types: single-step and multi-step query decompositions.

In single-step decomposition, the LLM analyzes the incoming query to identify distinct components within it. The query is broken down into multiple simpler sub-queries, each targeting a specific aspect of the original question. Each sub-query is processed independently through the retrieval system to find relevant documents. The retrieved documents from all sub-queries are combined, potentially with deduplication. The LLM uses the aggregated documents to generate the answer to the original complex query.

Here is how the single-step query decomposition prompt might look like:

decomposition_prompt = f"""
    I need to break down the following complex query into 2-4 simpler sub-queries 
    that together cover all aspects of the original question:
    
    "{complex_query}"
    
    Return only the numbered sub-queries, one per line.
    """

Multi-step query decomposition takes the concept of breaking down complex queries further by introducing sequential processing, dependencies between sub-queries, and iterative refinement. Unlike single-step decomposition, which processes sub-queries in parallel, multi-step decomposition creates a structured workflow where information from earlier steps informs later steps.

The LLM first analyzes the complex query to identify not just component parts, but the logical dependencies between them. The LLM then constructs a query plan or tree that represents the dependency structure. Here is a prompt example that can be used in this step:

planning_prompt = f"""
    For the complex query: "{complex_query}"
    
    Create a multi-step query plan with 2-5 sequential steps. For each step:
    1. Write the specific sub-query to execute
    2. Explain what information this step seeks
    3. Describe how this information depends on or builds upon previous steps
    
    Format as:
    Step 1: [sub-query]
    Purpose: [information sought]
    Dependencies: [none for step 1]
    
    Step 2: [sub-query]
    Purpose: [information sought]
    Dependencies: [how this relies on step 1]
    """

Sub-queries are executed in a specific order, where the results of earlier queries inform the formulation or execution of later queries:

  • Initial queries retrieve foundational information

  • Intermediate queries use earlier results to retrieve more specific information

  • Final queries may synthesize or build upon all previous information

As each step is processed, the system accumulates context that influences subsequent retrievals. Information from earlier retrievals is incorporated into later sub-query formulations. The LLM may rewrite later queries based on what was learned earlier. Context from previous steps helps disambiguate terms or concepts in later steps. Here is how the refinement prompt might look like:

refinement_prompt = f"""
  Original sub-query: "{step['sub_query']}"
            
  Previous findings:
  {context_summary}
            
  Refine the sub-query to incorporate the context from previous findings
  while maintaining its original intent. Return just the refined query.
  """

The LLM may revise its approach based on intermediate results. If a retrieval step returns insufficient information, the LLM can reformulate that sub-query. If contradictory information is found, additional verification queries may be inserted. The decomposition plan itself may be dynamically adjusted.

HyDE

In traditional RAG, the retrieval step often suffers from the "lexical gap" problem - where the user's query language differs significantly from the language in relevant documents. This mismatch can lead to poor retrieval results. HyDe (Hypothetical document embeddings) enhances RAG by generating hypothetical documents that serve as improved query representations. The hypothetical document serves as a richer representation of the user's query. It contains more relevant terms and context that can better match with the actual documents in the corpus.

First, the original query is taken. That query is passed to an LLM with a specific prompt to generate a hypothetical answer or a document. The prompt typically follows this pattern: "Write a passage that would be the perfect answer to the query: [original query]". The LLM returns what it beliefs would be the ideal document containing information relevant for answering the query. Then, this perfect document is sent to the embedding model instead of the original user query. The embedding of the hypothetical document is then compared against the existing embeddings in the database, and the top-k most similar documents are retrieved and passed to the LLM to compose the final answer.

Router query engine

Router query engine directs incoming queries to the most appropriate retrieval mechanisms based on query characteristics. They serve as a orchestration layer that can significantly improve retrieval performance by selecting specialized processing paths for different query types.

It analyzes an incoming query and makes routing decisions to direct it to one or more specialized retrievers or processing pipelines. First, the query is analyzed to determine its type, domain, complexity, and other relevant characteristics. Then, the most appropriate retrieval mechanism(s) for that specific query type are chosen. The last step combines all of the retrieved results together.

An example router prompt can be defined as follows:

router_template = """
        Given a user query, determine the most appropriate retrieval system to handle it.
        
        User Query: {query}
        
        Pick exactly one of the following options:
        1. semantic_search - For general queries requiring conceptual understanding
        2. keyword_search - For queries with specific terms, names, or exact phrases
        3. structured_data - For queries about specific attributes, comparisons, or metrics
        4. qa_retrieval - For direct questions seeking factual answers
        5. code_search - For queries about programming or code examples
        
        Output the name of the chosen option ONLY.
        """

Conclusion

As a result, you are now familiar with:

  • Query decomposition breaks complex queries into manageable sub-queries while preserving their logical relationships.

  • HyDE generates hypothetical perfect documents to serve as richer query representations for retrieval.

  • Router query engines provide an orchestration layer that directs queries to specialized retrieval mechanisms based on query characteristics.

3 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo