Query Reformulation

Overview

In a multi-turn conversation, users often ask follow-up questions that reference previous context:

“How does the Builder pattern work?” “And for Bedrock?”

The second question makes no sense to the vector store in isolation. Query reformulation solves this by rewriting the question into a self-contained query before retrieval:

“How does the Builder pattern work with AWS Bedrock in RAGLight?”

This dramatically improves retrieval accuracy in conversational RAG.

How it works

When reformulation=True, the pipeline adds a reformulate step before retrieval:

User Question
   ↓
Reformulate (LLM call using conversation history)
   ↓
Standalone Question
   ↓
Vector Store (similarity search)
   ↓
Retrieved Documents
   ↓
LLM Generation
   ↓
Final Answer

The same LLM configured for generation is used for reformulation. If there is no conversation history yet (first turn), the question is passed through unchanged — no extra LLM call is made. The reformulated question is logged at INFO level so you can inspect what the model produced.

Configuration

Reformulation is enabled by default. You can disable it explicitly if needed.

Via `RAGConfig` (simple API)

from raglight.config.rag_config import RAGConfig
from raglight.config.settings import Settings

# Enabled by default — no change needed
config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
)

# Disable explicitly
config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    reformulation=False,
)

Via the Builder API

from raglight.rag.builder import Builder
from raglight.config.settings import Settings

rag = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name="all-MiniLM-L6-v2")
    .with_vector_store(
        Settings.CHROMA,
        persist_directory="./myDb",
        collection_name="my_collection",
    )
    .with_llm(Settings.OLLAMA, model_name="llama3.1:8b")
    .build_rag(k=5, reformulation=True)  # True by default
)

When reformulation helps

Scenario	Benefit
Multi-turn conversations	Resolves pronoun/reference ambiguity
Follow-up questions	Makes implicit context explicit for retrieval
Short or vague queries	Expands the query for better recall

When to disable it

Scenario	Reason
Single-turn Q&A	No history to leverage, avoids unnecessary LLM call
Very fast LLMs only	Adds one LLM roundtrip per query
Strict cost control	Each reformulation consumes tokens

Summary

Reformulation rewrites follow-up questions into standalone queries
Enabled by default in RAGConfig and Builder.build_rag()
Uses the same LLM as generation — no extra model needed
No-op on the first turn (no history)
Disable via reformulation=False

​Query Reformulation

​Overview

​How it works

​Configuration

​Via RAGConfig (simple API)

​Via the Builder API

​When reformulation helps

​When to disable it

​Summary