Hybrid Search
Overview
By default, RAGLight retrieves documents using semantic search — it embeds the query and finds the closest vectors in the store. Hybrid search extends this by combining two complementary retrieval strategies:- Semantic search — captures meaning and context (via embeddings + ChromaDB)
- BM25 — captures exact keyword matches (via the BM25Okapi algorithm)
When to use hybrid search
Semantic search alone can struggle when:- queries contain rare technical terms or acronyms
- documents are sparse or domain-specific
- the user expects an exact term to appear in the answer
How it works
At retrieval time, hybrid search follows these steps:k_rrf = 60 (a standard constant that smooths rank decay), and the sum runs over each ranked list the document appears in.
Configuration
Via VectorStoreConfig (simple API)
RAGPipeline or AgenticRAGPipeline — the rest of the pipeline is unchanged.
Via the Builder API (advanced)
Search modes
RAGLight supports three search modes, all configured viasearch_type:
| Value | Constant | Behavior |
|---|---|---|
"semantic" | Settings.SEARCH_SEMANTIC | Vector similarity search only (default) |
"bm25" | Settings.SEARCH_BM25 | Keyword search only |
"hybrid" | Settings.SEARCH_HYBRID | BM25 + semantic, fused with RRF |
"semantic" — existing code requires no changes to keep its current behavior.
Full example
BM25 index persistence
The BM25 index is built from the same documents stored in ChromaDB. It is automatically:- populated when documents are ingested via
pipeline.build()orvector_store.ingest() - saved to
{persist_directory}/bm25_{collection_name}.json - reloaded on next startup from that file
Summary
- Hybrid search combines semantic and BM25 retrieval with Reciprocal Rank Fusion
- Enable it by setting
search_type=Settings.SEARCH_HYBRIDinVectorStoreConfig - The default remains
"semantic"— no breaking change for existing code - The BM25 index is persisted automatically alongside ChromaDB data
- Use hybrid search when your knowledge base contains technical terms, acronyms, or sparse text