Hybrid Search

Overview

By default, RAGLight retrieves documents using semantic search — it embeds the query and finds the closest vectors in the store. Hybrid search extends this by combining two complementary retrieval strategies:

Semantic search — captures meaning and context (via embeddings + vector store)
BM25 — captures exact keyword matches (via the BM25Okapi algorithm)

The two result lists are merged using Reciprocal Rank Fusion (RRF), a rank aggregation algorithm that is both simple and robust.

When to use hybrid search

Semantic search alone can struggle when:

queries contain rare technical terms or acronyms
documents are sparse or domain-specific
the user expects an exact term to appear in the answer

BM25 alone misses synonyms and paraphrasing. Hybrid search combines both to improve retrieval quality across a wider range of query types.

How it works

At retrieval time, hybrid search follows these steps:

Query
  ↓
Semantic search  →  top 2k documents
BM25 search      →  top 2k documents
  ↓
Reciprocal Rank Fusion (RRF)
  ↓
Top k documents (deduplicated, reranked)

Why 2k? Fetching more candidates from each method before fusion ensures that good results that rank lower in one list still have a chance to surface in the final top k. RRF score formula:

score(doc) = Σ  1 / (k_rrf + rank)

where k_rrf = 60 (a standard constant that smooths rank decay), and the sum runs over each ranked list the document appears in.

Configuration

Via `VectorStoreConfig` (simple API)

from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
    search_type=Settings.SEARCH_HYBRID,  # "semantic" | "bm25" | "hybrid"
)

Pass this config to RAGPipeline or AgenticRAGPipeline — the rest of the pipeline is unchanged.

Via the Builder API (advanced)

Works with both Chroma and Qdrant:

from raglight.rag.builder import Builder
from raglight.config.settings import Settings

# With Chroma
rag = (
    Builder()
    .with_embeddings(
        Settings.HUGGINGFACE,
        model_name=Settings.DEFAULT_EMBEDDINGS_MODEL,
    )
    .with_vector_store(
        Settings.CHROMA,
        persist_directory="./defaultDb",
        collection_name=Settings.DEFAULT_COLLECTION_NAME,
        search_type=Settings.SEARCH_HYBRID,
    )
    .with_llm(Settings.OLLAMA, model_name=Settings.DEFAULT_LLM)
    .build_rag(k=5)
)

# With Qdrant (drop-in replacement)
rag = (
    Builder()
    .with_embeddings(
        Settings.HUGGINGFACE,
        model_name=Settings.DEFAULT_EMBEDDINGS_MODEL,
    )
    .with_vector_store(
        Settings.QDRANT,
        persist_directory="./defaultDb",
        collection_name=Settings.DEFAULT_COLLECTION_NAME,
        search_type=Settings.SEARCH_HYBRID,
    )
    .with_llm(Settings.OLLAMA, model_name=Settings.DEFAULT_LLM)
    .build_rag(k=5)
)

Search modes

RAGLight supports three search modes, all configured via search_type:

Value	Constant	Behavior
`"semantic"`	`Settings.SEARCH_SEMANTIC`	Vector similarity search only (default)
`"bm25"`	`Settings.SEARCH_BM25`	Keyword search only
`"hybrid"`	`Settings.SEARCH_HYBRID`	BM25 + semantic, fused with RRF

The default is "semantic" — existing code requires no changes to keep its current behavior.

Full example

from raglight.rag.simple_rag_api import RAGPipeline
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings
from raglight.models.data_source_model import FolderSource

Settings.setup_logging()

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
    search_type=Settings.SEARCH_HYBRID,
)

config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    knowledge_base=[FolderSource(path="./data")],
    k=5,
)

pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()

response = pipeline.generate("What are the key concepts in these documents?")
print(response)

BM25 index persistence

The BM25 index is built from the same documents stored in your vector store. It is automatically:

populated when documents are ingested via pipeline.build() or vector_store.ingest()
saved to {persist_directory}/bm25_{collection_name}.json
reloaded on next startup from that file

No additional setup is required. The BM25 index stays in sync with the vector store automatically.

Remote Qdrant: when using Qdrant in remote mode (host + port), there is no local persist_directory. In this case the BM25 index is kept in-memory and rebuilt from the remote collection at startup. This adds a brief startup cost proportional to collection size, but requires no additional storage.

Summary

Hybrid search combines semantic and BM25 retrieval with Reciprocal Rank Fusion
Enable it by setting search_type=Settings.SEARCH_HYBRID in VectorStoreConfig
The default remains "semantic" — no breaking change for existing code
Supported on both Chroma and Qdrant backends
The BM25 index is persisted automatically alongside your vector store data (in-memory for remote Qdrant)
Use hybrid search when your knowledge base contains technical terms, acronyms, or sparse text

​Hybrid Search

​Overview

​When to use hybrid search

​How it works

​Configuration

​Via VectorStoreConfig (simple API)

​Via the Builder API (advanced)

​Search modes

​Full example

​BM25 index persistence

​Summary

Hybrid Search

Overview

When to use hybrid search

How it works

Configuration

Via `VectorStoreConfig` (simple API)

Via the Builder API (advanced)

Search modes

Full example

BM25 index persistence

Summary