RAG Pipelines

Overview

A RAG (Retrieval-Augmented Generation) pipeline combines two steps:

Retrieve relevant documents from a vector store
Generate an answer using a Large Language Model (LLM) conditioned on those documents

RAGLight provides two ways to build a standard RAG pipeline:

a high-level API with RAGPipeline (simple and explicit)
a low-level Builder API (fully composable and customizable)

Both approaches rely on the same core components:

loaders (knowledge sources)
readers (document processors)
embeddings
vector stores
LLMs

How RAG works in RAGLight

At runtime, a RAG pipeline follows this flow:

User Question
   ↓
Vector Store (similarity search)
   ↓
Retrieved Documents
   ↓
Prompt Construction
   ↓
LLM Generation
   ↓
Final Answer

Optionally, a cross-encoder can be used to rerank retrieved documents before generation.

Option 1: RAGPipeline (simple API)

RAGPipeline is the recommended entry point if you want:

a clear, batteries-included RAG setup
minimal boilerplate
fast prototyping

Basic example

from raglight.rag.simple_rag_api import RAGPipeline
from raglight.models.data_source_model import FolderSource, GitHubSource
from raglight.config.settings import Settings
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig

Settings.setup_logging()

knowledge_base = [
    FolderSource(path="./data/knowledge_base"),
    GitHubSource(url="https://github.com/Bessouat40/RAGLight"),
]

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
)

config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    knowledge_base=knowledge_base,
    k=5,
)

pipeline = RAGPipeline(config, vector_store_config)

pipeline.build()

response = pipeline.generate(
    "How can I create an easy RAGPipeline using RAGLight?"
)
print(response)

What happens during `build()`

Calling pipeline.build() triggers:

resolution of knowledge sources
document ingestion
chunking and embedding
storage in the vector store

Once built, the pipeline is ready for querying.

Querying the pipeline

response = pipeline.generate("Explain how RAG works")

Behind the scenes:

the query is embedded
the vector store retrieves top-k chunks
chunks are injected into a prompt
the LLM generates an answer

Option 2: Builder API (advanced)

The Builder API exposes all RAG components explicitly. Use it when you want:

fine-grained control over each component
custom ingestion workflows
advanced experimentation

Building a RAG pipeline step by step

from raglight.rag.builder import Builder
from raglight.config.settings import Settings

builder = Builder()

rag = (
    builder
    .with_embeddings(
        Settings.HUGGINGFACE,
        model_name=Settings.DEFAULT_EMBEDDINGS_MODEL,
    )
    .with_vector_store(
        Settings.CHROMA,
        persist_directory="./defaultDb",
        collection_name=Settings.DEFAULT_COLLECTION_NAME,
    )
    .with_llm(
        Settings.OLLAMA,
        model_name=Settings.DEFAULT_LLM,
        system_prompt=Settings.DEFAULT_SYSTEM_PROMPT,
    )
    .build_rag(k=5)
)

Ingesting documents manually

With the Builder API, ingestion is explicit:

rag.vector_store.ingest(data_path="./data")

This makes it easy to:

control when ingestion happens
reuse the same vector store across pipelines
debug indexing issues

Querying the RAG pipeline

response = rag.generate("How does RAGLight structure a RAG pipeline?")
print(response)

The retrieval and generation logic is identical to RAGPipeline.

Choosing between RAGPipeline and Builder

Use case	Recommended approach
Quick prototype	`RAGPipeline`
Minimal code	`RAGPipeline`
Fine-grained control	Builder API
Custom ingestion	Builder API
Advanced experimentation	Builder API

Both APIs produce the same internal RAG graph.

Common parameters

Regardless of the API, the following parameters matter:

k: number of retrieved chunks
embedding model and provider
vector store backend
LLM provider and model
system prompt

These parameters directly affect answer quality and latency.

Summary

RAG pipelines retrieve documents before generating answers
RAGLight offers a simple (RAGPipeline) and an advanced (Builder) API
Both approaches share the same core logic
Choose simplicity or control depending on your use case

Core Concepts

Pipelines

​RAG Pipelines

​Overview

​How RAG works in RAGLight

​Option 1: RAGPipeline (simple API)

​Basic example

​What happens during build()

​Querying the pipeline

​Option 2: Builder API (advanced)

​Building a RAG pipeline step by step

​Ingesting documents manually

​Querying the RAG pipeline

​Choosing between RAGPipeline and Builder

​Common parameters

​Summary