Skip to main content

RAG Pipelines

Overview

A RAG (Retrieval-Augmented Generation) pipeline combines two steps:
  1. Retrieve relevant documents from a vector store
  2. Generate an answer using a Large Language Model (LLM) conditioned on those documents
RAGLight provides two ways to build a standard RAG pipeline:
  • a high-level API with RAGPipeline (simple and explicit)
  • a low-level Builder API (fully composable and customizable)
Both approaches rely on the same core components:
  • loaders (knowledge sources)
  • readers (document processors)
  • embeddings
  • vector stores
  • LLMs

How RAG works in RAGLight

At runtime, a RAG pipeline follows this flow:
User Question

Vector Store (similarity search)

Retrieved Documents

Prompt Construction

LLM Generation

Final Answer
Optionally, a cross-encoder can be used to rerank retrieved documents before generation.

Option 1: RAGPipeline (simple API)

RAGPipeline is the recommended entry point if you want:
  • a clear, batteries-included RAG setup
  • minimal boilerplate
  • fast prototyping

Basic example

from raglight.rag.simple_rag_api import RAGPipeline
from raglight.models.data_source_model import FolderSource, GitHubSource
from raglight.config.settings import Settings
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig

Settings.setup_logging()

knowledge_base = [
    FolderSource(path="./data/knowledge_base"),
    GitHubSource(url="https://github.com/Bessouat40/RAGLight"),
]

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
)

config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    knowledge_base=knowledge_base,
    k=5,
)

pipeline = RAGPipeline(config, vector_store_config)

pipeline.build()

response = pipeline.generate(
    "How can I create an easy RAGPipeline using RAGLight?"
)
print(response)

What happens during build()

Calling pipeline.build() triggers:
  1. resolution of knowledge sources
  2. document ingestion
  3. chunking and embedding
  4. storage in the vector store
Once built, the pipeline is ready for querying.

Querying the pipeline

response = pipeline.generate("Explain how RAG works")
Behind the scenes:
  • the query is embedded
  • the vector store retrieves top-k chunks
  • chunks are injected into a prompt
  • the LLM generates an answer

Option 2: Builder API (advanced)

The Builder API exposes all RAG components explicitly. Use it when you want:
  • fine-grained control over each component
  • custom ingestion workflows
  • advanced experimentation

Building a RAG pipeline step by step

from raglight.rag.builder import Builder
from raglight.config.settings import Settings

builder = Builder()

rag = (
    builder
    .with_embeddings(
        Settings.HUGGINGFACE,
        model_name=Settings.DEFAULT_EMBEDDINGS_MODEL,
    )
    .with_vector_store(
        Settings.CHROMA,
        persist_directory="./defaultDb",
        collection_name=Settings.DEFAULT_COLLECTION_NAME,
    )
    .with_llm(
        Settings.OLLAMA,
        model_name=Settings.DEFAULT_LLM,
        system_prompt=Settings.DEFAULT_SYSTEM_PROMPT,
    )
    .build_rag(k=5)
)

Ingesting documents manually

With the Builder API, ingestion is explicit:
rag.vector_store.ingest(data_path="./data")
This makes it easy to:
  • control when ingestion happens
  • reuse the same vector store across pipelines
  • debug indexing issues

Querying the RAG pipeline

response = rag.generate("How does RAGLight structure a RAG pipeline?")
print(response)
The retrieval and generation logic is identical to RAGPipeline.

Choosing between RAGPipeline and Builder

Use caseRecommended approach
Quick prototypeRAGPipeline
Minimal codeRAGPipeline
Fine-grained controlBuilder API
Custom ingestionBuilder API
Advanced experimentationBuilder API
Both APIs produce the same internal RAG graph.

Common parameters

Regardless of the API, the following parameters matter:
  • k: number of retrieved chunks
  • embedding model and provider
  • vector store backend
  • LLM provider and model
  • system prompt
These parameters directly affect answer quality and latency.

Summary

  • RAG pipelines retrieve documents before generating answers
  • RAGLight offers a simple (RAGPipeline) and an advanced (Builder) API
  • Both approaches share the same core logic
  • Choose simplicity or control depending on your use case