Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.raglight.com/llms.txt

Use this file to discover all available pages before exploring further.

Observability with Langfuse

Overview

RAGLight integrates with Langfuse to give you full visibility over your RAG pipeline. Every call to generate() or generate_streaming() produces a structured trace showing exactly what happened at each step.

Retrieve

See which documents were retrieved, from which collection, with which query.

Rerank

Inspect the reranking step when a CrossEncoder is active.

Generate

Trace the LLM call — prompt, model, latency, and token counts.

Installation

pip install "raglight[langfuse]"
This installs langfuse==4.0.0 alongside RAGLight.

Configuration

Tracing is configured via LangfuseConfig, a dataclass that holds your Langfuse credentials.
from raglight.config.langfuse_config import LangfuseConfig

langfuse_config = LangfuseConfig(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="http://localhost:3000",  # or your Langfuse Cloud URL
)
Pass this config to your pipeline — the rest is automatic.

Usage with RAGPipeline

from raglight.rag.simple_rag_api import RAGPipeline
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.langfuse_config import LangfuseConfig
from raglight.config.settings import Settings

langfuse_config = LangfuseConfig(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="http://localhost:3000",
)

config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    langfuse_config=langfuse_config,
)

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./myDb",
    collection_name="my_collection",
)

pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()

response = pipeline.generate("What is RAGLight?")
print(response)

Usage with the Builder API

from raglight.rag.builder import Builder
from raglight.config.langfuse_config import LangfuseConfig
from raglight.config.settings import Settings

langfuse_config = LangfuseConfig(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="http://localhost:3000",
)

rag = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
    .with_vector_store(Settings.CHROMA, persist_directory="./myDb", collection_name="my_collection")
    .with_llm(Settings.OLLAMA, model_name=Settings.DEFAULT_LLM)
    .build_rag(k=5, langfuse_config=langfuse_config)
)

rag.vector_store.ingest(data_path="./docs")
response = rag.generate("Explain the retrieval pipeline")
print(response)

Streaming support

Langfuse tracing works identically for streaming. The trace is emitted when the stream completes.
for chunk in pipeline.generate_streaming("What is RAGLight?"):
    print(chunk, end="", flush=True)
# → full trace appears in Langfuse once the stream ends
All LLM providers support streaming traces: Ollama, OpenAI, Mistral, Gemini, LMStudio, and AWS Bedrock.

Session ID

By default, a UUID is generated once per RAG instance and reused for every generate() call. This groups all turns of the same conversation under a single Langfuse session. You can pin a custom session ID:
LangfuseConfig(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="http://localhost:3000",
    session_id="my-session-42",
)

Use with raglight serve

When using the REST API, pass Langfuse credentials as environment variables:
.env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=http://localhost:3000
Then start the server:
raglight serve
When LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and LANGFUSE_HOST (or LANGFUSE_BASE_URL) are all set, tracing is enabled automatically. If any of these are missing, RAGLight disables Langfuse entirely — no connection attempt is made to localhost:3000.

Run Langfuse locally

The fastest way to get Langfuse running locally is Docker Compose:
git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker-compose up
Langfuse will be available at http://localhost:3000.

Summary

  • Install with pip install "raglight[langfuse]"
  • Pass LangfuseConfig to RAGConfig or build_rag()
  • Both generate() and generate_streaming() are traced automatically
  • All LLM providers are supported
  • Sessions group all turns of a conversation together
  • For raglight serve, set LANGFUSE_* env vars — no code changes needed