Skip to main content

Observability with Langfuse

Overview

RAGLight integrates with Langfuse to give you full visibility over your RAG pipeline. Every call to generate() or generate_streaming() produces a structured trace showing exactly what happened at each step.

Retrieve

See which documents were retrieved, from which collection, with which query.

Rerank

Inspect the reranking step when a CrossEncoder is active.

Generate

Trace the LLM call — prompt, model, latency, and token counts.

Installation

pip install "raglight[langfuse]"
This installs langfuse==4.0.0 alongside RAGLight.

Configuration

Tracing is configured via LangfuseConfig, a dataclass that holds your Langfuse credentials.
from raglight.config.langfuse_config import LangfuseConfig

langfuse_config = LangfuseConfig(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="http://localhost:3000",  # or your Langfuse Cloud URL
)
Pass this config to your pipeline — the rest is automatic.

Usage with RAGPipeline

from raglight.rag.simple_rag_api import RAGPipeline
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.langfuse_config import LangfuseConfig
from raglight.config.settings import Settings

langfuse_config = LangfuseConfig(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="http://localhost:3000",
)

config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    langfuse_config=langfuse_config,
)

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./myDb",
    collection_name="my_collection",
)

pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()

response = pipeline.generate("What is RAGLight?")
print(response)

Usage with the Builder API

from raglight.rag.builder import Builder
from raglight.config.langfuse_config import LangfuseConfig
from raglight.config.settings import Settings

langfuse_config = LangfuseConfig(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="http://localhost:3000",
)

rag = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
    .with_vector_store(Settings.CHROMA, persist_directory="./myDb", collection_name="my_collection")
    .with_llm(Settings.OLLAMA, model_name=Settings.DEFAULT_LLM)
    .build_rag(k=5, langfuse_config=langfuse_config)
)

rag.vector_store.ingest(data_path="./docs")
response = rag.generate("Explain the retrieval pipeline")
print(response)

Streaming support

Langfuse tracing works identically for streaming. The trace is emitted when the stream completes.
for chunk in pipeline.generate_streaming("What is RAGLight?"):
    print(chunk, end="", flush=True)
# → full trace appears in Langfuse once the stream ends
All LLM providers support streaming traces: Ollama, OpenAI, Mistral, Gemini, LMStudio, and AWS Bedrock.

Session ID

By default, a UUID is generated once per RAG instance and reused for every generate() call. This groups all turns of the same conversation under a single Langfuse session. You can pin a custom session ID:
LangfuseConfig(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="http://localhost:3000",
    session_id="my-session-42",
)

Use with raglight serve

When using the REST API, pass Langfuse credentials as environment variables:
.env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=http://localhost:3000
Then start the server:
raglight serve
When LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and LANGFUSE_HOST (or LANGFUSE_BASE_URL) are all set, tracing is enabled automatically. If any of these are missing, RAGLight disables Langfuse entirely — no connection attempt is made to localhost:3000.

Run Langfuse locally

The fastest way to get Langfuse running locally is Docker Compose:
git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker-compose up
Langfuse will be available at http://localhost:3000.

Summary

  • Install with pip install "raglight[langfuse]"
  • Pass LangfuseConfig to RAGConfig or build_rag()
  • Both generate() and generate_streaming() are traced automatically
  • All LLM providers are supported
  • Sessions group all turns of a conversation together
  • For raglight serve, set LANGFUSE_* env vars — no code changes needed