Conversation History

Overview

RAGLight supports full multi-turn conversations across all LLM providers. User and assistant messages from previous turns are automatically injected into each new request — the model sees the full context of the conversation. This works identically for generate() and generate_streaming(), and is compatible with all providers: Ollama, OpenAI, Mistral, Gemini, LMStudio, and AWS Bedrock.

How it works

Each call to generate() or generate_streaming() accepts an optional history field — a list of {"role": ..., "content": ...} messages. RAGLight injects them into the LLM prompt before the current question. RAGPipeline manages this history automatically. You just call generate() repeatedly and history is accumulated for you.

Usage

Automatic history with RAGPipeline

from raglight.rag.simple_rag_api import RAGPipeline
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
)

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./myDb",
    collection_name="my_collection",
)

pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()

# Each call automatically includes previous turns
response1 = pipeline.generate("What is RAGLight?")
print(response1)

response2 = pipeline.generate("Can you give me a code example?")
print(response2)  # The model knows the previous question was about RAGLight

Capping history length

To limit memory usage in long conversations, set max_history in RAGConfig:

config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    max_history=10,  # Keep the last 10 messages
)

When the cap is reached, the oldest messages are dropped automatically.

Resetting history

pipeline.reset_history()

Manual history with the Builder API

When using the RAG object directly, pass history explicitly:

rag = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
    .with_vector_store(Settings.CHROMA, persist_directory="./myDb", collection_name="my_collection")
    .with_llm(Settings.OLLAMA, model_name=Settings.DEFAULT_LLM)
    .build_rag(k=5)
)

history = []

question1 = "What is RAGLight?"
response1 = rag.generate({"question": question1, "history": history})
history.append({"role": "user", "content": question1})
history.append({"role": "assistant", "content": response1})

question2 = "Can you give me a code example?"
response2 = rag.generate({"question": question2, "history": history})

History in the REST API

The /generate and /generate/stream endpoints accept a history field:

{
  "question": "Can you give me a code example?",
  "history": [
    {"role": "user", "content": "What is RAGLight?"},
    {"role": "assistant", "content": "RAGLight is a lightweight Python RAG framework..."}
  ]
}

The Streamlit chat UI (raglight serve --ui) manages this automatically.

Summary

RAGPipeline accumulates history automatically across generate() calls
Use max_history in RAGConfig to cap the number of messages kept
Use pipeline.reset_history() to start a fresh conversation
All providers support history — same behavior regardless of backend
Works identically with generate() and generate_streaming()

Configuration

Generation

Embeddings

Retrieval

Ingestion

Pipelines & Deployment

Integrations

Conversation History

Conversation History

Overview

How it works

Usage

Automatic history with RAGPipeline

Capping history length

Resetting history

Manual history with the Builder API

History in the REST API

Summary

Configuration

Generation

Embeddings

Retrieval

Ingestion

Pipelines & Deployment

Integrations

Documentation Index

​Conversation History

​Overview

​How it works

​Usage

​Automatic history with RAGPipeline

​Capping history length

​Resetting history

​Manual history with the Builder API

​History in the REST API

​Summary

Conversation History

Overview

How it works

Usage

Automatic history with RAGPipeline

Capping history length

Resetting history

Manual history with the Builder API

History in the REST API

Summary