Skip to main content

Conversation History

Overview

RAGLight supports full multi-turn conversations across all LLM providers. User and assistant messages from previous turns are automatically injected into each new request — the model sees the full context of the conversation. This works identically for generate() and generate_streaming(), and is compatible with all providers: Ollama, OpenAI, Mistral, Gemini, LMStudio, and AWS Bedrock.

How it works

Each call to generate() or generate_streaming() accepts an optional history field — a list of {"role": ..., "content": ...} messages. RAGLight injects them into the LLM prompt before the current question. RAGPipeline manages this history automatically. You just call generate() repeatedly and history is accumulated for you.

Usage

Automatic history with RAGPipeline

from raglight.rag.simple_rag_api import RAGPipeline
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
)

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./myDb",
    collection_name="my_collection",
)

pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()

# Each call automatically includes previous turns
response1 = pipeline.generate("What is RAGLight?")
print(response1)

response2 = pipeline.generate("Can you give me a code example?")
print(response2)  # The model knows the previous question was about RAGLight

Capping history length

To limit memory usage in long conversations, set max_history in RAGConfig:
config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    max_history=10,  # Keep the last 10 messages
)
When the cap is reached, the oldest messages are dropped automatically.

Resetting history

pipeline.reset_history()

Manual history with the Builder API

When using the RAG object directly, pass history explicitly:
rag = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
    .with_vector_store(Settings.CHROMA, persist_directory="./myDb", collection_name="my_collection")
    .with_llm(Settings.OLLAMA, model_name=Settings.DEFAULT_LLM)
    .build_rag(k=5)
)

history = []

question1 = "What is RAGLight?"
response1 = rag.generate({"question": question1, "history": history})
history.append({"role": "user", "content": question1})
history.append({"role": "assistant", "content": response1})

question2 = "Can you give me a code example?"
response2 = rag.generate({"question": question2, "history": history})

History in the REST API

The /generate and /generate/stream endpoints accept a history field:
{
  "question": "Can you give me a code example?",
  "history": [
    {"role": "user", "content": "What is RAGLight?"},
    {"role": "assistant", "content": "RAGLight is a lightweight Python RAG framework..."}
  ]
}
The Streamlit chat UI (raglight serve --ui) manages this automatically.

Summary

  • RAGPipeline accumulates history automatically across generate() calls
  • Use max_history in RAGConfig to cap the number of messages kept
  • Use pipeline.reset_history() to start a fresh conversation
  • All providers support history — same behavior regardless of backend
  • Works identically with generate() and generate_streaming()