Documentation Index
Fetch the complete documentation index at: https://docs.raglight.com/llms.txt
Use this file to discover all available pages before exploring further.
Conversation History
Overview
RAGLight supports full multi-turn conversations across all LLM providers. User and assistant messages from previous turns are automatically injected into each new request — the model sees the full context of the conversation.
This works identically for generate() and generate_streaming(), and is compatible with all providers: Ollama, OpenAI, Mistral, Gemini, LMStudio, and AWS Bedrock.
How it works
Each call to generate() or generate_streaming() accepts an optional history field — a list of {"role": ..., "content": ...} messages. RAGLight injects them into the LLM prompt before the current question.
RAGPipeline manages this history automatically. You just call generate() repeatedly and history is accumulated for you.
Usage
Automatic history with RAGPipeline
from raglight.rag.simple_rag_api import RAGPipeline
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings
config = RAGConfig(
llm=Settings.DEFAULT_LLM,
provider=Settings.OLLAMA,
)
vector_store_config = VectorStoreConfig(
embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
provider=Settings.HUGGINGFACE,
database=Settings.CHROMA,
persist_directory="./myDb",
collection_name="my_collection",
)
pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()
# Each call automatically includes previous turns
response1 = pipeline.generate("What is RAGLight?")
print(response1)
response2 = pipeline.generate("Can you give me a code example?")
print(response2) # The model knows the previous question was about RAGLight
Capping history length
To limit memory usage in long conversations, set max_history in RAGConfig:
config = RAGConfig(
llm=Settings.DEFAULT_LLM,
provider=Settings.OLLAMA,
max_history=10, # Keep the last 10 messages
)
When the cap is reached, the oldest messages are dropped automatically.
Resetting history
Manual history with the Builder API
When using the RAG object directly, pass history explicitly:
rag = (
Builder()
.with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
.with_vector_store(Settings.CHROMA, persist_directory="./myDb", collection_name="my_collection")
.with_llm(Settings.OLLAMA, model_name=Settings.DEFAULT_LLM)
.build_rag(k=5)
)
history = []
question1 = "What is RAGLight?"
response1 = rag.generate({"question": question1, "history": history})
history.append({"role": "user", "content": question1})
history.append({"role": "assistant", "content": response1})
question2 = "Can you give me a code example?"
response2 = rag.generate({"question": question2, "history": history})
History in the REST API
The /generate and /generate/stream endpoints accept a history field:
{
"question": "Can you give me a code example?",
"history": [
{"role": "user", "content": "What is RAGLight?"},
{"role": "assistant", "content": "RAGLight is a lightweight Python RAG framework..."}
]
}
The Streamlit chat UI (raglight serve --ui) manages this automatically.
Summary
RAGPipeline accumulates history automatically across generate() calls
- Use
max_history in RAGConfig to cap the number of messages kept
- Use
pipeline.reset_history() to start a fresh conversation
- All providers support history — same behavior regardless of backend
- Works identically with
generate() and generate_streaming()