> ## Documentation Index
> Fetch the complete documentation index at: https://docs.raglight.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Conversation History

> Multi-turn conversations with automatic history injection across all providers.

# Conversation History

## Overview

RAGLight supports **full multi-turn conversations** across all LLM providers. User and assistant messages from previous turns are automatically injected into each new request — the model sees the full context of the conversation.

This works identically for `generate()` and `generate_streaming()`, and is compatible with all providers: Ollama, OpenAI, Mistral, Gemini, LMStudio, and AWS Bedrock.

***

## How it works

Each call to `generate()` or `generate_streaming()` accepts an optional `history` field — a list of `{"role": ..., "content": ...}` messages. RAGLight injects them into the LLM prompt before the current question.

RAGPipeline manages this history automatically. You just call `generate()` repeatedly and history is accumulated for you.

***

## Usage

### Automatic history with RAGPipeline

```python theme={null}
from raglight.rag.simple_rag_api import RAGPipeline
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
)

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./myDb",
    collection_name="my_collection",
)

pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()

# Each call automatically includes previous turns
response1 = pipeline.generate("What is RAGLight?")
print(response1)

response2 = pipeline.generate("Can you give me a code example?")
print(response2)  # The model knows the previous question was about RAGLight
```

### Capping history length

To limit memory usage in long conversations, set `max_history` in `RAGConfig`:

```python theme={null}
config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    max_history=10,  # Keep the last 10 messages
)
```

When the cap is reached, the oldest messages are dropped automatically.

### Resetting history

```python theme={null}
pipeline.reset_history()
```

### Manual history with the Builder API

When using the `RAG` object directly, pass history explicitly:

```python theme={null}
rag = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
    .with_vector_store(Settings.CHROMA, persist_directory="./myDb", collection_name="my_collection")
    .with_llm(Settings.OLLAMA, model_name=Settings.DEFAULT_LLM)
    .build_rag(k=5)
)

history = []

question1 = "What is RAGLight?"
response1 = rag.generate({"question": question1, "history": history})
history.append({"role": "user", "content": question1})
history.append({"role": "assistant", "content": response1})

question2 = "Can you give me a code example?"
response2 = rag.generate({"question": question2, "history": history})
```

***

## History in the REST API

The `/generate` and `/generate/stream` endpoints accept a `history` field:

```json theme={null}
{
  "question": "Can you give me a code example?",
  "history": [
    {"role": "user", "content": "What is RAGLight?"},
    {"role": "assistant", "content": "RAGLight is a lightweight Python RAG framework..."}
  ]
}
```

The Streamlit chat UI (`raglight serve --ui`) manages this automatically.

***

## Summary

* `RAGPipeline` accumulates history automatically across `generate()` calls
* Use `max_history` in `RAGConfig` to cap the number of messages kept
* Use `pipeline.reset_history()` to start a fresh conversation
* All providers support history — same behavior regardless of backend
* Works identically with `generate()` and `generate_streaming()`
