CLI
RAGLight ships with a full command-line interface. Three commands cover the main use cases:
| Command | Description |
|---|
raglight chat | Interactive RAG chat session in your terminal |
raglight agentic-chat | Same as chat, but with the Agentic RAG pipeline |
raglight serve | Deploy as a REST API (optionally with a Streamlit UI) |
raglight chat
Starts an interactive terminal chat session backed by a RAG pipeline. On first launch, a setup wizard guides you through choosing your vector store, embeddings, and LLM. Subsequent runs can skip the wizard entirely via environment variables.
The wizard walks you through:
- Vector database — Chroma or Qdrant, local path or remote host
- Embeddings — provider and model
- LLM — provider, model, and API base URL
- Knowledge source — local folder or GitHub repository
- Indexing — option to reuse an existing index
- Chat loop — type your questions, get streamed responses
Responses are rendered as markdown in the terminal with streaming output.
Skip the wizard with env vars
Create a .env file (or export env vars) and the wizard is bypassed entirely:
RAGLIGHT_LLM_PROVIDER=Ollama
RAGLIGHT_LLM_MODEL=llama3.1:8b
RAGLIGHT_EMBEDDINGS_PROVIDER=HuggingFace
RAGLIGHT_EMBEDDINGS_MODEL=all-MiniLM-L6-v2
RAGLIGHT_DB=Chroma
RAGLIGHT_PERSIST_DIR=./myDb
RAGLIGHT_COLLECTION=default
RAGLIGHT_DATA_PATH=./docs
Then just run:
RAGLight prints the active configuration and goes straight to the chat loop.
Commands in the chat loop
| Input | Action |
|---|
| Any text | Send a question to the RAG pipeline |
bye / exit / quit | End the session |
raglight agentic-chat
Same as raglight chat, but uses the Agentic RAG pipeline — the LLM can call tools, reason in multiple steps, and go beyond simple retrieval.
The setup wizard is identical. The difference is in the pipeline: the agent decides when to retrieve, can combine multiple retrievals, and produces richer answers for complex questions.
Agentic mode requires an LLM that supports tool calling (e.g. llama3.1, gpt-4o, mistral-large).
See the Agentic RAG page for a full explanation of the pipeline.
raglight serve
Starts a FastAPI REST API exposing your RAG pipeline over HTTP. Entirely configured by environment variables — no Python code required.
Add --ui to also launch the Streamlit chat interface:
See the REST API page for the full reference — endpoints, configuration variables, Docker Compose setup, and more.
CLI options
| Option | Default | Description |
|---|
--host | 0.0.0.0 | Host to bind |
--port | 8000 | Port to listen on |
--reload | false | Enable auto-reload (development) |
--workers | 1 | Number of Uvicorn worker processes |
--ui | false | Launch the Streamlit chat UI alongside the API |
--ui-port | 8501 | Port for the Streamlit UI |
Common environment variables
All three commands read the same RAGLIGHT_* environment variables:
| Variable | Default | Description |
|---|
RAGLIGHT_LLM_PROVIDER | Ollama | LLM provider |
RAGLIGHT_LLM_MODEL | llama3 | LLM model name |
RAGLIGHT_LLM_API_BASE | http://localhost:11434 | LLM API base URL |
RAGLIGHT_EMBEDDINGS_PROVIDER | HuggingFace | Embeddings provider |
RAGLIGHT_EMBEDDINGS_MODEL | all-MiniLM-L6-v2 | Embeddings model |
RAGLIGHT_DB | Chroma | Vector store backend (Chroma or Qdrant) |
RAGLIGHT_PERSIST_DIR | ./raglight_db | Local persistence directory |
RAGLIGHT_COLLECTION | default | Collection name |
RAGLIGHT_K | 5 | Number of documents retrieved per query |
RAGLIGHT_DATA_PATH | — | Path to documents (skips wizard prompt) |
Summary
raglight chat — terminal RAG chat with streaming markdown output
raglight agentic-chat — same but with tool-calling agent mode
raglight serve — REST API; add --ui for the web chat interface
- All three share the same
RAGLIGHT_* env vars — one .env file for everything