REST API — raglight serve
raglight serve starts a FastAPI server that exposes your RAG pipeline as a REST API. The entire configuration is driven by environment variables — no Python code required.
The interactive API documentation (Swagger UI) is automatically available at
http://localhost:8000/docs once the server is running.Quick start
http://0.0.0.0:8000 by default and prints the active configuration on startup.
CLI options
| Option | Default | Description |
|---|---|---|
--host | 0.0.0.0 | Host to bind |
--port | 8000 | Port to listen on |
--reload | false | Enable auto-reload (development) |
--workers | 1 | Number of Uvicorn worker processes |
--ui | false | Launch the Streamlit chat UI alongside the API |
--ui-port | 8501 | Port for the Streamlit UI |
Chat UI — --ui
Add --ui to start a Streamlit chat interface alongside the API. Both processes share the same configuration and communicate over HTTP.
| Address | Service |
|---|---|
http://localhost:8000 | REST API + Swagger (/docs) |
http://localhost:8501 | Streamlit chat UI |
- Chat with your documents — full conversation history, markdown rendering
- Upload files directly from the browser (PDF, TXT, code…)
- Ingest a directory by providing a path on the server machine
- Switch LLM on the fly — the ⚙️ Model settings panel in the sidebar lets you change provider, model, and API base URL without restarting the server
--ui-port to change the Streamlit port:
The API and UI run as two independent processes. Stopping
raglight serve terminates both cleanly.Configuration
All settings are read fromRAGLIGHT_* environment variables. Copy the example file and adjust the values:
| Variable | Default | Description |
|---|---|---|
RAGLIGHT_LLM_MODEL | llama3 | LLM model name |
RAGLIGHT_LLM_PROVIDER | Ollama | LLM provider |
RAGLIGHT_LLM_API_BASE | http://localhost:11434 | LLM API base URL |
RAGLIGHT_EMBEDDINGS_MODEL | all-MiniLM-L6-v2 | Embeddings model name |
RAGLIGHT_EMBEDDINGS_PROVIDER | HuggingFace | Embeddings provider |
RAGLIGHT_EMBEDDINGS_API_BASE | http://localhost:11434 | Embeddings API base URL |
RAGLIGHT_PERSIST_DIR | ./raglight_db | Local ChromaDB persistence directory |
RAGLIGHT_COLLECTION | default | ChromaDB collection name |
RAGLIGHT_K | 5 | Number of documents retrieved per query |
RAGLIGHT_SYSTEM_PROMPT | (default prompt) | Custom system prompt for the LLM |
RAGLIGHT_CHROMA_HOST | — | Remote Chroma host (leave unset for local) |
RAGLIGHT_CHROMA_PORT | — | Remote Chroma port |
RAGLIGHT_API_TIMEOUT | 300 | Request timeout in seconds for the Streamlit UI (increase for slow models) |
Valid provider values match the
Settings constants:- LLM:
Ollama,Mistral,OpenAI,LmStudio,GoogleGemini,AWSBedrock - Embeddings:
HuggingFace,Ollama,OpenAI,GoogleGemini,AWSBedrock
Example: Mistral + remote Chroma
.env
Endpoints
GET /health
Returns the server status.
POST /generate
Ask a question to the RAG pipeline.
Request body
POST /ingest
Index documents into the vector store. Supports three sources — combinable in a single call.
Request body
POST /ingest/upload
Upload files directly from the client via multipart/form-data. Use this when the server and the files are on different machines.
The field name must be files. Multiple files can be sent in a single request.
GET /collections
List the available ChromaDB collections.
GET /config
Returns the currently active LLM configuration.
POST /config
Switches the LLM at runtime — no server restart required. The new model is loaded immediately and used for all subsequent /generate calls.
Request body
This is the endpoint used by the Streamlit UI’s ⚙️ Model settings panel.
Deploy with Docker Compose
The fastest path to a production deployment:The
docker-compose.yml includes extra_hosts: host.docker.internal:host-gateway so the container can reach an Ollama instance running on the host machine.Summary
| Endpoint | Description |
|---|---|
GET /health | Server liveness check |
POST /generate | Ask a question to the RAG pipeline |
POST /ingest | Index a folder, file paths, or GitHub repo |
POST /ingest/upload | Upload files directly (multipart/form-data) |
GET /collections | List available ChromaDB collections |
GET /config | Get the current LLM configuration |
POST /config | Switch the LLM at runtime (no restart needed) |