REST API — raglight serve
raglight serve starts a FastAPI server that exposes your RAG pipeline as a REST API. The entire configuration is driven by environment variables — no Python code required.
The interactive API documentation (Swagger UI) is automatically available at
http://localhost:8000/docs once the server is running.Quick start
http://0.0.0.0:8000 by default and prints the active configuration on startup.
CLI options
| Option | Default | Description |
|---|---|---|
--host | 0.0.0.0 | Host to bind |
--port | 8000 | Port to listen on |
--reload | false | Enable auto-reload (development) |
--workers | 1 | Number of Uvicorn worker processes |
Configuration
All settings are read fromRAGLIGHT_* environment variables. Copy the example file and adjust the values:
| Variable | Default | Description |
|---|---|---|
RAGLIGHT_LLM_MODEL | llama3 | LLM model name |
RAGLIGHT_LLM_PROVIDER | Ollama | LLM provider |
RAGLIGHT_LLM_API_BASE | http://localhost:11434 | LLM API base URL |
RAGLIGHT_EMBEDDINGS_MODEL | all-MiniLM-L6-v2 | Embeddings model name |
RAGLIGHT_EMBEDDINGS_PROVIDER | HuggingFace | Embeddings provider |
RAGLIGHT_EMBEDDINGS_API_BASE | http://localhost:11434 | Embeddings API base URL |
RAGLIGHT_PERSIST_DIR | ./raglight_db | Local ChromaDB persistence directory |
RAGLIGHT_COLLECTION | default | ChromaDB collection name |
RAGLIGHT_K | 5 | Number of documents retrieved per query |
RAGLIGHT_SYSTEM_PROMPT | (default prompt) | Custom system prompt for the LLM |
RAGLIGHT_CHROMA_HOST | — | Remote Chroma host (leave unset for local) |
RAGLIGHT_CHROMA_PORT | — | Remote Chroma port |
Valid provider values match the
Settings constants: Ollama, Mistral,
OpenAI, LmStudio, GoogleGemini for LLMs — and HuggingFace, Ollama,
OpenAI, GoogleGemini for embeddings.Example: Mistral + remote Chroma
.env
Endpoints
GET /health
Returns the server status.
POST /generate
Ask a question to the RAG pipeline.
Request body
POST /ingest
Index documents into the vector store. Supports three sources — combinable in a single call.
Request body
POST /ingest/upload
Upload files directly from the client via multipart/form-data. Use this when the server and the files are on different machines.
The field name must be files. Multiple files can be sent in a single request.
GET /collections
List the available ChromaDB collections.
Deploy with Docker Compose
The fastest path to a production deployment:The
docker-compose.yml includes extra_hosts: host.docker.internal:host-gateway so the container can reach an Ollama instance running on the host machine.Summary
| Endpoint | Description |
|---|---|
GET /health | Server liveness check |
POST /generate | Ask a question to the RAG pipeline |
POST /ingest | Index a folder, file paths, or GitHub repo |
POST /ingest/upload | Upload files directly (multipart/form-data) |
GET /collections | List available ChromaDB collections |