Skip to main content

REST API — raglight serve

raglight serve starts a FastAPI server that exposes your RAG pipeline as a REST API. The entire configuration is driven by environment variables — no Python code required.
The interactive API documentation (Swagger UI) is automatically available at http://localhost:8000/docs once the server is running.

Quick start

pip install raglight
raglight serve
The server starts on http://0.0.0.0:8000 by default and prints the active configuration on startup.

CLI options

OptionDefaultDescription
--host0.0.0.0Host to bind
--port8000Port to listen on
--reloadfalseEnable auto-reload (development)
--workers1Number of Uvicorn worker processes
raglight serve --host 127.0.0.1 --port 8080 --workers 4

Configuration

All settings are read from RAGLIGHT_* environment variables. Copy the example file and adjust the values:
cp examples/serve_example/.env.example .env
raglight serve
VariableDefaultDescription
RAGLIGHT_LLM_MODELllama3LLM model name
RAGLIGHT_LLM_PROVIDEROllamaLLM provider
RAGLIGHT_LLM_API_BASEhttp://localhost:11434LLM API base URL
RAGLIGHT_EMBEDDINGS_MODELall-MiniLM-L6-v2Embeddings model name
RAGLIGHT_EMBEDDINGS_PROVIDERHuggingFaceEmbeddings provider
RAGLIGHT_EMBEDDINGS_API_BASEhttp://localhost:11434Embeddings API base URL
RAGLIGHT_PERSIST_DIR./raglight_dbLocal ChromaDB persistence directory
RAGLIGHT_COLLECTIONdefaultChromaDB collection name
RAGLIGHT_K5Number of documents retrieved per query
RAGLIGHT_SYSTEM_PROMPT(default prompt)Custom system prompt for the LLM
RAGLIGHT_CHROMA_HOSTRemote Chroma host (leave unset for local)
RAGLIGHT_CHROMA_PORTRemote Chroma port
Valid provider values match the Settings constants: Ollama, Mistral, OpenAI, LmStudio, GoogleGemini for LLMs — and HuggingFace, Ollama, OpenAI, GoogleGemini for embeddings.

Example: Mistral + remote Chroma

.env
RAGLIGHT_LLM_PROVIDER=Mistral
RAGLIGHT_LLM_MODEL=mistral-small-latest
RAGLIGHT_EMBEDDINGS_PROVIDER=HuggingFace
RAGLIGHT_EMBEDDINGS_MODEL=all-MiniLM-L6-v2
RAGLIGHT_CHROMA_HOST=chromadb
RAGLIGHT_CHROMA_PORT=8000
RAGLIGHT_COLLECTION=production

Endpoints

GET /health

Returns the server status.
curl http://localhost:8000/health
# {"status": "ok"}

POST /generate

Ask a question to the RAG pipeline. Request body
{ "question": "What is RAGLight?" }
Response
{ "answer": "RAGLight is a lightweight Python framework for RAG..." }
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"question": "What is RAGLight?"}'

POST /ingest

Index documents into the vector store. Supports three sources — combinable in a single call. Request body
{
  "data_path": "./my_documents",
  "file_paths": ["/absolute/path/to/file.pdf"],
  "github_url": "https://github.com/Bessouat40/RAGLight",
  "github_branch": "main"
}
All fields are optional, but at least one must be provided.
curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"data_path": "./docs"}'

POST /ingest/upload

Upload files directly from the client via multipart/form-data. Use this when the server and the files are on different machines. The field name must be files. Multiple files can be sent in a single request.
curl -X POST http://localhost:8000/ingest/upload \
  -F "files=@./report.pdf" \
  -F "files=@./notes.txt"
Uploaded files are written to a temporary directory, processed, indexed into ChromaDB, then deleted. Nothing is stored permanently on disk besides the embeddings. Response
{ "message": "Ingested 2 file(s): report.pdf, notes.txt" }

GET /collections

List the available ChromaDB collections.
curl http://localhost:8000/collections
# {"collections": ["default", "project_x"]}

Deploy with Docker Compose

The fastest path to a production deployment:
1

Copy the example env file

cd examples/serve_example
cp .env.example .env
Edit .env and set your LLM provider, model, and any API keys.
2

Start the stack

docker-compose up
The API is available at http://localhost:8000.
3

Ingest your documents

curl -X POST http://localhost:8000/ingest/upload \
  -F "files=@./my_document.pdf"
4

Query the API

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"question": "Summarize the document"}'
The docker-compose.yml includes extra_hosts: host.docker.internal:host-gateway so the container can reach an Ollama instance running on the host machine.

Summary

EndpointDescription
GET /healthServer liveness check
POST /generateAsk a question to the RAG pipeline
POST /ingestIndex a folder, file paths, or GitHub repo
POST /ingest/uploadUpload files directly (multipart/form-data)
GET /collectionsList available ChromaDB collections