Skip to main content

REST API — raglight serve

raglight serve starts a FastAPI server that exposes your RAG pipeline as a REST API. The entire configuration is driven by environment variables — no Python code required.
The interactive API documentation (Swagger UI) is automatically available at http://localhost:8000/docs once the server is running.

Quick start

pip install raglight
raglight serve
The server starts on http://0.0.0.0:8000 by default and prints the active configuration on startup.

CLI options

OptionDefaultDescription
--host0.0.0.0Host to bind
--port8000Port to listen on
--reloadfalseEnable auto-reload (development)
--workers1Number of Uvicorn worker processes
--uifalseLaunch the Streamlit chat UI alongside the API
--ui-port8501Port for the Streamlit UI
raglight serve --host 127.0.0.1 --port 8080 --workers 4

Chat UI — --ui

Add --ui to start a Streamlit chat interface alongside the API. Both processes share the same configuration and communicate over HTTP.
raglight serve --ui
AddressService
http://localhost:8000REST API + Swagger (/docs)
http://localhost:8501Streamlit chat UI
The UI lets you:
  • Chat with your documents — full conversation history, markdown rendering
  • Upload files directly from the browser (PDF, TXT, code…)
  • Ingest a directory by providing a path on the server machine
  • Switch LLM on the fly — the ⚙️ Model settings panel in the sidebar lets you change provider, model, and API base URL without restarting the server
Use --ui-port to change the Streamlit port:
raglight serve --ui --port 8000 --ui-port 3000
The API and UI run as two independent processes. Stopping raglight serve terminates both cleanly.

Configuration

All settings are read from RAGLIGHT_* environment variables. Copy the example file and adjust the values:
cp examples/serve_example/.env.example .env
raglight serve
VariableDefaultDescription
RAGLIGHT_LLM_MODELllama3LLM model name
RAGLIGHT_LLM_PROVIDEROllamaLLM provider
RAGLIGHT_LLM_API_BASEhttp://localhost:11434LLM API base URL
RAGLIGHT_EMBEDDINGS_MODELall-MiniLM-L6-v2Embeddings model name
RAGLIGHT_EMBEDDINGS_PROVIDERHuggingFaceEmbeddings provider
RAGLIGHT_EMBEDDINGS_API_BASEhttp://localhost:11434Embeddings API base URL
RAGLIGHT_PERSIST_DIR./raglight_dbLocal ChromaDB persistence directory
RAGLIGHT_COLLECTIONdefaultChromaDB collection name
RAGLIGHT_K5Number of documents retrieved per query
RAGLIGHT_SYSTEM_PROMPT(default prompt)Custom system prompt for the LLM
RAGLIGHT_CHROMA_HOSTRemote Chroma host (leave unset for local)
RAGLIGHT_CHROMA_PORTRemote Chroma port
RAGLIGHT_API_TIMEOUT300Request timeout in seconds for the Streamlit UI (increase for slow models)
Valid provider values match the Settings constants:
  • LLM: Ollama, Mistral, OpenAI, LmStudio, GoogleGemini, AWSBedrock
  • Embeddings: HuggingFace, Ollama, OpenAI, GoogleGemini, AWSBedrock

Example: Mistral + remote Chroma

.env
RAGLIGHT_LLM_PROVIDER=Mistral
RAGLIGHT_LLM_MODEL=mistral-small-latest
RAGLIGHT_EMBEDDINGS_PROVIDER=HuggingFace
RAGLIGHT_EMBEDDINGS_MODEL=all-MiniLM-L6-v2
RAGLIGHT_CHROMA_HOST=chromadb
RAGLIGHT_CHROMA_PORT=8000
RAGLIGHT_COLLECTION=production

Endpoints

GET /health

Returns the server status.
curl http://localhost:8000/health
# {"status": "ok"}

POST /generate

Ask a question to the RAG pipeline. Request body
{ "question": "What is RAGLight?" }
Response
{ "answer": "RAGLight is a lightweight Python framework for RAG..." }
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"question": "What is RAGLight?"}'

POST /ingest

Index documents into the vector store. Supports three sources — combinable in a single call. Request body
{
  "data_path": "./my_documents",
  "file_paths": ["/absolute/path/to/file.pdf"],
  "github_url": "https://github.com/Bessouat40/RAGLight",
  "github_branch": "main"
}
All fields are optional, but at least one must be provided.
curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"data_path": "./docs"}'

POST /ingest/upload

Upload files directly from the client via multipart/form-data. Use this when the server and the files are on different machines. The field name must be files. Multiple files can be sent in a single request.
curl -X POST http://localhost:8000/ingest/upload \
  -F "files=@./report.pdf" \
  -F "files=@./notes.txt"
Uploaded files are written to a temporary directory, processed, indexed into ChromaDB, then deleted. Nothing is stored permanently on disk besides the embeddings. Response
{ "message": "Ingested 2 file(s): report.pdf, notes.txt" }

GET /collections

List the available ChromaDB collections.
curl http://localhost:8000/collections
# {"collections": ["default", "project_x"]}

GET /config

Returns the currently active LLM configuration.
curl http://localhost:8000/config
# {"llm_provider": "Ollama", "llm_model": "llama3", "llm_api_base": "http://localhost:11434"}

POST /config

Switches the LLM at runtime — no server restart required. The new model is loaded immediately and used for all subsequent /generate calls. Request body
{
  "llm_provider": "Mistral",
  "llm_model": "mistral-small-latest",
  "llm_api_base": null
}
curl -X POST http://localhost:8000/config \
  -H "Content-Type: application/json" \
  -d '{"llm_provider": "Mistral", "llm_model": "mistral-small-latest"}'
This is the endpoint used by the Streamlit UI’s ⚙️ Model settings panel.

Deploy with Docker Compose

The fastest path to a production deployment:
1

Copy the example env file

cd examples/serve_example
cp .env.example .env
Edit .env and set your LLM provider, model, and any API keys.
2

Start the stack

docker-compose up
The API is available at http://localhost:8000.
3

Ingest your documents

curl -X POST http://localhost:8000/ingest/upload \
  -F "files=@./my_document.pdf"
4

Query the API

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"question": "Summarize the document"}'
The docker-compose.yml includes extra_hosts: host.docker.internal:host-gateway so the container can reach an Ollama instance running on the host machine.

Summary

EndpointDescription
GET /healthServer liveness check
POST /generateAsk a question to the RAG pipeline
POST /ingestIndex a folder, file paths, or GitHub repo
POST /ingest/uploadUpload files directly (multipart/form-data)
GET /collectionsList available ChromaDB collections
GET /configGet the current LLM configuration
POST /configSwitch the LLM at runtime (no restart needed)