REST API — `raglight serve`

raglight serve starts a FastAPI server that exposes your RAG pipeline as a REST API. The entire configuration is driven by environment variables — no Python code required.

The interactive API documentation (Swagger UI) is automatically available at http://localhost:8000/docs once the server is running.

Quick start

pip install raglight
raglight serve

The server starts on http://0.0.0.0:8000 by default and prints the active configuration on startup.

CLI options

Option	Default	Description
`--host`	`0.0.0.0`	Host to bind
`--port`	`8000`	Port to listen on
`--reload`	`false`	Enable auto-reload (development)
`--workers`	`1`	Number of Uvicorn worker processes
`--ui`	`false`	Launch the Streamlit chat UI alongside the API
`--ui-port`	`8501`	Port for the Streamlit UI

raglight serve --host 127.0.0.1 --port 8080 --workers 4

Chat UI — `--ui`

Add --ui to start a Streamlit chat interface alongside the API. Both processes share the same configuration and communicate over HTTP.

raglight serve --ui

Address	Service
`http://localhost:8000`	REST API + Swagger (`/docs`)
`http://localhost:8501`	Streamlit chat UI

The UI lets you:

Chat with your documents — full conversation history, markdown rendering
Upload files directly from the browser (PDF, TXT, code…)
Ingest a directory by providing a path on the server machine
Switch LLM on the fly — the ⚙️ Model settings panel in the sidebar lets you change provider, model, and API base URL without restarting the server

Use --ui-port to change the Streamlit port:

raglight serve --ui --port 8000 --ui-port 3000

The API and UI run as two independent processes. Stopping raglight serve terminates both cleanly.

Configuration

All settings are read from RAGLIGHT_* environment variables. Copy the example file and adjust the values:

cp examples/serve_example/.env.example .env
raglight serve

Variable	Default	Description
`RAGLIGHT_LLM_MODEL`	`llama3`	LLM model name
`RAGLIGHT_LLM_PROVIDER`	`Ollama`	LLM provider
`RAGLIGHT_LLM_API_BASE`	`http://localhost:11434`	LLM API base URL
`RAGLIGHT_EMBEDDINGS_MODEL`	`all-MiniLM-L6-v2`	Embeddings model name
`RAGLIGHT_EMBEDDINGS_PROVIDER`	`HuggingFace`	Embeddings provider
`RAGLIGHT_EMBEDDINGS_API_BASE`	`http://localhost:11434`	Embeddings API base URL
`RAGLIGHT_PERSIST_DIR`	`./raglight_db`	Local ChromaDB persistence directory
`RAGLIGHT_COLLECTION`	`default`	ChromaDB collection name
`RAGLIGHT_K`	`5`	Number of documents retrieved per query
`RAGLIGHT_SYSTEM_PROMPT`	(default prompt)	Custom system prompt for the LLM
`RAGLIGHT_CHROMA_HOST`	—	Remote Chroma host (leave unset for local)
`RAGLIGHT_CHROMA_PORT`	—	Remote Chroma port
`RAGLIGHT_API_TIMEOUT`	`300`	Request timeout in seconds for the Streamlit UI (increase for slow models)

Valid provider values match the Settings constants:

LLM: Ollama, Mistral, OpenAI, LmStudio, GoogleGemini, AWSBedrock
Embeddings: HuggingFace, Ollama, OpenAI, GoogleGemini, AWSBedrock

Example: Mistral + remote Chroma

.env

RAGLIGHT_LLM_PROVIDER=Mistral
RAGLIGHT_LLM_MODEL=mistral-small-latest
RAGLIGHT_EMBEDDINGS_PROVIDER=HuggingFace
RAGLIGHT_EMBEDDINGS_MODEL=all-MiniLM-L6-v2
RAGLIGHT_CHROMA_HOST=chromadb
RAGLIGHT_CHROMA_PORT=8000
RAGLIGHT_COLLECTION=production

Endpoints

`GET /health`

Returns the server status.

curl http://localhost:8000/health
# {"status": "ok"}

`POST /generate`

Ask a question to the RAG pipeline. Request body

{ "question": "What is RAGLight?" }

Response

{ "answer": "RAGLight is a lightweight Python framework for RAG..." }

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"question": "What is RAGLight?"}'

`POST /ingest`

Index documents into the vector store. Supports three sources — combinable in a single call. Request body

{
  "data_path": "./my_documents",
  "file_paths": ["/absolute/path/to/file.pdf"],
  "github_url": "https://github.com/Bessouat40/RAGLight",
  "github_branch": "main"
}

All fields are optional, but at least one must be provided.

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"data_path": "./docs"}'

`POST /ingest/upload`

Upload files directly from the client via multipart/form-data. Use this when the server and the files are on different machines. The field name must be files. Multiple files can be sent in a single request.

curl -X POST http://localhost:8000/ingest/upload \
  -F "files=@./report.pdf" \
  -F "files=@./notes.txt"

Uploaded files are written to a temporary directory, processed, indexed into ChromaDB, then deleted. Nothing is stored permanently on disk besides the embeddings. Response

{ "message": "Ingested 2 file(s): report.pdf, notes.txt" }

`GET /collections`

List the available ChromaDB collections.

curl http://localhost:8000/collections
# {"collections": ["default", "project_x"]}

`GET /config`

Returns the currently active LLM configuration.

curl http://localhost:8000/config
# {"llm_provider": "Ollama", "llm_model": "llama3", "llm_api_base": "http://localhost:11434"}

`POST /config`

Switches the LLM at runtime — no server restart required. The new model is loaded immediately and used for all subsequent /generate calls. Request body

{
  "llm_provider": "Mistral",
  "llm_model": "mistral-small-latest",
  "llm_api_base": null
}

curl -X POST http://localhost:8000/config \
  -H "Content-Type: application/json" \
  -d '{"llm_provider": "Mistral", "llm_model": "mistral-small-latest"}'

This is the endpoint used by the Streamlit UI’s ⚙️ Model settings panel.

Deploy with Docker Compose

The fastest path to a production deployment:

Copy the example env file

cd examples/serve_example
cp .env.example .env

Edit .env and set your LLM provider, model, and any API keys.

Start the stack

docker-compose up

The API is available at http://localhost:8000.

Ingest your documents

curl -X POST http://localhost:8000/ingest/upload \
  -F "files=@./my_document.pdf"

Query the API

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"question": "Summarize the document"}'

The docker-compose.yml includes extra_hosts: host.docker.internal:host-gateway so the container can reach an Ollama instance running on the host machine.

Summary

Endpoint	Description
`GET /health`	Server liveness check
`POST /generate`	Ask a question to the RAG pipeline
`POST /ingest`	Index a folder, file paths, or GitHub repo
`POST /ingest/upload`	Upload files directly (multipart/form-data)
`GET /collections`	List available ChromaDB collections
`GET /config`	Get the current LLM configuration
`POST /config`	Switch the LLM at runtime (no restart needed)

​REST API — raglight serve

​Quick start

​CLI options

​Chat UI — --ui

​Configuration

​Example: Mistral + remote Chroma

​Endpoints

​GET /health

​POST /generate

​POST /ingest

​POST /ingest/upload

​GET /collections

​GET /config

​POST /config

​Deploy with Docker Compose

​Summary

REST API — `raglight serve`

Quick start

CLI options

Chat UI — `--ui`

Configuration

Example: Mistral + remote Chroma

Endpoints

`GET /health`

`POST /generate`

`POST /ingest`

`POST /ingest/upload`

`GET /collections`

`GET /config`

`POST /config`

Deploy with Docker Compose

Summary