Configure and switch between different LLM backends in RAGLight.
An LLM provider defines how text generation is performed in RAGLight.RAGLight is LLM-agnostic: the generation model is decoupled from retrieval, embeddings, and vector stores. This makes it easy to experiment with different providers without rewriting your pipeline.
Local Providers
Recommended for prototyping. Run entirely offline with Ollama, LMStudio,
or vLLM. Zero cost, total privacy.
Remote APIs
Recommended for production. Connect to OpenAI, Mistral, or Gemini for
higher reasoning capabilities.
In a RAG pipeline, the LLM (Generation) and Embeddings (Retrieval) are configured separately. This allows you to mix and match (e.g., Local Embeddings + Remote LLM).Here is a full example using Google Gemini for both.
main.py
Copy
from raglight.rag.simple_rag_api import RAGPipelinefrom raglight.models.data_source_model import GitHubSourcefrom raglight.config.settings import Settingsfrom raglight.config.rag_config import RAGConfigfrom raglight.config.vector_store_config import VectorStoreConfigSettings.setup_logging()# 1. Define Source (e.g., GitHub Repo)knowledge_base = [ GitHubSource(url="https://github.com/Bessouat40/RAGLight")]# 2. Configure Vector Store (Embeddings)# We use Gemini for embeddings here, but could use Settings.HUGGINGFACEvector_store_config = VectorStoreConfig( provider=Settings.GOOGLE_GEMINI, embedding_model=Settings.GEMINI_EMBEDDING_MODEL, database=Settings.CHROMA, persist_directory="./defaultDb", collection_name="gemini_collection",)# 3. Configure RAG (Generation)config = RAGConfig( provider=Settings.GOOGLE_GEMINI, llm=Settings.GEMINI_LLM_MODEL, knowledge_base=knowledge_base, api_base=Settings.DEFAULT_GOOGLE_CLIENT # Optional if env var is set)# 4. Build & Runpipeline = RAGPipeline(config, vector_store_config)pipeline.build()response = pipeline.generate( "How can I create a RAGPipeline using python? Show me code.")print(response)
Install Ollama from ollama.com. 2. Run ollama serve. 3. Pull a model: ollama pull llama3. 4. Default URL:
http://localhost:11434 (handled by RAGLight).
LMStudio
Open LMStudio. 2. Go to the Local Server tab. 3. Load a model. 4.
Click Start Server. 5. Ensure Settings.DEFAULT_LMSTUDIO_CLIENT matches
the URL (usually http://localhost:1234/v1).
vLLM
Start vLLM with an OpenAI-compatible server. 2. Set api_base in your
config to your vLLM endpoint.