Skip to main content
An LLM provider defines how text generation is performed in RAGLight. RAGLight is LLM-agnostic: the generation model is decoupled from retrieval, embeddings, and vector stores. This makes it easy to experiment with different providers without rewriting your pipeline.

Local Providers

Recommended for prototyping. Run entirely offline with Ollama, LMStudio, or vLLM. Zero cost, total privacy.

Remote APIs

Recommended for production. Connect to OpenAI, Mistral, or Gemini for higher reasoning capabilities.

Configuration

Providers are configured using constants from the Settings class. This ensures type safety and prevents typo-related errors.
Case Sensitivity: Use the exact constants defined in Settings. For example, use Settings.LMSTUDIO, not Settings.LMStudio.

Two ways to use LLMs

You can use a provider in two modes:
  1. Directly (via Builder) for testing prompts or models without retrieval.
  2. In a Pipeline (via RAGPipeline) for the full RAG experience.

1. Direct Usage (The Builder)

Use the Builder pattern when you want a simple chat loop to validate a model or a system prompt.
from raglight.rag.builder import Builder
from raglight.config.settings import Settings

# 1. Setup
Settings.setup_logging()
model = "llama3"

# 2. Build LLM (No retrieval)

llm = (
Builder()
.with_llm(
provider=Settings.OLLAMA,
model_name=model,
system_prompt=Settings.DEFAULT_SYSTEM_PROMPT
)
.build_llm()
)

# 3. Chat Loop

print(llm.generate({"question": "Explain quantum computing in one sentence."}))


2. RAG Pipeline Usage

In a RAG pipeline, the LLM (Generation) and Embeddings (Retrieval) are configured separately. This allows you to mix and match (e.g., Local Embeddings + Remote LLM). Here is a full example using Google Gemini for both.
main.py
from raglight.rag.simple_rag_api import RAGPipeline
from raglight.models.data_source_model import GitHubSource
from raglight.config.settings import Settings
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig

Settings.setup_logging()

# 1. Define Source (e.g., GitHub Repo)
knowledge_base = [
    GitHubSource(url="https://github.com/Bessouat40/RAGLight")
]

# 2. Configure Vector Store (Embeddings)
# We use Gemini for embeddings here, but could use Settings.HUGGINGFACE
vector_store_config = VectorStoreConfig(
    provider=Settings.GOOGLE_GEMINI,
    embedding_model=Settings.GEMINI_EMBEDDING_MODEL,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name="gemini_collection",
)

# 3. Configure RAG (Generation)
config = RAGConfig(
    provider=Settings.GOOGLE_GEMINI,
    llm=Settings.GEMINI_LLM_MODEL,
    knowledge_base=knowledge_base,
    api_base=Settings.DEFAULT_GOOGLE_CLIENT # Optional if env var is set
)

# 4. Build & Run
pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()

response = pipeline.generate(
    "How can I create a RAGPipeline using python? Show me code."
)
print(response)

Provider Setup Checklist

Before running the code, ensure your environment is ready.

Local Providers

  1. Install Ollama from ollama.com. 2. Run ollama serve. 3. Pull a model: ollama pull llama3. 4. Default URL: http://localhost:11434 (handled by RAGLight).
  1. Open LMStudio. 2. Go to the Local Server tab. 3. Load a model. 4. Click Start Server. 5. Ensure Settings.DEFAULT_LMSTUDIO_CLIENT matches the URL (usually http://localhost:1234/v1).
  1. Start vLLM with an OpenAI-compatible server. 2. Set api_base in your config to your vLLM endpoint.

Remote Providers

Ensure these environment variables are set in your .env file: * OpenAI: OPENAI_API_KEY Mistral: MISTRAL_API_KEY Google: GEMINI_API_KEY