Skip to main content
An LLM provider defines how text generation is performed in RAGLight. RAGLight is LLM-agnostic: the generation model is decoupled from retrieval, embeddings, and vector stores. This makes it easy to experiment with different providers without rewriting your pipeline. All providers support conversation history: user and assistant messages from previous turns are automatically injected into each request, enabling genuine multi-turn conversations regardless of the backend. All providers also support Langfuse observability — both generate() and generate_streaming() propagate tracing callbacks automatically. No extra configuration is needed beyond setting LangfuseConfig on your pipeline.

Local Providers

Recommended for prototyping. Run entirely offline with Ollama, LMStudio, or vLLM. Zero cost, total privacy.

Remote APIs

Recommended for production. Connect to OpenAI, Mistral, or Gemini for higher reasoning capabilities.

AWS Bedrock

Managed cloud inference. Use Claude, Titan, Llama and other Bedrock models with your existing AWS credentials.

Configuration

Providers are configured using constants from the Settings class. This ensures type safety and prevents typo-related errors.
Case Sensitivity: Use the exact constants defined in Settings. For example, use Settings.LMSTUDIO, not Settings.LMStudio.

Two ways to use LLMs

You can use a provider in two modes:
  1. Directly (via Builder) for testing prompts or models without retrieval.
  2. In a Pipeline (via RAGPipeline) for the full RAG experience.

1. Direct Usage (The Builder)

Use the Builder pattern when you want a simple chat loop to validate a model or a system prompt.
from raglight.rag.builder import Builder
from raglight.config.settings import Settings

# 1. Setup
Settings.setup_logging()
model = "llama3"

# 2. Build LLM (No retrieval)

llm = (
Builder()
.with_llm(
provider=Settings.OLLAMA,
model_name=model,
system_prompt=Settings.DEFAULT_SYSTEM_PROMPT
)
.build_llm()
)

# 3. Chat Loop

print(llm.generate({"question": "Explain quantum computing in one sentence."}))


2. RAG Pipeline Usage

In a RAG pipeline, the LLM (Generation) and Embeddings (Retrieval) are configured separately. This allows you to mix and match (e.g., Local Embeddings + Remote LLM). Here is a full example using Google Gemini for both.
main.py
from raglight.rag.simple_rag_api import RAGPipeline
from raglight.models.data_source_model import GitHubSource
from raglight.config.settings import Settings
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig

Settings.setup_logging()

# 1. Define Source (e.g., GitHub Repo)
knowledge_base = [
    GitHubSource(url="https://github.com/Bessouat40/RAGLight")
]

# 2. Configure Vector Store (Embeddings)
# We use Gemini for embeddings here, but could use Settings.HUGGINGFACE
vector_store_config = VectorStoreConfig(
    provider=Settings.GOOGLE_GEMINI,
    embedding_model=Settings.GEMINI_EMBEDDING_MODEL,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name="gemini_collection",
)

# 3. Configure RAG (Generation)
config = RAGConfig(
    provider=Settings.GOOGLE_GEMINI,
    llm=Settings.GEMINI_LLM_MODEL,
    knowledge_base=knowledge_base,
    api_base=Settings.DEFAULT_GOOGLE_CLIENT # Optional if env var is set
)

# 4. Build & Run
pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()

response = pipeline.generate(
    "How can I create a RAGPipeline using python? Show me code."
)
print(response)

Provider Setup Checklist

Before running the code, ensure your environment is ready.

Local Providers

  1. Install Ollama from ollama.com. 2. Run ollama serve. 3. Pull a model: ollama pull llama3. 4. Default URL: http://localhost:11434 (handled by RAGLight).
  1. Open LMStudio. 2. Go to the Local Server tab. 3. Load a model. 4. Click Start Server. 5. Ensure Settings.DEFAULT_LMSTUDIO_CLIENT matches the URL (usually http://localhost:1234/v1).
  1. Start vLLM with an OpenAI-compatible server. 2. Set api_base in your config to your vLLM endpoint.

Remote Providers

Ensure these environment variables are set in your .env file:
  • OpenAI: OPENAI_API_KEY
  • Mistral: MISTRAL_API_KEY
  • Google: GEMINI_API_KEY
Authentication uses the standard boto3 credential chain — no extra install needed:
  1. Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
  2. Credentials file: ~/.aws/credentials
  3. IAM role: automatic when running on EC2, ECS, or Lambda
See the AWS Bedrock page for a full example.