> ## Documentation Index
> Fetch the complete documentation index at: https://docs.raglight.com/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM Providers

> Configure and switch between different LLM backends in RAGLight.

An **LLM provider** defines how text generation is performed in RAGLight.

RAGLight is **LLM-agnostic**: the generation model is decoupled from retrieval, embeddings, and vector stores. This makes it easy to experiment with different providers without rewriting your pipeline.

All providers support **conversation history**: user and assistant messages from previous turns are automatically injected into each request, enabling genuine multi-turn conversations regardless of the backend.

All providers also support **Langfuse observability** — both `generate()` and `generate_streaming()` propagate tracing callbacks automatically. No extra configuration is needed beyond setting `LangfuseConfig` on your pipeline.

<CardGroup cols={3}>
  <Card title="Local Providers" icon="laptop" color="#16a34a">
    **Recommended for prototyping.** Run entirely offline with Ollama, LMStudio,
    or vLLM. Zero cost, total privacy.
  </Card>

  <Card title="Remote APIs" icon="cloud" color="#2563eb">
    **Recommended for production.** Connect to OpenAI, Mistral, or Gemini for
    higher reasoning capabilities.
  </Card>

  <Card title="AWS Bedrock" icon="aws" color="#FF9900">
    **Managed cloud inference.** Use Claude, Titan, Llama and other Bedrock
    models with your existing AWS credentials.
  </Card>
</CardGroup>

***

## Configuration

Providers are configured using constants from the `Settings` class. This ensures type safety and prevents typo-related errors.

<Warning>
  **Case Sensitivity**: Use the exact constants defined in `Settings`. For
  example, use `Settings.LMSTUDIO`, not `Settings.LMStudio`.
</Warning>

### Two ways to use LLMs

You can use a provider in two modes:

1. **Directly** (via `Builder`) for testing prompts or models without retrieval.
2. **In a Pipeline** (via `RAGPipeline`) for the full RAG experience.

***

## 1. Direct Usage (The Builder)

Use the `Builder` pattern when you want a simple chat loop to validate a model or a system prompt.

<CodeGroup>
  ```python Ollama (Local) theme={null}
  from raglight.rag.builder import Builder
  from raglight.config.settings import Settings

  # 1. Setup
  Settings.setup_logging()
  model = "llama3"

  # 2. Build LLM (No retrieval)

  llm = (
  Builder()
  .with_llm(
  provider=Settings.OLLAMA,
  model_name=model,
  system_prompt=Settings.DEFAULT_SYSTEM_PROMPT
  )
  .build_llm()
  )

  # 3. Chat Loop

  print(llm.generate({"question": "Explain quantum computing in one sentence."}))

  ```

  ```python LMStudio (Local) theme={null}
  from raglight.rag.builder import Builder
  from raglight.config.settings import Settings

  Settings.setup_logging()

  llm = (
      Builder()
      .with_llm(
          provider=Settings.LMSTUDIO,
          model_name="hermes-2-pro", # Match your loaded model
          system_prompt=Settings.DEFAULT_SYSTEM_PROMPT,
          # Ensure this matches your LMStudio Local Server config
          api_base=Settings.DEFAULT_LMSTUDIO_CLIENT,
      )
      .build_llm()
  )

  print(llm.generate({"question": "Hello from RAGLight via LMStudio!"}))
  ```

  ```python Mistral (API) theme={null}
  from raglight.rag.builder import Builder
  from raglight.config.settings import Settings
  from dotenv import load_dotenv

  load_dotenv() # Load MISTRAL_API_KEY
  Settings.setup_logging()

  llm = (
      Builder()
      .with_llm(
          provider=Settings.MISTRAL,
          model_name="mistral-large-latest",
          api_key=Settings.MISTRAL_API_KEY, # Or set via env var
          system_prompt=Settings.DEFAULT_SYSTEM_PROMPT
      )
      .build_llm()
  )

  print(llm.generate({"question": "Summarize RAG architecture."}))
  ```
</CodeGroup>

***

## 2. RAG Pipeline Usage

In a RAG pipeline, the **LLM** (Generation) and **Embeddings** (Retrieval) are configured separately. This allows you to mix and match (e.g., Local Embeddings + Remote LLM).

Here is a full example using **Google Gemini** for both.

```python main.py theme={null}
from raglight.rag.simple_rag_api import RAGPipeline
from raglight.models.data_source_model import GitHubSource
from raglight.config.settings import Settings
from raglight.config.rag_config import RAGConfig
from raglight.config.vector_store_config import VectorStoreConfig

Settings.setup_logging()

# 1. Define Source (e.g., GitHub Repo)
knowledge_base = [
    GitHubSource(url="https://github.com/Bessouat40/RAGLight")
]

# 2. Configure Vector Store (Embeddings)
# We use Gemini for embeddings here, but could use Settings.HUGGINGFACE
vector_store_config = VectorStoreConfig(
    provider=Settings.GOOGLE_GEMINI,
    embedding_model=Settings.GEMINI_EMBEDDING_MODEL,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name="gemini_collection",
)

# 3. Configure RAG (Generation)
config = RAGConfig(
    provider=Settings.GOOGLE_GEMINI,
    llm=Settings.GEMINI_LLM_MODEL,
    knowledge_base=knowledge_base,
    api_base=Settings.DEFAULT_GOOGLE_CLIENT # Optional if env var is set
)

# 4. Build & Run
pipeline = RAGPipeline(config, vector_store_config)
pipeline.build()

response = pipeline.generate(
    "How can I create a RAGPipeline using python? Show me code."
)
print(response)
```

***

## Provider Setup Checklist

Before running the code, ensure your environment is ready.

### Local Providers

<AccordionGroup>
  <Accordion title="Ollama" icon="terminal">
    1. Install Ollama from [ollama.com](https://ollama.com). 2. Run `ollama
           serve`. 3. Pull a model: `ollama pull llama3`. 4. Default URL:
       `http://localhost:11434` (handled by RAGLight).
  </Accordion>

  <Accordion title="LMStudio" icon="desktop">
    1. Open LMStudio. 2. Go to the **Local Server** tab. 3. Load a model. 4.
       Click **Start Server**. 5. Ensure `Settings.DEFAULT_LMSTUDIO_CLIENT` matches
       the URL (usually `http://localhost:1234/v1`).
  </Accordion>

  <Accordion title="vLLM" icon="bolt">
    1. Start vLLM with an OpenAI-compatible server. 2. Set `api_base` in your
       config to your vLLM endpoint.
  </Accordion>
</AccordionGroup>

### Remote Providers

<AccordionGroup>
  <Accordion title="API Keys" icon="key">
    Ensure these environment variables are set in your `.env` file:

    * **OpenAI**: `OPENAI_API_KEY`
    * **Mistral**: `MISTRAL_API_KEY`
    * **Google**: `GEMINI_API_KEY`
  </Accordion>

  <Accordion title="AWS Bedrock" icon="aws">
    Authentication uses the standard boto3 credential chain — no extra install needed:

    1. **Environment variables**: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_DEFAULT_REGION`
    2. **Credentials file**: `~/.aws/credentials`
    3. **IAM role**: automatic when running on EC2, ECS, or Lambda

    See the [AWS Bedrock page](/documentation/bedrock) for a full example.
  </Accordion>
</AccordionGroup>
