Embeddings

Overview

Embeddings convert text (and sometimes other modalities) into vectors so your vector store can perform similarity search. In RAGLight, embeddings are a first-class configuration choice:

They are independent from the LLM used for generation
They can be local or hosted
They can be swapped without changing your pipeline logic

This separation is intentional: you may want fast local embeddings for indexing while using a strong hosted model for generation (or the opposite).

Why it matters

Your embedding model strongly impacts:

Retrieval quality (what gets retrieved)
Latency and indexing speed
Memory footprint and storage size
Multilingual performance

A great LLM cannot compensate for weak retrieval. If the wrong chunks are retrieved, the answer will be wrong — even with a strong generation model.

How embeddings are configured in RAGLight

Embeddings are configured in the vector store configuration, not in the RAG config. In other words:

RAGConfig controls generation (LLM provider, model, prompts)
VectorStoreConfig controls indexing + retrieval (embeddings provider, model, storage)

Minimal example:

from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
)

Available embedding providers

RAGLight supports multiple embedding providers:

HuggingFace
Ollama
vLLM
OpenAI
Google Gemini

Below are concrete examples of how to configure each one.

HuggingFace (local, default)

HuggingFace embeddings are a great default for local-first RAG.

from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model="all-MiniLM-L6-v2",
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
)

Google Gemini (hosted)

Gemini can be used for embeddings via Settings.GOOGLE_GEMINI.

from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.GEMINI_EMBEDDING_MODEL,
    provider=Settings.GOOGLE_GEMINI,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
)

Make sure the API key is available:

export GEMINI_API_KEY=your_key

OpenAI (hosted)

Use OpenAI embeddings by selecting the OpenAI provider and a compatible embedding model.

from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model="text-embedding-3-small",
    provider=Settings.OPENAI,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
    api_base=Settings.DEFAULT_OPENAI_CLIENT,
)

export OPENAI_API_KEY=your_key

Ollama (local)

If your Ollama setup provides an embeddings model, you can use it as your embedding provider.

from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model="nomic-embed-text",
    provider=Settings.OLLAMA,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
    api_base=Settings.DEFAULT_OLLAMA_CLIENT,
)

vLLM (server)

If you expose an embeddings endpoint through vLLM (typically OpenAI-compatible), you can point RAGLight to your server.

from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model="your-embedding-model",
    provider=Settings.VLLM,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
    api_base="http://localhost:8000",
)

Recommended defaults

If you are starting out, these defaults usually work well:

Embeddings: all-MiniLM-L6-v2 (fast and compact)
Vector store: Chroma (local persistence)
LLM: Ollama (local)

You can then switch providers as your prototype evolves.

Common pitfalls

Embeddings and LLM are different models

It’s common to assume that the LLM model also embeds documents. In RAGLight, embeddings are configured separately.

Changing embeddings requires re-indexing

If you change embedding_model or provider, you must rebuild your vector store:

pipeline.build()

Otherwise, the stored vectors won’t match the new embedding space.

Summary

Embeddings drive retrieval quality.
They are configured in VectorStoreConfig.
Providers are swappable and independent from the LLM.
Changing embeddings requires rebuilding the index.

Core Concepts

Pipelines

Embeddings

Embeddings

Overview

Why it matters

How embeddings are configured in RAGLight

Available embedding providers

HuggingFace (local, default)

Google Gemini (hosted)

OpenAI (hosted)

Ollama (local)

vLLM (server)

Recommended defaults

Common pitfalls

Embeddings and LLM are different models

Changing embeddings requires re-indexing

Summary

Core Concepts

Pipelines

​Embeddings

​Overview

​Why it matters

​How embeddings are configured in RAGLight

​Available embedding providers

​HuggingFace (local, default)

​Google Gemini (hosted)

​OpenAI (hosted)

​Ollama (local)

​vLLM (server)

​Recommended defaults

​Common pitfalls

​Embeddings and LLM are different models

​Changing embeddings requires re-indexing

​Summary

Embeddings

Overview

Why it matters

How embeddings are configured in RAGLight

Available embedding providers

HuggingFace (local, default)

Google Gemini (hosted)

OpenAI (hosted)

Ollama (local)

vLLM (server)

Recommended defaults

Common pitfalls

Embeddings and LLM are different models

Changing embeddings requires re-indexing

Summary