Skip to main content

Embeddings

Overview

Embeddings convert text (and sometimes other modalities) into vectors so your vector store can perform similarity search. In RAGLight, embeddings are a first-class configuration choice:
  • They are independent from the LLM used for generation
  • They can be local or hosted
  • They can be swapped without changing your pipeline logic
This separation is intentional: you may want fast local embeddings for indexing while using a strong hosted model for generation (or the opposite).

Why it matters

Your embedding model strongly impacts:
  • Retrieval quality (what gets retrieved)
  • Latency and indexing speed
  • Memory footprint and storage size
  • Multilingual performance
A great LLM cannot compensate for weak retrieval. If the wrong chunks are retrieved, the answer will be wrong — even with a strong generation model.

How embeddings are configured in RAGLight

Embeddings are configured in the vector store configuration, not in the RAG config. In other words:
  • RAGConfig controls generation (LLM provider, model, prompts)
  • VectorStoreConfig controls indexing + retrieval (embeddings provider, model, storage)
Minimal example:
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.DEFAULT_EMBEDDINGS_MODEL,
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
)

Available embedding providers

RAGLight supports multiple embedding providers:
  • HuggingFace
  • Ollama
  • vLLM
  • OpenAI
  • Google Gemini
Below are concrete examples of how to configure each one.

HuggingFace (local, default)

HuggingFace embeddings are a great default for local-first RAG.
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model="all-MiniLM-L6-v2",
    provider=Settings.HUGGINGFACE,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
)

Google Gemini (hosted)

Gemini can be used for embeddings via Settings.GOOGLE_GEMINI.
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model=Settings.GEMINI_EMBEDDING_MODEL,
    provider=Settings.GOOGLE_GEMINI,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
)
Make sure the API key is available:
export GEMINI_API_KEY=your_key

OpenAI (hosted)

Use OpenAI embeddings by selecting the OpenAI provider and a compatible embedding model.
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model="text-embedding-3-small",
    provider=Settings.OPENAI,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
    api_base=Settings.DEFAULT_OPENAI_CLIENT,
)
export OPENAI_API_KEY=your_key

Ollama (local)

If your Ollama setup provides an embeddings model, you can use it as your embedding provider.
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model="nomic-embed-text",
    provider=Settings.OLLAMA,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
    api_base=Settings.DEFAULT_OLLAMA_CLIENT,
)

vLLM (server)

If you expose an embeddings endpoint through vLLM (typically OpenAI-compatible), you can point RAGLight to your server.
from raglight.config.vector_store_config import VectorStoreConfig
from raglight.config.settings import Settings

vector_store_config = VectorStoreConfig(
    embedding_model="your-embedding-model",
    provider=Settings.VLLM,
    database=Settings.CHROMA,
    persist_directory="./defaultDb",
    collection_name=Settings.DEFAULT_COLLECTION_NAME,
    api_base="http://localhost:8000",
)

If you are starting out, these defaults usually work well:
  • Embeddings: all-MiniLM-L6-v2 (fast and compact)
  • Vector store: Chroma (local persistence)
  • LLM: Ollama (local)
You can then switch providers as your prototype evolves.

Common pitfalls

Embeddings and LLM are different models

It’s common to assume that the LLM model also embeds documents. In RAGLight, embeddings are configured separately.

Changing embeddings requires re-indexing

If you change embedding_model or provider, you must rebuild your vector store:
pipeline.build()
Otherwise, the stored vectors won’t match the new embedding space.

Summary

  • Embeddings drive retrieval quality.
  • They are configured in VectorStoreConfig.
  • Providers are swappable and independent from the LLM.
  • Changing embeddings requires rebuilding the index.