> ## Documentation Index
> Fetch the complete documentation index at: https://docs.raglight.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Multimodal RAG

> Ingest PDFs with images using Vision-Language Models.

Standard RAG pipelines often ignore images inside PDFs. RAGLight's **Multimodal Pipeline** uses Vision-Language Models (like `llava` via Ollama or `gpt-4o` via OpenAI) to "see" diagrams, charts, and photos inside your documents and index their descriptions.

<Note>
  You need a VLM-capable model (e.g., `llava` via Ollama or `gpt-4o` via
  OpenAI) for this to work effectively.
</Note>

## Implementation

<Steps>
  <Step title="Import the VLM processor">
    `VlmPDFProcessor` replaces the default PDF processor with one that uses a
    VLM to caption images.
  </Step>

  <Step title="Configure the pipeline">
    Pass `custom_processors` to the Builder's `with_vector_store` call to
    override the default PDF handling.
  </Step>

  <Step title="Ingest and query">
    Documents are indexed with visual context. You can now ask questions about
    charts or diagrams.
  </Step>
</Steps>

```python multimodal_rag.py theme={null}
from raglight.rag.builder import Builder
from raglight.config.settings import Settings
from raglight.document_processing.vlm_pdf_processor import VlmPDFProcessor
from raglight.llm.ollama_model import OllamaModel
from raglight.models.data_source_model import FolderSource

Settings.setup_logging()

# 1. Instantiate a VLM (here: llava via Ollama)
vlm = OllamaModel(
    model_name="llava",
    system_prompt="You are a technical documentation visual assistant.",
)

# 2. Override the default PDF processor with the VLM-based one
custom_processors = {
    "pdf": VlmPDFProcessor(vlm),
}

# 3. Build the vector store with the custom processor
vector_store = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
    .with_vector_store(
        Settings.CHROMA,
        persist_directory="./defaultDb",
        collection_name=Settings.DEFAULT_COLLECTION_NAME,
        custom_processors=custom_processors,
    )
    .build_vector_store()
)

# 4. Ingest your PDFs
vector_store.ingest(data_path="./technical_manuals")

# 5. Build the full RAG pipeline and query
rag = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
    .with_vector_store(
        Settings.CHROMA,
        persist_directory="./defaultDb",
        collection_name=Settings.DEFAULT_COLLECTION_NAME,
        custom_processors=custom_processors,
    )
    .with_llm(Settings.OLLAMA, model_name="llava")
    .build_rag(k=5)
)

response = rag.generate("Describe the architecture diagram on page 3.")
print(response)
```
