Skip to main content
Standard RAG pipelines often ignore images inside PDFs. RAGLight’s Multimodal Pipeline uses Vision-Language Models (like llava via Ollama or gpt-4o via OpenAI) to “see” diagrams, charts, and photos inside your documents and index their descriptions.
You need a VLM-capable model (e.g., llava via Ollama or gpt-4o via OpenAI) for this to work effectively.

Implementation

1

Import the VLM processor

VlmPDFProcessor replaces the default PDF processor with one that uses a VLM to caption images.
2

Configure the pipeline

Pass custom_processors to the Builder’s with_vector_store call to override the default PDF handling.
3

Ingest and query

Documents are indexed with visual context. You can now ask questions about charts or diagrams.
multimodal_rag.py
from raglight.rag.builder import Builder
from raglight.config.settings import Settings
from raglight.document_processing.vlm_pdf_processor import VlmPDFProcessor
from raglight.llm.ollama_model import OllamaModel
from raglight.models.data_source_model import FolderSource

Settings.setup_logging()

# 1. Instantiate a VLM (here: llava via Ollama)
vlm = OllamaModel(
    model_name="llava",
    system_prompt="You are a technical documentation visual assistant.",
)

# 2. Override the default PDF processor with the VLM-based one
custom_processors = {
    "pdf": VlmPDFProcessor(vlm),
}

# 3. Build the vector store with the custom processor
vector_store = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
    .with_vector_store(
        Settings.CHROMA,
        persist_directory="./defaultDb",
        collection_name=Settings.DEFAULT_COLLECTION_NAME,
        custom_processors=custom_processors,
    )
    .build_vector_store()
)

# 4. Ingest your PDFs
vector_store.ingest(data_path="./technical_manuals")

# 5. Build the full RAG pipeline and query
rag = (
    Builder()
    .with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
    .with_vector_store(
        Settings.CHROMA,
        persist_directory="./defaultDb",
        collection_name=Settings.DEFAULT_COLLECTION_NAME,
        custom_processors=custom_processors,
    )
    .with_llm(Settings.OLLAMA, model_name="llava")
    .build_rag(k=5)
)

response = rag.generate("Describe the architecture diagram on page 3.")
print(response)