Standard RAG pipelines often ignore images inside PDFs. RAGLight’s Multimodal Pipeline uses Vision-Language Models (like llava via Ollama or gpt-4o via OpenAI) to “see” diagrams, charts, and photos inside your documents and index their descriptions.
You need a VLM-capable model (e.g., llava via Ollama or gpt-4o via
OpenAI) for this to work effectively.
Implementation
Import the VLM processor
VlmPDFProcessor replaces the default PDF processor with one that uses a
VLM to caption images.
Configure the pipeline
Pass custom_processors to the Builder’s with_vector_store call to
override the default PDF handling.
Ingest and query
Documents are indexed with visual context. You can now ask questions about
charts or diagrams.
from raglight.rag.builder import Builder
from raglight.config.settings import Settings
from raglight.document_processing.vlm_pdf_processor import VlmPDFProcessor
from raglight.llm.ollama_model import OllamaModel
from raglight.models.data_source_model import FolderSource
Settings.setup_logging()
# 1. Instantiate a VLM (here: llava via Ollama)
vlm = OllamaModel(
model_name="llava",
system_prompt="You are a technical documentation visual assistant.",
)
# 2. Override the default PDF processor with the VLM-based one
custom_processors = {
"pdf": VlmPDFProcessor(vlm),
}
# 3. Build the vector store with the custom processor
vector_store = (
Builder()
.with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
.with_vector_store(
Settings.CHROMA,
persist_directory="./defaultDb",
collection_name=Settings.DEFAULT_COLLECTION_NAME,
custom_processors=custom_processors,
)
.build_vector_store()
)
# 4. Ingest your PDFs
vector_store.ingest(data_path="./technical_manuals")
# 5. Build the full RAG pipeline and query
rag = (
Builder()
.with_embeddings(Settings.HUGGINGFACE, model_name=Settings.DEFAULT_EMBEDDINGS_MODEL)
.with_vector_store(
Settings.CHROMA,
persist_directory="./defaultDb",
collection_name=Settings.DEFAULT_COLLECTION_NAME,
custom_processors=custom_processors,
)
.with_llm(Settings.OLLAMA, model_name="llava")
.build_rag(k=5)
)
response = rag.generate("Describe the architecture diagram on page 3.")
print(response)