- Multimodal document ingestion (PDF, image, scanned)
- OCR integration (Tesseract, PaddleOCR, etc.)
- Table detection and extraction
- Vision-language question answering
- Document summarization
- Customizable pipeline components
- Model orchestration and caching