Dual Coding Agents provides a modular architecture for constructing AI agents that seamlessly combine visual understanding and language generation. The framework offers built-in support for image encoders like OpenAI CLIP, transformer-based language models such as GPT, and orchestrates them in a chain-of-thought pipeline. Users can feed images and prompt templates to the agent, which processes visual features, reasons about context, and produces detailed textual outputs. Researchers and developers can swap models, configure prompts, and extend agents with plugins. This toolkit simplifies experiments in multimodal AI, enabling rapid prototyping of applications ranging from visual question answering and document analysis to accessibility tools and educational platforms.