Granite Retrieval Agent provides developers with a flexible platform to build retrieval-augmented generative AI agents that combine semantic search and large language models. Users can ingest documents from diverse sources, create vector embeddings, and configure Azure Cognitive Search indexes or alternative vector stores. When a query arrives, the agent retrieves the most relevant passages, constructs context windows, and calls LLM APIs for precise answers or summaries. It supports memory management, chain-of-thought orchestration, and custom plugins for pre- and post-processing. Deployable with Docker or directly via Python, Granite Retrieval Agent accelerates the creation of knowledge-driven chatbots, enterprise assistants, and Q&A systems with reduced hallucinations and enhanced factual accuracy.