rag-services provides a collection of containerized RESTful microservices designed to streamline retrieval-augmented generation (RAG) applications. It includes modular components for document storage, vector indexing, embedding generation, LLM inference, and orchestration. Developers can plug in popular vector databases and language model providers, creating highly customizable and scalable RAG pipelines. Fully open-source, rag-services simplifies deployment and management of AI assistants in cloud-native, production environments.
rag-services provides a collection of containerized RESTful microservices designed to streamline retrieval-augmented generation (RAG) applications. It includes modular components for document storage, vector indexing, embedding generation, LLM inference, and orchestration. Developers can plug in popular vector databases and language model providers, creating highly customizable and scalable RAG pipelines. Fully open-source, rag-services simplifies deployment and management of AI assistants in cloud-native, production environments.
rag-services is an extensible platform that breaks down RAG pipelines into discrete microservices. It offers a document store service, a vector index service, an embedder service, multiple LLM inference services, and an orchestrator service to coordinate workflows. Each component exposes REST APIs, allowing you to mix and match databases and model providers. With Docker and Docker Compose support, you can deploy locally or in Kubernetes clusters. The framework enables scalable, fault-tolerant RAG solutions for chatbots, knowledge bases, and automated document Q&A.
Who will use rag-services?
AI/ML Engineers
Backend Developers
Data Scientists
Enterprises building RAG applications
How to use the rag-services?
Step1: Clone the repository from GitHub.
Step2: Copy and customize the .env configuration for vector DB and LLM endpoints.
Step3: Build and start all services via Docker Compose.
Step4: Ingest documents through the document store API and generate embeddings.
Step5: Send user queries to the orchestrator endpoint for RAG-enabled responses.