- Unified HTTP API for chat, completion, and embeddings
- Multi-model backend support (OpenAI, Azure, Vertex AI, local models)
- Vector database integration for retrieval-augmented generation
- Request batching and caching
- Streaming token-by-token responses
- Role-based access control
- Prometheus-compatible metrics export