- Local LLM inference with WebGPU backend
- WebAssembly support for broad device compatibility
- Real-time streaming of AI responses
- Model switching (LLaMA, Vicuna, Alpaca, etc.)
- Customizable React-based user interface
- Conversation history and system prompt management
- Extensible plugin architecture for custom behaviors
- Offline operation without server dependencies