- Automated benchmarking harness
- Diverse task suite (reasoning, planning, Q&A, tool use)
- Interactive web-based leaderboard
- Custom agent integration templates
- Docker support for reproducibility
- Metric tracking and visualization
- Community submission workflow