- PPO-based policy training in SC2 environment
- Integration with DeepMind PySC2 for state/action handling
- Configurable neural network architectures and rewards
- Multiprocessing support for parallel sample collection
- Logging and TensorBoard integration
- Evaluation scripts for benchmarking agents