- Dual-Branch Diffusion Transformer architecture for joint audio-video generation
- Perfect lip-sync with multi-language support
- Cinematic camera controls (pan, tilt, zoom, orbit)
- Text-to-video and image-to-video generation
- Real-time video creation with 10x faster inference speed
- Native Chinese language optimization