- Multimodal input system (upload images, videos, audio; up to 12 files)
- Text-to-video, image-to-video, video reference motion transfer
- Native audio synthesis with precise lip-sync and rhythm sync
- Automatic storyboarding / AI agent planning camera movements
- Physics-aware motion and real-world dynamics
- High success rate (>90%) and rapid 1080p generation
- Supporting image tools: background remover, upscaler, face-swap, batch processing