LTX-2 is a 19-billion-parameter open-source video foundation model (14B video + 5B audio) that generates synchronized high-resolution video and audio in a single pass. It supports native 4K (3840×2160) at up to 50 FPS and video clips up to 20 seconds, with multimodal inputs including text prompts, images, depth maps, keyframes, and short reference videos. The model provides native audio synthesis—dialogue, ambient sounds, music and Foley—aligned to visual events. LTX-2 is optimized for efficient inference (NVFP4/NVFP8) and shipped under Apache 2.0 so teams can download weights, fine-tune, deploy locally, or use the hosted web generator (credits required).
LTX-2 is an open-source AI video generation model hosted on Vidthis that produces native 4K, up to 50 fps, audiovisual content with perfectly synchronized sound. It supports both text-to-video and image-to-video modes, offers durations from 5 to 20 seconds, and provides granular camera, motion, and style controls via LoRA adapters. LTX-2 targets production workflows with Fast/Pro/Ultra modes, reproducible seeds, and the ability to customize or deploy weights locally for advanced fine-tuning.