LTX-2 is a 19-billion-parameter open-source video foundation model (14B video + 5B audio) that generates synchronized high-resolution video and audio in a single pass. It supports native 4K (3840×2160) at up to 50 FPS and video clips up to 20 seconds, with multimodal inputs including text prompts, images, depth maps, keyframes, and short reference videos. The model provides native audio synthesis—dialogue, ambient sounds, music and Foley—aligned to visual events. LTX-2 is optimized for efficient inference (NVFP4/NVFP8) and shipped under Apache 2.0 so teams can download weights, fine-tune, deploy locally, or use the hosted web generator (credits required).