The HunyuanVideo 1.5 model possesses comprehensive and powerful core capabilities, supporting Text to Video and Image to Video with both Chinese and English inputs. Its Image to Video capability demonstrates high consistency between images and video.
The model also features strong instruction understanding and following capabilities, accurately realizing diverse scenes including camera movement, smooth motion, realistic characters, and expressions; it supports multiple styles such as realism, animation, and blocks, and can generate Chinese and English text within the video. In terms of image quality, the model can natively generate 480p and 720p high-definition videos with a duration of 5–10 seconds, which can be enhanced to 1080p cinematic quality through a super-resolution model.