113
+340
// summary
OmniVoice is an advanced zero-shot multilingual text-to-speech model based on a diffusion language model architecture, supporting over 600 languages. While maintaining high-quality speech output, the model features extremely fast inference speeds and supports voice cloning and voice design capabilities. Users can easily implement speech generation, non-linguistic symbol control, and pronunciation adjustment via the Python API or command-line tools.
// use cases
01
Zero-shot voice cloning: Achieve high-quality timbre cloning using reference audio.
02
Voice design: Customize voice characteristics by specifying attributes such as gender, age, pitch, and accent.
03
Large-scale batch inference: Support efficient batch speech generation tasks in multi-GPU environments.