fsa-OmniVoice

AITTSSpeech SynthesisVoice CloningDiffusion ModelsDeep Learning

113

+340

// summary

OmniVoice is an advanced zero-shot multilingual text-to-speech model based on a diffusion language model architecture, supporting over 600 languages. While maintaining high-quality speech output, the model features extremely fast inference speeds and supports voice cloning and voice design capabilities. Users can easily implement speech generation, non-linguistic symbol control, and pronunciation adjustment via the Python API or command-line tools.

// use cases

Zero-shot voice cloning: Achieve high-quality timbre cloning using reference audio.

Voice design: Customize voice characteristics by specifying attributes such as gender, age, pitch, and accent.

Large-scale batch inference: Support efficient batch speech generation tasks in multi-GPU environments.