HubLens › Trending › k2/fsa-OmniVoice
k2

fsa-OmniVoice

AITTSSpeech SynthesisVoice CloningDiffusion ModelsDeep Learning
View on GitHub
113
+340

// summary

OmniVoice is an advanced zero-shot multilingual text-to-speech model based on a diffusion language model architecture, supporting over 600 languages. While maintaining high-quality speech output, the model features extremely fast inference speeds and supports voice cloning and voice design capabilities. Users can easily implement speech generation, non-linguistic symbol control, and pronunciation adjustment via the Python API or command-line tools.

// use cases

01
Zero-shot voice cloning: Achieve high-quality timbre cloning using reference audio.
02
Voice design: Customize voice characteristics by specifying attributes such as gender, age, pitch, and accent.
03
Large-scale batch inference: Support efficient batch speech generation tasks in multi-GPU environments.