English female TTS Model capacitron t2 c150_v2 Encoding Trained on blizzard2013 Dataset at 24000Hz

English Female TTS Model Capacitron T2 C150_v2 Encoding Trained on Blizzard2013 Dataset at 24000Hz

English female text-to-speech model trained on the blizzard2013 dataset at 24000 Hz and is available to synthesize the English language.

Home /
AI Model Downloads /
Text-to-Speech Synthesis /
English female TTS Model capacitron t2 c150_v2 Encoding Trained on blizzard2013 Dataset at 24000Hz

English female text-to-speech model trained on the blizzard2013 dataset at 24000 Hz and is available to synthesize the English language.

Model Description

This English female text-to-speech model is trained on the the Blizzard dataset at 24000 Hz and is available to synthesize the English language. The model is based on the capacitron-t2-c150_v2 encoder.

pip install tts
tts --text "Hello, world!" --model_name tts_models/en/blizzard2013/capacitron-t2-c150_v2

Voice Samples

default (F)

English

English is a West Germanic language that originated in England and is now one of the most widely spoken languages in the world. It belongs to the Indo-European language family and is closely related to German and Dutch. English has a diverse vocabulary and is known for its global influence as a lingua franca. It uses the Latin alphabet with modifications, including the addition of letters such as ð and þ in Old English. English features a complex phonetic system with a wide range of vowel and consonant sounds.

Blizzard2013 Dataset

The Blizzard2013 dataset is a large-scale multilingual dataset containing speech data from multiple languages. It is often used for developing text-to-speech (TTS) systems.

Capacitron-T2-C150_v2

Capacitron-T2-C150_v2 is an advanced text-to-speech (TTS) model that provides high-quality speech synthesis. It is based on the Tacotron 2 architecture and incorporates improvements to enhance the quality and naturalness of the synthesized speech. With Capacitron-T2-C150_v2, you can convert written text into spoken words that sound remarkably human-like, with accurate intonations, clear pronunciation, and expressive qualities. This model is designed to deliver excellent performance for a wide range of applications, including voice assistants, narration systems, and audio content generation.