English TTS Model 108 Voices Fast_pitch Encoding Trained on Vctk Dataset at 22050Hz

English text-to-speech model containing 108 voices trained on the vctk dataset at 22050 Hz and is available to synthesize the en language.

Model Description

This English text-to-speech model contains 108 voices trained on the the VCTK dataset at 22050 Hz and is available to synthesize the en language. The model is based on the fast_pitch encoder.

pip install tts
tts --text "Hello, world!" --model_name tts_models/en/vctk/fast_pitch

Voice Samples

VCTK_p225 (M)

VCTK_p226 (M)

VCTK_p227 (M)

Click here to expand and listen: 105 voice(s)

VCTK_p228 (F)

VCTK_p229 (F)

VCTK_p230 (F)

VCTK_p231 (F)

VCTK_p232 (M)

VCTK_p233 (F)

VCTK_p234 (F)

VCTK_p236 (F)

VCTK_p237 (M)

VCTK_p238 (F)

VCTK_p239 (F)

VCTK_p240 (F)

VCTK_p241 (M)

VCTK_p243 (M)

VCTK_p244 (F)

VCTK_p245 (M)

VCTK_p246 (M)

VCTK_p247 (M)

VCTK_p248 (F)

VCTK_p249 (F)

VCTK_p250 (F)

VCTK_p251 (M)

VCTK_p252 (M)

VCTK_p253 (F)

VCTK_p254 (M)

VCTK_p255 (M)

VCTK_p256 (M)

VCTK_p257 (F)

VCTK_p258 (M)

VCTK_p259 (M)

VCTK_p260 (M)

VCTK_p261 (F)

VCTK_p262 (F)

VCTK_p263 (M)

VCTK_p264 (F)

VCTK_p265 (F)

VCTK_p266 (F)

VCTK_p267 (F)

VCTK_p268 (F)

VCTK_p269 (F)

VCTK_p270 (M)

VCTK_p271 (M)

VCTK_p272 (M)

VCTK_p273 (M)

VCTK_p274 (M)

VCTK_p275 (M)

VCTK_p276 (F)

VCTK_p277 (F)

VCTK_p278 (M)

VCTK_p279 (M)

VCTK_p280 (F)

VCTK_p281 (M)

VCTK_p282 (M)

VCTK_p283 (F)

VCTK_p284 (M)

VCTK_p285 (M)

VCTK_p286 (M)

VCTK_p287 (M)

VCTK_p288 (F)

VCTK_p292 (M)

VCTK_p293 (F)

VCTK_p294 (M)

VCTK_p295 (F)

VCTK_p297 (F)

VCTK_p298 (M)

VCTK_p299 (F)

VCTK_p300 (F)

VCTK_p301 (F)

VCTK_p302 (M)

VCTK_p303 (F)

VCTK_p304 (M)

VCTK_p305 (F)

VCTK_p306 (F)

VCTK_p307 (F)

VCTK_p308 (F)

VCTK_p310 (F)

VCTK_p311 (M)

VCTK_p312 (F)

VCTK_p313 (F)

VCTK_p314 (F)

VCTK_p316 (M)

VCTK_p317 (F)

VCTK_p318 (F)

VCTK_p323 (F)

VCTK_p326 (M)

VCTK_p329 (F)

VCTK_p330 (F)

VCTK_p333 (F)

VCTK_p334 (M)

VCTK_p335 (F)

VCTK_p336 (F)

VCTK_p339 (F)

VCTK_p340 (F)

VCTK_p341 (F)

VCTK_p343 (M)

VCTK_p345 (M)

VCTK_p347 (M)

VCTK_p351 (F)

VCTK_p360 (M)

VCTK_p361 (F)

VCTK_p362 (F)

VCTK_p363 (M)

VCTK_p364 (M)

VCTK_p374 (M)

VCTK_p376 (M)


English is a West Germanic language that originated in England and is now one of the most widely spoken languages in the world. It belongs to the Indo-European language family and is closely related to German and Dutch. English has a diverse vocabulary and is known for its global influence as a lingua franca. It uses the Latin alphabet with modifications, including the addition of letters such as ð and þ in Old English. English features a complex phonetic system with a wide range of vowel and consonant sounds.

VCTK Dataset

The VCTK dataset is a multi-speaker English speech dataset that contains recordings from a diverse set of speakers. It is commonly used for training and evaluating speech synthesis and voice conversion models.


FastPitch is an advanced text-to-speech (TTS) model that combines the power of autoregressive generation and non-autoregressive duration modeling. It offers a fast and efficient way to synthesize speech from text. FastPitch achieves high-quality results while significantly reducing the inference time compared to traditional autoregressive models. This makes it suitable for real-time applications, interactive systems, and scenarios where low-latency speech synthesis is desired.

