English text-to-speech model containing 108 voices trained on the vctk dataset at 22050 Hz and is available to synthesize the en language.
Model Description
This English text-to-speech model contains 108 voices trained on the the VCTK dataset at 22050 Hz and is available to synthesize the en language. The model is based on the fast_pitch encoder.
pip install tts
tts --text "Hello, world!" --model_name tts_models/en/vctk/fast_pitch
Voice Samples
VCTK_p225 (M)
VCTK_p226 (M)
VCTK_p227 (M)
Click here to expand and listen: 105 voice(s)
VCTK_p228 (F)
VCTK_p229 (F)
VCTK_p230 (F)
VCTK_p231 (F)
VCTK_p232 (M)
VCTK_p233 (F)
VCTK_p234 (F)
VCTK_p236 (F)
VCTK_p237 (M)
VCTK_p238 (F)
VCTK_p239 (F)
VCTK_p240 (F)
VCTK_p241 (M)
VCTK_p243 (M)
VCTK_p244 (F)
VCTK_p245 (M)
VCTK_p246 (M)
VCTK_p247 (M)
VCTK_p248 (F)
VCTK_p249 (F)
VCTK_p250 (F)
VCTK_p251 (M)
VCTK_p252 (M)
VCTK_p253 (F)
VCTK_p254 (M)
VCTK_p255 (M)
VCTK_p256 (M)
VCTK_p257 (F)
VCTK_p258 (M)
VCTK_p259 (M)
VCTK_p260 (M)
VCTK_p261 (F)
VCTK_p262 (F)
VCTK_p263 (M)
VCTK_p264 (F)
VCTK_p265 (F)
VCTK_p266 (F)
VCTK_p267 (F)
VCTK_p268 (F)
VCTK_p269 (F)
VCTK_p270 (M)
VCTK_p271 (M)
VCTK_p272 (M)
VCTK_p273 (M)
VCTK_p274 (M)
VCTK_p275 (M)
VCTK_p276 (F)
VCTK_p277 (F)
VCTK_p278 (M)
VCTK_p279 (M)
VCTK_p280 (F)
VCTK_p281 (M)
VCTK_p282 (M)
VCTK_p283 (F)
VCTK_p284 (M)
VCTK_p285 (M)
VCTK_p286 (M)
VCTK_p287 (M)
VCTK_p288 (F)
VCTK_p292 (M)
VCTK_p293 (F)
VCTK_p294 (M)
VCTK_p295 (F)
VCTK_p297 (F)
VCTK_p298 (M)
VCTK_p299 (F)
VCTK_p300 (F)
VCTK_p301 (F)
VCTK_p302 (M)
VCTK_p303 (F)
VCTK_p304 (M)
VCTK_p305 (F)
VCTK_p306 (F)
VCTK_p307 (F)
VCTK_p308 (F)
VCTK_p310 (F)
VCTK_p311 (M)
VCTK_p312 (F)
VCTK_p313 (F)
VCTK_p314 (F)
VCTK_p316 (M)
VCTK_p317 (F)
VCTK_p318 (F)
VCTK_p323 (F)
VCTK_p326 (M)
VCTK_p329 (F)
VCTK_p330 (F)
VCTK_p333 (F)
VCTK_p334 (M)
VCTK_p335 (F)
VCTK_p336 (F)
VCTK_p339 (F)
VCTK_p340 (F)
VCTK_p341 (F)
VCTK_p343 (M)
VCTK_p345 (M)
VCTK_p347 (M)
VCTK_p351 (F)
VCTK_p360 (M)
VCTK_p361 (F)
VCTK_p362 (F)
VCTK_p363 (M)
VCTK_p364 (M)
VCTK_p374 (M)
VCTK_p376 (M)
English
English is a West Germanic language that originated in England and is now one of the most widely spoken languages in the world. It belongs to the Indo-European language family and is closely related to German and Dutch. English has a diverse vocabulary and is known for its global influence as a lingua franca. It uses the Latin alphabet with modifications, including the addition of letters such as ð and þ in Old English. English features a complex phonetic system with a wide range of vowel and consonant sounds.
VCTK Dataset
The VCTK dataset is a multi-speaker English speech dataset that contains recordings from a diverse set of speakers. It is commonly used for training and evaluating speech synthesis and voice conversion models.
FastPitch
FastPitch is an advanced text-to-speech (TTS) model that combines the power of autoregressive generation and non-autoregressive duration modeling. It offers a fast and efficient way to synthesize speech from text. FastPitch achieves high-quality results while significantly reducing the inference time compared to traditional autoregressive models. This makes it suitable for real-time applications, interactive systems, and scenarios where low-latency speech synthesis is desired.
Follow AI Models on Google News
An easy & free way to support AI Models is to follow our google news feed! More followers will help us reach a wider audience!
Google News: AI Models