Persian female TTS Model glow tts Encoding Trained on custom Dataset at 24000Hz

Persian (فارسی) female text-to-speech model trained at 24000 Hz and is available to synthesize the Persian language.

Model Description

This Persian (فارسی) female text-to-speech model is trained on the a custom dataset at 24000 Hz and is available to synthesize the Persian language. The model is based on the Glow-TTS encoder.

pip install tts
tts --text "Hello, world!" --model_name tts_models/fa/custom/glow-tts

Voice Samples

default (F)

Persian (فارسی)

Persian, also known as Farsi, is an Indo-Iranian language primarily spoken in Iran, Afghanistan, and Tajikistan. It has a rich history dating back over 2,500 years and has influenced many other languages in the region. Persian uses the Arabic script with modifications and has a complex phonetic system with contrasting sounds. It is known for its poetic tradition and has been a language of art, literature, and science throughout history.

Custom Dataset

The custom dataset refers to a user-defined or specific dataset that is created for a particular task or project. It can include speech recordings and associated metadata tailored to the specific requirements of the project.

Glow-TTS

Glow-TTS is an advanced technology used for training audio models, specifically for text-to-speech synthesis. It stands for Glow: Generative Flow for Text to Speech. Glow-TTS leverages the power of deep learning and generative models to transform written text into natural and high-quality speech. By employing complex neural network architectures, Glow-TTS learns the intricate relationships between text and corresponding speech patterns. This enables it to generate speech that sounds remarkably human-like, with clear enunciation, natural prosody, and convincing emotional nuances. Glow-TTS breaks down the complexities of speech generation into a sequence of mathematical operations, making it easier for machines to learn and mimic the intricate nature of human speech. The technology has numerous applications, including voice assistants, automated voiceovers, interactive systems, and more, where realistic and expressive speech synthesis is required.