Catalan TTS Model 257 Voices vits Encoding Trained on custom Dataset at 22050Hz

Catalan (català) text-to-speech model containing 257 voices trained on the custom dataset at 22050 Hz and is available to synthesize the Catalan language.

Model Description

This Catalan (català) text-to-speech model contains 257 voices trained on the a custom dataset at 22050 Hz and is available to synthesize the Catalan language. The model is based on the VITS encoder.

pip install tts
tts --text "Hello, world!" --model_name tts_models/ca/custom/vits

Voice Samples

00236e350cc8 (M)

00459 (M)

00762 (M)

Click here to expand and listen: 254 voice(s)

00983a845f95 (M)

01591 (F)

02452 (F)

02689 (M)

02992 (M)

02f7d61edf50 (M)

03115 (M)

03386 (F)

03655 (F)

03944 (F)

04247 (M)

04484 (M)

04787 (M)

04910 (M)

05147 (M)

056d7638d714 (F)

05739 (M)

06008 (F)

06042 (F)

06279 (M)

06311 (F)

06582 (M)

06705 (M)

06942 (F)

06c6d2e09362 (F)

07140 (M)

07245 (F)

07803 (F)

08001 (M)

08106 (F)

085503e68b07 (F)

08664 (M)

08935 (M)

08967 (M)

09204 (F)

09598 (M)

09901 (F)

0befb1084ad0 (M)

0c6bf6782176 (M)

0d0a943d348b (M)

0da83aed1427 (M)

0ff19536d614 (F)

125d9d1721de (M)

1378866a4d2b (M)

14bc32c10eb2 (M)

151fcb1168f4 (M)

1610e2960395 (M)

1887c37f4187 (M)

1add23d44d2d (F)

1b7fc0c4e437 (M)

1b8354b1fe92 (F)

1be6c773da63 (M)

1c7af1cc1357 (M)

1c7f19a7fa0b (M)

1c80e9d982aa (F)

2256cc5ee6c6 (M)

238532dddf77 (M)

241ca4fdf212 (M)

2421aa51a089 (M)

24d967d0e8b8 (M)

25911630ab15 (M)

26099adbc4db (F)

28e2fe1944a5 (F)

2b59e9f830e5 (M)

2bc2a177bf56 (F)

2ce84c6ea6aa (M)

2d84f39c2cca (F)

2e6ccdf9f0a7 (M)

2f92b4704080 (M)

2fb95c3b786f (M)

30b1f81c5797 (M)

31535cb2ece4 (M)

31e6f3a01166 (F)

32550810ba55 (M)

336f82b4645b (M)

35b962b08846 (M)

3637902e0d19 (M)

3723bd65a05a (M)

373d86f9fa3a (M)

379d321bff71 (M)

37c12c700c95 (M)

3a4a32c7cff1 (M)

404ecea5ae8e (M)

41e5e21b3a3b (F)

464d9ac63f79 (M)

4869d94d4936 (M)

496b66c9cb70 (M)

49a765407153 (M)

4b6c7e4e9bde (F)

4bce212aca40 (F)

4cedaa8d9643 (M)

4d7e2548403c (M)

4de9f262eee7 (M)

4e5e58a6ec7d (M)

4ec8f1e81d7a (M)

4f57d1abde33 (M)

503dbbe83f01 (M)

51795e8ea8fa (M)

52cfac480c0c (M)

537e815df933 (M)

547dd49c2cbe (M)

54f344faa37d (M)

56071bfe30e9 (F)

57e5f7cc5fac (M)

5a9a6481f136 (F)

5ba168675a3f (M)

5da56ed89657 (M)

5ebf04dfec6c (M)

620b0d4c3be9 (F)

6323ec0401b2 (F)

633e7303eae4 (F)

6688b60c24d0 (F)

6745c47d0bd5 (F)

6892c6ba9f66 (F)

689a213fd2d6 (M)

696e88087171 (M)

6bdec6b6f7e6 (M)

6e5948f904b3 (F)

7115c00371f8 (M)

71b67ba5ec75 (M)

72a3d5bde83f (M)

73d3685f3e78 (M)

74a679bf6c4a (M)

7638395f7d47 (M)

76383f56d997 (M)

77cd12af0a3d (M)

7834da277192 (M)

79a830901c1b (F)

7b7593f44cc6 (M)

7c7d917d9741 (F)

7d19dccf4811 (M)

7d8d6fa22ff7 (M)

7e36be2204fe (M)

7ff908cc2a18 (M)

8154716e77ac (M)

8162d651b621 (F)

8348c81a2530 (F)

84b101db8d07 (M)

853fb95e0f01 (M)

85c9e13ccfc0 (M)

85ea0b349a8d (F)

88673d4f24d0 (M)

88ec4ff5a1b0 (M)

892bf89bd3a0 (M)

894bd433b4b0 (F)

896256329fbe (M)

897c3401b4a3 (M)

89e6f6a865ab (F)

8b707d4f8f32 (F)

8e98d00c5d11 (M)

90bb7c91281b (M)

911c26cf8283 (F)

92862e616dce (F)

92a15e2cbd0c (M)

97679def7032 (F)

97e29f9edfe7 (F)

9b5f9ebc9614 (M)

9b847b5006ea (F)

9cdf4ab91c8e (M)

9fb127fbe465 (F)

9fe6ba948da2 (M)

a1afb2eae495 (M)

a2b06b546791 (M)

a2b503bc78bd (M)

a359c15185b6 (F)

a35dea43a67c (M)

a4b1eb406ff2 (F)

a4b8fa949865 (M)

a6bc3c6beffd (M)

aabfdbdc2115 (F)

af506d21ee14 (M)

b04a1d5062f2 (M)

b0a3c5148905 (M)

b1a0cbb91459 (M)

b47a96b489f4 (F)

b52e493e5049 (F)

b5419f6ea89d (M)

b570d19edbda (M)

baff09432cff (M)

bc0b544f1c13 (M)

bc3886ba087d (M)

bd609b6955a6 (M)

bet (F)

bf64f21ff129 (M)

bfe8d96ce71f (M)

c088e98f02d3 (M)

c1bafe50eb70 (M)

c1e166044d77 (M)

c21ee3641607 (M)

c3f1018eb1f7 (M)

c4d740361d5f (M)

c5d4c712e060 (F)

c777d3358a0a (M)

c96c4e97012d (F)

c9774fae6c0a (M)

cb557116fa7b (M)

cc3b30ba0f73 (M)

ccd85fb40538 (M)

cd1226e73c82 (M)

cdc5df38351e (M)

ce31dc5dfa61 (M)

cefa12e7ac99 (M)

cf5b890eb74b (F)

cf8c583b1282 (F)

d0cd44fcdae6 (M)

d15bfc3278de (F)

d3d64ab67746 (M)

d647b73602a3 (M)

d98d182c89b4 (M)

dafd89491990 (M)

db6932752693 (M)

db8eecd1ac9b (M)

dbe9efadf636 (M)

dca1aa77f919 (M)

dee065b956b9 (M)

df52eb2c24a6 (M)

dfc8721858bd (M)

e249989b0c39 (M)

e364856fe22a (M)

e37d85b60af5 (M)

e41b679ec144 (M)

e61565e75d63 (M)

e6a64aa839b9 (F)

e751d2f83310 (M)

e7847a5814b8 (M)

e82ba384934a (M)

e9da05b6d590 (F)

ea8456e0667e (F)

eb415e110eaf (M)

eb5078bcb64f (M)

ed5c9e654bfb (M)

edba91511ccf (M)

ee216d2d13cb (F)

eli (F)

eva (F)

f1812dbb566e (M)

f26a63e5171e (M)

f2f359ea473c (F)

f35ce011f75f (F)

f4df4a067fec (M)

f56a47b89ebd (M)

f61bdd3abb2d (M)

f62196a11f50 (M)

f8e4bf2dd4f9 (M)

f980d152d5c1 (F)

fa8641fb64db (M)

fdde8cdd2fa5 (F)

jan (M)

mar (F)

ona (F)

pau (M)

pep (M)

pol (M)

teo (M)

Catalan (català)

Catalan is a Romance language primarily spoken in Catalonia, a region in northeastern Spain, as well as in the Balearic Islands, Valencia, and Andorra. It is also spoken by communities in the Roussillon region of France and the city of Alghero in Sardinia, Italy. Catalan has its roots in the Vulgar Latin spoken in the early Middle Ages. It is known for its distinct phonetic features, including the presence of voiced and voiceless alveolar sibilants and a contrast between dental and alveolar consonants.

Custom Dataset

The custom dataset refers to a user-defined or specific dataset that is created for a particular task or project. It can include speech recordings and associated metadata tailored to the specific requirements of the project.

VITS (VQ-VAE-Transformer)

VITS, also known as VQ-VAE-Transformer, is an advanced technique used for training audio models. It combines different components to create powerful models that can understand and generate human-like speech. VITS works by breaking down audio into tiny pieces called vectors, which are like puzzle pieces that represent different parts of the sound. These vectors are then put together using a special algorithm that helps the model learn patterns and understand the structure of the audio. It’s similar to how we put together jigsaw puzzles to form a complete picture. With VITS, the model can not only recognize and understand different speech sounds but also generate new sounds that sound very similar to human speech. This technology has a wide range of applications, from creating realistic voice assistants to helping people with speech impairments communicate more effectively.

Follow AI Models on Google News

An easy & free way to support AI Models is to follow our google news feed! More followers will help us reach a wider audience!

Google News: AI Models

Catalan TTS Model 257 Voices Vits Encoding Trained on Custom Dataset at 22050Hz