Replies: 2 comments
-
I'm no expert, but I tried this too. After getting to about 4000 epochs, training with a hundred 20 second clips it still sounds really buzzy and is unintelligible. Weirdly, it does sound like the voiced person (pitch, accent, 'modulation'), just not speaking intelligible words. Same data for fine-tuning the lessac model, perfect. Repeated it for my own data using piper-recording, 100 samples, also fine. I am guessing that for getting the model to a good working state, you need a ____load of data. |
Beta Was this translation helpful? Give feedback.
-
I went to 10,000 and still not quite right, so I think your thinking about using a base models which probably had thousands of clips is spot on. |
Beta Was this translation helpful? Give feedback.
-
I'm experimenting training a brand new model, not using an existing checkpoint.
I have about 130 WAV files.
Are there any recommendations for the --validation-split and --num-test-examples ?
These are both zero in the docs, but that may be assuming you're using an existing checkpoint to begin with.
Thanks,
Rob
Beta Was this translation helpful? Give feedback.
All reactions