Inconsistent in selecting the speaker voice

by Jernik - opened May 21

May 21

I tried generating output for speaker ID 0 at a time, but I'm not getting a consistent voice—the voice seems to change across generations. Did you experience the same issue? Were you able to get the same voice every time, or did it vary for you too?

senstella

Owner May 21

Hello! The reason you are getting random voices is that this model is the base model - speaker IDs just exist to ensure speaker consistency in conversation. In order to generate a consistent voice in the base model, you need to provide context. Try giving one or two samples of the voice you want to generate beforehand! That way, you can get a consistent voice.

senstella changed discussion status to closed May 21

Jernik

May 26

We have fine-tuned the model with 1,000 samples from a single speaker using a single speaker ID, yet we still can't get a consistent speaker voice from the model.

senstella

Owner Jun 2

Try adjusting LR and batch size, make sure model overfits in training sample a little! If that doesn't work, perhaps some rejection training of wrong samples are required, prerferedly with offline RL methods such as DPO, KTO. We have DPO, KTO implementation in the GitHub repository if you want to try out, but please note that it's very experimental!

senstella

Owner Jun 2

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment