This is a fine-tune of the 7B model of VibeVoice. Requires 21.9GB of VRAM for inference (through OpenVoiceLab)

Fine-tuning was done using the code available here:
https://github.com/voicepowered-ai/VibeVoice-finetuning

Dataset used for fine-tuning, was 764 audio files available for Halo 1, Halo 2, and Halo 3, via the links below:
Halo 1: https://sounds.spriters-resource.com/xbox/halocombatevolved/asset/413569/
Halo 2: https://sounds.spriters-resource.com/xbox/halo2/asset/436393/?source=genre
Halo 3: https://sounds.spriters-resource.com/xbox_360/halo3/asset/405404/

Fine-tuning parameters used were:

batch_size = 1
drop_rate = 0.2
grad_accum = 1
lr = 2.5e-5
lora_r = 128
lora_alpha = 512
epochs = 20
train_diff = True
bf16 = True
grad_clip = True
max_grad = 0.8
grad_checkpoint = False
diff_weight = 1.4
ce_weight = 0.04
warmup = 0.03
scheduler = "cosine"

Special thanks to mrfakename, for creating OpenVoiceLab, a fantastic resource for both inference, and fine-tuning, with quite a nice GUI.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support