openslr/librispeech_asr
Viewer • Updated • 585k • 107k • 225
How to use voidful/mhubert-unit-tts with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts")
model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts")voidful/mhubert-unit-tts
This repository provides a text to unit model form mhubert and trained with bart model.
The model was trained on the LibriSpeech ASR dataset for the English language and
Train epoch 13: WER:30.41 CER: 20.22
Hubert Code TTS Example
import asrp
import nlp2
import IPython.display as ipd
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
nlp2.download_file(
'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000',
'./')
tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts")
model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts")
model.eval()
cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan')
inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt")
code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0]
code = [int(i) for i in code.replace("</s>","").replace("<s>","").split("v_tok_")[1:]]
print(code)
ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)
Datasets The model was trained on the LibriSpeech ASR dataset for the English language.
Language The model is trained for the English language.
Metrics The model's performance is evaluated using Word Error Rate (WER).
Tags The model can be tagged with "hubert" and "tts".