Spanish TTS Model with Emotions and Multiple Voices

This repository contains a fine-tuned Spanish Text-to-Speech (TTS) model based on canopylabs/3b-es_it-pretrain-research_release. The model supports multiple voices and nuanced emotions, trained using Unsloth and SNAC for audio tokenization.

➑️ Try it online: https://huggingface.co/spaces/sirekist98/orpheustts_spanish_tuned


πŸ‘¨β€πŸ’» Model Summary

  • Base model: canopylabs/3b-es_it-pretrain-research_release
  • Fine-tuned with: LoRA adapters (64 rank, alpha 64)
  • Audio tokenization: SNAC (24kHz)
  • Input format: source (emotion): text
  • Dataset: ~109k samples, 11 emotions Γ— 11 speakers
  • Training framework: Unsloth + Hugging Face Transformers

πŸš€ Training Overview

The model was trained on a curated subset of the dataset sirekist98/spanish_tts_noauddataset_24khz. We selected combinations of speaker (source) and emotion with at least 1000 samples, resulting in a balanced dataset of over 109,000 examples.

Each sample was tokenized using SNAC and embedded in a prompt structured as:

source (emotion): text

This prompt was then used to generate audio tokens, enabling the model to learn nuanced emotional prosody and voice control.

We trained the model for 1 epoch using gradient accumulation (batch size 8 Γ— 4 steps) with 4-bit quantization on an NVIDIA L4 GPU.


πŸ”Š Inference

You can run inference using the demo space: Orpheus TTS Spanish Fine-Tuned.

To run inference locally with full control:

from transformers import AutoTokenizer, AutoModelForCausalLM
from snac import SNAC

base_model = AutoModelForCausalLM.from_pretrained("canopylabs/3b-es_it-pretrain-research_release", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
tokenizer = AutoTokenizer.from_pretrained("canopylabs/3b-es_it-pretrain-research_release")
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")

prompt = "alloy (intense_fear_dread_apprehension_and_horror): Estoy atrapado, por favor ayΓΊdame."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids)
# Postprocess generated tokens (simplified)
audio_tokens = output[0].tolist()
# Trim to multiple of 7, subtract offset, and decode
trimmed = [t - 128266 for t in audio_tokens if t >= 128266]

layer_1, layer_2, layer_3 = [], [], []
for i in range(len(trimmed) // 7):
    layer_1.append(trimmed[7*i])
    layer_2.append(trimmed[7*i+1])
    layer_3.extend(trimmed[7*i+2:7*i+4])
    layer_2.append(trimmed[7*i+4])
    layer_3.extend(trimmed[7*i+5:7*i+7])

layers = [
    torch.tensor(layer_1).unsqueeze(0).to(snac_model.device),
    torch.tensor(layer_2).unsqueeze(0).to(snac_model.device),
    torch.tensor(layer_3).unsqueeze(0).to(snac_model.device),
]
audio = snac_model.decode(layers).squeeze().cpu().numpy()

πŸ—£οΈ Available Voices

You can generate speech using the following voices (source):

alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse

🌧️ Available Emotions for each voice


alloy

  • intense_interest_fascination_curiosity_and_intrigue
  • intense_fear_dread_apprehension_and_horror
  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_contempt_disdain_loathing_and_detestation
  • intense_astonishment_surprise_amazement_and_shock
  • intense_confusion_bewilderment_disorientation_and_perplexity
  • intense_pride_dignity_self_confidence_and_honor
  • intense_sourness_tartness_and_acidity
  • intense_sympathy_compassion_warmth_trust_and_tenderness

ash

  • intense_interest_fascination_curiosity_and_intrigue
  • intense_fear_dread_apprehension_and_horror
  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_astonishment_surprise_amazement_and_shock
  • intense_sympathy_compassion_warmth_trust_and_tenderness

ballad

  • intense_interest_fascination_curiosity_and_intrigue
  • intense_fear_dread_apprehension_and_horror
  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_contempt_disdain_loathing_and_detestation
  • intense_astonishment_surprise_amazement_and_shock
  • intense_confusion_bewilderment_disorientation_and_perplexity
  • intense_helplessness_powerlessness_desperation_and_submission
  • intense_pride_dignity_self_confidence_and_honor
  • intense_sourness_tartness_and_acidity

coral

  • intense_fear_dread_apprehension_and_horror
  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_contempt_disdain_loathing_and_detestation
  • intense_confusion_bewilderment_disorientation_and_perplexity
  • intense_helplessness_powerlessness_desperation_and_submission
  • intense_pride_dignity_self_confidence_and_honor
  • intense_sourness_tartness_and_acidity
  • intense_sympathy_compassion_warmth_trust_and_tenderness

echo

  • intense_interest_fascination_curiosity_and_intrigue
  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_contempt_disdain_loathing_and_detestation
  • intense_astonishment_surprise_amazement_and_shock
  • intense_helplessness_powerlessness_desperation_and_submission
  • intense_pride_dignity_self_confidence_and_honor
  • intense_sympathy_compassion_warmth_trust_and_tenderness

fable

  • intense_interest_fascination_curiosity_and_intrigue
  • intense_fear_dread_apprehension_and_horror
  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_contempt_disdain_loathing_and_detestation
  • intense_helplessness_powerlessness_desperation_and_submission
  • intense_sourness_tartness_and_acidity

nova

  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_contempt_disdain_loathing_and_detestation
  • intense_astonishment_surprise_amazement_and_shock
  • intense_confusion_bewilderment_disorientation_and_perplexity
  • intense_helplessness_powerlessness_desperation_and_submission
  • intense_pride_dignity_self_confidence_and_honor
  • intense_sourness_tartness_and_acidity
  • intense_sympathy_compassion_warmth_trust_and_tenderness

onyx

  • intense_interest_fascination_curiosity_and_intrigue
  • intense_fear_dread_apprehension_and_horror
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_confusion_bewilderment_disorientation_and_perplexity
  • intense_helplessness_powerlessness_desperation_and_submission
  • intense_pride_dignity_self_confidence_and_honor
  • intense_sympathy_compassion_warmth_trust_and_tenderness

sage

  • intense_interest_fascination_curiosity_and_intrigue
  • intense_fear_dread_apprehension_and_horror
  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_astonishment_surprise_amazement_and_shock
  • intense_confusion_bewilderment_disorientation_and_perplexity
  • intense_pride_dignity_self_confidence_and_honor
  • intense_sourness_tartness_and_acidity
  • intense_sympathy_compassion_warmth_trust_and_tenderness

shimmer

  • intense_interest_fascination_curiosity_and_intrigue
  • intense_fear_dread_apprehension_and_horror
  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_contempt_disdain_loathing_and_detestation
  • intense_astonishment_surprise_amazement_and_shock
  • intense_confusion_bewilderment_disorientation_and_perplexity
  • intense_helplessness_powerlessness_desperation_and_submission
  • intense_pride_dignity_self_confidence_and_honor
  • intense_sourness_tartness_and_acidity

verse

  • intense_interest_fascination_curiosity_and_intrigue
  • intense_fear_dread_apprehension_and_horror
  • intense_ecstasy_pleasure_bliss_rapture_and_beatitude
  • intense_numbness_detachment_insensitivity_and_apathy
  • intense_contempt_disdain_loathing_and_detestation
  • intense_astonishment_surprise_amazement_and_shock
  • intense_helplessness_powerlessness_desperation_and_submission
  • intense_sourness_tartness_and_acidity

πŸ“– Citation

@misc{sirekist2025spanishTTS,
  author = {sirekist98},
  title = {Spanish TTS Model with Emotions and Multiple Voices},
  year = {2025},
  howpublished = {\url{https://huggingface.co/sirekist98/spanish_model}}
}

✨ Acknowledgements


❓ Questions or Contributions?

Open an issue or contact @sirekist98 on Hugging Face.

Thanks for checking out this model! πŸš€

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sirekist98/orpheustts_spanish_finetuned

Dataset used to train sirekist98/orpheustts_spanish_finetuned

Space using sirekist98/orpheustts_spanish_finetuned 1