πŸ—£οΈ Fine-Tuned SpeechT5 Model

This repository contains a fine-tuned version of SpeechT5 trained on approximately 60 minutes of Great voice I found in Youtube(it's might be AI generated) for text-to-speech (TTS) generation.


🧠 Model Overview

The goal of this model is to replicate the tone, rhythm, and delivery style of Andrew Tate’s speeches using the SpeechT5 architecture.
It performs well for short speech synthesis tasks but still exhibits a slightly metallic sound due to limited training data.


βš™οΈ Training Configuration

Parameter Value
Batch Size 8
Learning Rate 8e-5
Optimizer AdamW
Scheduler Linear
Training Steps 7000

πŸ—‚οΈ Dataset

  • Duration: ~1h18min minutes of clean audio
  • Sampling Rate: 16 kHz
  • Format: WAV
  • Text Source: Manual transcriptions

🎧 Results

  • The model produces clear and expressive speech aligned with Andrew Tate’s vocal tone.
  • Some metallic artifacts are still audible, likely due to the dataset size and limited training steps.
  • Further training and data augmentation could improve naturalness.

πŸš€ Recommendations for Improvement

  • Increase total training audio to 2–3 hours for better voice consistency.

🧩 Model Architecture

  • Base Model: microsoft/speecht5_tts
  • Fine-Tuning Framework: Hugging Face Transformers
  • Optimizer: AdamW

Example


Downloads last month
183
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for bakhil-aissa/speecht5_stoic_voice

Quantized
(3)
this model