Ijazah_Palsu_V1

Ijazah_Palsu_V1 is a fine-tuned version of the F5-TTS text-to-speech model, trained specifically on Indonesian voice data. The goal of this project is to explore and improve the expressiveness and pronunciation accuracy of Indonesian TTS models, particularly in real-world, varied-speaker conditions.

⚠️ Status: This model is in Beta/Experimental phase and should be used for research or evaluation purposes only.

🔍 Model Details

Base Model: SWivid/F5-TTS - F5TTS_v1_Base
Fine-tuned By: PapaRazi
Language: 🇮🇩 Indonesian
Vocabulary Size: 2,564 tokens
Model Size: 5,02 GB
Training Hardware: Single GPU — NVIDIA RTX 3060 (12GB VRAM)

Training Details

Training Time: ~50 hours of audio (50:27:14)
Total Samples: 66,233 (≈ 8.15 GB)
Vocabulary Size: 2,564
Hardware: NVIDIA RTX 3060 12GB
Precision: FP16 Mixed Precision

Training Config (excerpt from `config.json`):

{
  "learning_rate": 1e-05,
  "batch_size_per_gpu": 1600,
  "batch_size_type": "frame",
  "epochs": 28,
  "save_per_updates": 20000,
  "keep_last_n_checkpoints": 6,
  "last_per_updates": 10000,
  "tokenizer_type": "pinyin",
  "mixed_precision": "fp16"
}

The training dataset contains varied voices from different speakers (~10+ unique voices), covering both formal and conversational speech, including manually added synthetic samples (e.g., number reading via gTTS).

📉 Training Curves

🔺 Loss Over Time

🔻 Learning Rate Schedule

Training was manually stopped after approximately 300,000 steps, even though the learning rate had reached near zero. At this stage, the loss curve showed signs of instability and fluctuated without consistent downward progress. Based on qualitative evaluation of generated samples, the model was deemed sufficiently trained for a Beta release.

While performance was already usable, especially for Indonesian TTS tasks, further fine-tuning is still planned for:

Improving number and currency pronunciation
Enhancing long-form sentence fluency
Reducing jitter in expressive speech samples

⚠️ Known Limitations This model currently struggles with pronouncing numbers and numerical formats accurately (e.g., years, large numbers, currency values).

This is a common challenge in early-stage fine-tuning and can be attributed to:

Limited exposure to numerical utterances in the training dataset.

Variability in how numbers can be pronounced in Indonesian.

A dedicated sub-dataset for numerals and structured numeric expressions is being prepared and will be used in future fine-tuning phases.

Planned Improvement: In future versions, the model will be fine-tuned further using curated audio-text pairs focused specifically on number reading, dates, and currency values to enhance numerical pronunciation accuracy.

🔊 Sample Audio

Example inference output using Ijazah_Palsu_V1 model:

Text Input:
Caranya adalah cari kata kunci yang paling populer di situ, yaitu...

Text Input:
Kan Judi juga namanya Dewa, kemudian yang di Kamboja itu juga namanya Dewa.

Text Input:
kamarnya, lemarinya, rumahnya, dan lain sebagainya, dan mereka membuang barang-barang yang sudah saatnya dibuang, disitulah pentingnya di-clutter.

Text Input:
Ini adalah model TTS pertama saya. Kalau ada kekurangan, mohon dimaafkan.

📦 Usage

Manual Download & Usage

You can also download this model manually by placing the .pt checkpoint and corresponding vocab.txt file inside your F5-TTS checkpoint folder.

Inference via F5-TTS CLI:

f5-tts_infer-cli \
  --model "PapaRazi/Ijazah_Palsu_V1" \
  --ref_audio "ref.wav" \
  --ref_text "reference text" \
  --gen_text "generated Indonesian text"

PapaRazi
/

Ijazah_Palsu_V1

Ijazah_Palsu_V1

🔍 Model Details

Training Details

Training Config (excerpt from `config.json`):

📉 Training Curves

🔺 Loss Over Time

🔻 Learning Rate Schedule

🔊 Sample Audio

📦 Usage

Manual Download & Usage

Model tree for PapaRazi/Ijazah_Palsu_V1

Ijazah_Palsu_V1

🔍 Model Details

Training Details

Training Config (excerpt from config.json):

📉 Training Curves

🔺 Loss Over Time

🔻 Learning Rate Schedule

🔊 Sample Audio

📦 Usage

Manual Download & Usage

Model tree for PapaRazi/Ijazah_Palsu_V1

Training Config (excerpt from `config.json`):