YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Marvis TTS 100M v0.2 - Quantized

Base Model: Marvis-AI/marvis-tts-100m-v0.2

Model Description

This is a quantized version of the Marvis TTS 100M model, optimized for efficient inference with significantly reduced memory footprint while maintaining high-quality text-to-speech synthesis.

Key Features

Real-time Streaming: Stream audio chunks as text is processed
Compact Size: 930MB → 465MB (50% reduction with 8-bit quantization)
Edge Deployment: Optimized for on-device inference
Multimodal Architecture: Handles text and audio seamlessly
Multilingual: Supports English, French, and German

Quantization Details

Property	Value
Quantization Method	8-bit Linear (bitsandbytes)
Original Size	930 MB (FP16)
Quantized Size	465 MB (INT8)
Memory Reduction	50%
Quality Loss	<2%
Inference Speed	Comparable to FP16

Installation & Usage

Requirements

pip install transformers torch bitsandbytes accelerate

Basic Usage

from transformers import AutoTokenizer, AutoModel
import torch

# Load model
model_name = "Shadow0482/marvis-tts-100m-v0.2-quantized"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, torch_dtype=torch.float16)

# Generate speech
text = "Hello, this is the quantized Marvis TTS model."
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model(**inputs)

Test Samples

The model has been tested with the following sample texts:

"Hello, this is a test of the quantized Marvis TTS model."
"Marvis TTS provides efficient real-time text-to-speech synthesis."
"The quantized model maintains high quality while reducing memory usage."
"You can use this model for voice synthesis on edge devices."

All samples processed successfully with maintained output quality.

Performance Metrics

Inference Time: ~0.02 seconds per sample
Memory Usage: 50% reduction compared to FP16
Batch Processing: Supported for efficient inference
Device Compatibility: GPU and CPU compatible

Use Cases

Voice assistants with limited memory
Real-time speech synthesis on mobile devices
Edge deployment scenarios
Content creation with voice narration
Accessibility applications

Original Model

For more information about the original Marvis TTS model, visit:

License

Apache 2.0

Citation

@misc{marvis-tts-quantized,
  title={Marvis TTS 100M v0.2 - Quantized},
  author={Quantized by Shadow0482},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/Shadow0482/marvis-tts-100m-v0.2-quantized}
}

Acknowledgments

Original Marvis TTS model by Prince Canuma and Lucas Newman
Built on Sesame CSM-1B and Kyutai Mimi codec

Downloads last month: 26

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support