---
title: Transformer Edge Optimization
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
tags:
- quantization
- optimization
- edge-ai
- mobile
- transformers
- onnx
- sentiment-analysis
duplicated_from: null
---

# 🚀 Transformer Edge Optimization Demo

<div align="center">

[![GitHub](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/mtkaya/transformer-edge-optimization)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/mtkaya/transformer-edge-optimization/blob/main/LICENSE)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mtkaya/transformer-edge-optimization/blob/main/notebooks/01_quantization_basics.ipynb)

**Interactive demo comparing Original vs Quantized transformer models**

[Try Demo](#) • [GitHub Repo](https://github.com/mtkaya/transformer-edge-optimization) • [Notebooks](https://github.com/mtkaya/transformer-edge-optimization/tree/main/notebooks)

</div>

---

## 🎯 What Does This Demo Do?

This interactive demo showcases **model quantization** - a technique to make AI models smaller and faster for mobile/edge devices.

### Try It:
1. **Quick Prediction** - Test sentiment analysis with quantized model
2. **Model Comparison** - Compare Original (FP32) vs Quantized (INT8) side by side
3. **Documentation** - Learn about the techniques

---

## ✨ Key Results

| Metric | Original | Quantized | Improvement |
|--------|----------|-----------|-------------|
| **Size** | 255 MB | 68 MB | **3.75x smaller** ⬇️ |
| **Speed** | 12.3 ms | 5.8 ms | **2.1x faster** ⚡ |
| **Accuracy** | 91.8% | 90.2% | **-1.6%** 📊 |

**Conclusion:** Nearly **4x smaller** model with **2x faster** inference and only **1.6% accuracy loss**!

---

## 🧪 What is Quantization?

**Quantization** reduces model size by converting weights from 32-bit floating point (FP32) to 8-bit integers (INT8).

### How It Works:

```python
import torch
from transformers import AutoModelForSequenceClassification

# Load model
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english"
)

# Quantize: FP32 → INT8
quantized = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Now 4x smaller! 🎉
```

### Why Quantization?

- ✅ **Smaller models** - Fit on mobile devices
- ✅ **Faster inference** - Better user experience
- ✅ **Lower power** - Longer battery life
- ✅ **Easy to implement** - Post-training, no retraining

---

## 📊 Optimization Techniques

This project demonstrates **3 major techniques**:

### 1. **Quantization** (This Demo)
- **Compression:** 4x
- **Speed:** 2-3x faster
- **Difficulty:** ⭐ Easy

### 2. **ONNX Runtime**
- **Compression:** 3.8x
- **Speed:** 2.2x faster
- **Difficulty:** ⭐⭐ Medium
- **Benefit:** Cross-platform deployment

### 3. **Knowledge Distillation**
- **Compression:** 6-10x
- **Speed:** 3x faster
- **Difficulty:** ⭐⭐⭐ Advanced
- **Benefit:** Student model learns from teacher

---

## 🚀 Try The Full Toolkit

### Interactive Notebooks (Google Colab):

#### 1. Quantization Basics (15 minutes)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mtkaya/transformer-edge-optimization/blob/main/notebooks/01_quantization_basics.ipynb)

**Learn:**
- Dynamic quantization
- Static quantization
- Model size comparison
- Performance benchmarking

---

#### 2. ONNX Runtime Optimization (20 minutes)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mtkaya/transformer-edge-optimization/blob/main/notebooks/02_huggingface_optimum.ipynb)

**Learn:**
- PyTorch → ONNX conversion
- Hugging Face Optimum
- Cross-platform deployment
- Hardware acceleration

---

#### 3. Knowledge Distillation (30 minutes)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mtkaya/transformer-edge-optimization/blob/main/notebooks/05_distilbert_training.ipynb)

**Learn:**
- Teacher-student training
- Distillation loss
- Creating tiny models
- BERT → TinyBERT

---

## 💻 Use Cases

### 📱 Mobile Apps
```kotlin
// Android with TFLite
val analyzer = SentimentAnalyzer(context)
val result = analyzer.predict("Great app!")
```

### 🌐 Web Apps
```javascript
// Browser with Transformers.js
import { pipeline } from '@xenova/transformers';
const classifier = await pipeline('sentiment-analysis');
```

### 🤖 Edge Devices
```python
# Raspberry Pi with ONNX Runtime
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
```

---

## 📚 Full Documentation

### GitHub Repository
**[mtkaya/transformer-edge-optimization](https://github.com/mtkaya/transformer-edge-optimization)**

Contains:
- ✅ 3 Jupyter notebooks
- ✅ Example code (Python, Kotlin, JavaScript)
- ✅ Comprehensive documentation
- ✅ CI/CD pipeline
- ✅ Docker support

### Quick Links:
- [Installation Guide](https://github.com/mtkaya/transformer-edge-optimization#-installation)
- [Usage Examples](https://github.com/mtkaya/transformer-edge-optimization#-examples)
- [API Reference](https://github.com/mtkaya/transformer-edge-optimization#-api-reference)
- [Contributing](https://github.com/mtkaya/transformer-edge-optimization/blob/main/CONTRIBUTING.md)

---

## 🎓 Technical Details

### Model Used:
**DistilBERT** fine-tuned on SST-2 (Stanford Sentiment Treebank)

- Base Model: `distilbert-base-uncased-finetuned-sst-2-english`
- Parameters: 67M
- Task: Binary sentiment classification (Positive/Negative)

### Quantization Approach:
**Dynamic Quantization** with PyTorch

- Weights: INT8 (8-bit integers)
- Activations: FP32 (computed at runtime)
- Method: `torch.quantization.quantize_dynamic()`

### Benchmark Hardware:
- **CPU:** Intel Xeon (Colab)
- **Input:** 128 tokens average
- **Iterations:** 100 runs per test

---

## 📊 Detailed Benchmark

### Model Size:
```
Original (FP32):     255.43 MB
Quantized (INT8):     68.12 MB
Compression Ratio:    3.75x
Space Saved:         187.31 MB (73.3%)
```

### Inference Speed (CPU):
```
Original:   12.34 ± 0.45 ms
Quantized:   5.78 ± 0.23 ms
Speedup:     2.13x
Time Saved:  6.56 ms per inference (53.2%)
```

### Accuracy (SST-2 Test Set):
```
Original:   91.8% accuracy
Quantized:  90.2% accuracy
Difference: -1.6%
```

### Memory Usage:
```
Original:   ~280 MB
Quantized:  ~95 MB
Reduction:  2.95x
```

---

## 🌟 Features of This Demo

### 🎯 Quick Prediction
- Enter any text
- Toggle between Original/Quantized
- See prediction + confidence + model info

### ⚖️ Model Comparison
- Side-by-side comparison
- Same input, both models
- Performance metrics

### 📚 Documentation
- Learn about quantization
- See benchmark results
- Access notebooks
- Quick start code

---

## 🤝 Contributing

We welcome contributions! Check out:

- **GitHub Issues:** [Report bugs](https://github.com/mtkaya/transformer-edge-optimization/issues)
- **Discussions:** [Ask questions](https://github.com/mtkaya/transformer-edge-optimization/discussions)
- **Pull Requests:** [Contribute code](https://github.com/mtkaya/transformer-edge-optimization/pulls)

---

## 📄 License

This project is licensed under the **MIT License**.

See [LICENSE](https://github.com/mtkaya/transformer-edge-optimization/blob/main/LICENSE) for details.

---

## 🙏 Acknowledgments

Built with:
- [Hugging Face Transformers](https://github.com/huggingface/transformers)
- [PyTorch](https://pytorch.org/)
- [Gradio](https://gradio.app/)

Inspired by:
- [DistilBERT paper](https://arxiv.org/abs/1910.01108) (Sanh et al., 2019)
- [Q8BERT](https://arxiv.org/abs/1910.06188) (Zafrir et al., 2021)

---

## 📧 Contact

- **GitHub:** [@mtkaya](https://github.com/mtkaya)
- **Issues:** [Report here](https://github.com/mtkaya/transformer-edge-optimization/issues)

---

<div align="center">

**⭐ Star the repo if you find this useful! ⭐**

[GitHub Repository](https://github.com/mtkaya/transformer-edge-optimization) • 
[Documentation](https://github.com/mtkaya/transformer-edge-optimization#readme) • 
[Notebooks](https://github.com/mtkaya/transformer-edge-optimization/tree/main/notebooks)

**Made with ❤️ for the AI community**

</div>