Spaces:
Build error
Build error
Upload 3 files
Browse files- README.md +250 -7
- app.py +260 -0
- requirements.txt +64 -0
README.md
CHANGED
@@ -1,12 +1,255 @@
|
|
1 |
---
|
2 |
-
title: Language Detection
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: gradio
|
7 |
-
|
8 |
app_file: app.py
|
9 |
-
|
10 |
---
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: Language Detection App
|
3 |
+
emoji: π
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: blue
|
6 |
sdk: gradio
|
7 |
+
python_version: 3.9
|
8 |
app_file: app.py
|
9 |
+
license: mit
|
10 |
---
|
11 |
|
12 |
+
# π Language Detection App
|
13 |
+
|
14 |
+
A powerful and elegant language detection application built with Gradio frontend and a modular backend featuring multiple state-of-the-art ML models organized by architecture and training dataset.
|
15 |
+
|
16 |
+
## β¨ Features
|
17 |
+
|
18 |
+
- **Clean Gradio Interface**: Simple, intuitive web interface for language detection
|
19 |
+
- **Multiple Model Architectures**: Choose between XLM-RoBERTa (Model A) and BERT (Model B) architectures
|
20 |
+
- **Multiple Training Datasets**: Models trained on standard (Dataset A) and enhanced (Dataset B) datasets
|
21 |
+
- **Centralized Configuration**: All model configurations and settings in one place
|
22 |
+
- **Modular Backend**: Easy-to-extend architecture for plugging in your own ML models
|
23 |
+
- **Real-time Detection**: Instant language detection with confidence scores
|
24 |
+
- **Multiple Predictions**: Shows top 5 language predictions with confidence levels
|
25 |
+
- **100+ Languages**: Support for major world languages (varies by model)
|
26 |
+
- **Example Texts**: Pre-loaded examples in various languages for testing
|
27 |
+
- **Model Switching**: Seamlessly switch between different models
|
28 |
+
- **Extensible**: Abstract base class for implementing custom models
|
29 |
+
|
30 |
+
## π Quick Start
|
31 |
+
|
32 |
+
### 1. Setup Environment
|
33 |
+
|
34 |
+
```bash
|
35 |
+
# Create virtual environment
|
36 |
+
python -m venv venv
|
37 |
+
|
38 |
+
# Activate environment
|
39 |
+
# On macOS/Linux:
|
40 |
+
source venv/bin/activate
|
41 |
+
# On Windows:
|
42 |
+
venv\Scripts\activate
|
43 |
+
|
44 |
+
# Install dependencies
|
45 |
+
pip install -r requirements.txt
|
46 |
+
```
|
47 |
+
|
48 |
+
### 2. Test the Backend
|
49 |
+
|
50 |
+
```bash
|
51 |
+
# Run tests to verify everything works
|
52 |
+
python test_app.py
|
53 |
+
|
54 |
+
# Test specific model combinations
|
55 |
+
python test_model_a_dataset_a.py
|
56 |
+
python test_model_b_dataset_b.py
|
57 |
+
```
|
58 |
+
|
59 |
+
### 3. Launch the App
|
60 |
+
|
61 |
+
```bash
|
62 |
+
# Start the Gradio app
|
63 |
+
python app.py
|
64 |
+
```
|
65 |
+
|
66 |
+
The app will be available at `http://localhost:7860`
|
67 |
+
|
68 |
+
## π§© Model Architecture
|
69 |
+
|
70 |
+
The system is organized around two dimensions:
|
71 |
+
|
72 |
+
### ποΈ Model Architectures
|
73 |
+
- **Model A**: XLM-RoBERTa based architectures - Excellent cross-lingual transfer capabilities
|
74 |
+
- **Model B**: BERT based architectures - Efficient and fast processing
|
75 |
+
|
76 |
+
### π Training Datasets
|
77 |
+
- **Dataset A**: Standard multilingual language detection dataset - Broad language coverage
|
78 |
+
- **Dataset B**: Enhanced/specialized language detection dataset - Ultra-high accuracy focus
|
79 |
+
|
80 |
+
### π€ Available Model Combinations
|
81 |
+
|
82 |
+
1. **Model A Dataset A** - XLM-RoBERTa + Standard Dataset β
|
83 |
+
- **Architecture**: XLM-RoBERTa (Model A)
|
84 |
+
- **Training**: Dataset A (standard multilingual)
|
85 |
+
- **Accuracy**: 97.9%
|
86 |
+
- **Size**: 278M parameters
|
87 |
+
- **Languages**: 100+ languages
|
88 |
+
- **Strengths**: Balanced performance, robust cross-lingual capabilities, comprehensive language coverage
|
89 |
+
- **Use Cases**: General-purpose language detection, multilingual content processing
|
90 |
+
|
91 |
+
2. **Model B Dataset A** - BERT + Standard Dataset β
|
92 |
+
- **Architecture**: BERT (Model B)
|
93 |
+
- **Training**: Dataset A (standard multilingual)
|
94 |
+
- **Accuracy**: 96.17%
|
95 |
+
- **Size**: 178M parameters
|
96 |
+
- **Languages**: 100+ languages
|
97 |
+
- **Strengths**: Fast inference, broad language support, efficient processing
|
98 |
+
- **Use Cases**: High-throughput detection, real-time applications, resource-constrained environments
|
99 |
+
|
100 |
+
3. **Model A Dataset B** - XLM-RoBERTa + Enhanced Dataset β
|
101 |
+
- **Architecture**: XLM-RoBERTa (Model A)
|
102 |
+
- **Training**: Dataset B (enhanced/specialized)
|
103 |
+
- **Accuracy**: 99.72%
|
104 |
+
- **Size**: 278M parameters
|
105 |
+
- **Training Loss**: 0.0176
|
106 |
+
- **Languages**: 20 carefully selected languages
|
107 |
+
- **Strengths**: Exceptional accuracy, focused language support, state-of-the-art results
|
108 |
+
- **Use Cases**: Research applications, high-precision detection, critical accuracy requirements
|
109 |
+
|
110 |
+
4. **Model B Dataset B** - BERT + Enhanced Dataset β
|
111 |
+
- **Architecture**: BERT (Model B)
|
112 |
+
- **Training**: Dataset B (enhanced/specialized)
|
113 |
+
- **Accuracy**: 99.85%
|
114 |
+
- **Size**: 178M parameters
|
115 |
+
- **Training Loss**: 0.0125
|
116 |
+
- **Languages**: 20 carefully selected languages
|
117 |
+
- **Strengths**: Highest accuracy, ultra-low training loss, precision-optimized
|
118 |
+
- **Use Cases**: Maximum precision applications, research requiring highest accuracy
|
119 |
+
|
120 |
+
### ποΈ Core Components
|
121 |
+
|
122 |
+
- **`BaseLanguageModel`**: Abstract interface that all models must implement
|
123 |
+
- **`ModelRegistry`**: Manages model registration and creation with centralized configuration
|
124 |
+
- **`LanguageDetector`**: Main orchestrator for language detection
|
125 |
+
- **`model_config.py`**: Centralized configuration for all models and language mappings
|
126 |
+
|
127 |
+
### π§ Adding New Models
|
128 |
+
|
129 |
+
To add a new model combination, simply:
|
130 |
+
|
131 |
+
1. Create a new file in `backend/models/` (e.g., `model_c_dataset_a.py`)
|
132 |
+
2. Inherit from `BaseLanguageModel`
|
133 |
+
3. Implement the required methods
|
134 |
+
4. Add configuration to `model_config.py`
|
135 |
+
5. Register it in `ModelRegistry`
|
136 |
+
|
137 |
+
Example:
|
138 |
+
```python
|
139 |
+
# backend/models/model_c_dataset_a.py
|
140 |
+
from .base_model import BaseLanguageModel
|
141 |
+
from .model_config import get_model_config
|
142 |
+
|
143 |
+
class ModelCDatasetA(BaseLanguageModel):
|
144 |
+
def __init__(self):
|
145 |
+
self.model_key = "model-c-dataset-a"
|
146 |
+
self.config = get_model_config(self.model_key)
|
147 |
+
# Initialize your model
|
148 |
+
|
149 |
+
def predict(self, text: str) -> Dict[str, Any]:
|
150 |
+
# Implement prediction logic
|
151 |
+
pass
|
152 |
+
|
153 |
+
def get_supported_languages(self) -> List[str]:
|
154 |
+
# Return supported language codes
|
155 |
+
pass
|
156 |
+
|
157 |
+
def get_model_info(self) -> Dict[str, Any]:
|
158 |
+
# Return model metadata from config
|
159 |
+
pass
|
160 |
+
```
|
161 |
+
|
162 |
+
Then add configuration in `model_config.py` and register in `language_detector.py`.
|
163 |
+
|
164 |
+
## π§ͺ Testing
|
165 |
+
|
166 |
+
The project includes comprehensive test suites:
|
167 |
+
|
168 |
+
- **`test_app.py`**: General app functionality tests
|
169 |
+
- **`test_model_a_dataset_a.py`**: Tests for XLM-RoBERTa + standard dataset
|
170 |
+
- **`test_model_b_dataset_b.py`**: Tests for BERT + enhanced dataset (highest accuracy)
|
171 |
+
- **Model comparison tests**: Automated testing across all model combinations
|
172 |
+
- **Model switching tests**: Verify seamless model switching
|
173 |
+
|
174 |
+
## π Supported Languages
|
175 |
+
|
176 |
+
The models support different language sets based on their training:
|
177 |
+
|
178 |
+
- **Model A/B + Dataset A**: 100+ languages including major European, Asian, African, and other world languages based on the CC-100 dataset
|
179 |
+
- **Model A/B + Dataset B**: 20 carefully selected high-performance languages (Arabic, Bulgarian, German, Greek, English, Spanish, French, Hindi, Italian, Japanese, Dutch, Polish, Portuguese, Russian, Swahili, Thai, Turkish, Urdu, Vietnamese, Chinese)
|
180 |
+
|
181 |
+
## π Model Comparison
|
182 |
+
|
183 |
+
| Feature | Model A Dataset A | Model B Dataset A | Model A Dataset B | Model B Dataset B |
|
184 |
+
|---------|-------------------|-------------------|-------------------|-------------------|
|
185 |
+
| **Architecture** | XLM-RoBERTa | BERT | XLM-RoBERTa | BERT |
|
186 |
+
| **Dataset** | Standard | Standard | Enhanced | Enhanced |
|
187 |
+
| **Accuracy** | 97.9% | 96.17% | 99.72% | **99.85%** π |
|
188 |
+
| **Model Size** | 278M | 178M | 278M | 178M |
|
189 |
+
| **Languages** | 100+ | 100+ | 20 (curated) | 20 (curated) |
|
190 |
+
| **Training Loss** | N/A | N/A | 0.0176 | **0.0125** |
|
191 |
+
| **Speed** | Moderate | **Fast** | Moderate | **Fast** |
|
192 |
+
| **Memory Usage** | Higher | **Lower** | Higher | **Lower** |
|
193 |
+
| **Best For** | Balanced performance | Speed & broad coverage | Ultra-high accuracy | **Maximum precision** |
|
194 |
+
|
195 |
+
### π― Model Selection Guide
|
196 |
+
|
197 |
+
- **π Model B Dataset B**: Choose for maximum accuracy on 20 core languages (99.85%)
|
198 |
+
- **π¬ Model A Dataset B**: Choose for ultra-high accuracy on 20 core languages (99.72%)
|
199 |
+
- **βοΈ Model A Dataset A**: Choose for balanced performance and comprehensive language coverage (97.9%)
|
200 |
+
- **β‘ Model B Dataset A**: Choose for fast inference and broad language coverage (96.17%)
|
201 |
+
|
202 |
+
## π§ Configuration
|
203 |
+
|
204 |
+
You can configure models using the centralized configuration system:
|
205 |
+
|
206 |
+
```python
|
207 |
+
# Default model selection
|
208 |
+
detector = LanguageDetector(model_key="model-a-dataset-a") # Balanced XLM-RoBERTa
|
209 |
+
detector = LanguageDetector(model_key="model-b-dataset-a") # Fast BERT
|
210 |
+
detector = LanguageDetector(model_key="model-a-dataset-b") # Ultra-high accuracy XLM-RoBERTa
|
211 |
+
detector = LanguageDetector(model_key="model-b-dataset-b") # Maximum precision BERT
|
212 |
+
|
213 |
+
# All configurations are centralized in backend/models/model_config.py
|
214 |
+
```
|
215 |
+
|
216 |
+
## π Project Structure
|
217 |
+
|
218 |
+
```
|
219 |
+
language-detection/
|
220 |
+
βββ backend/
|
221 |
+
β βββ models/
|
222 |
+
β β βββ model_config.py # Centralized configuration
|
223 |
+
β β βββ base_model.py # Abstract base class
|
224 |
+
β β βββ model_a_dataset_a.py # XLM-RoBERTa + Standard
|
225 |
+
β β βββ model_b_dataset_a.py # BERT + Standard
|
226 |
+
β β βββ model_a_dataset_b.py # XLM-RoBERTa + Enhanced
|
227 |
+
β β βββ model_b_dataset_b.py # BERT + Enhanced
|
228 |
+
β β βββ __init__.py
|
229 |
+
β βββ language_detector.py # Main orchestrator
|
230 |
+
βββ tests/
|
231 |
+
βββ app.py # Gradio interface
|
232 |
+
βββ README.md
|
233 |
+
```
|
234 |
+
|
235 |
+
## π€ Contributing
|
236 |
+
|
237 |
+
1. Fork the repository
|
238 |
+
2. Create your feature branch (`git checkout -b feature/new-model-combination`)
|
239 |
+
3. Implement your model following the `BaseLanguageModel` interface
|
240 |
+
4. Add configuration to `model_config.py`
|
241 |
+
5. Add tests for your implementation
|
242 |
+
6. Commit your changes (`git commit -m 'Add new model combination'`)
|
243 |
+
7. Push to the branch (`git push origin feature/new-model-combination`)
|
244 |
+
8. Open a Pull Request
|
245 |
+
|
246 |
+
## π License
|
247 |
+
|
248 |
+
This project is open source and available under the MIT License.
|
249 |
+
|
250 |
+
## π Acknowledgments
|
251 |
+
|
252 |
+
- **Hugging Face** for the transformers library and model hosting platform
|
253 |
+
- **Model providers** for the fine-tuned language detection models used in this project
|
254 |
+
- **Gradio** for the excellent web interface framework
|
255 |
+
- **Open source community** for the foundational technologies that make this project possible
|
app.py
ADDED
@@ -0,0 +1,260 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
from backend.language_detector import LanguageDetector
|
3 |
+
|
4 |
+
def main():
|
5 |
+
# Initialize the language detector with default model (Model A Dataset A)
|
6 |
+
detector = LanguageDetector()
|
7 |
+
|
8 |
+
# Create Gradio interface
|
9 |
+
with gr.Blocks(title="Language Detection App", theme=gr.themes.Soft()) as app:
|
10 |
+
gr.Markdown("# π Language Detection App")
|
11 |
+
gr.Markdown("Select a model and enter text below to detect its language with confidence scores.")
|
12 |
+
|
13 |
+
# Model Selection Section with visual styling
|
14 |
+
with gr.Group():
|
15 |
+
gr.Markdown(
|
16 |
+
"<div style='text-align: center; padding: 16px 0 8px 0; margin-bottom: 16px; font-size: 18px; font-weight: 600; border-bottom: 2px solid; background: linear-gradient(90deg, transparent, rgba(99, 102, 241, 0.1), transparent); border-radius: 8px 8px 0 0;'>π€ Model Selection</div>"
|
17 |
+
)
|
18 |
+
|
19 |
+
# Get available models
|
20 |
+
available_models = detector.get_available_models()
|
21 |
+
model_choices = []
|
22 |
+
model_info_map = {}
|
23 |
+
|
24 |
+
for key, info in available_models.items():
|
25 |
+
if info["status"] == "available":
|
26 |
+
model_choices.append((info["display_name"], key))
|
27 |
+
else:
|
28 |
+
model_choices.append((f"{info['display_name']} (Coming Soon)", key))
|
29 |
+
model_info_map[key] = info
|
30 |
+
|
31 |
+
model_selector = gr.Dropdown(
|
32 |
+
choices=model_choices,
|
33 |
+
value="model-a-dataset-a", # Default to Model A Dataset A
|
34 |
+
label="Choose Language Detection Model",
|
35 |
+
interactive=True
|
36 |
+
)
|
37 |
+
|
38 |
+
# Model Information Display
|
39 |
+
model_info_display = gr.Markdown(
|
40 |
+
value=_format_model_info(detector.get_current_model_info()),
|
41 |
+
label="Model Information"
|
42 |
+
)
|
43 |
+
|
44 |
+
# Add visual separator
|
45 |
+
gr.Markdown(
|
46 |
+
"<div style='margin: 24px 0; border-top: 3px solid rgba(99, 102, 241, 0.2); background: linear-gradient(90deg, transparent, rgba(99, 102, 241, 0.05), transparent); height: 2px;'></div>"
|
47 |
+
)
|
48 |
+
|
49 |
+
# Analysis Section
|
50 |
+
with gr.Group():
|
51 |
+
gr.Markdown(
|
52 |
+
"<div style='text-align: center; padding: 16px 0 8px 0; margin-bottom: 16px; font-size: 18px; font-weight: 600; border-bottom: 2px solid; background: linear-gradient(90deg, transparent, rgba(34, 197, 94, 0.1), transparent); border-radius: 8px 8px 0 0;'>π Language Analysis</div>"
|
53 |
+
)
|
54 |
+
|
55 |
+
with gr.Row():
|
56 |
+
with gr.Column(scale=2):
|
57 |
+
# Input section
|
58 |
+
text_input = gr.Textbox(
|
59 |
+
label="Text to Analyze",
|
60 |
+
placeholder="Enter text here to detect its language...",
|
61 |
+
lines=5,
|
62 |
+
max_lines=10
|
63 |
+
)
|
64 |
+
|
65 |
+
detect_btn = gr.Button("π Detect Language", variant="primary", size="lg")
|
66 |
+
|
67 |
+
# Example texts
|
68 |
+
gr.Examples(
|
69 |
+
examples=[
|
70 |
+
["Hello, how are you today?"],
|
71 |
+
["Bonjour, comment allez-vous?"],
|
72 |
+
["Hola, ΒΏcΓ³mo estΓ‘s?"],
|
73 |
+
["Guten Tag, wie geht es Ihnen?"],
|
74 |
+
["γγγ«γ‘γ―γε
ζ°γ§γγοΌ"],
|
75 |
+
["ΠΡΠΈΠ²Π΅Ρ, ΠΊΠ°ΠΊ Π΄Π΅Π»Π°?"],
|
76 |
+
["Ciao, come stai?"],
|
77 |
+
["OlΓ‘, como vocΓͺ estΓ‘?"],
|
78 |
+
["δ½ ε₯½οΌδ½ ε₯½εοΌ"],
|
79 |
+
["μλ
νμΈμ, μ΄λ»κ² μ§λ΄μΈμ?"]
|
80 |
+
],
|
81 |
+
inputs=text_input,
|
82 |
+
label="Try these examples:"
|
83 |
+
)
|
84 |
+
|
85 |
+
with gr.Column(scale=2):
|
86 |
+
# Output section
|
87 |
+
with gr.Group():
|
88 |
+
gr.Markdown(
|
89 |
+
"<div style='text-align: center; padding: 16px 0 8px 0; margin-bottom: 12px; font-size: 18px; font-weight: 600; border-bottom: 2px solid; background: linear-gradient(90deg, transparent, rgba(168, 85, 247, 0.1), transparent); border-radius: 8px 8px 0 0;'>π Detection Results</div>"
|
90 |
+
)
|
91 |
+
|
92 |
+
detected_language = gr.Textbox(
|
93 |
+
label="Detected Language",
|
94 |
+
interactive=False
|
95 |
+
)
|
96 |
+
|
97 |
+
confidence_score = gr.Number(
|
98 |
+
label="Confidence Score",
|
99 |
+
interactive=False,
|
100 |
+
precision=4
|
101 |
+
)
|
102 |
+
|
103 |
+
language_code = gr.Textbox(
|
104 |
+
label="Language Code (ISO 639-1)",
|
105 |
+
interactive=False
|
106 |
+
)
|
107 |
+
|
108 |
+
# Top predictions table
|
109 |
+
top_predictions = gr.Dataframe(
|
110 |
+
headers=["Language", "Code", "Confidence"],
|
111 |
+
label="Top 5 Predictions",
|
112 |
+
interactive=False,
|
113 |
+
wrap=True
|
114 |
+
)
|
115 |
+
|
116 |
+
# Status/Info section
|
117 |
+
with gr.Row():
|
118 |
+
status_text = gr.Textbox(
|
119 |
+
label="Status",
|
120 |
+
interactive=False,
|
121 |
+
visible=False
|
122 |
+
)
|
123 |
+
|
124 |
+
# Event handlers
|
125 |
+
def detect_language_wrapper(text, selected_model):
|
126 |
+
if not text.strip():
|
127 |
+
return (
|
128 |
+
"No text provided",
|
129 |
+
0.0,
|
130 |
+
"",
|
131 |
+
[],
|
132 |
+
gr.update(value="Please enter some text to analyze.", visible=True)
|
133 |
+
)
|
134 |
+
|
135 |
+
try:
|
136 |
+
# Switch model if needed
|
137 |
+
if detector.current_model_key != selected_model:
|
138 |
+
try:
|
139 |
+
detector.switch_model(selected_model)
|
140 |
+
except NotImplementedError:
|
141 |
+
return (
|
142 |
+
"Model unavailable",
|
143 |
+
0.0,
|
144 |
+
"",
|
145 |
+
[],
|
146 |
+
gr.update(value="This model is not yet implemented. Please select an available model.", visible=True)
|
147 |
+
)
|
148 |
+
except Exception as e:
|
149 |
+
return (
|
150 |
+
"Model error",
|
151 |
+
0.0,
|
152 |
+
"",
|
153 |
+
[],
|
154 |
+
gr.update(value=f"Error loading model: {str(e)}", visible=True)
|
155 |
+
)
|
156 |
+
|
157 |
+
result = detector.detect_language(text)
|
158 |
+
|
159 |
+
# Extract main prediction
|
160 |
+
main_lang = result['language']
|
161 |
+
main_confidence = result['confidence']
|
162 |
+
main_code = result['language_code']
|
163 |
+
|
164 |
+
# Format top predictions for table
|
165 |
+
predictions_table = [
|
166 |
+
[pred['language'], pred['language_code'], f"{pred['confidence']:.4f}"]
|
167 |
+
for pred in result['top_predictions']
|
168 |
+
]
|
169 |
+
|
170 |
+
model_info = result.get('metadata', {}).get('model_info', {})
|
171 |
+
model_name = model_info.get('name', 'Unknown Model')
|
172 |
+
|
173 |
+
return (
|
174 |
+
main_lang,
|
175 |
+
main_confidence,
|
176 |
+
main_code,
|
177 |
+
predictions_table,
|
178 |
+
gr.update(value=f"β
Analysis Complete\n\nInput Text: {text[:100]}{'...' if len(text) > 100 else ''}\n\nDetected Language: {main_lang} ({main_code})\nConfidence: {main_confidence:.2%}\n\nModel: {model_name}", visible=True)
|
179 |
+
)
|
180 |
+
|
181 |
+
except Exception as e:
|
182 |
+
return (
|
183 |
+
"Error occurred",
|
184 |
+
0.0,
|
185 |
+
"",
|
186 |
+
[],
|
187 |
+
gr.update(value=f"Error: {str(e)}", visible=True)
|
188 |
+
)
|
189 |
+
|
190 |
+
def update_model_info(selected_model):
|
191 |
+
"""Update model information display when model selection changes."""
|
192 |
+
try:
|
193 |
+
if detector.current_model_key != selected_model:
|
194 |
+
detector.switch_model(selected_model)
|
195 |
+
model_info = detector.get_current_model_info()
|
196 |
+
return _format_model_info(model_info)
|
197 |
+
except NotImplementedError:
|
198 |
+
return "**This model is not yet implemented.** Please select an available model."
|
199 |
+
except Exception as e:
|
200 |
+
return f"**Error loading model information:** {str(e)}"
|
201 |
+
|
202 |
+
# Connect the button to the detection function
|
203 |
+
detect_btn.click(
|
204 |
+
fn=detect_language_wrapper,
|
205 |
+
inputs=[text_input, model_selector],
|
206 |
+
outputs=[detected_language, confidence_score, language_code, top_predictions, status_text]
|
207 |
+
)
|
208 |
+
|
209 |
+
# Also trigger on Enter key in text input
|
210 |
+
text_input.submit(
|
211 |
+
fn=detect_language_wrapper,
|
212 |
+
inputs=[text_input, model_selector],
|
213 |
+
outputs=[detected_language, confidence_score, language_code, top_predictions, status_text]
|
214 |
+
)
|
215 |
+
|
216 |
+
# Update model info when selection changes
|
217 |
+
model_selector.change(
|
218 |
+
fn=update_model_info,
|
219 |
+
inputs=[model_selector],
|
220 |
+
outputs=[model_info_display]
|
221 |
+
)
|
222 |
+
|
223 |
+
return app
|
224 |
+
|
225 |
+
|
226 |
+
def _format_model_info(model_info):
|
227 |
+
"""Format model information for display."""
|
228 |
+
if not model_info:
|
229 |
+
return "No model information available."
|
230 |
+
|
231 |
+
formatted_info = f"""
|
232 |
+
**{model_info.get('name', 'Unknown Model')}**
|
233 |
+
|
234 |
+
{model_info.get('description', 'No description available.')}
|
235 |
+
|
236 |
+
**π Performance:**
|
237 |
+
- Accuracy: {model_info.get('accuracy', 'N/A')}
|
238 |
+
- Model Size: {model_info.get('model_size', 'N/A')}
|
239 |
+
|
240 |
+
**ποΈ Architecture:**
|
241 |
+
- Model Architecture: {model_info.get('architecture', 'N/A')}
|
242 |
+
- Base Model: {model_info.get('base_model', 'N/A')}
|
243 |
+
- Training Dataset: {model_info.get('dataset', 'N/A')}
|
244 |
+
|
245 |
+
**π Languages:** {model_info.get('languages_supported', 'N/A')}
|
246 |
+
|
247 |
+
**βοΈ Training Details:** {model_info.get('training_details', 'N/A')}
|
248 |
+
|
249 |
+
**π‘ Use Cases:** {model_info.get('use_cases', 'N/A')}
|
250 |
+
|
251 |
+
**β
Strengths:** {model_info.get('strengths', 'N/A')}
|
252 |
+
|
253 |
+
**β οΈ Limitations:** {model_info.get('limitations', 'N/A')}
|
254 |
+
"""
|
255 |
+
return formatted_info
|
256 |
+
|
257 |
+
|
258 |
+
if __name__ == "__main__":
|
259 |
+
app = main()
|
260 |
+
app.launch()
|
requirements.txt
ADDED
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
aiofiles==24.1.0
|
2 |
+
annotated-types==0.7.0
|
3 |
+
anyio==4.9.0
|
4 |
+
audioop-lts==0.2.1
|
5 |
+
certifi==2025.4.26
|
6 |
+
charset-normalizer==3.4.2
|
7 |
+
click==8.1.8
|
8 |
+
fastapi==0.115.12
|
9 |
+
ffmpy==0.5.0
|
10 |
+
filelock==3.18.0
|
11 |
+
fsspec==2025.5.1
|
12 |
+
gradio==5.31.0
|
13 |
+
gradio_client==1.10.1
|
14 |
+
groovy==0.1.2
|
15 |
+
h11==0.16.0
|
16 |
+
hf-xet==1.1.2
|
17 |
+
httpcore==1.0.9
|
18 |
+
httpx==0.28.1
|
19 |
+
huggingface-hub==0.32.0
|
20 |
+
idna==3.10
|
21 |
+
Jinja2==3.1.6
|
22 |
+
markdown-it-py==3.0.0
|
23 |
+
MarkupSafe==3.0.2
|
24 |
+
mdurl==0.1.2
|
25 |
+
mpmath==1.3.0
|
26 |
+
networkx==3.4.2
|
27 |
+
numpy==2.2.6
|
28 |
+
orjson==3.10.18
|
29 |
+
packaging==25.0
|
30 |
+
pandas==2.2.3
|
31 |
+
pillow==11.2.1
|
32 |
+
pydantic==2.11.5
|
33 |
+
pydantic_core==2.33.2
|
34 |
+
pydub==0.25.1
|
35 |
+
Pygments==2.19.1
|
36 |
+
python-dateutil==2.9.0.post0
|
37 |
+
python-multipart==0.0.20
|
38 |
+
pytz==2025.2
|
39 |
+
PyYAML==6.0.2
|
40 |
+
regex==2024.11.6
|
41 |
+
requests==2.32.3
|
42 |
+
rich==14.0.0
|
43 |
+
ruff==0.11.11
|
44 |
+
safehttpx==0.1.6
|
45 |
+
safetensors==0.5.3
|
46 |
+
semantic-version==2.10.0
|
47 |
+
setuptools==80.8.0
|
48 |
+
shellingham==1.5.4
|
49 |
+
six==1.17.0
|
50 |
+
sniffio==1.3.1
|
51 |
+
starlette==0.46.2
|
52 |
+
sympy==1.14.0
|
53 |
+
tokenizers==0.21.1
|
54 |
+
tomlkit==0.13.2
|
55 |
+
torch==2.7.0
|
56 |
+
tqdm==4.67.1
|
57 |
+
transformers==4.52.3
|
58 |
+
typer==0.15.4
|
59 |
+
typing-inspection==0.4.1
|
60 |
+
typing_extensions==4.13.2
|
61 |
+
tzdata==2025.2
|
62 |
+
urllib3==2.4.0
|
63 |
+
uvicorn==0.34.2
|
64 |
+
websockets==15.0.1
|