Spaces:
Sleeping
Sleeping
Upload 4 files
Browse files- README app.md +84 -0
- app.py +231 -0
- lang_model_map_json.json +88 -0
- requirements.txt +8 -0
README app.md
ADDED
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Smart Multilingual Translator
|
2 |
+
|
3 |
+
A smart translation application based on Hugging Face models that automatically detects input language and translates it to multiple target languages.
|
4 |
+
|
5 |
+
## Key Features
|
6 |
+
- **Automatic Language Detection**: Automatic detection of input text language
|
7 |
+
- **Multilingual Translation**: Translate to up to 3 target languages simultaneously
|
8 |
+
- **User-Friendly Interface**: Simple and easy-to-use Gradio interface
|
9 |
+
- **Multiple Language Support**: Support for dozens of different languages
|
10 |
+
|
11 |
+
|
12 |
+
## Project Structure
|
13 |
+
```
|
14 |
+
smart-translator/
|
15 |
+
├── app.py # Main application
|
16 |
+
├── requirements.txt # Python dependencies
|
17 |
+
├── README.md # Project documentation
|
18 |
+
├── lang_model_map.json # Language to model mapping
|
19 |
+
└── assets/ # Assets folder
|
20 |
+
└── screenshot.png # Screenshot
|
21 |
+
```
|
22 |
+
|
23 |
+
## Supported Languages
|
24 |
+
### Language Detection (Input)
|
25 |
+
Arabic (ar) - العربية
|
26 |
+
Dutch (nl) - Nederlands
|
27 |
+
English (en) - English
|
28 |
+
French (fr) - Français
|
29 |
+
German (de) - Deutsch
|
30 |
+
Hebrew (he) - עברית
|
31 |
+
Italian (it) - Italiano
|
32 |
+
Portuguese (pt) - Português
|
33 |
+
Russian (ru) - Русский
|
34 |
+
Spanish (es) - Español
|
35 |
+
|
36 |
+
### Target Languages (Output)
|
37 |
+
Hebrew (he) - עברית
|
38 |
+
Arabic (ar) - العربية
|
39 |
+
Spanish (es) - Español
|
40 |
+
French (fr) - Français
|
41 |
+
|
42 |
+
## Models Used in the Application
|
43 |
+
| Model | Advantages | Disadvantages |
|
44 |
+
|-------|------------|---------------|
|
45 |
+
| **langdetect**: Library based on Google's language-detection library | Fast, lightweight, supports many languages | Less accurate with short texts |
|
46 |
+
| **Helsinki-NLP MarianMT** | Fast, lightweight, good quality for European languages | Less effective for Asian languages |
|
47 |
+
|
48 |
+
|
49 |
+
## Local Installation
|
50 |
+
|
51 |
+
1. Install dependencies:
|
52 |
+
```bash
|
53 |
+
pip install -r requirements.txt
|
54 |
+
```
|
55 |
+
|
56 |
+
2. Run the application:
|
57 |
+
```bash
|
58 |
+
python app.py
|
59 |
+
```
|
60 |
+
|
61 |
+
## Technical Implementation
|
62 |
+
|
63 |
+
### Translation Pipeline
|
64 |
+
1. **Language Detection**: Uses langdetect library
|
65 |
+
2. **English Intermediary**: All translations route through English
|
66 |
+
3. **Target Translation**: English text translated to selected languages
|
67 |
+
4. **Error Handling**: Graceful fallbacks for unsupported languages
|
68 |
+
|
69 |
+
## Requirements
|
70 |
+
|
71 |
+
- gradio==4.29.0
|
72 |
+
- transformers==4.41.1
|
73 |
+
- torch==2.2.2
|
74 |
+
- langdetect==1.0.9
|
75 |
+
- sentencepiece==0.1.97
|
76 |
+
- protobuf==3.20.0
|
77 |
+
- accelerate==0.21.0
|
78 |
+
- datasets==2.0.0
|
79 |
+
|
80 |
+
## License
|
81 |
+
|
82 |
+
This project is open source under the MIT License.
|
83 |
+
|
84 |
+
---
|
app.py
ADDED
@@ -0,0 +1,231 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import json
|
3 |
+
from langdetect import detect
|
4 |
+
from transformers import pipeline
|
5 |
+
import warnings
|
6 |
+
warnings.filterwarnings("ignore")
|
7 |
+
|
8 |
+
# Load language to model mapping
|
9 |
+
def load_language_model_map():
|
10 |
+
"""Load mapping between languages and translation models"""
|
11 |
+
return {
|
12 |
+
'ar': 'Helsinki-NLP/opus-mt-ar-en', # Arabic to English
|
13 |
+
'fr': 'Helsinki-NLP/opus-mt-fr-en', # French to English
|
14 |
+
'de': 'Helsinki-NLP/opus-mt-de-en', # German to English
|
15 |
+
'es': 'Helsinki-NLP/opus-mt-es-en', # Spanish to English
|
16 |
+
'he': 'Helsinki-NLP/opus-mt-he-en', # Hebrew to English
|
17 |
+
'ru': 'Helsinki-NLP/opus-mt-ru-en', # Russian to English
|
18 |
+
'it': 'Helsinki-NLP/opus-mt-it-en', # Italian to English
|
19 |
+
}
|
20 |
+
|
21 |
+
# Language code to full name mapping
|
22 |
+
LANGUAGE_NAMES = {
|
23 |
+
'en': 'English',
|
24 |
+
'ar': 'Arabic',
|
25 |
+
'fr': 'French',
|
26 |
+
'de': 'German',
|
27 |
+
'es': 'Spanish',
|
28 |
+
'he': 'Hebrew',
|
29 |
+
'ru': 'Russian',
|
30 |
+
'it': 'Italian',
|
31 |
+
'pt': 'Portuguese',
|
32 |
+
'nl': 'Dutch',
|
33 |
+
'sv': 'Swedish',
|
34 |
+
'da': 'Danish',
|
35 |
+
'no': 'Norwegian',
|
36 |
+
'fi': 'Finnish'
|
37 |
+
}
|
38 |
+
|
39 |
+
# Initialize translation pipelines
|
40 |
+
def get_translation_pipelines():
|
41 |
+
"""Initialize translation pipelines for different target languages from JSON"""
|
42 |
+
try:
|
43 |
+
with open('lang_model_map.json', 'r', encoding='utf-8') as f:
|
44 |
+
data = json.load(f)
|
45 |
+
# Extract output language mappings
|
46 |
+
output_langs = data['language_to_model_mapping']['output_languages']
|
47 |
+
pipelines = {}
|
48 |
+
for lang_name, lang_info in output_langs.items():
|
49 |
+
# Only load the main target languages to avoid memory issues
|
50 |
+
if lang_name in ['Hebrew', 'Arabic', 'Spanish', 'French']:
|
51 |
+
pipelines[lang_name] = pipeline("translation", model=lang_info['model'])
|
52 |
+
return pipelines
|
53 |
+
except FileNotFoundError:
|
54 |
+
# Fallback to hardcoded pipelines if JSON file not found
|
55 |
+
print("Warning: lang_model_map.json not found. Using fallback pipelines.")
|
56 |
+
return {
|
57 |
+
'Hebrew': pipeline("translation", model="Helsinki-NLP/opus-mt-en-he"),
|
58 |
+
'Arabic': pipeline("translation", model="Helsinki-NLP/opus-mt-en-ar"),
|
59 |
+
'Spanish': pipeline("translation", model="Helsinki-NLP/opus-mt-en-es"),
|
60 |
+
'French': pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
|
61 |
+
}
|
62 |
+
|
63 |
+
# Global variables for caching pipelines
|
64 |
+
language_model_map = load_language_model_map()
|
65 |
+
target_pipelines = get_translation_pipelines()
|
66 |
+
|
67 |
+
def detect_language(text):
|
68 |
+
"""Detect the language of input text"""
|
69 |
+
try:
|
70 |
+
detected_lang = detect(text)
|
71 |
+
return detected_lang, LANGUAGE_NAMES.get(detected_lang, detected_lang)
|
72 |
+
except:
|
73 |
+
return 'unknown', 'Unknown'
|
74 |
+
|
75 |
+
def translate_to_english(text, source_lang):
|
76 |
+
"""Translate text from source language to English"""
|
77 |
+
if source_lang == 'en':
|
78 |
+
return text
|
79 |
+
|
80 |
+
if source_lang in language_model_map:
|
81 |
+
try:
|
82 |
+
model_name = language_model_map[source_lang]
|
83 |
+
translator = pipeline("translation", model=model_name)
|
84 |
+
result = translator(text, max_length=512)
|
85 |
+
return result[0]['translation_text']
|
86 |
+
except Exception as e:
|
87 |
+
return f"Translation error: {str(e)}"
|
88 |
+
else:
|
89 |
+
return "Translation model not available for this language"
|
90 |
+
|
91 |
+
def translate_from_english(text, target_languages):
|
92 |
+
"""Translate English text to target languages"""
|
93 |
+
translations = {}
|
94 |
+
|
95 |
+
for lang_name in target_languages:
|
96 |
+
if lang_name in target_pipelines:
|
97 |
+
try:
|
98 |
+
result = target_pipelines[lang_name](text, max_length=512)
|
99 |
+
translations[lang_name] = result[0]['translation_text']
|
100 |
+
except Exception as e:
|
101 |
+
translations[lang_name] = f"Error: {str(e)}"
|
102 |
+
else:
|
103 |
+
translations[lang_name] = "Model not available"
|
104 |
+
|
105 |
+
return translations
|
106 |
+
|
107 |
+
def smart_translate(input_text, target_lang1, target_lang2, target_lang3):
|
108 |
+
"""Main translation function"""
|
109 |
+
if not input_text.strip():
|
110 |
+
return "Please enter text to translate", "", "", "", "", ""
|
111 |
+
|
112 |
+
# Detect source language
|
113 |
+
source_lang_code, source_lang_name = detect_language(input_text)
|
114 |
+
|
115 |
+
# Translate to English first if not already English
|
116 |
+
english_text = translate_to_english(input_text, source_lang_code)
|
117 |
+
|
118 |
+
# Get target languages list
|
119 |
+
target_languages = []
|
120 |
+
if target_lang1: target_languages.append(target_lang1)
|
121 |
+
if target_lang2: target_languages.append(target_lang2)
|
122 |
+
if target_lang3: target_languages.append(target_lang3)
|
123 |
+
|
124 |
+
# Translate to target languages
|
125 |
+
translations = translate_from_english(english_text, target_languages)
|
126 |
+
|
127 |
+
# Format results
|
128 |
+
result_text = f"**Original Text:** {input_text}\n\n"
|
129 |
+
result_text += f"**Detected Language:** {source_lang_name} ({source_lang_code})\n\n"
|
130 |
+
|
131 |
+
if source_lang_code != 'en':
|
132 |
+
result_text += f"**English Translation:** {english_text}\n\n"
|
133 |
+
|
134 |
+
result_text += "**Translations:**\n"
|
135 |
+
for lang, translation in translations.items():
|
136 |
+
result_text += f"• **{lang}:** {translation}\n"
|
137 |
+
|
138 |
+
# Return individual translations for display
|
139 |
+
trans1 = translations.get(target_lang1, "") if target_lang1 else ""
|
140 |
+
trans2 = translations.get(target_lang2, "") if target_lang2 else ""
|
141 |
+
trans3 = translations.get(target_lang3, "") if target_lang3 else ""
|
142 |
+
|
143 |
+
return result_text, source_lang_name, english_text, trans1, trans2, trans3
|
144 |
+
|
145 |
+
# Create and launch the Gradio interface
|
146 |
+
target_options = list(target_pipelines.keys())
|
147 |
+
|
148 |
+
with gr.Blocks(title="Smart Multilingual Translator", theme=gr.themes.Soft()) as interface:
|
149 |
+
gr.Markdown("""
|
150 |
+
# Smart Multilingual Translator
|
151 |
+
### Powered by Hugging Face Transformers
|
152 |
+
|
153 |
+
This application automatically detects the language of your input text and translates it to your selected target languages.
|
154 |
+
""")
|
155 |
+
|
156 |
+
with gr.Row():
|
157 |
+
with gr.Column(scale=2):
|
158 |
+
input_text = gr.Textbox(
|
159 |
+
label="Input Text",
|
160 |
+
placeholder="Enter text in any language...",
|
161 |
+
lines=5
|
162 |
+
)
|
163 |
+
|
164 |
+
with gr.Row():
|
165 |
+
target_lang1 = gr.Dropdown(
|
166 |
+
choices=target_options,
|
167 |
+
label="Target Language 1",
|
168 |
+
value="Hebrew"
|
169 |
+
)
|
170 |
+
target_lang2 = gr.Dropdown(
|
171 |
+
choices=target_options,
|
172 |
+
label="Target Language 2",
|
173 |
+
value="Arabic"
|
174 |
+
)
|
175 |
+
target_lang3 = gr.Dropdown(
|
176 |
+
choices=target_options,
|
177 |
+
label="Target Language 3",
|
178 |
+
value="Spanish"
|
179 |
+
)
|
180 |
+
|
181 |
+
translate_btn = gr.Button("🔄 Translate", variant="primary", size="lg")
|
182 |
+
|
183 |
+
with gr.Column(scale=3):
|
184 |
+
result_display = gr.Markdown(label="Translation Results")
|
185 |
+
|
186 |
+
with gr.Row():
|
187 |
+
with gr.Column():
|
188 |
+
detected_lang = gr.Textbox(label="Detected Language", interactive=False)
|
189 |
+
with gr.Column():
|
190 |
+
english_trans = gr.Textbox(label="English Translation", interactive=False)
|
191 |
+
|
192 |
+
with gr.Row():
|
193 |
+
trans1_output = gr.Textbox(label="Translation 1", interactive=False)
|
194 |
+
trans2_output = gr.Textbox(label="Translation 2", interactive=False)
|
195 |
+
trans3_output = gr.Textbox(label="Translation 3", interactive=False)
|
196 |
+
|
197 |
+
# Event handlers
|
198 |
+
translate_btn.click(
|
199 |
+
fn=smart_translate,
|
200 |
+
inputs=[input_text, target_lang1, target_lang2, target_lang3],
|
201 |
+
outputs=[result_display, detected_lang, english_trans, trans1_output, trans2_output, trans3_output]
|
202 |
+
)
|
203 |
+
|
204 |
+
gr.Markdown("""
|
205 |
+
---
|
206 |
+
## Supported Languages
|
207 |
+
### Language Detection (Input)
|
208 |
+
Arabic (ar) - العربية
|
209 |
+
Dutch (nl) - Nederlands
|
210 |
+
English (en) - English
|
211 |
+
French (fr) - Français
|
212 |
+
German (de) - Deutsch
|
213 |
+
Hebrew (he) - עברית
|
214 |
+
Italian (it) - Italiano
|
215 |
+
Portuguese (pt) - Português
|
216 |
+
Russian (ru) - Русский
|
217 |
+
Spanish (es) - Español
|
218 |
+
|
219 |
+
### Target Languages (Output)
|
220 |
+
Hebrew (he) - עברית
|
221 |
+
Arabic (ar) - العربية
|
222 |
+
Spanish (es) - Español
|
223 |
+
French (fr) - Français
|
224 |
+
|
225 |
+
### Models Used:
|
226 |
+
- **Language Detection:** langdetect
|
227 |
+
- **Translation Models:** Helsinki-NLP MarianMT models from Hugging Face
|
228 |
+
- **Configuration:** Models loaded from lang_model_map.json
|
229 |
+
""")
|
230 |
+
|
231 |
+
interface.launch(share=True)
|
lang_model_map_json.json
ADDED
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"language_to_model_mapping": {
|
3 |
+
"input_languages": {
|
4 |
+
"ar": {
|
5 |
+
"name": "Arabic",
|
6 |
+
"model": "Helsinki-NLP/opus-mt-ar-en",
|
7 |
+
"direction": "to_english"
|
8 |
+
},
|
9 |
+
"fr": {
|
10 |
+
"name": "French",
|
11 |
+
"model": "Helsinki-NLP/opus-mt-fr-en",
|
12 |
+
"direction": "to_english"
|
13 |
+
},
|
14 |
+
"de": {
|
15 |
+
"name": "German",
|
16 |
+
"model": "Helsinki-NLP/opus-mt-de-en",
|
17 |
+
"direction": "to_english"
|
18 |
+
},
|
19 |
+
"es": {
|
20 |
+
"name": "Spanish",
|
21 |
+
"model": "Helsinki-NLP/opus-mt-es-en",
|
22 |
+
"direction": "to_english"
|
23 |
+
},
|
24 |
+
"he": {
|
25 |
+
"name": "Hebrew",
|
26 |
+
"model": "Helsinki-NLP/opus-mt-he-en",
|
27 |
+
"direction": "to_english"
|
28 |
+
},
|
29 |
+
"ru": {
|
30 |
+
"name": "Russian",
|
31 |
+
"model": "Helsinki-NLP/opus-mt-ru-en",
|
32 |
+
"direction": "to_english"
|
33 |
+
},
|
34 |
+
"it": {
|
35 |
+
"name": "Italian",
|
36 |
+
"model": "Helsinki-NLP/opus-mt-it-en",
|
37 |
+
"direction": "to_english"
|
38 |
+
},
|
39 |
+
"pt": {
|
40 |
+
"name": "Portuguese",
|
41 |
+
"model": "Helsinki-NLP/opus-mt-pt-en",
|
42 |
+
"direction": "to_english"
|
43 |
+
},
|
44 |
+
"nl": {
|
45 |
+
"name": "Dutch",
|
46 |
+
"model": "Helsinki-NLP/opus-mt-nl-en",
|
47 |
+
"direction": "to_english"
|
48 |
+
}
|
49 |
+
},
|
50 |
+
"output_languages": {
|
51 |
+
"Hebrew": {
|
52 |
+
"code": "he",
|
53 |
+
"model": "Helsinki-NLP/opus-mt-en-he",
|
54 |
+
"direction": "from_english"
|
55 |
+
},
|
56 |
+
"Arabic": {
|
57 |
+
"code": "ar",
|
58 |
+
"model": "Helsinki-NLP/opus-mt-en-ar",
|
59 |
+
"direction": "from_english"
|
60 |
+
},
|
61 |
+
"Spanish": {
|
62 |
+
"code": "es",
|
63 |
+
"model": "Helsinki-NLP/opus-mt-en-es",
|
64 |
+
"direction": "from_english"
|
65 |
+
},
|
66 |
+
"French": {
|
67 |
+
"code": "fr",
|
68 |
+
"model": "Helsinki-NLP/opus-mt-en-fr",
|
69 |
+
"direction": "from_english"
|
70 |
+
},
|
71 |
+
"German": {
|
72 |
+
"code": "de",
|
73 |
+
"model": "Helsinki-NLP/opus-mt-en-de",
|
74 |
+
"direction": "from_english"
|
75 |
+
},
|
76 |
+
"Russian": {
|
77 |
+
"code": "ru",
|
78 |
+
"model": "Helsinki-NLP/opus-mt-en-ru",
|
79 |
+
"direction": "from_english"
|
80 |
+
},
|
81 |
+
"Italian": {
|
82 |
+
"code": "it",
|
83 |
+
"model": "Helsinki-NLP/opus-mt-en-it",
|
84 |
+
"direction": "from_english"
|
85 |
+
}
|
86 |
+
}
|
87 |
+
}
|
88 |
+
}
|
requirements.txt
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio==4.29.0
|
2 |
+
transformers==4.41.1
|
3 |
+
torch==2.2.2
|
4 |
+
langdetect==1.0.9
|
5 |
+
sentencepiece==0.1.97
|
6 |
+
protobuf==3.20.0
|
7 |
+
accelerate==0.21.0
|
8 |
+
datasets==2.0.0
|