NovaBenya commited on
Commit
0de54e1
·
verified ·
1 Parent(s): 86b4601

Upload 4 files

Browse files
Files changed (4) hide show
  1. README app.md +84 -0
  2. app.py +231 -0
  3. lang_model_map_json.json +88 -0
  4. requirements.txt +8 -0
README app.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Smart Multilingual Translator
2
+
3
+ A smart translation application based on Hugging Face models that automatically detects input language and translates it to multiple target languages.
4
+
5
+ ## Key Features
6
+ - **Automatic Language Detection**: Automatic detection of input text language
7
+ - **Multilingual Translation**: Translate to up to 3 target languages simultaneously
8
+ - **User-Friendly Interface**: Simple and easy-to-use Gradio interface
9
+ - **Multiple Language Support**: Support for dozens of different languages
10
+
11
+
12
+ ## Project Structure
13
+ ```
14
+ smart-translator/
15
+ ├── app.py # Main application
16
+ ├── requirements.txt # Python dependencies
17
+ ├── README.md # Project documentation
18
+ ├── lang_model_map.json # Language to model mapping
19
+ └── assets/ # Assets folder
20
+ └── screenshot.png # Screenshot
21
+ ```
22
+
23
+ ## Supported Languages
24
+ ### Language Detection (Input)
25
+ Arabic (ar) - العربية
26
+ Dutch (nl) - Nederlands
27
+ English (en) - English
28
+ French (fr) - Français
29
+ German (de) - Deutsch
30
+ Hebrew (he) - עברית
31
+ Italian (it) - Italiano
32
+ Portuguese (pt) - Português
33
+ Russian (ru) - Русский
34
+ Spanish (es) - Español
35
+
36
+ ### Target Languages (Output)
37
+ Hebrew (he) - עברית
38
+ Arabic (ar) - العربية
39
+ Spanish (es) - Español
40
+ French (fr) - Français
41
+
42
+ ## Models Used in the Application
43
+ | Model | Advantages | Disadvantages |
44
+ |-------|------------|---------------|
45
+ | **langdetect**: Library based on Google's language-detection library | Fast, lightweight, supports many languages | Less accurate with short texts |
46
+ | **Helsinki-NLP MarianMT** | Fast, lightweight, good quality for European languages | Less effective for Asian languages |
47
+
48
+
49
+ ## Local Installation
50
+
51
+ 1. Install dependencies:
52
+ ```bash
53
+ pip install -r requirements.txt
54
+ ```
55
+
56
+ 2. Run the application:
57
+ ```bash
58
+ python app.py
59
+ ```
60
+
61
+ ## Technical Implementation
62
+
63
+ ### Translation Pipeline
64
+ 1. **Language Detection**: Uses langdetect library
65
+ 2. **English Intermediary**: All translations route through English
66
+ 3. **Target Translation**: English text translated to selected languages
67
+ 4. **Error Handling**: Graceful fallbacks for unsupported languages
68
+
69
+ ## Requirements
70
+
71
+ - gradio==4.29.0
72
+ - transformers==4.41.1
73
+ - torch==2.2.2
74
+ - langdetect==1.0.9
75
+ - sentencepiece==0.1.97
76
+ - protobuf==3.20.0
77
+ - accelerate==0.21.0
78
+ - datasets==2.0.0
79
+
80
+ ## License
81
+
82
+ This project is open source under the MIT License.
83
+
84
+ ---
app.py ADDED
@@ -0,0 +1,231 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import json
3
+ from langdetect import detect
4
+ from transformers import pipeline
5
+ import warnings
6
+ warnings.filterwarnings("ignore")
7
+
8
+ # Load language to model mapping
9
+ def load_language_model_map():
10
+ """Load mapping between languages and translation models"""
11
+ return {
12
+ 'ar': 'Helsinki-NLP/opus-mt-ar-en', # Arabic to English
13
+ 'fr': 'Helsinki-NLP/opus-mt-fr-en', # French to English
14
+ 'de': 'Helsinki-NLP/opus-mt-de-en', # German to English
15
+ 'es': 'Helsinki-NLP/opus-mt-es-en', # Spanish to English
16
+ 'he': 'Helsinki-NLP/opus-mt-he-en', # Hebrew to English
17
+ 'ru': 'Helsinki-NLP/opus-mt-ru-en', # Russian to English
18
+ 'it': 'Helsinki-NLP/opus-mt-it-en', # Italian to English
19
+ }
20
+
21
+ # Language code to full name mapping
22
+ LANGUAGE_NAMES = {
23
+ 'en': 'English',
24
+ 'ar': 'Arabic',
25
+ 'fr': 'French',
26
+ 'de': 'German',
27
+ 'es': 'Spanish',
28
+ 'he': 'Hebrew',
29
+ 'ru': 'Russian',
30
+ 'it': 'Italian',
31
+ 'pt': 'Portuguese',
32
+ 'nl': 'Dutch',
33
+ 'sv': 'Swedish',
34
+ 'da': 'Danish',
35
+ 'no': 'Norwegian',
36
+ 'fi': 'Finnish'
37
+ }
38
+
39
+ # Initialize translation pipelines
40
+ def get_translation_pipelines():
41
+ """Initialize translation pipelines for different target languages from JSON"""
42
+ try:
43
+ with open('lang_model_map.json', 'r', encoding='utf-8') as f:
44
+ data = json.load(f)
45
+ # Extract output language mappings
46
+ output_langs = data['language_to_model_mapping']['output_languages']
47
+ pipelines = {}
48
+ for lang_name, lang_info in output_langs.items():
49
+ # Only load the main target languages to avoid memory issues
50
+ if lang_name in ['Hebrew', 'Arabic', 'Spanish', 'French']:
51
+ pipelines[lang_name] = pipeline("translation", model=lang_info['model'])
52
+ return pipelines
53
+ except FileNotFoundError:
54
+ # Fallback to hardcoded pipelines if JSON file not found
55
+ print("Warning: lang_model_map.json not found. Using fallback pipelines.")
56
+ return {
57
+ 'Hebrew': pipeline("translation", model="Helsinki-NLP/opus-mt-en-he"),
58
+ 'Arabic': pipeline("translation", model="Helsinki-NLP/opus-mt-en-ar"),
59
+ 'Spanish': pipeline("translation", model="Helsinki-NLP/opus-mt-en-es"),
60
+ 'French': pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
61
+ }
62
+
63
+ # Global variables for caching pipelines
64
+ language_model_map = load_language_model_map()
65
+ target_pipelines = get_translation_pipelines()
66
+
67
+ def detect_language(text):
68
+ """Detect the language of input text"""
69
+ try:
70
+ detected_lang = detect(text)
71
+ return detected_lang, LANGUAGE_NAMES.get(detected_lang, detected_lang)
72
+ except:
73
+ return 'unknown', 'Unknown'
74
+
75
+ def translate_to_english(text, source_lang):
76
+ """Translate text from source language to English"""
77
+ if source_lang == 'en':
78
+ return text
79
+
80
+ if source_lang in language_model_map:
81
+ try:
82
+ model_name = language_model_map[source_lang]
83
+ translator = pipeline("translation", model=model_name)
84
+ result = translator(text, max_length=512)
85
+ return result[0]['translation_text']
86
+ except Exception as e:
87
+ return f"Translation error: {str(e)}"
88
+ else:
89
+ return "Translation model not available for this language"
90
+
91
+ def translate_from_english(text, target_languages):
92
+ """Translate English text to target languages"""
93
+ translations = {}
94
+
95
+ for lang_name in target_languages:
96
+ if lang_name in target_pipelines:
97
+ try:
98
+ result = target_pipelines[lang_name](text, max_length=512)
99
+ translations[lang_name] = result[0]['translation_text']
100
+ except Exception as e:
101
+ translations[lang_name] = f"Error: {str(e)}"
102
+ else:
103
+ translations[lang_name] = "Model not available"
104
+
105
+ return translations
106
+
107
+ def smart_translate(input_text, target_lang1, target_lang2, target_lang3):
108
+ """Main translation function"""
109
+ if not input_text.strip():
110
+ return "Please enter text to translate", "", "", "", "", ""
111
+
112
+ # Detect source language
113
+ source_lang_code, source_lang_name = detect_language(input_text)
114
+
115
+ # Translate to English first if not already English
116
+ english_text = translate_to_english(input_text, source_lang_code)
117
+
118
+ # Get target languages list
119
+ target_languages = []
120
+ if target_lang1: target_languages.append(target_lang1)
121
+ if target_lang2: target_languages.append(target_lang2)
122
+ if target_lang3: target_languages.append(target_lang3)
123
+
124
+ # Translate to target languages
125
+ translations = translate_from_english(english_text, target_languages)
126
+
127
+ # Format results
128
+ result_text = f"**Original Text:** {input_text}\n\n"
129
+ result_text += f"**Detected Language:** {source_lang_name} ({source_lang_code})\n\n"
130
+
131
+ if source_lang_code != 'en':
132
+ result_text += f"**English Translation:** {english_text}\n\n"
133
+
134
+ result_text += "**Translations:**\n"
135
+ for lang, translation in translations.items():
136
+ result_text += f"• **{lang}:** {translation}\n"
137
+
138
+ # Return individual translations for display
139
+ trans1 = translations.get(target_lang1, "") if target_lang1 else ""
140
+ trans2 = translations.get(target_lang2, "") if target_lang2 else ""
141
+ trans3 = translations.get(target_lang3, "") if target_lang3 else ""
142
+
143
+ return result_text, source_lang_name, english_text, trans1, trans2, trans3
144
+
145
+ # Create and launch the Gradio interface
146
+ target_options = list(target_pipelines.keys())
147
+
148
+ with gr.Blocks(title="Smart Multilingual Translator", theme=gr.themes.Soft()) as interface:
149
+ gr.Markdown("""
150
+ # Smart Multilingual Translator
151
+ ### Powered by Hugging Face Transformers
152
+
153
+ This application automatically detects the language of your input text and translates it to your selected target languages.
154
+ """)
155
+
156
+ with gr.Row():
157
+ with gr.Column(scale=2):
158
+ input_text = gr.Textbox(
159
+ label="Input Text",
160
+ placeholder="Enter text in any language...",
161
+ lines=5
162
+ )
163
+
164
+ with gr.Row():
165
+ target_lang1 = gr.Dropdown(
166
+ choices=target_options,
167
+ label="Target Language 1",
168
+ value="Hebrew"
169
+ )
170
+ target_lang2 = gr.Dropdown(
171
+ choices=target_options,
172
+ label="Target Language 2",
173
+ value="Arabic"
174
+ )
175
+ target_lang3 = gr.Dropdown(
176
+ choices=target_options,
177
+ label="Target Language 3",
178
+ value="Spanish"
179
+ )
180
+
181
+ translate_btn = gr.Button("🔄 Translate", variant="primary", size="lg")
182
+
183
+ with gr.Column(scale=3):
184
+ result_display = gr.Markdown(label="Translation Results")
185
+
186
+ with gr.Row():
187
+ with gr.Column():
188
+ detected_lang = gr.Textbox(label="Detected Language", interactive=False)
189
+ with gr.Column():
190
+ english_trans = gr.Textbox(label="English Translation", interactive=False)
191
+
192
+ with gr.Row():
193
+ trans1_output = gr.Textbox(label="Translation 1", interactive=False)
194
+ trans2_output = gr.Textbox(label="Translation 2", interactive=False)
195
+ trans3_output = gr.Textbox(label="Translation 3", interactive=False)
196
+
197
+ # Event handlers
198
+ translate_btn.click(
199
+ fn=smart_translate,
200
+ inputs=[input_text, target_lang1, target_lang2, target_lang3],
201
+ outputs=[result_display, detected_lang, english_trans, trans1_output, trans2_output, trans3_output]
202
+ )
203
+
204
+ gr.Markdown("""
205
+ ---
206
+ ## Supported Languages
207
+ ### Language Detection (Input)
208
+ Arabic (ar) - العربية
209
+ Dutch (nl) - Nederlands
210
+ English (en) - English
211
+ French (fr) - Français
212
+ German (de) - Deutsch
213
+ Hebrew (he) - עברית
214
+ Italian (it) - Italiano
215
+ Portuguese (pt) - Português
216
+ Russian (ru) - Русский
217
+ Spanish (es) - Español
218
+
219
+ ### Target Languages (Output)
220
+ Hebrew (he) - עברית
221
+ Arabic (ar) - العربية
222
+ Spanish (es) - Español
223
+ French (fr) - Français
224
+
225
+ ### Models Used:
226
+ - **Language Detection:** langdetect
227
+ - **Translation Models:** Helsinki-NLP MarianMT models from Hugging Face
228
+ - **Configuration:** Models loaded from lang_model_map.json
229
+ """)
230
+
231
+ interface.launch(share=True)
lang_model_map_json.json ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "language_to_model_mapping": {
3
+ "input_languages": {
4
+ "ar": {
5
+ "name": "Arabic",
6
+ "model": "Helsinki-NLP/opus-mt-ar-en",
7
+ "direction": "to_english"
8
+ },
9
+ "fr": {
10
+ "name": "French",
11
+ "model": "Helsinki-NLP/opus-mt-fr-en",
12
+ "direction": "to_english"
13
+ },
14
+ "de": {
15
+ "name": "German",
16
+ "model": "Helsinki-NLP/opus-mt-de-en",
17
+ "direction": "to_english"
18
+ },
19
+ "es": {
20
+ "name": "Spanish",
21
+ "model": "Helsinki-NLP/opus-mt-es-en",
22
+ "direction": "to_english"
23
+ },
24
+ "he": {
25
+ "name": "Hebrew",
26
+ "model": "Helsinki-NLP/opus-mt-he-en",
27
+ "direction": "to_english"
28
+ },
29
+ "ru": {
30
+ "name": "Russian",
31
+ "model": "Helsinki-NLP/opus-mt-ru-en",
32
+ "direction": "to_english"
33
+ },
34
+ "it": {
35
+ "name": "Italian",
36
+ "model": "Helsinki-NLP/opus-mt-it-en",
37
+ "direction": "to_english"
38
+ },
39
+ "pt": {
40
+ "name": "Portuguese",
41
+ "model": "Helsinki-NLP/opus-mt-pt-en",
42
+ "direction": "to_english"
43
+ },
44
+ "nl": {
45
+ "name": "Dutch",
46
+ "model": "Helsinki-NLP/opus-mt-nl-en",
47
+ "direction": "to_english"
48
+ }
49
+ },
50
+ "output_languages": {
51
+ "Hebrew": {
52
+ "code": "he",
53
+ "model": "Helsinki-NLP/opus-mt-en-he",
54
+ "direction": "from_english"
55
+ },
56
+ "Arabic": {
57
+ "code": "ar",
58
+ "model": "Helsinki-NLP/opus-mt-en-ar",
59
+ "direction": "from_english"
60
+ },
61
+ "Spanish": {
62
+ "code": "es",
63
+ "model": "Helsinki-NLP/opus-mt-en-es",
64
+ "direction": "from_english"
65
+ },
66
+ "French": {
67
+ "code": "fr",
68
+ "model": "Helsinki-NLP/opus-mt-en-fr",
69
+ "direction": "from_english"
70
+ },
71
+ "German": {
72
+ "code": "de",
73
+ "model": "Helsinki-NLP/opus-mt-en-de",
74
+ "direction": "from_english"
75
+ },
76
+ "Russian": {
77
+ "code": "ru",
78
+ "model": "Helsinki-NLP/opus-mt-en-ru",
79
+ "direction": "from_english"
80
+ },
81
+ "Italian": {
82
+ "code": "it",
83
+ "model": "Helsinki-NLP/opus-mt-en-it",
84
+ "direction": "from_english"
85
+ }
86
+ }
87
+ }
88
+ }
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ gradio==4.29.0
2
+ transformers==4.41.1
3
+ torch==2.2.2
4
+ langdetect==1.0.9
5
+ sentencepiece==0.1.97
6
+ protobuf==3.20.0
7
+ accelerate==0.21.0
8
+ datasets==2.0.0