--- license: apache-2.0 base_model: openai/gpt-oss-20b tags: - multilingual - reasoning - thinking - fine-tuned - lora - conversational language: - multilingual - en - es - ar - fr - de - zh - ja - ko - hi - ru datasets: - HuggingFaceH4/Multilingual-Thinking library_name: transformers pipeline_tag: text-generation --- # GPT-OSS-NEMO-20B: Multilingual Thinking Model ## Model Description **GPT-OSS-NEMO-20B** is a fine-tuned version of OpenAI's GPT-OSS-20B model, specifically enhanced for multilingual reasoning and thinking capabilities. This model has been trained using Supervised Fine-Tuning (SFT) on the HuggingFaceH4/Multilingual-Thinking dataset to improve its ability to reason in multiple languages while maintaining strong performance across diverse linguistic contexts. ## Key Features - 🌍 **Multilingual Reasoning**: Enhanced ability to think and reason in multiple languages - 🧠 **Chain-of-Thought**: Improved reasoning capabilities with explicit thinking processes - 💬 **Conversational**: Optimized for interactive dialogue and question-answering - 🎯 **Cross-lingual**: Can reason in one language and respond in another - ⚡ **High Performance**: Built on the robust 20B parameter GPT-OSS foundation ## Training Details ### Base Model - **Model**: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) - **Parameters**: 20 billion parameters - **Architecture**: GPT-OSS (Mixture of Experts) ### Fine-tuning Configuration - **Method**: LoRA (Low-Rank Adaptation) - **Rank (r)**: 8 - **Alpha**: 16 - **Target Modules**: All linear layers with specific focus on MoE expert layers - **Target Parameters**: - Layer 7, 15, 23 MLP experts (gate_up_proj, down_proj) ### Training Infrastructure - **Hardware**: 4x NVIDIA H100 GPUs - **Cloud Platform**: Microsoft Azure NC-series instances - **Training Framework**: TRL (Transformers Reinforcement Learning) - **Optimization**: AdamW with cosine learning rate scheduling ### Training Hyperparameters - **Learning Rate**: 2e-4 - **Batch Size**: 4 per device (16 total with 4 GPUs) - **Gradient Accumulation**: 4 steps - **Epochs**: 4 - **Max Sequence Length**: 2048 tokens - **Warmup Ratio**: 3% - **LR Scheduler**: Cosine with minimum LR (10% of peak) - **Gradient Checkpointing**: Enabled ### Dataset - **Name**: [HuggingFaceH4/Multilingual-Thinking](https://huggingface.co/datasets/HuggingFaceH4/Multilingual-Thinking) - **Purpose**: Multilingual reasoning and thinking enhancement - **Languages**: Multiple languages including English, Spanish, Arabic, French, German, Chinese, Japanese, Korean, Hindi, Russian - **Training Split**: Full training set ## Usage ### Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained( "justinj92/gpt-oss-nemo-20b", torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("justinj92/gpt-oss-nemo-20b") # Example: Multilingual reasoning messages = [ {"role": "system", "content": "reasoning language: Arabic"}, {"role": "user", "content": "¿Cuál es la capital de Australia?"} ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ) outputs = model.generate( inputs, max_new_tokens=512, temperature=0.6, do_sample=True ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Advanced Usage with Custom Reasoning Language ```python # Specify reasoning language in system prompt reasoning_language = "French" # Can be any supported language system_prompt = f"reasoning language: {reasoning_language}" messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": "Explain quantum computing in simple terms."} ] ``` ## Model Capabilities ### Multilingual Reasoning The model can: - Think and reason in a specified language (via system prompt) - Process questions in one language and reason in another - Maintain coherent logic across language boundaries - Provide explanations with explicit reasoning steps ### Language Support Primary languages include: - **English** (en) - **Spanish** (es) - **Arabic** (ar) - **French** (fr) - **German** (de) - **Chinese** (zh) - **Japanese** (ja) - **Korean** (ko) - **Hindi** (hi) - **Russian** (ru) ## Performance The model demonstrates improved performance in: - Cross-lingual reasoning tasks - Multi-step problem solving - Contextual understanding across languages - Maintaining coherence in multilingual conversations ## Limitations - Performance may vary across different languages - Complex reasoning in low-resource languages may be limited - Generated content should be verified for factual accuracy - May exhibit biases present in the training data ## Technical Specifications - **Model Size**: ~20B parameters - **Precision**: BF16 (Brain Floating Point 16-bit) - **Memory Requirements**: ~40GB VRAM for inference - **Recommended Hardware**: NVIDIA A100/H100 or similar high-memory GPUs - **Framework Compatibility**: transformers, torch, accelerate ## Citation If you use this model in your research, please cite: ```bibtex @misc{gpt-oss-nemo-20b, title={GPT-OSS-NEMO-20B: A Multilingual Thinking Model}, author={justinj92}, year={2025}, howpublished={\url{https://huggingface.co/justinj92/gpt-oss-nemo-20b}}, note={Fine-tuned from openai/gpt-oss-20b using HuggingFaceH4/Multilingual-Thinking} } ``` ## Acknowledgments - **Base Model**: OpenAI GPT-OSS-20B team - **Dataset**: HuggingFace H4 team for the Multilingual-Thinking dataset - **Infrastructure**: Microsoft Azure for cloud computing resources - **Framework**: Hugging Face transformers and TRL libraries ## License This model is released under the Apache 2.0 license, following the base model's licensing terms. --- *Model trained on August 2025 using state-of-the-art multilingual reasoning techniques.*