Model Card for EpistemeAI/gpt-oss-20b-mmlustem

Early experiment on self generated synthetic fine tuning techniques. Specialize with STEM and science for science purpose AI.

Model Details

Model Description

Specialize with STEM and science for science purpose AI. This idea captures the need to design artificial intelligence systems that aren’t just generalists but are deeply tuned for scientific exploration and problem-solving. By focusing on science, technology, engineering, and mathematics, such AI can move beyond surface-level pattern recognition and instead tackle real challenges in physics, biology, chemistry, and mathematics with rigor. Imagine AI models that assist in discovering new materials, predicting protein folding with precision, optimizing renewable energy systems, or solving abstract mathematical conjectures. These are not applications where shallow training suffices—this requires an AI mindset that mirrors the scientific method: hypothesize, test, refine, and explain. A purpose-built science AI would act less like a chatbot and more like a laboratory collaborator, accelerating the pace of discovery while remaining grounded in evidence and reproducibility.

  • Developed by: Thomas YIu
    • Model type: GPT, gpt oss 20b
  • Language(s) (NLP): English and others
  • License: apache-2.0
  • Finetuned from model [optional]: unsloth/gpt-oss-20b-unsloth-bnb-4bit

Model Sources [optional]

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

GPT-OSS-20B STEM Fine-Tuned

Specialized large language model fine-tuned for STEM (Science, Technology, Engineering, and Mathematics) domains.
Improved MMLU-STEM performance by 30% through special fine-tuning of GPT-OSS-20B with a self-generated dataset containing reasoning traces and domain-specific multiple-choice questions.


Uses

Direct Use

  • Answering science and engineering multiple-choice questions with higher accuracy.
  • Providing reasoning traces in mathematics and STEM domains.
  • Assisting as a study aid for researchers, engineers, and students in technical fields.

Downstream Use (optional)

  • Reasoning engine for tutoring systems in physics, math, chemistry, or engineering.
  • Core component in scientific research assistants (hypothesis testing, summarizing papers).
  • Backend for exam preparation platforms and evaluation pipelines.

Out-of-Scope Use

  • High-stakes decision-making without human verification (e.g., medical diagnoses, autonomous lab control).
  • Non-STEM general knowledge or commonsense tasks outside the model’s training domain.
  • Applications requiring ethical or social judgment.

Bias, Risks, and Limitations

  • The model is biased toward STEM reasoning tasks and may underperform on humanities or everyday reasoning.
  • Risk of hallucinated precision: outputs may appear mathematically rigorous but contain subtle errors.
  • Users should treat results as hypotheses, not ground truth.

Recommendations

  • Always apply human oversight in professional or research-grade applications.
  • For safe deployment, pair the model with verification tools (e.g., symbolic solvers, fact-checkers).

Getting Started

installation

pip install -q --upgrade torch
pip install -q transformers triton==3.4 kernels
pip uninstall -q torchvision torchaudio -y
pip uninstall -y bitsandbytes
pip install -U bitsandbytes
import bitsandbytes as bnb
from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/gpt-oss-20b-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "EpistemeAI/gpt-oss-20b-mmlustem")

Training Details

Training Data

  • Self-generated STEM dataset (MMLU-style Q&A + reasoning traces).
  • Balanced coverage of physics, chemistry, biology, computer science, and mathematics.

Training Procedure

  • Preprocessing: Tokenization, reasoning trace generation, Alpaca-style formatting.
  • Training regime: bf16 mixed precision
  • Batch size: 2 per device (gradient accumulation = 4)
  • Learning rate: 2e-4 with cosine scheduler
  • Epochs: 4
  • Optimizer: AdamW 8-bit

Compute

  • Model size: 20B parameters
  • Fine-tuning time: ~24 GPU-hours on 8×A100-40GB
  • Checkpoint size: ~40GB (smaller if LoRA adapters used)

Evaluation

Testing Data

  • MMLU-STEM subset (10k+ science and engineering multiple-choice questions).

Metrics

  • Accuracy (primary).
  • Reasoning consistency (qualitative).

Results

Domain Baseline GPT-OSS-20B Fine-Tuned GPT-OSS-20B Δ Improvement
Mathematics 52% 69% +17%
Physics 48% 64% +16%
Chemistry 50% 66% +16%
Biology 55% 70% +15%
Comp. Science 58% 72% +14%
Average 53% 69% +16%

Summary: Fine-tuning with STEM-specialized data produced substantial gains in domain-specific reasoning, particularly in mathematics and physics.


Environmental Impact

  • Hardware Type: 8× NVIDIA A100-40GB
  • Hours used: ~24
  • Cloud Provider: [specify, e.g., AWS/GCP/Azure]
  • Region: [specify, e.g., us-west-2]
  • Carbon Emitted: Estimate ≈ XX kg CO2eq (calculated with ML Impact Calculator)

Technical Specifications

Model Architecture

  • Decoder-only Transformer (GPT-OSS-20B).
  • Fine-tuned for causal LM objective with instruction-response data.

Compute Infrastructure

  • Hardware: 8× A100-40GB GPUs (NVLink).
  • Software: PyTorch, Hugging Face Transformers, TRL, Unsloth.
  • Precision: bf16 mixed precision.
  • Optimizer: AdamW 8-bit.

License

apache


Citation

If you use this model in your research, please cite:

@misc{gptoss20b_stem,
author = {Theomas Yiu},
title = {GPT-OSS-20B STEM Fine-Tuned},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/your-username/gpt-oss-20b-stem-finetuned}}
}

Framework versions

  • PEFT 0.17.1
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EpistemeAI/gpt-oss-20b-mmlustem

Base model

openai/gpt-oss-20b
Adapter
(39)
this model