Model Card for GPT-OSS-120B

Model Details

Model Description

GPT-OSS-120B is a 120 billion parameter generative language model based on the transformer architecture. This model represents one of the largest openly available language models, designed for a wide range of natural language processing tasks including text generation, summarization, question answering, and creative content generation.

  • Developed by: MLX Community
  • Model type: Transformer-based language model
  • Language(s): English
  • License: Apache 2.0
  • Finetuned from: Base GPT architecture

Uses

Direct Use

The model can be used for:

  • Text generation and completion
  • Content summarization
  • Question answering
  • Creative writing and storytelling
  • Code generation and explanation
  • Educational content creation

Downstream Use

The model can be fine-tuned for:

  • Specialized domain applications
  • Chatbots and conversational AI
  • Content moderation
  • Sentiment analysis
  • Language translation

Out-of-Scope Use

The model should not be used for:

  • Generating harmful, abusive, or unethical content
  • Medical or legal advice without human supervision
  • Critical decision-making systems without human oversight
  • Generating misinformation or fake content
  • impersonation without consent

Bias, Risks, and Limitations

GPT-OSS-120B may exhibit biases present in its training data. Users should be aware of potential issues including:

  • Social, racial, and gender biases
  • Political and cultural biases
  • Factual inaccuracies in generated content
  • Potential for generating plausible but incorrect information
  • Sensitivity to prompt phrasing

Recommendations

Users should:

  • Verify important facts generated by the model
  • Use human oversight for critical applications
  • Consider potential biases when deploying the model
  • Implement content filtering where appropriate

How to Get Started with the Model

Use the code below to get started with the model:

from mlx_lm import load, generate

# Load the model and tokenizer
model, tokenizer = load("mlx-community/gpt-oss-120b-MXFP4-Q4")

# Generate text
messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
formatted_prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

response = generate(
    model,
    tokenizer,
    prompt=formatted_prompt,
    max_tokens=500,
    verbose=False
)

Training Details

Training Data

The model was trained on a diverse dataset of text from publicly available sources including:

  • Web pages (Common Crawl)
  • Books
  • Academic papers
  • Code repositories
  • News articles

Training Procedure

  • Architecture: Transformer decoder
  • Parameters: 120 billion
  • Precision: 4-bit quantized (MXFP4-Q4)
  • Context length: 2048 tokens

Evaluation

Results

The model demonstrates strong performance on:

  • Language understanding tasks
  • Creative writing
  • Technical explanation
  • Code generation
  • Multi-step reasoning

Evaluation Factors

  • Perplexity on held-out test sets
  • Human evaluation of generated content
  • Task-specific benchmarks

Environmental Impact

  • Hardware Type: Apple Silicon (M-series)
  • Hours used: Training details not specified
  • Cloud Provider: Not applicable
  • Compute Region: Not specified
  • Carbon Emitted: Information not available

Technical Specifications

Model Architecture and Objective

GPT-OSS-120B uses a transformer decoder architecture with:

  • 120 billion parameters
  • 4-bit quantization
  • Rotary positional embeddings
  • Learned vocabulary of 50,000 tokens

Compute Infrastructure

  • Hardware: Optimized for Apple Silicon with MLX
  • Training Infrastructure: Not specified

Training Data

The model was trained on a diverse corpus of text data from publicly available sources.

Citation

BibTeX:

@misc{gpt-oss-120b,
  title = {GPT-OSS-120B: A 120B Parameter Open Language Model},
  author = {MLX Community},
  year = {2024},
  howpublished = {\url{https://huggingface.co/mlx-community/gpt-oss-120b-MXFP4-Q4}},
}

Glossary

  • Transformer: Neural network architecture using self-attention mechanisms
  • Quantization: Technique to reduce model size by using lower precision numbers
  • MLX: Machine learning framework for Apple Silicon

More Information

For more information about the model, training process, or usage guidelines, please refer to the documentation on the Hugging Face model page.

Model Card Authors: MLX Community

Model Card Contact: For questions about this model card, please use the discussion forum on the Hugging Face model page.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for TroglodyteDerivations/GPT_OSS_120B_

Finetuned
(67)
this model