DeepSeek-R1T-Chimera

TNG Logo


Model merge of DeepSeek-R1 and DeepSeek-V3 (0324)

An open weights model combining the intelligence of R1 with the token efficiency of V3.

For details on the construction process and analyses of Chimera model variants, please read our paper.

Paper on arXiV | Announcement on X | LinkedIn post | Try it on OpenRouter

Model Details

  • Architecture: DeepSeek-MoE Transformer-based language model
  • Combination Method: Merged model weights from DeepSeek-R1 and DeepSeek-V3 (0324)
  • Release Date: 2025-04-27

Use, Out-of-scope Use, Limitations, Risks, Recommendations et al

Regarding R1T Chimera, we ask you to follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model.

These guidelines are available here on Hugging Face.

Contact

Citation

@misc{tng_technology_consulting_gmbh_2025,
    author       = { TNG Technology Consulting GmbH },
    title        = { DeepSeek-R1T-Chimera },
    year         = 2025,
    month        = {April},
    url          = { https://huggingface.co/tngtech/DeepSeek-R1T-Chimera },
    doi          = { 10.57967/hf/5330 },
    publisher    = { Hugging Face }
}
Downloads last month
2,496
Safetensors
Model size
685B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 39 Ask for provider support

Model tree for tngtech/DeepSeek-R1T-Chimera

Quantized
(56)
this model
Quantizations
3 models

Spaces using tngtech/DeepSeek-R1T-Chimera 5