dots1
🤗 Hugging Face | 📑 Paper
🖥️ Demo | 💬 WeChat (微信) | 📕 rednote
Visit our Hugging Face (click links above), search checkpoints with names starting with dots.llm1
or visit the dots1 collection, and you will find all you need! Enjoy!
News
- 2025.06.06: We released the
dots.llm1
series. Check our report for more details!
1. Introduction
The dots.llm1
model is a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models.
Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1
achieves performance comparable to Qwen2.5-72B after pretrained on high-quality corpus without synthetic data. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.
2. Model Summary
This repo contains the base and instruction-tuned dots.llm1
model. which has the following features:
- Type: A MoE model with 14B activated and 142B total parameters trained on high-quality corpus.
- Training Stages: Pretraining and SFT.
- Architecture: Multi-head Attention with QK-Norm in attention Layer, fine-grained MoE utilizing top-6 out of 128 routed experts, plus 2 shared experts.
- Number of Layers: 62
- Number of Attention Heads: 32
- Supported Languages: English, Chinese
- Context Length: 32,768 tokens
- License: MIT
The highlights from dots.llm1
include:
- Enhanced Data Processing: We propose a scalable and fine-grained three-stage data processing framework designed to generate large-scale, high-quality and diverse data for pretraining.
- No Synthetic Data during Pretraining: High-quality non-synthetic tokens was used in base model pretraining.
- Performance and Cost Efficiency:
dots.llm1
is an open-source model that activates only 14B parameters at inference, delivering both comprehensive capabilities and high computational efficiency. - Infrastructure: We introduce an innovative MoE all-to-all communication and computation overlapping recipe based on interleaved 1F1B pipeline scheduling and an efficient grouped GEMM implementation to boost computational efficiency.
- Open Accessibility to Model Dynamics: Intermediate model checkpoints are released spanning the entire training process, facilitating future research into the learning dynamics of large language models.
3. dots.llm1.inst.FP8-dynamic
We release the quantized dots.llm1.inst.FP8-dynamic
model, which retains approximately 98% of the original performance after quantization.
Docker (vllm)
For convenience, we recommend running vLLM inference using our Docker image rednotehilab/dots1:vllm-openai-v0.9.1
, , which is available on Docker Hub.
python3 -m vllm.entrypoints.openai.api_server \
--model rednote-hilab/dots.llm1.inst.FP8-dynamic \
--tensor-parallel-size 4 \
--pipeline-parallel-size 1 \
--trust-remote-code \
--served-model-name dots1
Inference with huggingface
We are working to merge it into Transformers (PR #38143).
Chat Completion
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name = "rednote-hilab/dots.llm1.inst-FP8-dynamic"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")
messages = [
{"role": "user", "content": "Write a piece of quicksort code in C++"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=200)
result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)
Citation
If you find dots.llm1
is useful or want to use in your projects, please kindly cite our paper:
@misc{huo2025dotsllm1technicalreport,
title={dots.llm1 Technical Report},
author={Bi Huo and Bin Tu and Cheng Qin and Da Zheng and Debing Zhang and Dongjie Zhang and En Li and Fu Guo and Jian Yao and Jie Lou and Junfeng Tian and Li Hu and Ran Zhu and Shengdong Chen and Shuo Liu and Su Guang and Te Wo and Weijun Zhang and Xiaoming Shi and Xinxin Peng and Xing Wu and Yawen Liu and Yuqiu Ji and Ze Wen and Zhenhai Liu and Zichao Li and Zilong Liao},
year={2025},
eprint={2506.05767},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.05767},
}
- Downloads last month
- 0
Model tree for rednote-hilab/dots.llm1.inst-FP8-dynamic
Base model
rednote-hilab/dots.llm1.base