Was the 7.5T Token Continual Pre-Training Performed on the Instruction-Tuned Model or the Base PLM?
Hello, and thank you for your impressive work on the MiniMax-M1 project!
In the paper, it is mentioned that a 7.5T token continual pre-training (CPT) was performed based on the MiniMax-Text-01 model.
To clarify, was this CPT applied to the instruction-tuned model (MiniMax-Text-01 Instruction), or was it conducted on the base pretrained language model (MiniMax-Text-01 PLM) before any instruction tuning?
(Here, “MiniMax-Text-01 Instruction” and “MiniMax-Text-01 PLM” are just placeholder terms for clarity.)
Understanding this detail would help clarify how CPT fits into the overall training pipeline and what its intended role is.
Thank you in advance!
Thank you for the clarification!
@sriting
I have a follow-up question regarding the implementation details.
Is it correct that there are no special tokens such as and to explicitly mark reasoning paths (i.e., Chain-of-Thought segments) in the model output?
I ask this because some recent models (e.g., Qwen3, DeepSeek-MoE) adopt such tags to separate reasoning and final answers. I’m curious whether MiniMax-M1 internally uses similar markers during SFT or RL training, or if reasoning is handled purely implicitly through instruction and response formatting.
Thanks again for your time and support!