Chancy commited on
Commit
339f7ec
·
verified ·
1 Parent(s): 5555079

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -39,8 +39,8 @@ Polaris is an open-source post-training method that uses reinforcement learning
39
  - **Inference-Time Length:** Polaris incorporates length extrapolation techniques for generating longer CoT at inference stage. This enables a *"train-short, generate-long"* paradigm for CoT reasoning, mitigating the computational burden of training with excessively long rollouts .
40
  - **Exploration Efficiency:** Exploration efficiency in Polaris is enhanced through multi-stage training. However, reducing the model's response length in the first stage poses potential risks. A more conservative approach would be to directly allow the model to "think longer" from the beginning.
41
 
42
- The details of our training recipe and analysis can be found in our [blog post]().
43
- The code and data for reproducing our results can be found in our [github repo]().
44
 
45
  ### Evaluation Results
46
 
 
39
  - **Inference-Time Length:** Polaris incorporates length extrapolation techniques for generating longer CoT at inference stage. This enables a *"train-short, generate-long"* paradigm for CoT reasoning, mitigating the computational burden of training with excessively long rollouts .
40
  - **Exploration Efficiency:** Exploration efficiency in Polaris is enhanced through multi-stage training. However, reducing the model's response length in the first stage poses potential risks. A more conservative approach would be to directly allow the model to "think longer" from the beginning.
41
 
42
+ The details of our training recipe and analysis can be found in our [blog post](https://hkunlp.github.io/blog/2025/Polaris).
43
+ The code and data for reproducing our results can be found in our [github repo](https://github.com/ChenxinAn-fdu/POLARIS).
44
 
45
  ### Evaluation Results
46