Could you please share the specific tools and process used for evaluation DeepSeek-R1-0528-FP4?

#3
by Jaxe2a - opened

Hello,
I was reviewing the evaluation results for the DeepSeek-R1-0528-FP4 model and noticed that the official page presents detailed benchmark scores and metrics. However, I couldn’t find information on the exact tools or scripts used to generate these results.
Could you kindly share:
• What specific evaluation tools or frameworks were used?
• Are there any public scripts or documentation available for reproducing the benchmarks?
• Details about the hardware and software environment during evaluation?
Having this information would greatly help me understand and reproduce the evaluation process.
Thank you very much for your time and support!

Sign up or log in to comment