Curious - Yarn setting different from Qwen3 repo for 128k?

#1
by DavidAU - opened

Note the yarn setting for 4B Qwen 3 as per Qwen's repo is:

"rope_scaling": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}

Noticed yours is different?

"rope_scaling": {
"factor": 3.2,
"original_max_position_embeddings": 40960,
"rope_type": "yarn"
},

Does this impact performance?

Menlo Research org

Hi our test result is coming from

"rope_scaling": {
"factor": 3.2,
"original_max_position_embeddings": 40960,
"rope_type": "yarn"
},

There should be no issue with current config don't worry

We have benchmarked everything using this config, you should get the same result with this config.

We're re-benchmarking the config from Qwen team, we're a bit confused atm but it should affect nothing from performance perspective.

Will update result soon! If the result is better will use the new config, else this should be fine.

Thank you for quick update.

I have used Yarn to extend the Qwen3s to 320k ... but if your method works better - all the better!

Sign up or log in to comment