Curious - Yarn setting different from Qwen3 repo for 128k?
Note the yarn setting for 4B Qwen 3 as per Qwen's repo is:
"rope_scaling": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}
Noticed yours is different?
"rope_scaling": {
"factor": 3.2,
"original_max_position_embeddings": 40960,
"rope_type": "yarn"
},
Does this impact performance?
Hi our test result is coming from
"rope_scaling": {
"factor": 3.2,
"original_max_position_embeddings": 40960,
"rope_type": "yarn"
},
There should be no issue with current config don't worry
We have benchmarked everything using this config, you should get the same result with this config.
We're re-benchmarking the config from Qwen team, we're a bit confused atm but it should affect nothing from performance perspective.
Will update result soon! If the result is better will use the new config, else this should be fine.
Thank you for quick update.
I have used Yarn to extend the Qwen3s to 320k ... but if your method works better - all the better!