Curious - Yarn setting different from Qwen3 repo for 128k?

by DavidAU - opened 1 day ago

1 day ago

Note the yarn setting for 4B Qwen 3 as per Qwen's repo is:

"rope_scaling": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}

Noticed yours is different?

"rope_scaling": {
"factor": 3.2,
"original_max_position_embeddings": 40960,
"rope_type": "yarn"
},

Does this impact performance?

Menlo Research org about 23 hours ago

Hi our test result is coming from

"rope_scaling": {
"factor": 3.2,
"original_max_position_embeddings": 40960,
"rope_type": "yarn"
},

Menlo Research org about 23 hours ago

•

There should be no issue with current config don't worry

We have benchmarked everything using this config, you should get the same result with this config.

We're re-benchmarking the config from Qwen team, we're a bit confused atm but it should affect nothing from performance perspective.

Will update result soon! If the result is better will use the new config, else this should be fine.

about 22 hours ago

Thank you for quick update.

I have used Yarn to extend the Qwen3s to 320k ... but if your method works better - all the better!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment