What's the correct reasoning format?

#2
by BigBeavis - opened

[THINK] [/THINK] or or something else?

Ah, I probably should've mentioned that in the model card. There's no thinking on this model, if it still retains the capability of it at all from the original though, it'll be [THINK] [/THINK] but you'll need to prefill it (even then I'm not sure if it'll work).

Yeah, looks like it can think, but can't end the thinking segment properly, transitioning into the following reply part without placing the end of thinking tag.

What he means is that it's not a chain of thought model, it's not been trained to be. Its just pretending when you add the tag (it'll even close it on occasions too). Any model can pretend to "think" if you add a tag like [think], but that "thinking" is just play pretend.

(great update, btw)

@SerialKicked this finetune is based on magistral, which was made with reasoning capabilities, and was trained to recognize [THINK] and [/THINK] tags.

that's why made the original post, because i assumed this release would maintain that trait, but quickly discovered that [/THINK] tag isn't being returned, so i went to ask what tags were used in the finetuning data, guessing they might've used data with < think > tag instead, which seems to be the most popular due to Qwen. But the authors have said they didn't use any data with think tags at all. Which explains the current behavior.

The model has partially kept the reasoning capabilities of its base model, and it's not as you say, it's not "play pretend" (whatever that means, all models play pretend if you want to go that way). If you put the thinking tag in the pre-fill, it actually gives very solid and on-point, high quality reasoning, and then continues with a post-reasoning reply that adheres to it. The only problem is, due to the dilution through fine-tuning, the model's emphasis on putting the end-thinking tag got suppressed, so it forgets to use it 75% of the time.

Let's reiterate this: the model has kept the strong reasoning capabilities of its base model, but it can't reliably put the corresponding end-of-reasoning tag where it should be, even though in its response it goes on to "the actual" reply part post-thinking.

@zerofata I kind of understand why you didn't include reasoning examples in your training data, probably just used the same dataset as with the previous versions of the PaintedFantasy? But imo the quality of reasoning behavior retained in the model from its predecessor demonstrated in this release (with a corresponding improvement in quality of reply afterwards (and system prompt adherence), compared to not using the tag in pre-fill), at least shows there's merit in giving a redo a go as a separate release trained with some reasoning data using the original [THINK] and [/THINK] tags included.

I've been keeping an eye out for ideas on creating some good reasoning datasets recently, but need to think about a better way to create them as I'm picky with how I want the reasoning data to work (concise, uncensored, in character and accurate) and how they get handled in training frameworks can get pretty funky (Do you train them to operate on /nothink tags like GLM, require a [THINK] prefill or just yolo it etc).

I've done magistral before with a thinking dataset where it reasoned correctly, but wasn't a big fan of the output as it only followed ~80% of the reasoning it generated and did big yaps so never released it. Does make me wonder if this model is as verbose as it is due to expecting some of the token budget to be used by thinking though.

Once I get a reasoning dataset I like sorted out, this'll definitely be tried again though.

Sign up or log in to comment