So this is just a SFT "distill" of Magistral-Medium ?

#6
by gghfez - opened

Hi, I'm just making sure I understand.

You basically did this: cognitivecomputations/Dolphin3.0-R1-Mistral-24B

But using Magistral-Medium to generate the traces, as opposed to DeepSeek-R1 like cognitivecomputations did?

https://mistral.ai/static/research/magistral.pdf

Read the paper. RL and then SFT on top.

Mistral AI_ org

Hi there, as mentionned in the paper, it was:

  • Mistral Medium + RL = Magistral Medium
  • Mistral Small + SFT (from Magistral Medium) + RL = Magistral Small
    Both had RL

Thanks, I was feeling groggy / missed it when I read the paper the first time.

I don't see any difference with Gemini Flash 2.5.

I mean, it's a model that answer in my native language (French one), but finally, where are the difference between that model and Gemini?

I liked the Magistral AI answer to "Wow Me" But.. Not saw anything about a significant change.

Perhaps my question wasn't not so accurate?

I don't see any difference with Gemini Flash 2.5.

I mean, it's a model that answer in my native language (French one), but finally, where are the difference between that model and Gemini?

I liked the Magistral AI answer to "Wow Me" But.. Not saw anything about a significant change.

Perhaps my question wasn't not so accurate?

Are you aware of the difference in model size between them ?

  • Quantized ("good enough") Magistral can run on commodity hardware with 24 Go of RAM, mac M1, GPU with 24 Go of VRAM etc.)
  • Gemini Flash 2.5 runs on Google cloud infrastructure, probably on ~1000 Go of TPU/GPU RAM or something equivalent.

If you don’t see that much differences in both, that is a magistral (pun intended) win for Mistral AI.

Thanks for your answer.

Are you aware of the difference in model size between them ?

No. I didn't know, it's impressive. Indeed, great words play. I think you're right. 👍🏻

But I prefer letting people more experienced than me to judge if it is the case, so I can't feedback more on the win topic! 😊

I will then try more discussions to explore differences between Mistral MagistralAI and Google Gemini Flash 2.5. 😉

By the way, the current AI model online, I think is really nice to read is the one used by Perplexity AI. Through their website of the same name : Perplexity.ai
Is Perplexity AI, a proprietary model or something on HuggingFace can approach its discussion efficiently?🤔

Which great LiteRT model on HuggingFace of Mistral can works very great on Google Edge Gallery Android application?

Because I'm pretty sure an open-source Mistral model can works on CPU likely how Hammer 2.1-1.5b (Hammer2.1-1.5b_multi-prefill-seq_q8_ekv1280.task) was doing it for me.

Thanks a lot for all future answers.

Sign up or log in to comment