So this is just a SFT "distill" of Magistral-Medium ?

by gghfez - opened 15 days ago

Discussion

gghfez

15 days ago

Hi, I'm just making sure I understand.

You basically did this: cognitivecomputations/Dolphin3.0-R1-Mistral-24B

But using Magistral-Medium to generate the traces, as opposed to DeepSeek-R1 like cognitivecomputations did?

ggnoy

15 days ago

https://mistral.ai/static/research/magistral.pdf

Read the paper. RL and then SFT on top.

pandora-s

Mistral AI_ org 12 days ago

Hi there, as mentionned in the paper, it was:

Mistral Medium + RL = Magistral Medium
Mistral Small + SFT (from Magistral Medium) + RL = Magistral Small
Both had RL

gghfez

12 days ago

Thanks, I was feeling groggy / missed it when I read the paper the first time.

FashionStash

10 days ago

I don't see any difference with Gemini Flash 2.5.

I mean, it's a model that answer in my native language (French one), but finally, where are the difference between that model and Gemini?

I liked the Magistral AI answer to "Wow Me" But.. Not saw anything about a significant change.

Perhaps my question wasn't not so accurate?

fradav

8 days ago

I don't see any difference with Gemini Flash 2.5.

I mean, it's a model that answer in my native language (French one), but finally, where are the difference between that model and Gemini?

I liked the Magistral AI answer to "Wow Me" But.. Not saw anything about a significant change.

Perhaps my question wasn't not so accurate?

Are you aware of the difference in model size between them ?

Quantized ("good enough") Magistral can run on commodity hardware with 24 Go of RAM, mac M1, GPU with 24 Go of VRAM etc.)
Gemini Flash 2.5 runs on Google cloud infrastructure, probably on ~1000 Go of TPU/GPU RAM or something equivalent.

If you don’t see that much differences in both, that is a magistral (pun intended) win for Mistral AI.

FashionStash

2 days ago

Thanks for your answer.

Are you aware of the difference in model size between them ?

No. I didn't know, it's impressive. Indeed, great words play. I think you're right. 👍🏻

But I prefer letting people more experienced than me to judge if it is the case, so I can't feedback more on the win topic! 😊

I will then try more discussions to explore differences between Mistral MagistralAI and Google Gemini Flash 2.5. 😉

By the way, the current AI model online, I think is really nice to read is the one used by Perplexity AI. Through their website of the same name : Perplexity.ai
Is Perplexity AI, a proprietary model or something on HuggingFace can approach its discussion efficiently?🤔

Which great LiteRT model on HuggingFace of Mistral can works very great on Google Edge Gallery Android application?

Because I'm pretty sure an open-source Mistral model can works on CPU likely how Hammer 2.1-1.5b (Hammer2.1-1.5b_multi-prefill-seq_q8_ekv1280.task) was doing it for me.

https://huggingface.co/litert-community/Hammer2.1-1.5b

Thanks a lot for all future answers.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment