PhysiQuanty's picture
šŸ—ļø Building on HF

PhysiQuanty PRO

PhysiQuanty

AI & ML interests

Theoretical Physics, Invariant Tokenization, Standard Model of Particle Physics Applied ML šŸ‡«šŸ‡·

Recent Activity

reacted to RDTvlokip's post with šŸš€ 40 minutes ago
I finally changed the architecture of my 15M French LLM. It worked. Then I almost fooled myself about how much and catching that was the real win. After proving last time that architecture is a threshold, not a lever, I got stubborn: could I change how the model learns? Four honest attempts, Lion, a sharper AdamW β2, multi-token prediction, LayerScale. Four failures. The bottleneck wasn't the learning rule either. So I changed the shape of the computation instead: loop the same transformer blocks 4Ɨ, deeper reasoning, zero added parameters. It beat the baseline on perplexity, the first thing in the whole project to move that number. Then I added my own twist: let each token decide how deep to think, halting on its own entropy. My first evaluation was spectacular. Coherence up 65%. Hallucinated names down 62%. It was noise. Eight prompts, one seed. I re-ran on 50 prompts Ɨ 200 tokens and watched the gains shrink to "modest" and on out-of-domain prompts, recurrence actually made things worse. No universal winner. And none of it is new: it's Adaptive Computation Time (2016), the Universal Transformer (2018), and LoopViT (2026), recombined and measured honestly. The real lesson: A number from 8 prompts is a rumor. The eval harness that kills your own best result is worth more than the result it kills. Cite your lineage. Stay preliminary until multiple seeds say otherwise. The three models are live. The write-up is honest about every caveat šŸ‘‡ šŸ”— https://huggingface.co/blog/RDTvlokip/teaching-a-15m-french-llm-to-think-deeper
reacted to RDTvlokip's post with šŸ”„ 40 minutes ago
I finally changed the architecture of my 15M French LLM. It worked. Then I almost fooled myself about how much and catching that was the real win. After proving last time that architecture is a threshold, not a lever, I got stubborn: could I change how the model learns? Four honest attempts, Lion, a sharper AdamW β2, multi-token prediction, LayerScale. Four failures. The bottleneck wasn't the learning rule either. So I changed the shape of the computation instead: loop the same transformer blocks 4Ɨ, deeper reasoning, zero added parameters. It beat the baseline on perplexity, the first thing in the whole project to move that number. Then I added my own twist: let each token decide how deep to think, halting on its own entropy. My first evaluation was spectacular. Coherence up 65%. Hallucinated names down 62%. It was noise. Eight prompts, one seed. I re-ran on 50 prompts Ɨ 200 tokens and watched the gains shrink to "modest" and on out-of-domain prompts, recurrence actually made things worse. No universal winner. And none of it is new: it's Adaptive Computation Time (2016), the Universal Transformer (2018), and LoopViT (2026), recombined and measured honestly. The real lesson: A number from 8 prompts is a rumor. The eval harness that kills your own best result is worth more than the result it kills. Cite your lineage. Stay preliminary until multiple seeds say otherwise. The three models are live. The write-up is honest about every caveat šŸ‘‡ šŸ”— https://huggingface.co/blog/RDTvlokip/teaching-a-15m-french-llm-to-think-deeper
View all activity

Organizations

Sorbonne UniversitƩ's profile picture scikit-learn's profile picture Tamazight NLP's profile picture Hugging Face Discord Community's profile picture Meta Research's profile picture Reasoning datasets competition 's profile picture Hugging Science's profile picture MCP-1st-Birthday's profile picture TinyModels's profile picture PhysiAI's profile picture INPI-France's profile picture Patenty-1-DataSet's profile picture Patenty1-DataSet-US's profile picture Physi-Wiki's profile picture SKT AI LABS's profile picture Humanity's Last Hackathon's profile picture Build Small Hackathon's profile picture SpiceeChat's profile picture HF-Collab-Center's profile picture Data-Gouv-Community's profile picture France-Travail-Community's profile picture Meteo-France-API's profile picture WhirlwindAI's profile picture