Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

Jaward

posted an update 1 day ago

Post

2459

Awesome intro to LLM course "Language Modeling from Scratch" by stanford. love the aesthetics behind the lecture notes, notes-in-code genius idea👍
Course site: https://stanford-cs336.github.io/spring2025/
Repo: https://github.com/stanford-cs336/spring2025-lectures
Videos: https://www.youtube.com/playlist?list=PLoROMvodv4rOY23Y0BoGoBGgQ1zmU_MT_

2 replies

bartowski

posted an update 2 days ago

Post

3700

Was going to post this on /r/LocalLLaMa, but apparently it's without moderation at this time :')

bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF

Was able to use previous mistral chat templates, some hints from Qwen templates, and Claude to piece together a seemingly working chat template, tested it with llama.cpp server and got perfect results, though lmstudio still seems to be struggling for some reason (don't know how to specify a jinja file there)

Outlined the details of the script and results in my llama.cpp PR to add the jinja template:

https://github.com/ggml-org/llama.cpp/pull/14349

Start server with a command like this:

./llama-server -m /models/mistralai_Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf --jinja --chat-template-file /models/Mistral-Small-3.2-24B-Instruct-2506.jinja

and it should be perfect! Hoping it'll work for ALL tools if lmstudio gets an update or something, not just llama.cpp, but very happy to see it works flawlessly in llama.cpp

In the meantime, will try to open a PR to minja to make the strftime work, but no promises :)

Abhaykoul

posted an update about 12 hours ago

Post

969

Introducing Dhanishtha 2.0: World's first Intermediate Thinking Model

Dhanishtha 2.0 is the world's first LLM designed to think between the responses. Unlike other Reasoning LLMs, which think just once.

Dhanishtha can think, rethink, self-evaluate, and refine in between responses using multiple <think> blocks.
This technique makes it Hinghlt Token efficient it Uses up to 79% fewer tokens than DeepSeek R1
---

You can try our model from: https://helpingai.co/chat
Also, we're gonna Open-Source Dhanistha on July 1st.

---
For Devs:
🔑 Get your API key at https://helpingai.co/dashboard

from HelpingAI import HAI  # pip install HelpingAI==1.1.1
from rich import print

hai = HAI(api_key="hl-***********************")

response = hai.chat.completions.create(
    model="Dhanishtha-2.0-preview",
    messages=[{"role": "user", "content": "What is the value of ∫0∞𝑥3/𝑥−1𝑑𝑥 ?"}],
    stream=True,
    hide_think=False # Hide or show models thinking
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)

1 reply

yeonseok-zeticai

posted an update 1 day ago

Post

1518

Hi everyone,

I’ve been running small language models (SLLMs) directly on smartphones — completely offline, with no cloud backend or server API calls.

I wanted to share:
1. ⚡ Tokens/sec performance across several SLLMs
2. 🤖 Observations on hardware utilization (where the workload actually runs)
3. 📏 Trade-offs between model size, latency, and feasibility for mobile apps

There are reports for below models
- QWEN3 0.6B
- NVIDIA/Nemotron QWEN 1.5B
- SimpleScaling S1
- TinyLlama
- Unsloth tuned Llama 3.2 1B
- Naver HyperClova 0.5B

📜Comparable Benchmark reports (no cloud, all on-device):
I’d really value your thoughts on:
- Creative ideas to further optimize inference under these hardware constraints
- Other compact LLMs worth testing on-device
- Experiences you’ve had trying to deploy LLMs at the edge

If there’s interest, I’m happy to share more details on the test setup, hardware specs, or the tooling we used for these comparisons.

Thanks for taking a look, and you can build your own through at "https://mlange.zetic.ai"!

merve

posted an update 2 days ago

Post

4036

Release picks of the past week is here! Find more models, datasets, Spaces here merve/june-20-releases-68594824d1f4dfa61aee3433

🖼️ VLMs/OCR
> moonshotai/Kimi-VL-A3B-Thinking-2506 is a powerful reasoning vision LM, 3B active params, smarter with less tokens, supports long documents, videos 👏 (OS)
> nanonets/Nanonets-OCR-s is 3.75B params OCR model based on Qwen2.5VL-3B-Instruct (OS)

💬 LLMs
> moonshotai/Kimi-Dev-72B is a strong coding model based on Qwen2.5-72B (OS)
> Mistral released mistralai/Mistral-Small-3.2-24B-Instruct-2506, an update to their former model with better function calling & instruction following (OS)

🗣️ Audio
> Google released google/magenta-realtime, real time music generation & audio synthesis (cc-by-4)
> kyutai released new speech-to-text models that come in 1B & 2B ( kyutai/stt-1b-en_fr, stt-2b-en_fr) with 0.5s and 2.5s delay

3D
> Tencent released tencent/Hunyuan3D-2.1 an image-to-3D model (see below)

pagezyhf

posted an update 1 day ago

Post

1514

Hackathons in Paris on July 5th and 6th!

Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.

Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.

Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!

2 replies

BFFree

posted an update about 22 hours ago

Post

1041

Working on some chess set concepts. I went towards minimal sculpted shapes then returned to some traditionalism.

cgeorgiaw

posted an update 1 day ago

Post

1122

Huge new bio datasets just dropped!!!

Check out them out @

ginkgo-datapoints
Read the blog for more info: https://huggingface.co/blog/cgeorgiaw/gdp

yeonseok-zeticai

posted an update 2 days ago

Post

5661

💫 Next-Level On-Device AI Showdown

🪽 See It to Believe It, How QWEN4b works at On-device environment without expensive GPU Cloud server?
We’ve crafted a side-by-side demo video showcasing both Jan-Nano and QWEN 4B in action—no more wondering which model reigns supreme. Click play, compare their speeds, accuracy, and memory footprints, and decide which one fits your needs best!

👋 Why You Can’t Miss This
We are actively creating runnable sLLM environments for On-device AI. You can just build On-device AI apps within few hours.
Including Jan-Nano, QWEN4b, there are several sLLM models ready to be used on your AI application!.

🤑 Please feel free to use, because it is free to use!.

Ready to Compare?

Watch now, draw your own conclusions, and let us know which model you’d deploy in your next edge-AI project! 🌍💡

#OnDeviceAI #EdgeAI #AIShowdown #MLOptimization #DemoVideo #AIComparison

1 reply

freddyaboulton

posted an update about 4 hours ago

Post

The new multimodalart/self-forcing model and demo are truly impressive!

Recently active users