Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

Jaward 
posted an update 1 day ago
bartowski 
posted an update 2 days ago
view post
Post
3700
Was going to post this on /r/LocalLLaMa, but apparently it's without moderation at this time :')

bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF

Was able to use previous mistral chat templates, some hints from Qwen templates, and Claude to piece together a seemingly working chat template, tested it with llama.cpp server and got perfect results, though lmstudio still seems to be struggling for some reason (don't know how to specify a jinja file there)

Outlined the details of the script and results in my llama.cpp PR to add the jinja template:

https://github.com/ggml-org/llama.cpp/pull/14349

Start server with a command like this:

./llama-server -m /models/mistralai_Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf --jinja --chat-template-file /models/Mistral-Small-3.2-24B-Instruct-2506.jinja


and it should be perfect! Hoping it'll work for ALL tools if lmstudio gets an update or something, not just llama.cpp, but very happy to see it works flawlessly in llama.cpp

In the meantime, will try to open a PR to minja to make the strftime work, but no promises :)
Abhaykoul 
posted an update about 12 hours ago
view post
Post
969
Introducing Dhanishtha 2.0: World's first Intermediate Thinking Model

Dhanishtha 2.0 is the world's first LLM designed to think between the responses. Unlike other Reasoning LLMs, which think just once.

Dhanishtha can think, rethink, self-evaluate, and refine in between responses using multiple <think> blocks.
This technique makes it Hinghlt Token efficient it Uses up to 79% fewer tokens than DeepSeek R1
---

You can try our model from: https://helpingai.co/chat
Also, we're gonna Open-Source Dhanistha on July 1st.

---
For Devs:
🔑 Get your API key at https://helpingai.co/dashboard
from HelpingAI import HAI  # pip install HelpingAI==1.1.1
from rich import print

hai = HAI(api_key="hl-***********************")

response = hai.chat.completions.create(
    model="Dhanishtha-2.0-preview",
    messages=[{"role": "user", "content": "What is the value of ∫0∞𝑥3/𝑥−1𝑑𝑥 ?"}],
    stream=True,
    hide_think=False # Hide or show models thinking
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)
  • 1 reply
·
yeonseok-zeticai 
posted an update 1 day ago
view post
Post
1518
Hi everyone,

I’ve been running small language models (SLLMs) directly on smartphones — completely offline, with no cloud backend or server API calls.

I wanted to share:
1. ⚡ Tokens/sec performance across several SLLMs
2. 🤖 Observations on hardware utilization (where the workload actually runs)
3. 📏 Trade-offs between model size, latency, and feasibility for mobile apps

There are reports for below models
- QWEN3 0.6B
- NVIDIA/Nemotron QWEN 1.5B
- SimpleScaling S1
- TinyLlama
- Unsloth tuned Llama 3.2 1B
- Naver HyperClova 0.5B

📜Comparable Benchmark reports (no cloud, all on-device):
I’d really value your thoughts on:
- Creative ideas to further optimize inference under these hardware constraints
- Other compact LLMs worth testing on-device
- Experiences you’ve had trying to deploy LLMs at the edge

If there’s interest, I’m happy to share more details on the test setup, hardware specs, or the tooling we used for these comparisons.

Thanks for taking a look, and you can build your own through at "https://mlange.zetic.ai"!
merve 
posted an update 2 days ago
view post
Post
4036
Release picks of the past week is here! Find more models, datasets, Spaces here merve/june-20-releases-68594824d1f4dfa61aee3433

🖼️ VLMs/OCR
> moonshotai/Kimi-VL-A3B-Thinking-2506 is a powerful reasoning vision LM, 3B active params, smarter with less tokens, supports long documents, videos 👏 (OS)
> nanonets/Nanonets-OCR-s is 3.75B params OCR model based on Qwen2.5VL-3B-Instruct (OS)

💬 LLMs
> moonshotai/Kimi-Dev-72B is a strong coding model based on Qwen2.5-72B (OS)
> Mistral released mistralai/Mistral-Small-3.2-24B-Instruct-2506, an update to their former model with better function calling & instruction following (OS)

🗣️ Audio
> Google released google/magenta-realtime, real time music generation & audio synthesis (cc-by-4)
> kyutai released new speech-to-text models that come in 1B & 2B ( kyutai/stt-1b-en_fr, stt-2b-en_fr) with 0.5s and 2.5s delay

3D
> Tencent released tencent/Hunyuan3D-2.1 an image-to-3D model (see below)
pagezyhf 
posted an update 1 day ago
view post
Post
1514
Hackathons in Paris on July 5th and 6th!

Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.

Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.

Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
  • 2 replies
·
BFFree 
posted an update about 22 hours ago
view post
Post
1041
Working on some chess set concepts. I went towards minimal sculpted shapes then returned to some traditionalism.
cgeorgiaw 
posted an update 1 day ago
yeonseok-zeticai 
posted an update 2 days ago
view post
Post
5661
💫 Next-Level On-Device AI Showdown

🪽 See It to Believe It, How QWEN4b works at On-device environment without expensive GPU Cloud server?
We’ve crafted a side-by-side demo video showcasing both Jan-Nano and QWEN 4B in action—no more wondering which model reigns supreme. Click play, compare their speeds, accuracy, and memory footprints, and decide which one fits your needs best!

👋 Why You Can’t Miss This
We are actively creating runnable sLLM environments for On-device AI. You can just build On-device AI apps within few hours.
Including Jan-Nano, QWEN4b, there are several sLLM models ready to be used on your AI application!.

🤑 Please feel free to use, because it is free to use!.

Ready to Compare?

Watch now, draw your own conclusions, and let us know which model you’d deploy in your next edge-AI project! 🌍💡

#OnDeviceAI #EdgeAI #AIShowdown #MLOptimization #DemoVideo #AIComparison
  • 1 reply
·
freddyaboulton 
posted an update about 4 hours ago