LM Studio Community

community

AI & ML interests

Models quantized and uploaded by the LM Studio community, for the LM Studio community. Discord: https://discord.gg/aPQfnNkxGC

Recent Activity

mattjcly updated a model 1 day ago

lmstudio-community/Llama-4-Scout-17B-16E-Instruct-GGUF

mattjcly published a model 3 days ago

lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-MLX-4bit

mattjcly updated a model 3 days ago

lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-MLX-4bit

View all activity

mattjcly

updated a model 1 day ago

lmstudio-community/Llama-4-Scout-17B-16E-Instruct-GGUF

Text Generation • Updated 1 day ago • 2.13k • 29

bartowski

posted an update 2 days ago

Post

4260

Was going to post this on /r/LocalLLaMa, but apparently it's without moderation at this time :')

bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF

Was able to use previous mistral chat templates, some hints from Qwen templates, and Claude to piece together a seemingly working chat template, tested it with llama.cpp server and got perfect results, though lmstudio still seems to be struggling for some reason (don't know how to specify a jinja file there)

Outlined the details of the script and results in my llama.cpp PR to add the jinja template:

https://github.com/ggml-org/llama.cpp/pull/14349

Start server with a command like this:

./llama-server -m /models/mistralai_Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf --jinja --chat-template-file /models/Mistral-Small-3.2-24B-Instruct-2506.jinja

and it should be perfect! Hoping it'll work for ALL tools if lmstudio gets an update or something, not just llama.cpp, but very happy to see it works flawlessly in llama.cpp

In the meantime, will try to open a PR to minja to make the strftime work, but no promises :)

mattjcly

published a model 3 days ago

lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-MLX-4bit

Image-Text-to-Text • Updated 3 days ago • 100 • 1

mattjcly

updated a model 3 days ago

lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-MLX-4bit

Image-Text-to-Text • Updated 3 days ago • 100 • 1

lmmy

updated a model 3 days ago

lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-MLX-4bit

Image-Text-to-Text • Updated 3 days ago • 100 • 1

will-lms

published a model 5 days ago

lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-GGUF

Image-Text-to-Text • Updated 5 days ago • 10.2k • 3

will-lms

updated a model 5 days ago

lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-GGUF

Image-Text-to-Text • Updated 5 days ago • 10.2k • 3

lmmy

updated a model 5 days ago

lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-GGUF

Image-Text-to-Text • Updated 5 days ago • 10.2k • 3

bartowski

published a model 6 days ago

lmstudio-community/Skywork-SWE-32B-GGUF

Text Generation • Updated 6 days ago • 325 • 1

bartowski

updated a model 6 days ago

lmstudio-community/Skywork-SWE-32B-GGUF

Text Generation • Updated 6 days ago • 325 • 1

bartowski

in lmstudio-community/AceReason-Nemotron-1.1-7B-GGUF 8 days ago

Add Transformers library and link to code

#1 opened 8 days ago by

nielsr

reach-vb

posted an update 13 days ago

Post

2118

Excited to onboard FeatherlessAI on Hugging Face as an Inference Provider - they bring a fleet of 6,700+ LLMs on-demand on the Hugging Face Hub 🤯

Starting today, you'd be able to access all those LLMs (OpenAI compatible) on HF model pages and via OpenAI client libraries too! 💥

Go, play with it today: https://huggingface.co/blog/inference-providers-featherless

P.S. They're also bringing on more GPUs to support all your concurrent requests!

reach-vb

posted an update about 1 month ago

Post

3968

hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! 💥

as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!

in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.

p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage

p.p.s. this is fully backwards compatible so everything will work as it should! 🤗

16 replies

bartowski

posted an update 2 months ago

Post

38455

Access requests enabled for latest GLM models

While a fix is being implemented (https://github.com/ggml-org/llama.cpp/pull/12957) I want to leave the models up for visibility and continued discussion, but want to prevent accidental downloads of known broken models (even though there are settings that could fix it at runtime for now)

With this goal, I've enabled access requests. I don't really want your data, so I'm sorry that I don't think there's a way around that? But that's what I'm gonna do for now, and I'll remove the gate when a fix is up and verified and I have a chance to re-convert and quantize!

Hope you don't mind in the mean time :D

1 reply

reach-vb

authored a paper 3 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 191

reach-vb

authored a paper 5 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 235

bartowski

posted an update 6 months ago

Post

73224

Switching to author_model-name

I posted a poll on twitter, and others have mentioned the interest in me using the convention of including the author name in the model path when I upload.

It has a couple advantages, first and foremost of course is ensuring clarity of who uploaded the original model (did Qwen upload Qwen2.6? Or did someone fine tune Qwen2.5 and named it 2.6 for fun?)

The second thing is that it avoids collisions, so if multiple people upload the same model and I try to quant them both, I would normally end up colliding and being unable to upload both

I'll be implementing the change next week, there are just two final details I'm unsure about:

First, should the files also inherit the author's name?

Second, what to do in the case that the author name + model name pushes us past the character limit?

Haven't yet decided how to handle either case, so feedback is welcome, but also just providing this as a "heads up"

5 replies

bartowski

posted an update 7 months ago

Post

80411

Looks like Q4_0_N_M file types are going away

Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)

You can see the reference PR here:

https://github.com/ggerganov/llama.cpp/pull/10446

So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)

As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !

Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541

Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights

17 replies

reach-vb

posted an update 7 months ago

Post

7109

VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: @jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! 🔥

bartowski

posted an update 7 months ago

Post

16516

Old mixtral model quants may be broken!

Recently Slaren over on llama.cpp refactored the model loader - in a way that's super awesome and very powerful - but with it came breaking of support for "split tensor MoE models", which applies to older mixtral models

You may have seen my upload of one such older mixtral model, ondurbin/bagel-dpo-8x7b-v0.2, and with the newest changes it seems to be able to run without issue

If you happen to run into issues with any other old mixtral models, drop a link here and I'll try to remake them with the new changes so that we can continue enjoying them :)

2 replies

AI & ML interests

Recent Activity

Team members 9

lmstudio-community's activity

Add Transformers library and link to code