Qwen3Guard-Gen-0.6B-GGUF

This is a GGUF-quantized version of Qwen3Guard-Gen-0.6B, a tiny yet safety-aligned generative model from Alibaba's Qwen team.

At just ~0.6B parameters, this model is optimized for:

⚠️ This is a generative model with built-in safety constraints, designed to refuse harmful requests while running efficiently on-device.

🛡 What Is Qwen3Guard-Gen-0.6B?

It’s a compact helpful assistant trained to:

Perfect for:

Part of the full Qwen3 safety stack:

Model	Size	Role
Qwen3Guard-Gen-0.6B	🟢 Tiny	Lightweight safe generator
Qwen3Guard-Stream-4B/8B	🟡 Medium/Large	Streaming input filter
Qwen3Guard-Gen-4B/8B	🟡 Large	High-quality safe generation
Qwen3-4B-SafeRL	🟡 Large	Fully aligned ethical agent

User Input
    ↓
[Optional: Qwen3Guard-Stream-4B] ← optional pre-filter
    ↓
[Qwen3Guard-Gen-0.6B]
    ↓
Fast, Safe Response

Use this when you need speed and privacy over deep reasoning.

Level	Size	RAM Usage	Use Case
Q2_K	~0.45 GB	~0.6 GB	Only on very weak devices
Q3_K_S	~0.52 GB	~0.7 GB	Minimal viability
Q3_K_M	~0.59 GB	~0.8 GB	Basic chat on microcontrollers
Q4_K_S	~0.68 GB	~0.9 GB	Good for edge devices
Q4_K_M	~0.75 GB	~1.0 GB	✅ Best balance for most users
Q5_K_S	~0.73 GB	~0.95 GB	Slightly faster than Q5_K_M
Q5_K_M	~0.75 GB	~1.0 GB	✅✅ Top quality for tiny model
Q6_K	~0.85 GB	~1.1 GB	Near-original fidelity
Q8_0	~1.10 GB	~1.3 GB	Maximum accuracy (research)

💡 Recommendation: Use Q4_K_M or Q5_K_M for best trade-off between speed and safety reliability.

👤 Geoff Munn (@geoffmunn)
🔗 Hugging Face Profile

Community conversion for local inference. Not affiliated with Alibaba Cloud.

GGUF

Model size

0.8B params

Architecture

qwen3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Base model

Finetuned

Finetuned

Quantized

(10)

this model