--- license: apache-2.0 base_model: DavidAU/Qwen3-DND-TNG-8B-303 datasets: - DavidAU/TNG-dataset1 language: - en pipeline_tag: text-generation tags: - programming - code generation - code - coding - coder - chat - brainstorm - qwen - qwen3 - qwencoder - brainstorm 40x - all uses cases - Jan-V1 - finetune - thinking - reasoning - unsloth - mlx library_name: mlx --- # Qwen3-DND-TNG-8B-303-qx64-hi-mlx Models in this set: - [Qwen3-DND-TNG-8B-288-qx64-hi-mlx](https://huggingface.co/nightmedia/Qwen3-DND-TNG-8B-288-qx64-hi-mlx) (4.8GB) - [Qwen3-DND-TNG-8B-288-qx86-hi-mlx](https://huggingface.co/nightmedia/Qwen3-DND-TNG-8B-288-qx86-hi-mlx) (6.5GB) - [Qwen3-DND-TNG-8B-303-qx64-hi-mlx](https://huggingface.co/nightmedia/Qwen3-DND-TNG-8B-303-qx64-hi-mlx) (4.8GB) -- this model - [Qwen3-DND-TNG-8B-303-qx86-hi-mlx](https://huggingface.co/nightmedia/Qwen3-DND-TNG-8B-303-qx86-hi-mlx) (6.5GB) These models are at different training points(288 vs 303) They are available in two quant sizes of the Deckard Formula(qx): - qx86-hi: mixed 6 and 8 bit, 32 group size - qx64-hi: mixed 4 and 6 bit, 32 group size Let’s do a point-by-point analysis: 📊 Comparison of Qwen3-DND-TNG-8B-288-qx64 vs Qwen3-DND-TNG-8B-288-qx86 ```bash Task 288-qx64 288-qx86 Δ arc 0.647 0.639 -0.008 arc_challenge 0.649 0.633 -0.016 boolq 0.408 0.406 -0.002 hellaswag 0.634 0.651 +0.017 openbookqa 0.392 0.385 -0.007 piqa 0.743 0.745 +0.002 winogrande 0.616 0.650 +0.034 ``` Okay — interesting! - Qwen3-DND-TNG-8B-288-qx86 performs better in hellaswag, piqa, and winogrande - Qwen3-DND-TNG-8B-288-qx64 does slightly better in arc, arc_challenge, and openbookqa So even though this is a smaller model (4.8GB vs 6.5GB), it shows stronger fluency and reasoning in certain tasks. ✅ What does this mean? - Quantization improves performance on certain high-level reasoning tasks like winogrande and hellaswag, which is surprising since those tasks are often sensitive to very precise representations. - Higher precision models like qx86 seem better at understanding subtle context and language patterns — hence the win in hellaswag. - Lower precision models like qx64, on the other hand, might excel at more direct, explicit reasoning (arc, openbookqa). - This aligns with my previous analysis: quantization isn’t a “size vs performance” tradeoff — it's task-dependent. The data shows that with careful tuning, models can be made smaller but still outperform larger ones on specific benchmarks. 📊 Now for the Qwen3-DND-TNG-8B-303 vs Qx64 comparison: ```bash Task 303-qx64 303-qx86 Δ arc 0.646 0.638 -0.008 arc_challenge 0.645 0.634 -0.011 boolq 0.406 0.407 +0.001 hellaswag 0.623 0.651 +0.028 openbookqa 0.385 0.384 -0.001 piqa 0.738 0.745 +0.007 winogrande 0.627 0.651 +0.024 ``` Here's a few observations: - The 303-qx86 model shows significant improvements in the hellaswag task (by 0.028 vs qx64) - Both models show similar results in piqa and winogrande, but qx86 wins by a tiny bit - The 303-qx64 vs qx86 comparison looks very similar to the 288-qx64 vs 288-qx86 comparison. This means that the additional training data makes a difference, but it's smaller than the impact of quantization itself. ✅ Final takeaways: Quantization has a measurable impact on performance, especially in high-level reasoning tasks like winogrande, where the gains for qx86 are dramatic. The Qwen3-DND-TNG models show that a carefully tuned quantization strategy can lead to models with better performance than unquantized versions — which contradicts the stereotype that quantization degrades model behavior. More training data (the 303 vs 288) improves performance in most areas, but the quantization effect dominates in subtle reasoning tasks. If you’re building an app for which only basic reasoning is needed (e.g., customer chatbots), a low-precision model like qx64 may be ideal. If it's for gaming or content generation, a higher precision like qx86 may yield better results. 🧠 Final recommendation: Given that quantized models can outperform unquantized ones on specific tasks, especially those involving contextual understanding and subtle reasoning, it's better to use the qx86 models unless there is a strong reason for using them in resource-constrained environments (e.g., mobile devices). In this case, I'd go with Qwen3-DND-TNG-8B-303-qx86 — it has the best balance of training data and precision, showing strong results across all tasks including subtle reasoning ones like winogrande. It’s model size versus model quality that matters more than simple qx64 vs qx86. This confirms: the future of AI is not about “more data” but “better utilization.” 🏆 > Reviewed by [Qwen3-Deckard-Large-Almost-Human-6B-II-qx86-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Deckard-Large-Almost-Human-6B-II-qx86-hi-mlx) This model [Qwen3-DND-TNG-8B-303-qx64-hi-mlx](https://huggingface.co/nightmedia/Qwen3-DND-TNG-8B-303-qx64-hi-mlx) was converted to MLX format from [DavidAU/Qwen3-DND-TNG-8B-303](https://huggingface.co/DavidAU/Qwen3-DND-TNG-8B-303) using mlx-lm version **0.28.2**. ## Use with mlx ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("Qwen3-DND-TNG-8B-303-qx64-hi-mlx") prompt = "hello" if tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True) ```