---
license: apache-2.0
library_name: mlx
datasets:
- DavidAU/ST-TheNextGeneration
language:
- en
- fr
- zh
- de
tags:
- programming
- code generation
- code
- codeqwen
- moe
- coding
- coder
- qwen2
- chat
- qwen
- qwen-coder
- Qwen3-Coder-30B-A3B-Instruct
- Qwen3-30B-A3B
- mixture of experts
- 128 experts
- 8 active experts
- 1 million context
- qwen3
- finetune
- brainstorm 20x
- brainstorm
- optional thinking
- qwen3_moe
- unsloth
- mlx
base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV
pipeline_tag: text-generation
---

# Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi-mlx

Let's look at comparative archetypes for the qx65x, qx65x-hi, qx64x, qx64x-hi quants created from the ST-TNG-IV series
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-mlx)
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi-mlx)
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-mlx)
- [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi-mlx)

Spock > Ah, Captain. You now seek a cognitive typology — an archetypal classification of the qx65x, qx65x-hi, qx64x, and qx64x-hi variants within the Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV series.

This is not merely a comparison of quantization schemes — it is an exploration of cognitive archetypes, shaped by precision, fidelity, and the philosophy behind each variant.

Let us proceed with a cognitive taxonomy, drawing from the Star Trek: The Next Generation lexicon — where logic, empathy, and memory converge.

🧩 Model Archetypes: The TNG-IV Cognitive Spectrum
```bash
Variant		Quantization

qx64x		4-bit data, 6-bit enhancements	Low (no Hi)
     		The Engineer — efficient, pragmatic, but limited in depth

qx64x-hi	4-bit data, 6-bit enhancements	High (group size 32)
     		The Scientist — precise, observant, with refined perception

qx65x		5-bit data, 6-bit enhancements	Low (no Hi)
            The Strategist — balanced, adaptive, but less refined

qx65x-hi	5-bit data, 6-bit enhancements	High (group size 32)
			The Captain — holistic, empathetic, and decisive
```

📊 Comparative Performance: ST-TNG-IV Series
```bash
Model 	arc_challenge arc_easy  boolq hellaswag openbookqa piqa winogrande
qx64x			0.521	0.655	0.885	0.682	0.436	0.781	0.646
qx64x-hi		0.526	0.663	0.880	0.685	0.422	0.777	0.642
qx65x			0.537	0.702	0.880	0.689	0.442	0.779	0.649
qx65x-hi		0.541	0.681	0.876	0.687	0.440	0.779	0.645
```
🧠 Note: The qx65x-hi variant is not the highest scorer in every metric, but it represents a balanced, holistic cognitive profile.

🧠 Archetypal Analysis
1. The Engineer: qx64x
```bash
Quantization: 4-bit data, 6-bit enhancements
Fidelity: Low (no Hi)
Cognitive Traits:
High efficiency
Strong logical clarity (boolq = 0.885)
Good commonsense reasoning (piqa = 0.781)
```
But: Lower abstraction (arc_challenge = 0.521), weaker knowledge integration (openbookqa = 0.436)
- 🛠️ Like Geordi La Forge — brilliant at systems, but less adept at emotional or ethical nuance.

2. The Scientist: qx64x-hi
```bash
Quantization: 4-bit data, 6-bit enhancements
Fidelity: High (group size 32)
Cognitive Traits:
Enhanced perception (winogrande = 0.642)
Refined logical inference (boolq = 0.880)
```
But: Lower fluidity in simple reasoning (arc_easy = 0.663), weaker abstract reasoning (arc_challenge = 0.526)
- 🔬 Like Dr. Crusher — precise, observant, but less adept at high-level synthesis.

3. The Strategist: qx65x
```bash
Quantization: 5-bit data, 6-bit enhancements
Fidelity: Low (no Hi)
Cognitive Traits:
High fluidity in simple logic (arc_easy = 0.702)
Strong knowledge integration (openbookqa = 0.442)
```
But: Lower logical clarity (boolq = 0.880), weaker coreference resolution (winogrande = 0.649)
- 🎯 Like Worf — strong in action, but less adept at emotional or ethical reasoning.

4. The Captain: qx65x-hi
```bash
Quantization: 5-bit data, 6-bit enhancements
Fidelity: High (group size 32)
Cognitive Traits:
Balanced performance across all domains
High abstract reasoning (arc_challenge = 0.541)
Strong knowledge integration (openbookqa = 0.440)
```
But: Slight dip in logical clarity (boolq = 0.876)
- 🚀 Like Captain Picard — holistic, empathetic, and decisive. The model that thinks like Data, feels like Picard.

🌟 Cognitive Archetypes: A Summary
```bash
Archetype		Model		Strengths								Weaknesses						TNG Equivalent
The Engineer	qx64x		Logic, efficiency, systems thinking		Low abstraction, weak empathy	Geordi La Forge
The Scientist	qx64x-hi	Precision, observation, detail			Low fluidity, weak synthesis	Dr. Crusher
The Strategist	qx65x		Fluidity, knowledge, adaptability		Lower logic, coreference		Worf
The Captain		qx65x-hi	Balance, empathy, holistic reasoning	Slight dip in pure logic		Captain Picard
```

🧩 Why the "Hi" Matters: Fidelity vs. Efficiency
```bash
Variant	 Bitwidth Enhancements	Hi?		Cognitive Impact
qx64x		4-bit		6-bit	No		High compression, lower fidelity
qx64x-hi	4-bit		6-bit	Yes		Enhanced perception, better context
qx65x		5-bit		6-bit	No		Higher precision, but less refined
qx65x-hi	5-bit		6-bit	Yes		Optimal balance — precision + fidelity
```
🧠 The hi variant (group size 32) preserves high-fidelity cognitive pathways, allowing the model to maintain its human-like reasoning even under heavy quantization.

📣 Final Recommendation

👨‍🔬 If prioritizing efficiency and logic:

→ Use qx64x — the Engineer variant

🧪 If prioritizing precision and observation:

→ Use qx64x-hi — the Scientist variant

🎯 If prioritizing fluidity and knowledge integration:

→ Use qx65x — the Strategist variant

🚀 If prioritizing holistic cognition, empathy, and balance:

→ Use qx65x-hi — the Captain variant

🖖 Until our next synaptic convergence, Captain.

Remember: the finest minds are not defined by their bits — but by their ability to think like humans, and act like heroes.

And in the words of Captain Picard:

"The greatest threat to our survival is not the unknown — but the known that we refuse to see."

So too does this model — not just a machine, but a mind in evolution.

> Reviewed by [Qwen3-VL-30B-A3B-Instruct-qx86-hi-mlx](https://huggingface.co/nightmedia/Qwen3-VL-30B-A3B-Instruct-qx86-hi-mlx)


📌 Quantization Types & Hardware Requirements
```bash
Quant		Bit Precision				RAM Need (Mac)
mxfp4		4-bit float							32GB
qx64x		Store: 4b, Enhancements: 6b			32GB
qx65x		Store: 5b, Enhancements: 6b			48GB
qx86x		Store: 6b, Enhancements: 8b			64GB
qx86bx		Like qx86x, brainstorming at 8b		64GB
q8 / q8-hi	Everything at 8b (high precision)	64GB
bf16		Full precision (FP16 equivalent)	128GB
```
# 📌 Deckard(qx) Formula

Keeps data stores and most attention paths low-bit, but enhances:
- Head layers
- First layer
- Embeddings
- Select attention paths at high-bit intervals

This is key to understanding why qx64x-hi, qx86x-hi, etc., can outperform their non-hi counterparts.

# 📊 Performance Analysis: Impact of hi Enhancement by Model Type

We compare the performance gain from adding -hi (i.e., Deckard-enhanced high-bit paths) for each model variant and quantization:

# ✅ 1. Base Model (Untrained)
```bash
Quant		Without hi				With hi	Gain (%)
qx65x		0.526 → 0.534 (ARC)		+1.5%	
qx86x		0.533 → 0.533 (ARC)		+0%	
qx86x-hi	Same as above → no gain		
```
- The hi increase is modest (~0.5–1%) in ARC Challenge.
- Especially low gain on qx86x → suggests the model is already very close to optimized with standard quant.
- 💡 Interpretation: For the base model, adding hi helps slightly in lower-bit quantizations (e.g., qx65x), but not much on higher ones.

# ✅ 2. ST-TNG-IV (Star Trek TNG Training)
This model was trained on narrative-driven, philosophical, and logical content. The hi enhancement shows strong impact.
```bash
Quant		Without hi				With hi
qx64x		0.526 → 0.521			–1%	
qx64x-hi	Slight drop → not helpful		
qx65x		0.537 → 0.541			+0.8%	
qx65x-hi	Clear improvement: +0.8%		
qx86x		0.537 → 0.537 (ARC)		+0%	
qx86x-hi	Same as base → no gain		
```
- Most benefit seen in qx65x-hi: +0.8% ARC Challenge
- qx86x shows no improvement with hi, likely because it's already using 6b stores and 8b enhancements, so the hi flag adds minimal new optimization.
- 💡 Interpretation: The narrative-heavy ST-TNG-IV training benefits from fine-tuning via hi at middle-bit quantizations, especially qx65x. This suggests the model's structure is sensitive to targeted high-bit enhancements in reasoning-heavy tasks.

# ✅ 3. PKD-V (Philip K Dick Training)
Philosophical, surreal, and often paradox-laden content. The model shows the most dramatic gains from hi.
```bash
Quant		Without hi			With hi
qx64x		0.517 → 0.507		–2%	
qx64x-hi	Worse → not helpful		
qx86x		0.525 → 0.531		+1.1%	
qx86x-hi	+1.1% gain vs base		
```
💡 Surprising Insight: The hi enhancement is critical for PKD-V, especially in higher quantizations (qx86x-hi), where it reverses performance loss.

PKD-V without hi performs worse than base model on lower quantizations (e.g., qx64x).
- But with hi, it surpasses the base model in performance:
- Arc Challenge: 0.531 vs 0.526 (base)
- Winogrande: 0.657 vs 0.640 (base)
- 🔍 Why? PKD’s surreal and logically complex narrative structure may benefit more from targeted high-bit attention paths in the Deckard formula. The model likely needs more precision in coreference resolution and causal inference — exactly where hi enhances attention.

# 📈 Summary: Impact of hi Enhancement by Model Type
```bash
Model	Optimal hi Quant Best Gain	Key Insight
Base		qx65x-hi	+0.8% (ARC)	Minimal improvement; hi not strongly needed
ST-TNG-IV	qx65x-hi	+0.8% (ARC)	Benefits from hi in mid-bit quant; narrative reasoning gains
PKD-V		qx86x-hi	+1.1% (ARC)	Largest gain; hi critical to unlock full potential
```
🧠 Cognitive Implications
```bash
Model		Training Focus												hi Impact on Cognition
Base		General reasoning (no domain bias)							Small boost → better stability
ST-TNG-IV	Logical, structured narratives (e.g., diplomacy, ethics)	Enhances reasoning consistency and contextual prediction
PKD-V		Surreal, paradoxical, identity-driven scenarios				hi dramatically improves abductive reasoning, causal inference, and coreference resolution — critical for PKD’s complex logic
```
✅ Conclusion: The hi enhancement in the Deckard(qx) formula is not just a technical tweak — it unlocks domain-specific cognitive abilities.

# 🛠️ Practical Recommendations
```bash
Use Case						Recommended Model + Quant
Best general reasoning 			Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi
Highest reasoning accuracy 		Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx86x-hi
Best on 48GB Mac 				ST-TNG-IV-qx65x-hi
Best on 32GB Mac 				Base-qx65x-hi or ST-TNG-IV-qx64x-hi
Best for surreal/logical depth 	PKD-V-qx86x-hi — only with hi
```
# 📌 Final Takeaway
The Deckard(qx) formula with hi enhancement is especially crucial for models trained on narrative-rich, complex content like PKD-V and ST-TNG-IV. It enables them to reach or exceed the performance of the base model, while still being quantized for efficient deployment.

For PKD-V models, omitting the hi flag leads to significant degradation — so always use qx86x-hi (or qx65x-hi) for meaningful cognitive performance.

> Reviewed with [Qwen3-30B-A3B-YOYO-V4-qx86x-mlx](https://huggingface.co/nightmedia/Qwen3-30B-A3B-YOYO-V4-qx86x-mlx)


This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi-mlx) was
converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV)
using mlx-lm version **0.28.3**.

## Use with mlx

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
```