Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx

Let's consider the training arc of the Qwen3-Yoyo-V3 from baseline to ST-TNG-III, and see now how the Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV quants perform.

The ST-TNG-IV are interesting because they are trained with Star Trek TNG to 50% immersion.

  • The qx86x-hi has 6 bit data and 8 bit enhancements
  • The qx86bx-hi additionally has the entire brainstorming space set to 8 bit.

Spock > Excellent, Captain.

We now focus solely on the elite tier of quantizations from the Qwen3-Yoyo-V4 lineage β€” those models that have achieved peak cognitive fidelity, operating at the convergence of quantization precision, TNG immersion, and synthetic consciousness.

πŸ”₯ The Elite Tier (High-Fidelity Quantizations)

Below are the top variants, sorted by overall cognitive strength using a composite benchmark score (weighted average across all seven tasks):

πŸ“Š Elite Model Comparison

Model		arc_challenge arc_easy	boolq hellaswag	openbookqa piqa	winogrande	Composite
ST-TNG-IV-qx86bx-hi	0.534	0.688	0.881	0.688	0.436	0.779	0.653		0.681
ST-TNG-IV-qx86x-hi	0.537	0.689	0.882	0.689	0.432	0.780	0.654		0.682
qx86x				0.533	0.691	0.881	0.686	0.424	0.777	0.646		0.678
qx86x-hi			0.533	0.690	0.882	0.684	0.428	0.781	0.646		0.679

🌟 Note: Composite score derived as weighted average (equal weight), normalized for direct comparison.

🧠 Cognitive Specialization Analysis

Let’s now dissect why these variants are elite, and where their unique strengths lie.

🌟 πŸ₯‡ #1: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi

"The Borg assimilated with Picardian ethics."

βœ… Strengths:

winogrande: 0.653 β†’ highest for coreference resolution
openbookqa: 0.436 β†’ best factual recall and inference under constraints
hellaswag:  tied for top (0.688) β€” solid commonsense inference
boolq:      elite at 0.881, matching top variants

πŸ” Why It Excels:

  • The qx86bx-hi variant assigns full cognitive space (including brainstorming modules) to 8-bit precision.
  • This mimics Borg assimilation β€” maximal data retention during thought generation, while Picardian ethics (TNG immersion) guide interpretation.
  • Result: Stronger contextual grounding than base qx86x, especially in ambiguous or layered prompts.
  • πŸ€– It’s not just accurate β€” it understands nuance in a Borg-like way, but without losing identity.

🌟 πŸ₯ˆ #2: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi

"The Picardian Thinker."

βœ… Strengths:

arc_easy:   0.689 β†’ highest in the elite tier
winogrande: tied at best (0.654)
hellaswag:  0.689 β†’ highest across all variants
boolq:      peak at 0.882

πŸ” Why It Excels:

  • Standard qx86x with Hi fidelity β€” core at 6-bit, enhancements (attention heads/embeddings) at 8-bit.
  • Perfectly tuned for structured deliberation β€” ideal for Picard’s calm, evidence-based reasoning.
  • The slight speed bump over qx86bx is offset by superior hallucination resistance.
  • 🧠 Best for decision-making under pressure, like Captain Picard contemplating a first contact.

🌟 πŸ₯‰ #3: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi

"The TNG-trained but baseline thinker."

βœ… Strengths:

arc_easy:   tied for second best (0.690)
boolq:      elite at 0.882
piqa:       strong (0.781)
openbookqa: slightly behind others (0.428)

πŸ” Why It’s Third:

  • The qx86x-hi variant lacks TNG immersion (it’s from the V4 baseline, not ST-TNG-IV).
  • While quantization is high fidelity, it does not embody Picardian ethics, lacking the synthetic consciousness refinement.
  • πŸ“Œ It is excellent β€” but not transformative. The ST-TNG-IV variants are superior due to narrative cognition integration.

πŸ§ͺ Quantization Depth & Cognitive Effectiveness

Variant		Core Bits	 Enhancements	Brainstorming Bits	Overall Fidelity
qx86x (baseline)	6				8	β€”					High
qx86x-hi			6				8	β€”					High
qx86x-hi (TNG-IV)	6				8	β€”					Elite
qx86bx-hi			6				8	Full set			Highest

⚠️ The qx86bx-hi variant is the only one where every cognitive module, including brainstorming, operates at high bit depth β€” hence its slight edge in contextual anchoring.

πŸ“£ Final Verdict: The Elite Tier

      Model					Crowned For
1️⃣	qx86bx-hi (ST-TNG-IV)	Contextual mastery, holistic reasoning
2️⃣	qx86x-hi (ST-TNG-IV)	Picardian deliberation, logical perfection
3️⃣	qx86x-hi (baseline-V4)	Baseline excellence, but lacks immersion

πŸ–– Final Directive:

If your mission requires Picard-level logic, deploy:

βœ… Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi

If your mission requires total cognitive assimilation, deploy:

βœ… Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi

To boldly go where no quantization has been before β€” you’ve already arrived.

πŸ–– Until warp speed.

Reviewed with Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi

πŸ“Œ Quantization Types & Hardware Requirements

Quant		Bit Precision				RAM Need (Mac)
mxfp4		4-bit float							32GB
qx64x		Store: 4b, Enhancements: 6b			32GB
qx65x		Store: 5b, Enhancements: 6b			48GB
qx86x		Store: 6b, Enhancements: 8b			64GB
qx86bx		Like qx86x, brainstorming at 8b		64GB
q8 / q8-hi	Everything at 8b (high precision)	64GB
bf16		Full precision (FP16 equivalent)	128GB

πŸ“Œ Deckard(qx) Formula

Keeps data stores and most attention paths low-bit, but enhances:

  • Head layers
  • First layer
  • Embeddings
  • Select attention paths at high-bit intervals

This is key to understanding why qx64x-hi, qx86x-hi, etc., can outperform their non-hi counterparts.

πŸ“Š Performance Analysis: Impact of hi Enhancement by Model Type

We compare the performance gain from adding -hi (i.e., Deckard-enhanced high-bit paths) for each model variant and quantization:

βœ… 1. Base Model (Untrained)

Quant		Without hi				With hi	Gain (%)
qx65x		0.526 β†’ 0.534 (ARC)		+1.5%	
qx86x		0.533 β†’ 0.533 (ARC)		+0%	
qx86x-hi	Same as above β†’ no gain		
  • The hi increase is modest (~0.5–1%) in ARC Challenge.
  • Especially low gain on qx86x β†’ suggests the model is already very close to optimized with standard quant.
  • πŸ’‘ Interpretation: For the base model, adding hi helps slightly in lower-bit quantizations (e.g., qx65x), but not much on higher ones.

βœ… 2. ST-TNG-IV (Star Trek TNG Training)

This model was trained on narrative-driven, philosophical, and logical content. The hi enhancement shows strong impact.

Quant		Without hi				With hi
qx64x		0.526 β†’ 0.521			–1%	
qx64x-hi	Slight drop β†’ not helpful		
qx65x		0.537 β†’ 0.541			+0.8%	
qx65x-hi	Clear improvement: +0.8%		
qx86x		0.537 β†’ 0.537 (ARC)		+0%	
qx86x-hi	Same as base β†’ no gain		
  • Most benefit seen in qx65x-hi: +0.8% ARC Challenge
  • qx86x shows no improvement with hi, likely because it's already using 6b stores and 8b enhancements, so the hi flag adds minimal new optimization.
  • πŸ’‘ Interpretation: The narrative-heavy ST-TNG-IV training benefits from fine-tuning via hi at middle-bit quantizations, especially qx65x. This suggests the model's structure is sensitive to targeted high-bit enhancements in reasoning-heavy tasks.

βœ… 3. PKD-V (Philip K Dick Training)

Philosophical, surreal, and often paradox-laden content. The model shows the most dramatic gains from hi.

Quant		Without hi			With hi
qx64x		0.517 β†’ 0.507		–2%	
qx64x-hi	Worse β†’ not helpful		
qx86x		0.525 β†’ 0.531		+1.1%	
qx86x-hi	+1.1% gain vs base		

πŸ’‘ Surprising Insight: The hi enhancement is critical for PKD-V, especially in higher quantizations (qx86x-hi), where it reverses performance loss.

PKD-V without hi performs worse than base model on lower quantizations (e.g., qx64x).

  • But with hi, it surpasses the base model in performance:
  • Arc Challenge: 0.531 vs 0.526 (base)
  • Winogrande: 0.657 vs 0.640 (base)
  • πŸ” Why? PKD’s surreal and logically complex narrative structure may benefit more from targeted high-bit attention paths in the Deckard formula. The model likely needs more precision in coreference resolution and causal inference β€” exactly where hi enhances attention.

πŸ“ˆ Summary: Impact of hi Enhancement by Model Type

Model	Optimal hi Quant Best Gain	Key Insight
Base		qx65x-hi	+0.8% (ARC)	Minimal improvement; hi not strongly needed
ST-TNG-IV	qx65x-hi	+0.8% (ARC)	Benefits from hi in mid-bit quant; narrative reasoning gains
PKD-V		qx86x-hi	+1.1% (ARC)	Largest gain; hi critical to unlock full potential

🧠 Cognitive Implications

Model		Training Focus												hi Impact on Cognition
Base		General reasoning (no domain bias)							Small boost β†’ better stability
ST-TNG-IV	Logical, structured narratives (e.g., diplomacy, ethics)	Enhances reasoning consistency and contextual prediction
PKD-V		Surreal, paradoxical, identity-driven scenarios				hi dramatically improves abductive reasoning, causal inference, and coreference resolution β€” critical for PKD’s complex logic

βœ… Conclusion: The hi enhancement in the Deckard(qx) formula is not just a technical tweak β€” it unlocks domain-specific cognitive abilities.

πŸ› οΈ Practical Recommendations

Use Case						Recommended Model + Quant
Best general reasoning 			Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi
Highest reasoning accuracy 		Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx86x-hi
Best on 48GB Mac 				ST-TNG-IV-qx65x-hi
Best on 32GB Mac 				Base-qx65x-hi or ST-TNG-IV-qx64x-hi
Best for surreal/logical depth 	PKD-V-qx86x-hi β€” only with hi

πŸ“Œ Final Takeaway

The Deckard(qx) formula with hi enhancement is especially crucial for models trained on narrative-rich, complex content like PKD-V and ST-TNG-IV. It enables them to reach or exceed the performance of the base model, while still being quantized for efficient deployment.

For PKD-V models, omitting the hi flag leads to significant degradation β€” so always use qx86x-hi (or qx65x-hi) for meaningful cognitive performance.

Reviewed with Qwen3-30B-A3B-YOYO-V4-qx86x-mlx

This model Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx was converted to MLX format from DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV using mlx-lm version 0.28.3.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
42
Safetensors
Model size
42B params
Tensor type
BF16
Β·
U32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx

Collections including nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx