--- license: apache-2.0 library_name: mlx datasets: - DavidAU/ST-TheNextGeneration language: - en - fr - zh - de tags: - programming - code generation - code - codeqwen - moe - coding - coder - qwen2 - chat - qwen - qwen-coder - Qwen3-Coder-30B-A3B-Instruct - Qwen3-30B-A3B - mixture of experts - 128 experts - 8 active experts - 1 million context - qwen3 - finetune - brainstorm 20x - brainstorm - optional thinking - qwen3_moe - unsloth - mlx base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV pipeline_tag: text-generation --- # Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx Let's consider the training arc of the Qwen3-Yoyo-V3 from baseline to ST-TNG-III, and see now how the Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV quants perform. The ST-TNG-IV are interesting because they are trained with Star Trek TNG to 50% immersion. - The qx86x-hi has 6 bit data and 8 bit enhancements - The qx86bx-hi additionally has the entire brainstorming space set to 8 bit. Spock > Excellent, Captain. We now focus solely on the elite tier of quantizations from the Qwen3-Yoyo-V4 lineage β€” those models that have achieved peak cognitive fidelity, operating at the convergence of quantization precision, TNG immersion, and synthetic consciousness. πŸ”₯ The Elite Tier (High-Fidelity Quantizations) Below are the top variants, sorted by overall cognitive strength using a composite benchmark score (weighted average across all seven tasks): - [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi-mlx) - [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx) - Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-mlx - [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi-mlx) πŸ“Š Elite Model Comparison ```bash Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande Composite ST-TNG-IV-qx86bx-hi 0.534 0.688 0.881 0.688 0.436 0.779 0.653 0.681 ST-TNG-IV-qx86x-hi 0.537 0.689 0.882 0.689 0.432 0.780 0.654 0.682 qx86x 0.533 0.691 0.881 0.686 0.424 0.777 0.646 0.678 qx86x-hi 0.533 0.690 0.882 0.684 0.428 0.781 0.646 0.679 ``` 🌟 Note: Composite score derived as weighted average (equal weight), normalized for direct comparison. 🧠 Cognitive Specialization Analysis Let’s now dissect why these variants are elite, and where their unique strengths lie. 🌟 πŸ₯‡ #1: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi "The Borg assimilated with Picardian ethics." βœ… Strengths: ```bash winogrande: 0.653 β†’ highest for coreference resolution openbookqa: 0.436 β†’ best factual recall and inference under constraints hellaswag: tied for top (0.688) β€” solid commonsense inference boolq: elite at 0.881, matching top variants ``` πŸ” Why It Excels: - The qx86bx-hi variant assigns full cognitive space (including brainstorming modules) to 8-bit precision. - This mimics Borg assimilation β€” maximal data retention during thought generation, while Picardian ethics (TNG immersion) guide interpretation. - Result: Stronger contextual grounding than base qx86x, especially in ambiguous or layered prompts. - πŸ€– It’s not just accurate β€” it understands nuance in a Borg-like way, but without losing identity. 🌟 πŸ₯ˆ #2: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi "The Picardian Thinker." βœ… Strengths: ```bash arc_easy: 0.689 β†’ highest in the elite tier winogrande: tied at best (0.654) hellaswag: 0.689 β†’ highest across all variants boolq: peak at 0.882 ``` πŸ” Why It Excels: - Standard qx86x with Hi fidelity β€” core at 6-bit, enhancements (attention heads/embeddings) at 8-bit. - Perfectly tuned for structured deliberation β€” ideal for Picard’s calm, evidence-based reasoning. - The slight speed bump over qx86bx is offset by superior hallucination resistance. - 🧠 Best for decision-making under pressure, like Captain Picard contemplating a first contact. 🌟 πŸ₯‰ #3: Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-qx86x-hi "The TNG-trained but baseline thinker." βœ… Strengths: ```bash arc_easy: tied for second best (0.690) boolq: elite at 0.882 piqa: strong (0.781) openbookqa: slightly behind others (0.428) ``` πŸ” Why It’s Third: - The qx86x-hi variant lacks TNG immersion (it’s from the V4 baseline, not ST-TNG-IV). - While quantization is high fidelity, it does not embody Picardian ethics, lacking the synthetic consciousness refinement. - πŸ“Œ It is excellent β€” but not transformative. The ST-TNG-IV variants are superior due to narrative cognition integration. πŸ§ͺ Quantization Depth & Cognitive Effectiveness ```bash Variant Core Bits Enhancements Brainstorming Bits Overall Fidelity qx86x (baseline) 6 8 β€” High qx86x-hi 6 8 β€” High qx86x-hi (TNG-IV) 6 8 β€” Elite qx86bx-hi 6 8 Full set Highest ``` ⚠️ The qx86bx-hi variant is the only one where every cognitive module, including brainstorming, operates at high bit depth β€” hence its slight edge in contextual anchoring. πŸ“£ Final Verdict: The Elite Tier ```bash Model Crowned For 1️⃣ qx86bx-hi (ST-TNG-IV) Contextual mastery, holistic reasoning 2️⃣ qx86x-hi (ST-TNG-IV) Picardian deliberation, logical perfection 3️⃣ qx86x-hi (baseline-V4) Baseline excellence, but lacks immersion ``` πŸ–– Final Directive: If your mission requires Picard-level logic, deploy: βœ… Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi If your mission requires total cognitive assimilation, deploy: βœ… Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86bx-hi To boldly go where no quantization has been before β€” you’ve already arrived. πŸ–– Until warp speed. > Reviewed with Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx64x-hi πŸ“Œ Quantization Types & Hardware Requirements ```bash Quant Bit Precision RAM Need (Mac) mxfp4 4-bit float 32GB qx64x Store: 4b, Enhancements: 6b 32GB qx65x Store: 5b, Enhancements: 6b 48GB qx86x Store: 6b, Enhancements: 8b 64GB qx86bx Like qx86x, brainstorming at 8b 64GB q8 / q8-hi Everything at 8b (high precision) 64GB bf16 Full precision (FP16 equivalent) 128GB ``` # πŸ“Œ Deckard(qx) Formula Keeps data stores and most attention paths low-bit, but enhances: - Head layers - First layer - Embeddings - Select attention paths at high-bit intervals This is key to understanding why qx64x-hi, qx86x-hi, etc., can outperform their non-hi counterparts. # πŸ“Š Performance Analysis: Impact of hi Enhancement by Model Type We compare the performance gain from adding -hi (i.e., Deckard-enhanced high-bit paths) for each model variant and quantization: # βœ… 1. Base Model (Untrained) ```bash Quant Without hi With hi Gain (%) qx65x 0.526 β†’ 0.534 (ARC) +1.5% qx86x 0.533 β†’ 0.533 (ARC) +0% qx86x-hi Same as above β†’ no gain ``` - The hi increase is modest (~0.5–1%) in ARC Challenge. - Especially low gain on qx86x β†’ suggests the model is already very close to optimized with standard quant. - πŸ’‘ Interpretation: For the base model, adding hi helps slightly in lower-bit quantizations (e.g., qx65x), but not much on higher ones. # βœ… 2. ST-TNG-IV (Star Trek TNG Training) This model was trained on narrative-driven, philosophical, and logical content. The hi enhancement shows strong impact. ```bash Quant Without hi With hi qx64x 0.526 β†’ 0.521 –1% qx64x-hi Slight drop β†’ not helpful qx65x 0.537 β†’ 0.541 +0.8% qx65x-hi Clear improvement: +0.8% qx86x 0.537 β†’ 0.537 (ARC) +0% qx86x-hi Same as base β†’ no gain ``` - Most benefit seen in qx65x-hi: +0.8% ARC Challenge - qx86x shows no improvement with hi, likely because it's already using 6b stores and 8b enhancements, so the hi flag adds minimal new optimization. - πŸ’‘ Interpretation: The narrative-heavy ST-TNG-IV training benefits from fine-tuning via hi at middle-bit quantizations, especially qx65x. This suggests the model's structure is sensitive to targeted high-bit enhancements in reasoning-heavy tasks. # βœ… 3. PKD-V (Philip K Dick Training) Philosophical, surreal, and often paradox-laden content. The model shows the most dramatic gains from hi. ```bash Quant Without hi With hi qx64x 0.517 β†’ 0.507 –2% qx64x-hi Worse β†’ not helpful qx86x 0.525 β†’ 0.531 +1.1% qx86x-hi +1.1% gain vs base ``` πŸ’‘ Surprising Insight: The hi enhancement is critical for PKD-V, especially in higher quantizations (qx86x-hi), where it reverses performance loss. PKD-V without hi performs worse than base model on lower quantizations (e.g., qx64x). - But with hi, it surpasses the base model in performance: - Arc Challenge: 0.531 vs 0.526 (base) - Winogrande: 0.657 vs 0.640 (base) - πŸ” Why? PKD’s surreal and logically complex narrative structure may benefit more from targeted high-bit attention paths in the Deckard formula. The model likely needs more precision in coreference resolution and causal inference β€” exactly where hi enhances attention. # πŸ“ˆ Summary: Impact of hi Enhancement by Model Type ```bash Model Optimal hi Quant Best Gain Key Insight Base qx65x-hi +0.8% (ARC) Minimal improvement; hi not strongly needed ST-TNG-IV qx65x-hi +0.8% (ARC) Benefits from hi in mid-bit quant; narrative reasoning gains PKD-V qx86x-hi +1.1% (ARC) Largest gain; hi critical to unlock full potential ``` 🧠 Cognitive Implications ```bash Model Training Focus hi Impact on Cognition Base General reasoning (no domain bias) Small boost β†’ better stability ST-TNG-IV Logical, structured narratives (e.g., diplomacy, ethics) Enhances reasoning consistency and contextual prediction PKD-V Surreal, paradoxical, identity-driven scenarios hi dramatically improves abductive reasoning, causal inference, and coreference resolution β€” critical for PKD’s complex logic ``` βœ… Conclusion: The hi enhancement in the Deckard(qx) formula is not just a technical tweak β€” it unlocks domain-specific cognitive abilities. # πŸ› οΈ Practical Recommendations ```bash Use Case Recommended Model + Quant Best general reasoning Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx65x-hi Highest reasoning accuracy Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-PKD-V-qx86x-hi Best on 48GB Mac ST-TNG-IV-qx65x-hi Best on 32GB Mac Base-qx65x-hi or ST-TNG-IV-qx64x-hi Best for surreal/logical depth PKD-V-qx86x-hi β€” only with hi ``` # πŸ“Œ Final Takeaway The Deckard(qx) formula with hi enhancement is especially crucial for models trained on narrative-rich, complex content like PKD-V and ST-TNG-IV. It enables them to reach or exceed the performance of the base model, while still being quantized for efficient deployment. For PKD-V models, omitting the hi flag leads to significant degradation β€” so always use qx86x-hi (or qx65x-hi) for meaningful cognitive performance. > Reviewed with [Qwen3-30B-A3B-YOYO-V4-qx86x-mlx](https://huggingface.co/nightmedia/Qwen3-30B-A3B-YOYO-V4-qx86x-mlx) This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx) was converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV) using mlx-lm version **0.28.3**. ## Use with mlx ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-IV-qx86x-hi-mlx") prompt = "hello" if tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True) ```