--- pipeline_tag: text-generation inference: false license: apache-2.0 library_name: transformers tags: - language - aquif - text-generation-inference - reasoning - math - coding - frontier - aquif-3.5 - moe language: - en - de - it - pt - fr - hi - es - th - zh - ja --- # aquif-3.5-Plus & aquif-3.5-Max The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models bring advanced reasoning capabilities and unprecedented context windows to achieve state-of-the-art performance for their respective categories. **aquif-3.5-Plus** combines hybrid reasoning with interchangeable thinking modes, offering flexibility for both speed-optimized and reasoning-intensive applications. **aquif-3.5-Max** represents frontier model capabilities with reasoning-only architecture, delivering exceptional performance across all benchmark categories. ## Model Repository Links | Model | HuggingFace Repository | |-------|----------------------| | aquif-3.5-Plus | [aquiffoo/aquif-3.5-Plus](https://huggingface.co/aquiffoo/aquif-3.5-Plus) | | aquif-3.5-Max | [aquiffoo/aquif-3.5-Max](https://huggingface.co/aquiffoo/aquif-3.5-Max) | ## Model Overview | Model | Total (B) | Active Params (B) | Reasoning | Context Window | Thinking Modes | |-------|-----------|-------------------|-----------|-----------------|----------------| | aquif-3.5-Plus | 30.5 | 3.3 | ✅ Hybrid | 1M | ✅ Interchangeable | | aquif-3.5-Max | 42.4 | 3.3 | ✅ Reasoning-Only | 1M | Reasoning-Only | ## Model Details ### aquif-3.5-Plus (Hybrid Reasoning with Interchangeable Modes) A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications. ## Artificial Analysis Intelligence Index (AAII) Benchmarks ### Core Performance Metrics | Benchmark | aquif-3.5-Plus (Non-Reasoning) | aquif-3.5-Plus (Reasoning) | aquif-3.5-Max | |-----------|--------------------------------|----------------------------|----------------| | MMLU-Pro | 80.2 | 82.8 | 85.4 | | GPQA Diamond | 72.1 | 79.7 | 83.2 | | AIME 2025 | 64.7 | 90.3 | 94.6 | | LiveCodeBench | 50.5 | 76.4 | 81.6 | | Humanity's Last Exam | 4.3 | 12.1 | 15.6 | | TAU2-Telecom | 34.2 | 41.5 | 51.3 | | IFBench | 39.3 | 54.3 | 65.4 | | TerminalBench-Hard | 10.1 | 15.2 | 23.9 | | AA-LCR | 30.4 | 59.9 | 61.2 | | SciCode | 29.5 | 35.7 | 40.9 | | **AAII Composite Score** | **42 (41.53)** | **55 (54.79)** | **60 (60.31)** | ### Comparable Models by Configuration **aquif-3.5-Plus (Non-Reasoning) — AAII 42** | Model | AAII Score | |-------|-----------| | GPT-5 mini | 42 | | Claude Haiku 4.5 | 42 | | Gemini 2.5 Flash Lite 2509 | 42 | | **aquif-3.5-Plus (Non-Reasoning)** | **42** | | DeepSeek V3 0324 | 41 | | Qwen3 VL 32B Instruct | 41 | | Qwen3 Coder 480B A35B | 42 | **aquif-3.5-Plus (Reasoning) — AAII 55** | Model | AAII Score | |-------|-----------| | GLM-4.6 | 56 | | Gemini 2.5 Flash 2509 | 54 | | Claude Haiku 4.5 | 55 | | **aquif-3.5-Plus (Reasoning)** | **55** | | Qwen3 Next 80B A3B | 54 | **aquif-3.5-Max — AAII 60** | Model | AAII Score | |-------|-----------| | Gemini 2.5 Pro | 60 | | Grok 4 Fast | 60 | | **aquif-3.5-Max** | **60** | | MiniMax-M2 | 61 | | gpt-oss-120B high | 61 | | GPT-5 mini | 61 | | DeepSeek-V3.1-Terminus | 58 | | Claude Opus 4.1 | 59 | ## Key Features **Massive Context Windows**: Both models support up to 1M tokens, enabling analysis of entire codebases, research papers, and extensive conversation histories without truncation. **Efficient Architecture**: Despite offering frontier-level performance, both models maintain exceptional efficiency through optimized mixture-of-experts design and active parameter count of just 3.3B. **Flexible Reasoning (Plus Only)**: aquif-3.5-Plus provides interchangeable thinking modes—enable reasoning for complex problems, disable for faster inference on straightforward tasks. **Multilingual Support**: Native support across English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese. ## Usage Recommendations **aquif-3.5-Plus:** - Complex reasoning requiring flexibility between speed and depth - Scientific analysis and mathematical problem-solving with thinking enabled - Rapid-response applications with thinking disabled - Code generation and review - Multilingual applications up to 1M token contexts **aquif-3.5-Max:** - Frontier-level problem-solving without compromise - Advanced research and scientific computing - Competition mathematics and algorithmic challenges - Comprehensive code analysis and generation - Complex multilingual tasks requiring maximum reasoning capability ## Setting Thinking Mode (aquif-3.5-Plus) Toggle between thinking and non-thinking modes by modifying the chat template: ``` set thinking = true # Enable reasoning mode set thinking = false # Disable thinking mode (faster inference) ``` Simply set the variable in your chat template before inference to switch modes. No model reloading required. ## Technical Specifications Both models support: - BF16 and FP16 precision - Mixture of Experts architecture optimizations - Efficient attention mechanisms with optimized KV caching - Up to 1M token context window - Multi-head attention with sparse routing ## Performance Highlights **aquif-3.5-Plus** achieves 82.3% average benchmark performance in thinking mode, surpassing models with 2-4x more total parameters. Non-thinking mode maintains competitive 66.9% performance for latency-sensitive applications. **aquif-3.5-Max** reaches 86.2% average performance, matching or exceeding frontier models while maintaining 42.4B total parameters—an extraordinary efficiency breakthrough. ## Acknowledgements - **Qwen Team**: Base architecture contributions - **Meta Llama Team**: Core model foundations - **Hugging Face**: Model hosting and training infrastructure ## License This project is released under the Apache 2.0 License. See LICENSE file for details. --- *Made in 🇧🇷* © 2025 aquif AI. All rights reserved.