aquif-ai
/

aquif-3.5-Plus-30B-A3B

@@ -53,39 +53,60 @@ The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models
 A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.
-**Non-Thinking Mode Performance:**
-| Metric | aquif-3.5-Plus (30B A3B) | Ling-flash-2.0 (103B A6B) | Qwen3-Instruct-2507 (30B A3B) | Kimi-Linear (49B A3B) | gpt-oss-low (120B A5B) |
-|--------|--------------------------|---------------------------|-------------------------------|----------------------|----------------------|
-| MMLU-Pro | 80.2 | 77.1 | 78.4 | 72.7 | 74.1 |
-| GPQA Diamond | 72.1 | 68.1 | 70.4 | 71.7 | 67.1 |
-| AIME 2025 | 64.7 | 56.6 | 61.3 | 58.6 | 50.4 |
-| LiveCodeBench | 50.5 | 51.4 | 43.2 | 45.7 | 42.7 |
-| **Average** | **66.9** | **66.3** | **66.3** | **62.2** | **58.6** |
-**Thinking Mode Performance:**
-| Metric | aquif-3.5-Plus (30B A3B) | Qwen3-Next-Thinking (80B A3B) | Ling-flash-2.0 (103B A6B) | gpt-oss (120B A5B) | aquif-3.5 (12B A4B) |
-|--------|--------------------------|-------------------------------|---------------------------|-------------------|-------------------|
-| MMLU-Pro | 82.8 | 82.7 | 78.3 | 81.4 | 78.5 |
-| GPQA Diamond | 79.7 | 77.2 | 75.3 | 73.1 | 70.8 |
-| AIME 2025 | 90.3 | 87.8 | 87.0 | 80.0 | 84.4 |
-| LiveCodeBench | 76.4 | 68.7 | 70.8 | 66.5 | 66.1 |
-| **Average** | **82.3** | **79.1** | **77.9** | **75.3** | **75.0** |
-### aquif-3.5-Max (Frontier Reasoning Model)
-A reasoning-only frontier model delivering exceptional performance across all categories. With optimized architecture specifically designed for complex problem-solving, aquif-3.5-Max achieves benchmark results competitive with or exceeding significantly larger models.
-**Performance Comparison:**
-| Metric | aquif-3.5-Max (42B A3B) | Claude Sonnet 4.5 | DeepSeek-V3.2 (685B A37B) | Gemini 2.5 Pro | GPT-5 (high) |
-|--------|------------------------|-------------------|---------------------------|----------------|-----------  |
-| MMLU-Pro | 85.4 | 82.7 | 85.0 | 83.7 | 87.4 |
-| GPQA Diamond | 83.2 | 83.4 | 79.9 | 86.4 | 85.7 |
-| AIME 2025 | 94.6 | 87.0 | 89.3 | 88.0 | 94.6 |
-| LiveCodeBench | 81.6 | 71.3 | 74.1 | 70.7 | 80.6 |
-| **Average** | **86.2** | **81.1** | **82.1** | **82.2** | **87.1** |
 ## Key Features

 A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.
+## Artificial Analysis Intelligence Index (AAII) Benchmarks
+### Core Performance Metrics
+| Benchmark | aquif-3.5-Plus (Non-Reasoning) | aquif-3.5-Plus (Reasoning) | aquif-3.5-Max |
+|-----------|--------------------------------|----------------------------|----------------|
+| MMLU-Pro | 80.2 | 82.8 | 85.4 |
+| GPQA Diamond | 72.1 | 79.7 | 83.2 |
+| AIME 2025 | 64.7 | 90.3 | 94.6 |
+| LiveCodeBench | 50.5 | 76.4 | 81.6 |
+| Humanity's Last Exam | 4.3 | 12.1 | 15.6 |
+| TAU2-Telecom | 34.2 | 41.5 | 51.3 |
+| IFBench | 39.3 | 54.3 | 65.4 |
+| TerminalBench-Hard | 10.1 | 15.2 | 23.9 |
+| AA-LCR | 30.4 | 59.9 | 61.2 |
+| SciCode | 29.5 | 35.7 | 40.9 |
+| **AAII Composite Score** | **42 (41.53)** | **55 (54.79)** | **60 (60.31)** |
+### Comparable Models by Configuration
+**aquif-3.5-Plus (Non-Reasoning) — AAII 42**
+| Model | AAII Score |
+|-------|-----------|
+| GPT-5 mini | 42 |
+| Claude Haiku 4.5 | 42 |
+| Gemini 2.5 Flash Lite 2509 | 42 |
+| **aquif-3.5-Plus (Non-Reasoning)** | **42** |
+| DeepSeek V3 0324 | 41 |
+| Qwen3 VL 32B Instruct | 41 |
+| Qwen3 Coder 480B A35B | 42 |
+**aquif-3.5-Plus (Reasoning) — AAII 55**
+| Model | AAII Score |
+|-------|-----------|
+| GLM-4.6 | 56 |
+| Gemini 2.5 Flash 2509 | 54 |
+| Claude Haiku 4.5 | 55 |
+| **aquif-3.5-Plus (Reasoning)** | **55** |
+| Qwen3 Next 80B A3B | 54 |
+**aquif-3.5-Max — AAII 60**
+| Model | AAII Score |
+|-------|-----------|
+| Gemini 2.5 Pro | 60 |
+| Grok 4 Fast | 60 |
+| **aquif-3.5-Max** | **60** |
+| MiniMax-M2 | 61 |
+| gpt-oss-120B high | 61 |
+| GPT-5 mini | 61 |
+| DeepSeek-V3.1-Terminus | 58 |
+| Claude Opus 4.1 | 59 |
 ## Key Features