Update README.md
Browse files
README.md
CHANGED
|
@@ -53,39 +53,60 @@ The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models
|
|
| 53 |
|
| 54 |
A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.
|
| 55 |
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
| 61 |
-
|
| 62 |
-
|
|
| 63 |
-
|
|
| 64 |
-
|
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
|
| 69 |
-
|
| 70 |
-
|
|
| 71 |
-
|
|
| 72 |
-
|
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
|
| 83 |
-
|
| 84 |
-
|
|
| 85 |
-
|
|
| 86 |
-
|
|
| 87 |
-
|
| 88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
## Key Features
|
| 91 |
|
|
|
|
| 53 |
|
| 54 |
A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.
|
| 55 |
|
| 56 |
+
## Artificial Analysis Intelligence Index (AAII) Benchmarks
|
| 57 |
+
|
| 58 |
+
### Core Performance Metrics
|
| 59 |
+
|
| 60 |
+
| Benchmark | aquif-3.5-Plus (Non-Reasoning) | aquif-3.5-Plus (Reasoning) | aquif-3.5-Max |
|
| 61 |
+
|-----------|--------------------------------|----------------------------|----------------|
|
| 62 |
+
| MMLU-Pro | 80.2 | 82.8 | 85.4 |
|
| 63 |
+
| GPQA Diamond | 72.1 | 79.7 | 83.2 |
|
| 64 |
+
| AIME 2025 | 64.7 | 90.3 | 94.6 |
|
| 65 |
+
| LiveCodeBench | 50.5 | 76.4 | 81.6 |
|
| 66 |
+
| Humanity's Last Exam | 4.3 | 12.1 | 15.6 |
|
| 67 |
+
| TAU2-Telecom | 34.2 | 41.5 | 51.3 |
|
| 68 |
+
| IFBench | 39.3 | 54.3 | 65.4 |
|
| 69 |
+
| TerminalBench-Hard | 10.1 | 15.2 | 23.9 |
|
| 70 |
+
| AA-LCR | 30.4 | 59.9 | 61.2 |
|
| 71 |
+
| SciCode | 29.5 | 35.7 | 40.9 |
|
| 72 |
+
| **AAII Composite Score** | **42 (41.53)** | **55 (54.79)** | **60 (60.31)** |
|
| 73 |
+
|
| 74 |
+
### Comparable Models by Configuration
|
| 75 |
+
|
| 76 |
+
**aquif-3.5-Plus (Non-Reasoning) — AAII 42**
|
| 77 |
+
|
| 78 |
+
| Model | AAII Score |
|
| 79 |
+
|-------|-----------|
|
| 80 |
+
| GPT-5 mini | 42 |
|
| 81 |
+
| Claude Haiku 4.5 | 42 |
|
| 82 |
+
| Gemini 2.5 Flash Lite 2509 | 42 |
|
| 83 |
+
| **aquif-3.5-Plus (Non-Reasoning)** | **42** |
|
| 84 |
+
| DeepSeek V3 0324 | 41 |
|
| 85 |
+
| Qwen3 VL 32B Instruct | 41 |
|
| 86 |
+
| Qwen3 Coder 480B A35B | 42 |
|
| 87 |
+
|
| 88 |
+
**aquif-3.5-Plus (Reasoning) — AAII 55**
|
| 89 |
+
|
| 90 |
+
| Model | AAII Score |
|
| 91 |
+
|-------|-----------|
|
| 92 |
+
| GLM-4.6 | 56 |
|
| 93 |
+
| Gemini 2.5 Flash 2509 | 54 |
|
| 94 |
+
| Claude Haiku 4.5 | 55 |
|
| 95 |
+
| **aquif-3.5-Plus (Reasoning)** | **55** |
|
| 96 |
+
| Qwen3 Next 80B A3B | 54 |
|
| 97 |
+
|
| 98 |
+
**aquif-3.5-Max — AAII 60**
|
| 99 |
+
|
| 100 |
+
| Model | AAII Score |
|
| 101 |
+
|-------|-----------|
|
| 102 |
+
| Gemini 2.5 Pro | 60 |
|
| 103 |
+
| Grok 4 Fast | 60 |
|
| 104 |
+
| **aquif-3.5-Max** | **60** |
|
| 105 |
+
| MiniMax-M2 | 61 |
|
| 106 |
+
| gpt-oss-120B high | 61 |
|
| 107 |
+
| GPT-5 mini | 61 |
|
| 108 |
+
| DeepSeek-V3.1-Terminus | 58 |
|
| 109 |
+
| Claude Opus 4.1 | 59 |
|
| 110 |
|
| 111 |
## Key Features
|
| 112 |
|