aquiffoo commited on
Commit
79b2887
·
verified ·
1 Parent(s): addd874

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -33
README.md CHANGED
@@ -53,39 +53,60 @@ The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models
53
 
54
  A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.
55
 
56
- **Non-Thinking Mode Performance:**
57
-
58
- | Metric | aquif-3.5-Plus (30B A3B) | Ling-flash-2.0 (103B A6B) | Qwen3-Instruct-2507 (30B A3B) | Kimi-Linear (49B A3B) | gpt-oss-low (120B A5B) |
59
- |--------|--------------------------|---------------------------|-------------------------------|----------------------|----------------------|
60
- | MMLU-Pro | 80.2 | 77.1 | 78.4 | 72.7 | 74.1 |
61
- | GPQA Diamond | 72.1 | 68.1 | 70.4 | 71.7 | 67.1 |
62
- | AIME 2025 | 64.7 | 56.6 | 61.3 | 58.6 | 50.4 |
63
- | LiveCodeBench | 50.5 | 51.4 | 43.2 | 45.7 | 42.7 |
64
- | **Average** | **66.9** | **66.3** | **66.3** | **62.2** | **58.6** |
65
-
66
- **Thinking Mode Performance:**
67
-
68
- | Metric | aquif-3.5-Plus (30B A3B) | Qwen3-Next-Thinking (80B A3B) | Ling-flash-2.0 (103B A6B) | gpt-oss (120B A5B) | aquif-3.5 (12B A4B) |
69
- |--------|--------------------------|-------------------------------|---------------------------|-------------------|-------------------|
70
- | MMLU-Pro | 82.8 | 82.7 | 78.3 | 81.4 | 78.5 |
71
- | GPQA Diamond | 79.7 | 77.2 | 75.3 | 73.1 | 70.8 |
72
- | AIME 2025 | 90.3 | 87.8 | 87.0 | 80.0 | 84.4 |
73
- | LiveCodeBench | 76.4 | 68.7 | 70.8 | 66.5 | 66.1 |
74
- | **Average** | **82.3** | **79.1** | **77.9** | **75.3** | **75.0** |
75
-
76
- ### aquif-3.5-Max (Frontier Reasoning Model)
77
-
78
- A reasoning-only frontier model delivering exceptional performance across all categories. With optimized architecture specifically designed for complex problem-solving, aquif-3.5-Max achieves benchmark results competitive with or exceeding significantly larger models.
79
-
80
- **Performance Comparison:**
81
-
82
- | Metric | aquif-3.5-Max (42B A3B) | Claude Sonnet 4.5 | DeepSeek-V3.2 (685B A37B) | Gemini 2.5 Pro | GPT-5 (high) |
83
- |--------|------------------------|-------------------|---------------------------|----------------|----------- |
84
- | MMLU-Pro | 85.4 | 82.7 | 85.0 | 83.7 | 87.4 |
85
- | GPQA Diamond | 83.2 | 83.4 | 79.9 | 86.4 | 85.7 |
86
- | AIME 2025 | 94.6 | 87.0 | 89.3 | 88.0 | 94.6 |
87
- | LiveCodeBench | 81.6 | 71.3 | 74.1 | 70.7 | 80.6 |
88
- | **Average** | **86.2** | **81.1** | **82.1** | **82.2** | **87.1** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
  ## Key Features
91
 
 
53
 
54
  A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.
55
 
56
+ ## Artificial Analysis Intelligence Index (AAII) Benchmarks
57
+
58
+ ### Core Performance Metrics
59
+
60
+ | Benchmark | aquif-3.5-Plus (Non-Reasoning) | aquif-3.5-Plus (Reasoning) | aquif-3.5-Max |
61
+ |-----------|--------------------------------|----------------------------|----------------|
62
+ | MMLU-Pro | 80.2 | 82.8 | 85.4 |
63
+ | GPQA Diamond | 72.1 | 79.7 | 83.2 |
64
+ | AIME 2025 | 64.7 | 90.3 | 94.6 |
65
+ | LiveCodeBench | 50.5 | 76.4 | 81.6 |
66
+ | Humanity's Last Exam | 4.3 | 12.1 | 15.6 |
67
+ | TAU2-Telecom | 34.2 | 41.5 | 51.3 |
68
+ | IFBench | 39.3 | 54.3 | 65.4 |
69
+ | TerminalBench-Hard | 10.1 | 15.2 | 23.9 |
70
+ | AA-LCR | 30.4 | 59.9 | 61.2 |
71
+ | SciCode | 29.5 | 35.7 | 40.9 |
72
+ | **AAII Composite Score** | **42 (41.53)** | **55 (54.79)** | **60 (60.31)** |
73
+
74
+ ### Comparable Models by Configuration
75
+
76
+ **aquif-3.5-Plus (Non-Reasoning) — AAII 42**
77
+
78
+ | Model | AAII Score |
79
+ |-------|-----------|
80
+ | GPT-5 mini | 42 |
81
+ | Claude Haiku 4.5 | 42 |
82
+ | Gemini 2.5 Flash Lite 2509 | 42 |
83
+ | **aquif-3.5-Plus (Non-Reasoning)** | **42** |
84
+ | DeepSeek V3 0324 | 41 |
85
+ | Qwen3 VL 32B Instruct | 41 |
86
+ | Qwen3 Coder 480B A35B | 42 |
87
+
88
+ **aquif-3.5-Plus (Reasoning) AAII 55**
89
+
90
+ | Model | AAII Score |
91
+ |-------|-----------|
92
+ | GLM-4.6 | 56 |
93
+ | Gemini 2.5 Flash 2509 | 54 |
94
+ | Claude Haiku 4.5 | 55 |
95
+ | **aquif-3.5-Plus (Reasoning)** | **55** |
96
+ | Qwen3 Next 80B A3B | 54 |
97
+
98
+ **aquif-3.5-Max — AAII 60**
99
+
100
+ | Model | AAII Score |
101
+ |-------|-----------|
102
+ | Gemini 2.5 Pro | 60 |
103
+ | Grok 4 Fast | 60 |
104
+ | **aquif-3.5-Max** | **60** |
105
+ | MiniMax-M2 | 61 |
106
+ | gpt-oss-120B high | 61 |
107
+ | GPT-5 mini | 61 |
108
+ | DeepSeek-V3.1-Terminus | 58 |
109
+ | Claude Opus 4.1 | 59 |
110
 
111
  ## Key Features
112