Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,9 @@ library_name: transformers
|
|
| 5 |
---
|
| 6 |
|
| 7 |
### UI-Venus
|
| 8 |
-
This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833).
|
|
|
|
|
|
|
| 9 |
|
| 10 |
|
| 11 |
|
|
@@ -194,57 +196,6 @@ Scores are in percentage (%). `T` = Text, `I` = Icon.
|
|
| 194 |
> π **Experimental results show that UI-Venus-Ground-72B achieves state-of-the-art performance on ScreenSpot-Pro with an average score of 61.7, while also setting new benchmarks on ScreenSpot-v2(95.3), OSWorld_G(69.8), AgentCPM(84.7), and UI-Vision(38.0), highlighting its effectiveness in complex visual grounding and action prediction tasks.**
|
| 195 |
|
| 196 |
|
| 197 |
-
### Results on AndroidWorld
|
| 198 |
-
This is the compressed package of validation trajectories for **AndroidWorld**, including execution logs and navigation paths.
|
| 199 |
-
π₯ Download: [UI-Venus-androidworld.zip](vis_androidworld/UI-Venus-androidworld.zip)
|
| 200 |
-
|
| 201 |
-
| Models | With Planner | A11y Tree | Screenshot | Success Rate (pass@1) |
|
| 202 |
-
|--------|--------------|-----------|------------|------------------------|
|
| 203 |
-
| **Closed-source Models** | | | | |
|
| 204 |
-
| GPT-4o| β | β
| β | 30.6 |
|
| 205 |
-
| ScaleTrack| β | β
| β | 44.0 |
|
| 206 |
-
| SeedVL-1.5 | β | β
| β
| 62.1 |
|
| 207 |
-
| UI-TARS-1.5 | β | β | β
| 64.2 |
|
| 208 |
-
| **Open-source Models** | | | | |
|
| 209 |
-
| GUI-Critic-R1-7B | β | β
| β
| 27.6 |
|
| 210 |
-
| Qwen2.5-VL-72B* | β | β | β
| 35.0 |
|
| 211 |
-
| UGround | β
| β | β
| 44.0 |
|
| 212 |
-
| Aria-UI | β
| β | β
| 44.8 |
|
| 213 |
-
| UI-TARS-72B | β | β | β
| 46.6 |
|
| 214 |
-
| GLM-4.5v | β | β | β
| 57.0 |
|
| 215 |
-
| **Ours** | | | | |
|
| 216 |
-
| UI-Venus-Navi-7B | β | β | β
| **49.1** |
|
| 217 |
-
| UI-Venus-Navi-72B | β | β | β
| **65.9** |
|
| 218 |
-
|
| 219 |
-
> **Table:** Performance comparison on **AndroidWorld** for end-to-end models. Our UI-Venus-Navi-72B achieves state-of-the-art performance, outperforming all baseline methods across different settings.
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
### Results on AndroidControl and GUI-Odyssey
|
| 223 |
-
|
| 224 |
-
| Models | AndroidControl-Low<br>Type Acc. | AndroidControl-Low<br>Step SR | AndroidControl-High<br>Type Acc. | AndroidControl-High<br>Step SR | GUI-Odyssey<br>Type Acc. | GUI-Odyssey<br>Step SR |
|
| 225 |
-
|--------|-------------------------------|-----------------------------|-------------------------------|-----------------------------|------------------------|----------------------|
|
| 226 |
-
| **Closed-source Models** | | | | | | |
|
| 227 |
-
| GPT-4o | 74.3 | 19.4 | 66.3 | 20.8 | 34.3 | 3.3 |
|
| 228 |
-
| **Open Source Models** | | | | | | |
|
| 229 |
-
| Qwen2.5-VL-7B | 94.1 | 85.0 | 75.1 | 62.9 | 59.5 | 46.3 |
|
| 230 |
-
| SeeClick | 93.0 | 75.0 | 82.9 | 59.1 | 71.0 | 53.9 |
|
| 231 |
-
| OS-Atlas-7B | 93.6 | 85.2 | 85.2 | 71.2 | 84.5 | 62.0 |
|
| 232 |
-
| Aguvis-7B| - | 80.5 | - | 61.5 | - | - |
|
| 233 |
-
| Aguvis-72B| - | 84.4 | - | 66.4 | - | - |
|
| 234 |
-
| OS-Genesis-7B | 90.7 | 74.2 | 66.2 | 44.5 | - | - |
|
| 235 |
-
| UI-TARS-7B| 98.0 | 90.8 | 83.7 | 72.5 | 94.6 | 87.0 |
|
| 236 |
-
| UI-TARS-72B| **98.1** | 91.3 | 85.2 | 74.7 | **95.4** | **88.6** |
|
| 237 |
-
| GUI-R1-7B| 85.2 | 66.5 | 71.6 | 51.7 | 65.5 | 38.8 |
|
| 238 |
-
| NaviMaster-7B | 85.6 | 69.9 | 72.9 | 54.0 | - | - |
|
| 239 |
-
| UI-AGILE-7B | 87.7 | 77.6 | 80.1 | 60.6 | - | - |
|
| 240 |
-
| AgentCPM-GUI | 94.4 | 90.2 | 77.7 | 69.2 | 90.0 | 75.0 |
|
| 241 |
-
| **Ours** | | | | | | |
|
| 242 |
-
| UI-Venus-Navi-7B | 97.1 | 92.4 | **86.5** | 76.1 | 87.3 | 71.5 |
|
| 243 |
-
| UI-Venus-Navi-72B | 96.7 | **92.9** | 85.9 | **77.2** | 87.2 | 72.4 |
|
| 244 |
-
|
| 245 |
-
> **Table:** Performance comparison on offline UI navigation datasets including AndroidControl and GUI-Odyssey. Note that models with * are reproduced.
|
| 246 |
-
|
| 247 |
-
|
| 248 |
# Citation
|
| 249 |
Please consider citing if you find our work useful:
|
| 250 |
```plain
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
### UI-Venus
|
| 8 |
+
This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833).
|
| 9 |
+
|
| 10 |
+
UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
|
| 11 |
|
| 12 |
|
| 13 |
|
|
|
|
| 196 |
> π **Experimental results show that UI-Venus-Ground-72B achieves state-of-the-art performance on ScreenSpot-Pro with an average score of 61.7, while also setting new benchmarks on ScreenSpot-v2(95.3), OSWorld_G(69.8), AgentCPM(84.7), and UI-Vision(38.0), highlighting its effectiveness in complex visual grounding and action prediction tasks.**
|
| 197 |
|
| 198 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 199 |
# Citation
|
| 200 |
Please consider citing if you find our work useful:
|
| 201 |
```plain
|