zengw commited on
Commit
e9d2aa9
Β·
verified Β·
1 Parent(s): 4623d0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -52
README.md CHANGED
@@ -5,7 +5,9 @@ library_name: transformers
5
  ---
6
 
7
  ### UI-Venus
8
- This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833). UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
 
 
9
 
10
 
11
 
@@ -194,57 +196,6 @@ Scores are in percentage (%). `T` = Text, `I` = Icon.
194
  > πŸ” **Experimental results show that UI-Venus-Ground-72B achieves state-of-the-art performance on ScreenSpot-Pro with an average score of 61.7, while also setting new benchmarks on ScreenSpot-v2(95.3), OSWorld_G(69.8), AgentCPM(84.7), and UI-Vision(38.0), highlighting its effectiveness in complex visual grounding and action prediction tasks.**
195
 
196
 
197
- ### Results on AndroidWorld
198
- This is the compressed package of validation trajectories for **AndroidWorld**, including execution logs and navigation paths.
199
- πŸ“₯ Download: [UI-Venus-androidworld.zip](vis_androidworld/UI-Venus-androidworld.zip)
200
-
201
- | Models | With Planner | A11y Tree | Screenshot | Success Rate (pass@1) |
202
- |--------|--------------|-----------|------------|------------------------|
203
- | **Closed-source Models** | | | | |
204
- | GPT-4o| ❌ | βœ… | ❌ | 30.6 |
205
- | ScaleTrack| ❌ | βœ… | ❌ | 44.0 |
206
- | SeedVL-1.5 | ❌ | βœ… | βœ… | 62.1 |
207
- | UI-TARS-1.5 | ❌ | ❌ | βœ… | 64.2 |
208
- | **Open-source Models** | | | | |
209
- | GUI-Critic-R1-7B | ❌ | βœ… | βœ… | 27.6 |
210
- | Qwen2.5-VL-72B* | ❌ | ❌ | βœ… | 35.0 |
211
- | UGround | βœ… | ❌ | βœ… | 44.0 |
212
- | Aria-UI | βœ… | ❌ | βœ… | 44.8 |
213
- | UI-TARS-72B | ❌ | ❌ | βœ… | 46.6 |
214
- | GLM-4.5v | ❌ | ❌ | βœ… | 57.0 |
215
- | **Ours** | | | | |
216
- | UI-Venus-Navi-7B | ❌ | ❌ | βœ… | **49.1** |
217
- | UI-Venus-Navi-72B | ❌ | ❌ | βœ… | **65.9** |
218
-
219
- > **Table:** Performance comparison on **AndroidWorld** for end-to-end models. Our UI-Venus-Navi-72B achieves state-of-the-art performance, outperforming all baseline methods across different settings.
220
-
221
-
222
- ### Results on AndroidControl and GUI-Odyssey
223
-
224
- | Models | AndroidControl-Low<br>Type Acc. | AndroidControl-Low<br>Step SR | AndroidControl-High<br>Type Acc. | AndroidControl-High<br>Step SR | GUI-Odyssey<br>Type Acc. | GUI-Odyssey<br>Step SR |
225
- |--------|-------------------------------|-----------------------------|-------------------------------|-----------------------------|------------------------|----------------------|
226
- | **Closed-source Models** | | | | | | |
227
- | GPT-4o | 74.3 | 19.4 | 66.3 | 20.8 | 34.3 | 3.3 |
228
- | **Open Source Models** | | | | | | |
229
- | Qwen2.5-VL-7B | 94.1 | 85.0 | 75.1 | 62.9 | 59.5 | 46.3 |
230
- | SeeClick | 93.0 | 75.0 | 82.9 | 59.1 | 71.0 | 53.9 |
231
- | OS-Atlas-7B | 93.6 | 85.2 | 85.2 | 71.2 | 84.5 | 62.0 |
232
- | Aguvis-7B| - | 80.5 | - | 61.5 | - | - |
233
- | Aguvis-72B| - | 84.4 | - | 66.4 | - | - |
234
- | OS-Genesis-7B | 90.7 | 74.2 | 66.2 | 44.5 | - | - |
235
- | UI-TARS-7B| 98.0 | 90.8 | 83.7 | 72.5 | 94.6 | 87.0 |
236
- | UI-TARS-72B| **98.1** | 91.3 | 85.2 | 74.7 | **95.4** | **88.6** |
237
- | GUI-R1-7B| 85.2 | 66.5 | 71.6 | 51.7 | 65.5 | 38.8 |
238
- | NaviMaster-7B | 85.6 | 69.9 | 72.9 | 54.0 | - | - |
239
- | UI-AGILE-7B | 87.7 | 77.6 | 80.1 | 60.6 | - | - |
240
- | AgentCPM-GUI | 94.4 | 90.2 | 77.7 | 69.2 | 90.0 | 75.0 |
241
- | **Ours** | | | | | | |
242
- | UI-Venus-Navi-7B | 97.1 | 92.4 | **86.5** | 76.1 | 87.3 | 71.5 |
243
- | UI-Venus-Navi-72B | 96.7 | **92.9** | 85.9 | **77.2** | 87.2 | 72.4 |
244
-
245
- > **Table:** Performance comparison on offline UI navigation datasets including AndroidControl and GUI-Odyssey. Note that models with * are reproduced.
246
-
247
-
248
  # Citation
249
  Please consider citing if you find our work useful:
250
  ```plain
 
5
  ---
6
 
7
  ### UI-Venus
8
+ This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833).
9
+
10
+ UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
11
 
12
 
13
 
 
196
  > πŸ” **Experimental results show that UI-Venus-Ground-72B achieves state-of-the-art performance on ScreenSpot-Pro with an average score of 61.7, while also setting new benchmarks on ScreenSpot-v2(95.3), OSWorld_G(69.8), AgentCPM(84.7), and UI-Vision(38.0), highlighting its effectiveness in complex visual grounding and action prediction tasks.**
197
 
198
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
199
  # Citation
200
  Please consider citing if you find our work useful:
201
  ```plain