Minstrel54524 commited on
Commit
bbb453a
·
verified ·
1 Parent(s): 6ad8bbe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -22,14 +22,14 @@ base_model:
22
  * **Version:** 1.0
23
  * **Model Type:** GUI Grounding / UI Element Localization
24
  * **Developers:** Jikai Chen, Long Chen, Dong Wang, Zhixuan Chu, Qinglin Su, Leilei Gan, Chenyi Zhuang, Jinjie Gu
25
- * **Paper:** [V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task](https://arxiv.org/abs/2508.13634)
26
- * **Repository:** [Github](https://github.com/inclusionAI/AgenticLearning/tree/main/V2P)
27
 
28
  ### Model Description
29
 
30
  **V2P (Valley-to-Peak)** is an advanced model designed for robust and precise Graphical User Interface (GUI) element localization (grounding). In the field of GUI automation agents, accurately identifying interactive elements on a screen is critical. Traditional methods like bounding box regression or center-point prediction often overlook the spatial uncertainty of interaction and the hierarchical visual-semantic relationships, leading to insufficient localization accuracy.
31
 
32
- The V2P model was developed to address two major pain points in existing methods:
33
  1. **Attention Drift due to Background Interference:** The model's attention mistakenly disperses to irrelevant background areas.
34
  2. **Imprecise Click Locations:** The model fails to distinguish between the center and the edges of a target element, leading to interaction failures.
35
 
@@ -103,4 +103,4 @@ output_ids = generated_ids[0][input_token_len:]
103
  output_text = processor.decode(output_ids, skip_special_tokens=True)
104
 
105
  print(output_text)
106
- # For more visualization code, please refer to the code in the V2P GitHub repository.
 
22
  * **Version:** 1.0
23
  * **Model Type:** GUI Grounding / UI Element Localization
24
  * **Developers:** Jikai Chen, Long Chen, Dong Wang, Zhixuan Chu, Qinglin Su, Leilei Gan, Chenyi Zhuang, Jinjie Gu
25
+
26
+ [![Paper](https://img.shields.io/badge/arXiv-2508.13634-b31b1b.svg)](https://arxiv.org/abs/2508.13634) [![Code](https://img.shields.io/badge/GitHub-Repository-blue.svg?logo=github)](https://github.com/inclusionAI/AgenticLearning/tree/main/V2P)
27
 
28
  ### Model Description
29
 
30
  **V2P (Valley-to-Peak)** is an advanced model designed for robust and precise Graphical User Interface (GUI) element localization (grounding). In the field of GUI automation agents, accurately identifying interactive elements on a screen is critical. Traditional methods like bounding box regression or center-point prediction often overlook the spatial uncertainty of interaction and the hierarchical visual-semantic relationships, leading to insufficient localization accuracy.
31
 
32
+ The V2P model was developed to address two major pain points in existing visual methods:
33
  1. **Attention Drift due to Background Interference:** The model's attention mistakenly disperses to irrelevant background areas.
34
  2. **Imprecise Click Locations:** The model fails to distinguish between the center and the edges of a target element, leading to interaction failures.
35
 
 
103
  output_text = processor.decode(output_ids, skip_special_tokens=True)
104
 
105
  print(output_text)
106
+ # For more visualization code, please refer to the code in the V2P GitHub repository...