microsoft
/

GUI-Actor-3B-Qwen2.5-VL

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions Community

qianhuiwu commited on 17 days ago

Commit

ad7221a

·

verified ·

1 Parent(s): d340fee

add dataset link.

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ pipeline_tag: image-text-to-text
 # GUI-Actor-7B with Qwen2.5-VL-7B as backbone VLM
 This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://huggingface.co/papers/2506.03143).
-It is developed based on [Qwen2.5-VL-3B-Instruct ](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here (coming soon)]().
 For more details on model design and evaluation, please check: [🏠 Project Page](https://microsoft.github.io/GUI-Actor/) | [💻 Github Repo](https://github.com/microsoft/GUI-Actor) | [📑 Paper](https://www.arxiv.org/pdf/2506.03143).

 # GUI-Actor-7B with Qwen2.5-VL-7B as backbone VLM
 This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://huggingface.co/papers/2506.03143).
+It is developed based on [Qwen2.5-VL-3B-Instruct ](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here](https://huggingface.co/datasets/cckevinn/GUI-Actor-Data).
 For more details on model design and evaluation, please check: [🏠 Project Page](https://microsoft.github.io/GUI-Actor/) | [💻 Github Repo](https://github.com/microsoft/GUI-Actor) | [📑 Paper](https://www.arxiv.org/pdf/2506.03143).