zenlm
/

zen-vl-4b-agent

Image-Text-to-Text

vision-language

function-calling

Model card Files Files and versions

zeekay commited on 15 days ago

Commit

1906c78

·

verified ·

1 Parent(s): 97ee200

Initialize zen-vl-4b-agent model card

Files changed (1) hide show

README.md +91 -0

README.md ADDED Viewed

	@@ -0,0 +1,91 @@

+---
+license: apache-2.0
+tags:
+- vision-language
+- multimodal
+- function-calling
+- visual-agents
+- qwen3-vl
+- zen
+language:
+- en
+- multilingual
+base_model:
+- Qwen/Qwen3-VL-4B-Instruct
+library_name: transformers
+pipeline_tag: image-text-to-text
+---
+# Zen Vl 4B Agent
+Zen VL 4B Agent - Vision-language model with function calling and tool use capabilities
+## Model Details
+- **Architecture**: Qwen3-VL
+- **Parameters**: 4B
+- **Context Window**: 256K tokens (expandable to 1M)
+- **License**: Apache 2.0
+- **Training**: Fine-tuned with Zen identity and function calling
+## Capabilities
+- 🎨 **Visual Understanding**: Image analysis, video comprehension, spatial reasoning
+- 📝 **OCR**: Text extraction in 32 languages
+- 🧠 **Multimodal Reasoning**: STEM, math, code generation
+- 🛠️ **Function Calling**: Tool use with visual context
+- 🤖 **Visual Agents**: GUI interaction, parameter extraction
+## Usage
+```python
+from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
+from PIL import Image
+# Load model
+model = Qwen3VLForConditionalGeneration.from_pretrained(
+    "zenlm/zen-vl-4b-agent",
+    device_map="auto"
+)
+processor = AutoProcessor.from_pretrained("zenlm/zen-vl-4b-agent")
+# Process image
+image = Image.open("example.jpg")
+prompt = "What's in this image?"
+messages = [{"role": "user", "content": prompt}]
+text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
+# Generate
+outputs = model.generate(**inputs, max_new_tokens=256)
+response = processor.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Links
+- 🌐 **Website**: [zenlm.org](https://zenlm.org)
+- 📚 **GitHub**: [zenlm/zen-vl](https://github.com/zenlm/zen-vl)
+- 📄 **Paper**: Coming soon
+- 🤗 **Model Family**: [zenlm](https://huggingface.co/zenlm)
+## Citation
+```bibtex
+@misc{zenvl2025,
+  title={Zen VL: Vision-Language Models with Integrated Function Calling},
+  author={Hanzo AI Team},
+  year={2025},
+  publisher={Zen Language Models},
+  url={https://github.com/zenlm/zen-vl}
+}
+```
+## License
+Apache 2.0
+---
+Created by [Hanzo AI](https://hanzo.ai) for the Zen model family.