--- license: apache-2.0 tags: - vision-language - multimodal - function-calling - visual-agents - qwen3-vl - zen language: - en - multilingual base_model: - Qwen/Qwen3-VL-4B-Instruct library_name: transformers pipeline_tag: image-text-to-text --- # Zen Vl 4B Agent Zen VL 4B Agent - Vision-language model with function calling and tool use capabilities ## Model Details - **Architecture**: Qwen3-VL - **Parameters**: 4B - **Context Window**: 256K tokens (expandable to 1M) - **License**: Apache 2.0 - **Training**: Fine-tuned with Zen identity and function calling ## Capabilities - 🎨 **Visual Understanding**: Image analysis, video comprehension, spatial reasoning - 📝 **OCR**: Text extraction in 32 languages - 🧠 **Multimodal Reasoning**: STEM, math, code generation - 🛠️ **Function Calling**: Tool use with visual context - 🤖 **Visual Agents**: GUI interaction, parameter extraction ## Usage ```python from transformers import Qwen3VLForConditionalGeneration, AutoProcessor from PIL import Image # Load model model = Qwen3VLForConditionalGeneration.from_pretrained( "zenlm/zen-vl-4b-agent", device_map="auto" ) processor = AutoProcessor.from_pretrained("zenlm/zen-vl-4b-agent") # Process image image = Image.open("example.jpg") prompt = "What's in this image?" messages = [{"role": "user", "content": prompt}] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=text, images=image, return_tensors="pt").to(model.device) # Generate outputs = model.generate(**inputs, max_new_tokens=256) response = processor.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Links - 🌐 **Website**: [zenlm.org](https://zenlm.org) - 📚 **GitHub**: [zenlm/zen-vl](https://github.com/zenlm/zen-vl) - 📄 **Paper**: Coming soon - 🤗 **Model Family**: [zenlm](https://huggingface.co/zenlm) ## Citation ```bibtex @misc{zenvl2025, title={Zen VL: Vision-Language Models with Integrated Function Calling}, author={Hanzo AI Team}, year={2025}, publisher={Zen Language Models}, url={https://github.com/zenlm/zen-vl} } ``` ## License Apache 2.0 --- Created by [Hanzo AI](https://hanzo.ai) for the Zen model family.