--- pipeline_tag: image-text-to-text tags: - MLX - mlx base_model: - Qwen/Qwen3-VL-4B-Thinking --- # Qwen3-VL-4B-Thinking Run **Qwen3-VL-4B-Thinking** optimized for **Apple Silicon** on MLX with [NexaSDK](https://github.com/NexaAI/nexa-sdk). ## Quickstart 1. **Install [NexaSDK](https://github.com/NexaAI/nexa-sdk)** 2. Run the model locally with one line of code: ```bash nexa infer NexaAI/qwen3vl-4B-Thinking-fp16-mlx ``` ## Model Description **Qwen3-VL-4B-Thinking** is a 4-billion-parameter multimodal large language model from the Qwen team at Alibaba Cloud. Part of the **Qwen3-VL** (Vision-Language) family, it is designed for advanced visual reasoning and chain-of-thought generation across image, text, and video inputs. Compared to the *Instruct* variant, the **Thinking** model emphasizes deeper multi-step reasoning, analysis, and planning. It produces detailed, structured outputs that reflect intermediate reasoning steps, making it well-suited for research, multimodal understanding, and agentic workflows. ## Features - **Vision-Language Understanding**: Processes images, text, and videos for joint reasoning tasks. - **Structured Thinking Mode**: Generates intermediate reasoning traces for better transparency and interpretability. - **High Accuracy on Visual QA**: Performs strongly on visual question answering, chart reasoning, and document analysis benchmarks. - **Multilingual Support**: Understands and responds in multiple languages. - **Optimized for Efficiency**: Delivers strong performance at 4B scale for on-device or edge deployment. ## Use Cases - Multimodal reasoning and visual question answering - Scientific and analytical reasoning tasks involving charts, tables, and documents - Step-by-step visual explanation or tutoring - Research on interpretability and chain-of-thought modeling - Integration into agent systems that require structured reasoning ## Inputs and Outputs **Input:** - Text, images, or combined multimodal prompts (e.g., image + question) **Output:** - Generated text, reasoning traces, or structured responses - May include explicit thought steps or structured JSON reasoning sequences ## License Check the [official Qwen license](https://huggingface.co/Qwen) for terms of use and redistribution.