Awesome OV Models
Collection
Some of our favorite recent OV Models
•
6 items
•
Updated
This is the Qwen/Qwen2.5-VL-3B-Instruct model, converted to OpenVINO, with int4 weights for the language model, int8 weights for the other models. The INT4 weights are compressed with symmetric, channel-wise quantization, with AWQ and scale estimation. The model works on CPU, GPU and NPU. See below for the model export command/properties.
** This is subject to Qwen Research License **
To download the model, run pip install huggingface-hub[cli] and then:
huggingface-cli download llmware/Qwen2.5-VL-3B-Instruct-ov-int4-npu --local-dir Qwen2.5-VL-3B-Instruct-ov-int4-npu
Use OpenVINO GenAI to run inference on this model. This model works with OpenVINO GenAI 2025.3 and later. For NPU inference, make sure to use the latest NPU driver (Windows, Linux)
pip install --upgrade openvino-genai pillow
curl -O "https://storage.openvinotoolkit.org/test_data/images/dog.jpg"import numpy as np
import openvino as ov
import openvino_genai
from PIL import Image
# Choose GPU instead of NPU to run the model on Intel integrated or discrete GPU, or CPU to run on CPU.
# CACHE_DIR caches the model the first time, so subsequent model loading will be faster
pipeline_config = {"CACHE_DIR": "model_cache"}
pipe = openvino_genai.VLMPipeline("Qwen2.5-VL-3B-Instruct-ov-int4-npu", "NPU", **pipeline_config)
image = Image.open("dog.jpg")
# optional: resizing to a smaller size (depending on image and prompt) is often useful to speed up inference.
image = image.resize((128, 128))
image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.uint8)
image_data = ov.Tensor(image_data)
prompt = "Can you describe the image?"
result = pipe.generate(prompt, image=image_data, max_new_tokens=100)
print(result.texts[0])
Model export command:
optimum-cli export openvino -m Qwen/Qwen2.5-VL-3B-Instruct --weight-format int4 --group-size -1 --sym --awq --scale-estimation --dataset contextual Qwen2.5-VL-3B-Instruct-ov-int4-
npu
openvino_version : 2025.3.0-19807-44526285f24-releases/2025/3
nncf_version : 2.17.0
optimum_intel_version : 1.26.0.dev0+0e2ccef
optimum_version : 1.27.0
pytorch_version : 2.7.1
transformers_version : 4.51.3
all_layers : False
awq : True
backup_mode : int8_asym
compression_format : dequantize
gptq : False
group_size : -1
ignored_scope : []
lora_correction : False
mode : int4_sym
ratio : 1.0
scale_estimation : True
sensitivity_metric : max_activation_variance
Base model
Qwen/Qwen2.5-VL-3B-Instruct