Instructions to use HuggingFaceTB/SmolVLM-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceTB/SmolVLM-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="HuggingFaceTB/SmolVLM-Base") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-Base") model = AutoModelForImageTextToText.from_pretrained("HuggingFaceTB/SmolVLM-Base") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HuggingFaceTB/SmolVLM-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceTB/SmolVLM-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceTB/SmolVLM-Base", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/HuggingFaceTB/SmolVLM-Base
- SGLang
How to use HuggingFaceTB/SmolVLM-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceTB/SmolVLM-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceTB/SmolVLM-Base", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceTB/SmolVLM-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceTB/SmolVLM-Base", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use HuggingFaceTB/SmolVLM-Base with Docker Model Runner:
docker model run hf.co/HuggingFaceTB/SmolVLM-Base
Not able to calculate the "eval_loss" when passed the validation set in the trainer class
I guess the model doesn't return the loss in the forward pass, thats why not able to calculate the "eval_loss" with a val set during the finetuning process. I used the official finetuning notebook provided to finetune the model.
Here is my implementation I tried with a toy dataset:
training_args = TrainingArguments(
num_train_epochs=4,
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
warmup_steps=50,
learning_rate=1e-4,
weight_decay=0.01,
logging_steps=2,
save_strategy="steps",
save_steps=2,
save_total_limit=1,
optim="paged_adamw_8bit", # for 8-bit, keep this, else adamw_hf
bf16=True, # underlying precision for 8bit
output_dir=f"./{model_name}-vqav2",
report_to="tensorboard",
remove_unused_columns=False,
eval_strategy="steps", # Enables evaluation
eval_steps=2, # Frequency of evaluation
load_best_model_at_end=True, # Load the best model at the end
metric_for_best_model="eval_loss", # Monitor this metric
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=collate_fn,
train_dataset=train_set,
eval_dataset=val_set,
)
trainer.train()
When the eval step is reached it goes to calculate the loss I get the below error:
KeyError: "The metric_for_best_model training argument is set to 'eval_loss', which is not found in the evaluation metrics. The available evaluation metrics are: []. Please ensure that the compute_metrics function returns a dictionary that includes 'eval_loss' or consider changing the metric_for_best_model via the TrainingArguments."
Please help me fix this. Or is it something to do with the model itself.