Spaces:
Running
Running
metadata
title: SmolVLM2 On Llama.cpp
emoji: 💻
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 5.33.2
app_file: app.py
pinned: false
license: mit
short_description: SmolVLM2 on llama.cpp
SmolVLM2 Live Inference Demo
This HuggingFace Spaces demo runs SmolVLM2 2.2B, 500M, or 256M Instruct GGUF models on CPU using llama-cpp-python
(v0.3.9) which builds llama.cpp
under the hood, and Gradio v5.33.2 for the UI. It captures frames from your webcam every N milliseconds and performs live inference, displaying the model's response in real time.
Setup
Clone this repository
git clone <your-space-repo-url> cd <your-space-repo-name>
Install dependencies
pip install -r requirements.txt
Add your GGUF models
Create a
models/
directory in the root of the repo and upload your.gguf
files:mkdir models # then upload: # - smolvlm2-2.2B-instruct.gguf # - smolvlm2-500M-instruct.gguf # - smolvlm2-256M-instruct.gguf
Usage
- Select Model: Choose one of the
.gguf
files you uploaded. - System Prompt: Customize the system-level instructions for the model.
- User Prompt: Provide the user query or instruction.
- Interval (ms): Set how often (in milliseconds) to capture a frame and run inference.
- Live Camera Feed: The demo will start your webcam and capture frames at the specified interval.
- Model Output: See the model’s response below the camera feed.
Notes
- This demo runs entirely on CPU. Inference speed depends on the model size and your machine's CPU performance.
- Make sure your browser has permission to access your webcam.