--- title: SmolVLM2 On Llama.cpp emoji: 💻 colorFrom: gray colorTo: gray sdk: gradio sdk_version: 5.33.2 app_file: app.py pinned: false license: mit short_description: SmolVLM2 on llama.cpp --- # SmolVLM2 Live Inference Demo This HuggingFace Spaces demo runs SmolVLM2 2.2B, 500M, or 256M Instruct GGUF models on CPU using `llama-cpp-python` (v0.3.9) which builds `llama.cpp` under the hood, and Gradio v5.33.2 for the UI. It captures frames from your webcam every N milliseconds and performs live inference, displaying the model's response in real time. ## Setup 1. **Clone this repository** ```bash git clone cd ``` 2. **Install dependencies** ```bash pip install -r requirements.txt ``` 3. **Add your GGUF models** Create a `models/` directory in the root of the repo and upload your `.gguf` files: ```bash mkdir models # then upload: # - smolvlm2-2.2B-instruct.gguf # - smolvlm2-500M-instruct.gguf # - smolvlm2-256M-instruct.gguf ``` ## Usage - **Select Model**: Choose one of the `.gguf` files you uploaded. - **System Prompt**: Customize the system-level instructions for the model. - **User Prompt**: Provide the user query or instruction. - **Interval (ms)**: Set how often (in milliseconds) to capture a frame and run inference. - **Live Camera Feed**: The demo will start your webcam and capture frames at the specified interval. - **Model Output**: See the model’s response below the camera feed. ## Notes - This demo runs entirely on CPU. Inference speed depends on the model size and your machine's CPU performance. - Make sure your browser has permission to access your webcam.