Luigi's picture
initial commit
ca97f63
|
raw
history blame
1.7 kB
metadata
title: SmolVLM2 On Llama.cpp
emoji: 💻
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 5.33.2
app_file: app.py
pinned: false
license: mit
short_description: SmolVLM2 on llama.cpp

SmolVLM2 Live Inference Demo

This HuggingFace Spaces demo runs SmolVLM2 2.2B, 500M, or 256M Instruct GGUF models on CPU using llama-cpp-python (v0.3.9) which builds llama.cpp under the hood, and Gradio v5.33.2 for the UI. It captures frames from your webcam every N milliseconds and performs live inference, displaying the model's response in real time.

Setup

  1. Clone this repository

    git clone <your-space-repo-url>
    cd <your-space-repo-name>
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. Add your GGUF models

    Create a models/ directory in the root of the repo and upload your .gguf files:

    mkdir models
    # then upload:
    # - smolvlm2-2.2B-instruct.gguf
    # - smolvlm2-500M-instruct.gguf
    # - smolvlm2-256M-instruct.gguf
    

Usage

  • Select Model: Choose one of the .gguf files you uploaded.
  • System Prompt: Customize the system-level instructions for the model.
  • User Prompt: Provide the user query or instruction.
  • Interval (ms): Set how often (in milliseconds) to capture a frame and run inference.
  • Live Camera Feed: The demo will start your webcam and capture frames at the specified interval.
  • Model Output: See the model’s response below the camera feed.

Notes

  • This demo runs entirely on CPU. Inference speed depends on the model size and your machine's CPU performance.
  • Make sure your browser has permission to access your webcam.