Luigi's picture
update readme
4d88fce
|
raw
history blame
1.92 kB
metadata
title: SmolVLM2 On Llama.cpp
emoji: 💻
colorFrom: gray
colorTo: gray
sdk: streamlit
app_file: app.py
pinned: false
license: mit
short_description: SmolVLM2 on llama.cpp
sdk_version: 1.45.1

SmolVLM2 Real‑Time Captioning Demo

This Hugging Face Spaces app uses Streamlit + WebRTC to capture your webcam feed every N milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below the video.

Features

  • CPU‑only inference via llama-cpp-python wrapping llama.cpp.
  • WebRTC camera input for low‑latency, browser‑native video streaming.
  • Adjustable interval slider (100 ms to 10 s) for capture frequency.
  • Automatic GGUF model download from Hugging Face Hub when missing.
  • Debug logging in the terminal for tracing inference steps.

Setup

  1. Clone this repository

    git clone <your-space-repo-url>
    cd <your-space-repo-name>
    
  2. Install dependencies

    pip install -r requirements.txt
    
  3. (Optional) Pre‑download model files The app automatically downloads these files if they are not present:

    • SmolVLM2-500M-Video-Instruct.Q4_K_M.gguf
    • mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf

    To skip download, manually place them in the repo root.

Usage

  1. Launch the app:

    streamlit run app.py
    
  2. Open your browser at the URL shown (e.g. http://localhost:8501).

  3. Allow webcam access when prompted by the browser.

  4. Adjust the capture interval using the slider.

  5. Click Start to begin streaming and captioning.

  6. View live captions in the panel below the video.

File Structure

  • app.py — Main Streamlit + WebRTC application.
  • requirements.txt — Python dependencies.
  • .gguf model files (auto‑downloaded or user‑provided).

License

Licensed under the MIT License.