Luigi's picture
Update README.md
427620d verified
---
license: mit
title: SmolVLM2 Real-Time Captioning Demo
sdk: gradio
colorFrom: green
colorTo: blue
short_description: Real-time webcam captioning with SmolVLM2 on llama.cpp
sdk_version: 5.34.1
---
# SmolVLM2 Real-Time Captioning Demo
This Hugging Face Spaces app uses **Gradio v5 Blocks** to capture your webcam feed every *N* milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below each frame.
## Features
* **CPU-only inference** via `llama-cpp-python` wrapping `llama.cpp`.
* **Gradio live streaming** for low-latency, browser-native video input.
* **Adjustable interval slider** (100 ms to 10 s) for frame capture frequency.
* **Automatic GGUF model download** from Hugging Face Hub when missing.
* **Debug logging** in the terminal for tracing each inference step.
## Setup
1. **Clone this repository**
```bash
git clone <your-space-repo-url>
cd <your-space-repo-name>
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **(Optional) Pre-download model files**
These will be automatically downloaded if absent:
* `SmolVLM2-500M-Video-Instruct.Q8_0.gguf`
* `mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf`
To skip downloads, place both GGUF files in the repo root.
## Usage
1. **Launch the app**:
```bash
python app.py
```
2. **Open your browser** at the URL shown in the terminal (e.g. `http://127.0.0.1:7860`).
3. **Allow webcam access** when prompted.
4. **Adjust the capture interval** using the slider in the UI.
5. **Live captions** will appear below each video frame.
## File Structure
* `app.py` β€” Main Gradio v5 Blocks application.
* `requirements.txt` β€” Python dependencies.
* `.gguf` model files (auto-downloaded or user-provided).
## License