File size: 1,793 Bytes
62ed9f9
5234d12
970f416
427620d
5234d12
 
 
c725424
5234d12
ca97f63
970f416
3e01b8a
970f416
4d88fce
 
 
970f416
 
 
4d88fce
970f416
ca97f63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
970f416
 
4d88fce
970f416
4d88fce
 
970f416
4d88fce
 
ca97f63
4d88fce
ca97f63
 
970f416
ca97f63
 
970f416
4d88fce
970f416
4d88fce
970f416
4d88fce
970f416
4d88fce
 
ca97f63
970f416
4d88fce
970f416
ca97f63
5234d12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: mit
title: SmolVLM2 Real-Time Captioning Demo
sdk: gradio
colorFrom: green
colorTo: blue
short_description: Real-time webcam captioning with SmolVLM2 on llama.cpp
sdk_version: 5.34.1
---

# SmolVLM2 Real-Time Captioning Demo

This Hugging Face Spaces app uses **Gradio v5 Blocks** to capture your webcam feed every *N* milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below each frame.

## Features

* **CPU-only inference** via `llama-cpp-python` wrapping `llama.cpp`.
* **Gradio live streaming** for low-latency, browser-native video input.
* **Adjustable interval slider** (100 ms to 10 s) for frame capture frequency.
* **Automatic GGUF model download** from Hugging Face Hub when missing.
* **Debug logging** in the terminal for tracing each inference step.

## Setup

1. **Clone this repository**

   ```bash
   git clone <your-space-repo-url>
   cd <your-space-repo-name>
   ```

2. **Install dependencies**

   ```bash
   pip install -r requirements.txt
   ```

3. **(Optional) Pre-download model files**
   These will be automatically downloaded if absent:

   * `SmolVLM2-500M-Video-Instruct.Q8_0.gguf`
   * `mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf`

   To skip downloads, place both GGUF files in the repo root.

## Usage

1. **Launch the app**:

   ```bash
   python app.py
   ```

2. **Open your browser** at the URL shown in the terminal (e.g. `http://127.0.0.1:7860`).

3. **Allow webcam access** when prompted.

4. **Adjust the capture interval** using the slider in the UI.

5. **Live captions** will appear below each video frame.

## File Structure

* `app.py` — Main Gradio v5 Blocks application.
* `requirements.txt` — Python dependencies.
* `.gguf` model files (auto-downloaded or user-provided).

## License