---
title: SmolVLM2 On Llama.cpp
emoji: 💻
colorFrom: gray
colorTo: gray
sdk: streamlit
app_file: app.py
pinned: false
license: mit
short_description: SmolVLM2 on llama.cpp
sdk_version: 1.45.1
---

# SmolVLM2 Real‑Time Captioning Demo

This Hugging Face Spaces app uses **Streamlit** + **WebRTC** to capture your webcam feed every *N* milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below the video.

## Features

* **CPU‑only inference** via `llama-cpp-python` wrapping `llama.cpp`.
* **WebRTC camera input** for low‑latency, browser‑native video streaming.
* **Adjustable interval slider** (100 ms to 10 s) for capture frequency.
* **Automatic GGUF model download** from Hugging Face Hub when missing.
* **Debug logging** in the terminal for tracing inference steps.

## Setup

1. **Clone this repository**

   ```bash
   git clone <your-space-repo-url>
   cd <your-space-repo-name>
   ```

2. **Install dependencies**

   ```bash
   pip install -r requirements.txt
   ```

3. **(Optional) Pre‑download model files**
   The app automatically downloads these files if they are not present:

   * `SmolVLM2-500M-Video-Instruct.Q4_K_M.gguf`
   * `mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf`

   To skip download, manually place them in the repo root.

## Usage

1. **Launch the app**:

   ```bash
   streamlit run app.py
   ```

2. **Open your browser** at the URL shown (e.g. `http://localhost:8501`).

3. **Allow webcam access** when prompted by the browser.

4. **Adjust the capture interval** using the slider.

5. **Click **Start** to begin streaming and captioning.**

6. **View live captions** in the panel below the video.

## File Structure

* `app.py` — Main Streamlit + WebRTC application.
* `requirements.txt` — Python dependencies.
* `.gguf` model files (auto‑downloaded or user‑provided).

## License

Licensed under the MIT License.