Spaces:

Luigi
/

SmolVLM2-on-llama.cpp

Running

App Files Files Community

SmolVLM2-on-llama.cpp / README.md

Luigi

Update README.md

427620d verified 12 days ago

preview code

raw

history blame contribute delete

1.79 kB

	---
	license: mit
	title: SmolVLM2 Real-Time Captioning Demo
	sdk: gradio
	colorFrom: green
	colorTo: blue
	short_description: Real-time webcam captioning with SmolVLM2 on llama.cpp
	sdk_version: 5.34.1
	---

	# SmolVLM2 Real-Time Captioning Demo

	This Hugging Face Spaces app uses Gradio v5 Blocks to capture your webcam feed every N milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below each frame.

	## Features

	* CPU-only inference via `llama-cpp-python` wrapping `llama.cpp`.
	* Gradio live streaming for low-latency, browser-native video input.
	* Adjustable interval slider (100 ms to 10 s) for frame capture frequency.
	* Automatic GGUF model download from Hugging Face Hub when missing.
	* Debug logging in the terminal for tracing each inference step.

	## Setup

	1. Clone this repository

	```bash
	git clone <your-space-repo-url>
	cd <your-space-repo-name>
	```

	2. Install dependencies

	```bash
	pip install -r requirements.txt
	```

	3. (Optional) Pre-download model files
	These will be automatically downloaded if absent:

	* `SmolVLM2-500M-Video-Instruct.Q8_0.gguf`
	* `mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf`

	To skip downloads, place both GGUF files in the repo root.

	## Usage

	1. Launch the app:

	```bash
	python app.py
	```

	2. Open your browser at the URL shown in the terminal (e.g. `http://127.0.0.1:7860`).

	3. Allow webcam access when prompted.

	4. Adjust the capture interval using the slider in the UI.

	5. Live captions will appear below each video frame.

	## File Structure

	* `app.py` — Main Gradio v5 Blocks application.
	* `requirements.txt` — Python dependencies.
	* `.gguf` model files (auto-downloaded or user-provided).

	## License