--- title: SmolVLM2 On Llama.cpp emoji: 💻 colorFrom: gray colorTo: gray sdk: streamlit app_file: app.py pinned: false license: mit short_description: SmolVLM2 on llama.cpp sdk_version: 1.45.1 --- # SmolVLM2 Real‑Time Captioning Demo This Hugging Face Spaces app uses **Streamlit** + **WebRTC** to capture your webcam feed every *N* milliseconds and run it through the SmolVLM2 model on your CPU, displaying live captions below the video. ## Features * **CPU‑only inference** via `llama-cpp-python` wrapping `llama.cpp`. * **WebRTC camera input** for low‑latency, browser‑native video streaming. * **Adjustable interval slider** (100 ms to 10 s) for capture frequency. * **Automatic GGUF model download** from Hugging Face Hub when missing. * **Debug logging** in the terminal for tracing inference steps. ## Setup 1. **Clone this repository** ```bash git clone cd ``` 2. **Install dependencies** ```bash pip install -r requirements.txt ``` 3. **(Optional) Pre‑download model files** The app automatically downloads these files if they are not present: * `SmolVLM2-500M-Video-Instruct.Q4_K_M.gguf` * `mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf` To skip download, manually place them in the repo root. ## Usage 1. **Launch the app**: ```bash streamlit run app.py ``` 2. **Open your browser** at the URL shown (e.g. `http://localhost:8501`). 3. **Allow webcam access** when prompted by the browser. 4. **Adjust the capture interval** using the slider. 5. **Click **Start** to begin streaming and captioning.** 6. **View live captions** in the panel below the video. ## File Structure * `app.py` — Main Streamlit + WebRTC application. * `requirements.txt` — Python dependencies. * `.gguf` model files (auto‑downloaded or user‑provided). ## License Licensed under the MIT License.