MLX-Video-OCR-DeepSeek-Apple-Silicon
🎯 One-click Mac deployment · 📹 Video / 📄 PDF / 🖼 Image 3-in-1 OCR · 🖥 Full local GUI
This is a local OCR application optimized for Apple Silicon (M1/M2/M3/M4),
built on top of deepseek-ai/DeepSeek-OCR and the MLX ecosystem. It provides:
- 📹 Video frame extraction + OCR (automatically samples frames from videos, then runs OCR)
- 📄 PDF batch OCR (supports multi-page PDFs, batch mode and single-page mode)
- 🖼 Image OCR (documents, tables, handwriting, scene text)
- 🎨 Image pre-processing (auto-rotation, enhancement, de-shadow, background removal)
- 🖥 Full Web GUI (drag-and-drop upload, progress display, result preview)
- 🍎 One-click Mac deployment (
./start.shautomatically sets up the environment and dependencies)
Weights are not re-uploaded in this repo. Instead, they are automatically downloaded and cached locally via mlx-community/DeepSeek-OCR-8bit.
In the current ecosystem of projects based on deepseek-ai/DeepSeek-OCR, this solution focuses on
Mac Apple Silicon local deployment + unified Video/PDF/Image workflow + a complete GUI,
acting as an application-layer integration rather than “just another weights-only model repo”.
🧮 Precision & Weights (3B + 8bit)
This project does not re-upload any weights, but directly uses:
- Base model:
deepseek-ai/DeepSeek-OCR(around 3B parameters) - MLX quantized version:
mlx-community/DeepSeek-OCR-8bit
In practice, this means:
- 🧠 Model capability: Leverages the original DeepSeek-OCR architecture and performance
- 💾 Storage footprint: 8bit quantization makes it suitable for local Mac environments
- ⚡ Runtime efficiency: Uses MLX + Metal GPU on Apple Silicon for accelerated inference
If you need:
- Maximum precision / research use → Use
deepseek-ai/DeepSeek-OCRdirectly - Practical Mac local tooling → Use this project +
mlx-community/DeepSeek-OCR-8bitto run the full Video/PDF/Image workflow via GUI
🔗 Project & Base Models
- Base model:
deepseek-ai/DeepSeek-OCR - MLX quantized version:
mlx-community/DeepSeek-OCR-8bit - Local application source code (GUI + backend):
matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon
This Hugging Face model card is mainly intended to:
- Document this project as a local GUI / deployment example built on
deepseek-ai/DeepSeek-OCR - Make it easy to discover this Mac GUI solution when searching for
base_model: deepseek-ai/DeepSeek-OCR
✨ Features
🎬 Video OCR
- Automatically extracts key frames from videos (MP4 / AVI / MOV / MKV / WebM)
- Sends all extracted frames to DeepSeek-OCR in batches
- Supports:
- Frame preview
- Batch download of frames
- “Frames → OCR” one-click workflow
📄 PDF OCR (Multi-page Batch)
- Supports multi-page PDF batch processing
- Two modes:
- Batch mode: process the document in batches of N pages
- Single-page mode: precisely select specific pages
- Provides:
- PDF thumbnail preview
- Page selection, progress display, pause/resume/cancel controls
🖼 Image OCR
- Supports common formats: PNG / JPG / JPEG
- Multiple scenarios:
- Documents, tables, academic content
- Handwriting
- Street signs / shop signs / product packaging
- Output formats:
- Markdown
- LaTeX (math formulas)
- Plain text
🎨 Image Pre-processing
Built-in presets (scan optimize, photo enhance, background removal, etc.) including:
- Auto-rotation (deskew)
- Contrast enhancement + sharpening
- Shadow removal
- Binarization
- Background removal (via
rembgor an OpenCV-based fallback pipeline)
Pre-processing can:
- Batch process multiple images
- Package processed results into a ZIP file for download
- Send processed images directly into the OCR pipeline with one click
🖥 GUI Overview
- ✅ Single-page Web GUI (Flask + vanilla JS + Tailwind)
- ✅ Drag-and-drop upload
- ✅ Thumbnails for images / PDFs
- ✅ Batch progress bar and status text
- ✅ Result panel supports:
- One-click copy
- Downloading result files
- ✅ Responsive design: works well on desktops, laptops, and tablets
🍎 Mac One-click Deployment
Requirements
- macOS 13.0+
- Apple Silicon (M1 / M2 / M3 / M4)
- Python 3.11+
- Recommended RAM: ≥ 16GB
One-click Install & Run (Recommended)
# 1. Choose an install directory
cd ~/Downloads # or cd ~ / cd ~/Documents / any location you prefer
# 2. Clone the project
git clone https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon.git
cd MLX-Video-OCR-DeepSeek-Apple-Silicon
# 3. One-click start (creates venv, installs deps, finds a free port)
./start.sh
After startup, open your browser at:
http://localhost:5000(or another port between 5000–5010 if 5000 is taken)
⚙️ Model Download & Caching
Internally, the application does:
os.environ["HF_HOME"] = str(Path.home() / "hf_cache")
model_path = "mlx-community/DeepSeek-OCR-8bit"
_model_instance, _processor_instance = load(model_path)
This means:
- On first use, it downloads
mlx-community/DeepSeek-OCR-8bitfrom Hugging Face - Download location:
~/hf_cache/ - Subsequent runs, even from different project directories, reuse the same local model cache and do not re-download
🔒 Privacy & Local Execution
- All inference (video frame extraction, PDF processing, image OCR) runs entirely on your machine
- No documents or images are uploaded to any external servers
- Weights and cache are stored under your user directory (e.g.
~/hf_cache/)
📦 Use Cases
This project is ideal if you want to run DeepSeek-OCR on Mac + Apple Silicon and:
- Prefer a visual GUI instead of pure scripts
- Want one-click startup without manual environment setup
- Need to handle Video + PDF + Image in a single workflow
- Require all data to remain on-device, with no cloud dependency
🧩 Development & Contributions
Source code and issues:
- GitHub:
matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon - Issues: bug reports and feature requests are welcome
📜 License
This application uses the AGPL-3.0 license.
Please also respect the licenses of:
- This repo (GUI + backend) under AGPL-3.0
deepseek-ai/DeepSeek-OCRandmlx-community/DeepSeek-OCR-8bitas published on Hugging Face
繁體中文說明
🎯 Mac 一鍵部署 · 📹 影片 / 📄 PDF / 🖼 圖片 三合一 OCR · 🖥 完整本地 GUI
這是一個針對 Apple Silicon (M1/M2/M3/M4) 優化的本地 OCR 應用,
基於 deepseek-ai/DeepSeek-OCR 與 MLX 生態,提供:
- 📹 影片截圖 + OCR(從影片自動抽幀再做 OCR)
- 📄 PDF 批次 OCR(支援多頁 PDF、批次/單頁模式)
- 🖼 圖片 OCR(含文件、表格、手寫、場景文字)
- 🎨 照片前處理(自動旋轉、增強、去陰影、去背)
- 🖥 完整 Web GUI(拖放上傳、進度條、結果預覽)
- 🍎 Mac 一鍵部署(
./start.sh自動完成環境與依賴)
權重不在本 repo 中,而是透過
mlx-community/DeepSeek-OCR-8bit自動下載並快取到本機。
在目前以 deepseek-ai/DeepSeek-OCR 為基底的專案中,本方案聚焦於
Mac Apple Silicon 本地部署 + 影片/PDF/圖片 三合一工作流 + 完整 GUI 介面,
屬於偏應用層的整合解決方案,而非單純「只提供模型權重」的 repo。
🧮 精度與權重說明(3B + 8bit)
本專案並 不重新上傳權重,而是直接使用:
- 基底模型:
deepseek-ai/DeepSeek-OCR(約 3B 參數) - MLX 量化版本:
mlx-community/DeepSeek-OCR-8bit
也就是說:
- 🧠 模型能力:沿用 DeepSeek-OCR 的構造與效果
- 💾 儲存體積:使用 8bit 量化,適合 Mac 本地環境
- ⚡ 執行效率:在 Apple Silicon 上搭配 MLX + Metal GPU,加速推理
如果你需要:
- 最高精度 / 研究用途 → 建議直接使用
deepseek-ai/DeepSeek-OCR - 實務應用 / Mac 本地工具 → 建議使用本專案 +
mlx-community/DeepSeek-OCR-8bit,在 GUI 中完成影片/PDF/圖片工作流
🔗 專案與基底模型
- Base model:
deepseek-ai/DeepSeek-OCR - MLX 量化版本:
mlx-community/DeepSeek-OCR-8bit - 本地應用程式原始碼 (GUI + 後端):
matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon
本 Hugging Face model 卡主要用來:
- 說明此專案是基於
deepseek-ai/DeepSeek-OCR的 本地 GUI / 部署範例 - 讓使用者在搜尋
base_model: deepseek-ai/DeepSeek-OCR時,可以找到這個 Mac GUI 解決方案
✨ 功能特色
🎬 影片 OCR(Video OCR)
- 從影片(MP4 / AVI / MOV / MKV / WebM)中 自動抽取關鍵幀
- 以批次方式將所有截圖送入 DeepSeek-OCR 做文字辨識
- 支援:
- 幀預覽
- 批次下載截圖
- 截圖 → 直接送往 OCR 流程
📄 PDF OCR(多頁批次)
- 支援 多頁 PDF 批次處理
- 兩種模式:
- 批次模式:每批 N 頁,一次跑完整份文件
- 單頁模式:精準選擇特定頁面
- 提供:
- PDF 縮圖預覽
- 頁面選擇、進度顯示、暫停/繼續/取消
🖼 圖片 OCR(Image OCR)
- 支援 PNG / JPG / JPEG 等常見圖片格式
- 多場景:
- 文檔、表格、學術內容
- 手寫文字
- 街景 / 招牌 / 產品包裝
- 可輸出:
- Markdown
- LaTeX(數學公式)
- 純文字
🎨 照片前處理
內建多種前處理 preset(掃描優化、照片優化、去背等),包含:
- 自動旋轉(校正傾斜)
- 對比度增強 + 銳化
- 去陰影
- 二值化
- 去背景(rembg 或 fallback OpenCV pipeline)
前處理可以:
- 批次處理多張圖片
- 處理後打包成 ZIP 下載
- 一鍵「送到 OCR」直接進入識別流程
🖥 GUI 介面概覽
- ✅ 單一頁面 Web GUI(Flask + 原生 JS + Tailwind)
- ✅ 拖放上傳區塊(Drag & Drop)
- ✅ 圖片 / PDF 縮圖預覽
- ✅ 批次進度條與文字狀態
- ✅ 結果區支援:
- 一鍵複製
- 下載結果檔案
- ✅ 響應式設計:桌機 / 筆電 / 平板 皆可舒適使用
🍎 Mac 一鍵部署
系統需求
- macOS 13.0+
- Apple Silicon(M1 / M2 / M3 / M4)
- Python 3.11+
- RAM 建議 ≥ 16GB
一鍵安裝與啟動(推薦)
# 1. 選擇安裝目錄
cd ~/Downloads # 或 cd ~ / cd ~/Documents / 任何你習慣放專案的位置
# 2. 克隆專案
git clone https://github.com/matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon.git
cd MLX-Video-OCR-DeepSeek-Apple-Silicon
# 3. 一鍵啟動(自動建立 venv、安裝依賴、尋找可用端口)
./start.sh
啟動成功後,瀏覽器打開:
http://localhost:5000(或自動選擇 5000–5010 之間的可用端口)
⚙️ 模型下載與快取行為
程式內部使用:
os.environ["HF_HOME"] = str(Path.home() / "hf_cache")
model_path = "mlx-community/DeepSeek-OCR-8bit"
_model_instance, _processor_instance = load(model_path)
這代表:
- 第一次使用時,會從 Hugging Face 下載
mlx-community/DeepSeek-OCR-8bit - 下載位置:
~/hf_cache/ - 之後再次啟動或在不同專案目錄執行時,都會共用同一份本地模型快取,不會重複下載
🔒 隱私與本地運行
- 所有推理(影片截圖、PDF 處理、圖片 OCR)皆在本地完成
- 不會將你的文件或圖片上傳到伺服器
- 模型權重與快取均存在你的使用者目錄下(例如:
~/hf_cache/)
📦 適用情境
- 想要在 Mac + Apple Silicon 上跑 DeepSeek-OCR,並且:
- 希望有 可視化 GUI
- 希望 一鍵啟動,不想手動配環境
- 希望同時處理 影片 / PDF / 圖片
- 希望所有資料留在本機,不上雲
🧩 開發與貢獻
原始碼與 issue 請參考 GitHub repo:
- GitHub:
matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon - Issues: 歡迎回報錯誤、功能建議與 PR
📜 License
本應用程式使用 AGPL-3.0 授權。
請同時遵守:
- 本 repo(GUI + backend)的 AGPL-3.0
deepseek-ai/DeepSeek-OCR與mlx-community/DeepSeek-OCR-8bit的授權條款
Model tree for matica0902/MLX-Video-OCR-DeepSeek-Apple-Silicon
Base model
deepseek-ai/DeepSeek-OCR