Whisper-WebUI
A Gradio-based browser interface for Whisper. You can use it as an Easy Subtitle Generator!

Notebook
If you wish to try this on Colab, you can do it in here!
Feature
- Generate subtitles from various sources, including :
- Currently supported subtitle formats :
- SRT
- WebVTT
- txt ( only text file without timeline )
- Speech to Text Translation
- From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
- Text to Text Translation
- Translate subtitle files using Facebook NLLB models
- Translate subtitle files using DeepL API
Installation and Running
Prerequisite
To run Whisper, you need to have git
, python
version 3.8 ~ 3.10 and FFmpeg
.
Please follow the links below to install the necessary software:
After installing FFmpeg, make sure to add the FFmpeg/bin
folder to your system PATH!
Automatic Installation
If you have satisfied the prerequisites listed above, you are now ready to start Whisper-WebUI.
- Run
Install.bat
from Windows Explorer as a regular, non-administrator user.
- After installation, run the
start-webui.bat
. (It will automatically download the model if it is not already installed.)
- Open your web browser and go to
http://localhost:7860
( If you're running another Web-UI, it will be hosted on a different port , such as localhost:7861
, localhost:7862
, and so on )
And you can also run the project with command line arguments if you like by running user-start-webui.bat
, see wiki for a guide to arguments.
VRAM Usages
This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.
According to faster-whisper, the efficiency of the optimized whisper model is as follows:
Implementation |
Precision |
Beam size |
Time |
Max. GPU memory |
Max. CPU memory |
openai/whisper |
fp16 |
5 |
4m30s |
11325MB |
9439MB |
faster-whisper |
fp16 |
5 |
54s |
4755MB |
3244MB |
If you want to use the original Open AI whisper implementation instead of optimized whisper, you can set the command line argument DISABLE_FASTER_WHISPER
to True
. See the wiki for more information.
Available models
This is Whisper's original VRAM usage table for models.
Size |
Parameters |
English-only model |
Multilingual model |
Required VRAM |
Relative speed |
tiny |
39 M |
tiny.en |
tiny |
~1 GB |
~32x |
base |
74 M |
base.en |
base |
~1 GB |
~16x |
small |
244 M |
small.en |
small |
~2 GB |
~6x |
medium |
769 M |
medium.en |
medium |
~5 GB |
~2x |
large |
1550 M |
N/A |
large |
~10 GB |
1x |
.en
models are for English only, and the cool thing is that you can use the Translate to English
option from the "large" models!