diff --git a/README.md b/README.md index e74cc37..de942ec 100644 --- a/README.md +++ b/README.md @@ -71,6 +71,8 @@ Then, download one of the Whisper models converted in [ggml format](models). For bash ./models/download-ggml-model.sh base.en ``` +If you wish to convert the Whisper models to ggml format yourself, instructions are in [models/README.md](models/README.md). + Now build the [main](examples/main) example and transcribe an audio file like this: ```bash diff --git a/models/README.md b/models/README.md index ab0dde7..c62f036 100644 --- a/models/README.md +++ b/models/README.md @@ -1,15 +1,17 @@ ## Whisper model files in custom ggml format The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L27) -have been converted to custom `ggml` format in order to be able to load them in C/C++. The conversion has been performed -using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script. You can either obtain the original models and generate -the `ggml` files yourself using the conversion script, or you can use the [download-ggml-model.sh](download-ggml-model.sh) -script to download the already converted models. Currently, they are hosted on the following locations: +are converted to custom `ggml` format in order to be able to load them in C/C++. +Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script. + +You can either obtain the original models and generate the `ggml` files yourself using the conversion script, +or you can use the [download-ggml-model.sh](download-ggml-model.sh) script to download the already converted models. +Currently, they are hosted on the following locations: - https://huggingface.co/ggerganov/whisper.cpp - https://ggml.ggerganov.com -Sample usage: +Sample download: ```java $ ./download-ggml-model.sh base.en @@ -21,6 +23,16 @@ You can now use it like this: $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav ``` +To convert the files yourself, use the convert-pt-to-ggml.py script. Here is an example usage. +The original PyTorch files are assumed to have been downloaded into ~/.cache/whisper +Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source: +``` +mkdir models/whisper-medium +python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium +mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin +rmdir models/whisper-medium +``` + A third option to obtain the model files is to download them from Hugging Face: https://huggingface.co/ggerganov/whisper.cpp/tree/main diff --git a/models/download-ggml-model.sh b/models/download-ggml-model.sh index 749b409..e5c59a7 100755 --- a/models/download-ggml-model.sh +++ b/models/download-ggml-model.sh @@ -62,7 +62,7 @@ if [ -f "ggml-$model.bin" ]; then fi if [ -x "$(command -v wget)" ]; then - wget --quiet --show-progress -O ggml-$model.bin $src/$pfx-$model.bin + wget --no-config --quiet --show-progress -O ggml-$model.bin $src/$pfx-$model.bin elif [ -x "$(command -v curl)" ]; then curl -L --output ggml-$model.bin $src/$pfx-$model.bin else