Add quantize script for batch quantization (#92)

* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
Pavol Rusnak 2023-03-13 17:15:20 +01:00 committed by GitHub
parent 1808ee0500
commit d1f224712d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 18 additions and 31 deletions

View file

@ -145,44 +145,16 @@ python3 -m pip install torch numpy sentencepiece
python3 convert-pth-to-ggml.py models/7B/ 1 python3 convert-pth-to-ggml.py models/7B/ 1
# quantize the model to 4-bits # quantize the model to 4-bits
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2 ./quantize.sh 7B
# run the inference # run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128 ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
``` ```
For the bigger models, there are a few extra quantization steps. For example, for LLaMA-13B, converting to FP16 format
will create 2 ggml files, instead of one:
```bash
ggml-model-f16.bin
ggml-model-f16.bin.1
```
You need to quantize each of them separately like this:
```bash
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
```
Everything else is the same. Simply run:
```bash
./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128
```
The number of files generated for each model is as follows:
```
7B -> 1 file
13B -> 2 files
30B -> 4 files
65B -> 8 files
```
When running the larger models, make sure you have enough disk space to store all the intermediate files. When running the larger models, make sure you have enough disk space to store all the intermediate files.
TODO: add model disk/mem requirements
### Interactive mode ### Interactive mode
If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter. If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.

15
quantize.sh Executable file
View file

@ -0,0 +1,15 @@
#!/usr/bin/env bash
if ! [[ "$1" =~ ^[0-9]{1,2}B$ ]]; then
echo
echo "Usage: quantize.sh 7B|13B|30B|65B [--remove-f16]"
echo
exit 1
fi
for i in `ls models/$1/ggml-model-f16.bin*`; do
./quantize "$i" "${i/f16/q4_0}" 2
if [[ "$2" == "--remove-f16" ]]; then
rm "$i"
fi
done