update README.md

2024-01-14 11:39:21 +08:00 · 2024-01-14 11:39:21 +08:00 · 6d0d5ec05f
parent f0b707e2c3
commit 6d0d5ec05f
1 changed files with 16 additions and 5 deletions
--- a/README.md
+++ b/README.md
@ -126,7 +126,7 @@ cmake .. -DSD_METAL=ON
 cmake --build . --config Release
 ```

-### Using Flash Attention
+##### Using Flash Attention

 Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUBLAS is enabled because the kernel implementation is missing.

@ -142,7 +142,7 @@ usage: ./bin/sd [arguments]

 arguments:
  -h, --help                         show this help message and exit
-  -M, --mode [txt2img or img2img]    generation mode (default: txt2img)
+  -M, --mode [MODEL]                 run mode (txt2img or img2img or convert, default: txt2img)
  -t, --threads N                    number of threads to use during computation (default: -1).
                                     If threads <= 0, then threads will be set to the number of CPU physical cores
  -m, --model [MODEL]                path to model
@ -168,7 +168,8 @@ arguments:
  -s SEED, --seed SEED               RNG seed (default: 42, use random seed for < 0)
  -b, --batch-count COUNT            number of images to generate.
  --schedule {discrete, karras}      Denoiser sigma schedule (default: discrete)
-  --clip-skip N                      number of layers to skip of clip model (default: 0)
+  --clip-skip N                      ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1)
+                                     <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x
  --vae-tiling                       process vae in tiles to reduce memory usage
  -v, --verbose                      print extra info
 ```
@ -183,6 +184,16 @@ You can specify the model weight type using the `--type` parameter. The weights
 - `q5_0` or `q5_1` for 5-bit integer quantization
 - `q4_0` or `q4_1` for 4-bit integer quantization

+#### Convert to GGUF
+
+You can also convert weights in the formats `ckpt/safetensors/diffusers` to gguf and perform quantization in advance, avoiding the need for quantization every time you load them.
+
+For example:
+
+```sh
+./bin/sd -M convert -m ../models/v1-5-pruned-emaonly.safetensors -o  ../models/v1-5-pruned-emaonly.q8_0.gguf -v --type q8_0
+```
+
 #### txt2img example

 ```sh
@ -240,7 +251,7 @@ Here's a simple example:
 | ----  |----    |
 | ![](./assets/without_lcm.png) |![](./assets/with_lcm.png)  |

-## Using TAESD to faster decoding
+#### Using TAESD to faster decoding

 You can use TAESD to accelerate the decoding of latent images by following these steps:

@ -258,7 +269,7 @@ curl -L -O https://huggingface.co/madebyollin/taesd/blob/main/diffusion_pytorch_
 sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat" --taesd ../models/diffusion_pytorch_model.safetensors
 ```

-## Using ESRGAN to upscale results
+#### Using ESRGAN to upscale results

 You can use ESRGAN to upscale the generated images. At the moment, only the [RealESRGAN_x4plus_anime_6B.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth) model is supported. Support for more models of this architecture will be added soon.