Commit Graph

25 Commits (3f7436e8a09611931709b29f5c507245c8c1d7a4)

Author SHA1 Message Date
Georgi Gerganov e410cfc3ce
ggml : sync latest ggml repo
- new Q4 and Q8 quantization
- updated CUDA
2023-05-20 18:56:30 +03:00
Georgi Gerganov e693074aa6
ggml : sync latest ggml
- New Q4 and Q5 formats
- Various improvements
2023-05-14 18:04:23 +03:00
Georgi Gerganov 0bcb64b184
ggml : sync ggml (clBLAST + tensor names) 2023-05-02 21:24:18 +03:00
Georgi Gerganov 794b162a46
whisper : add integer quantization support (#540)
* whisper : add integer quantization support

* examples : add common-ggml + prepare to add "quantize" tool

* whisper : quantization tool ready

* whisper : fix F32 support

* whisper : try to fix shared lib linkage

* wasm : update quantized models to Q5

* bench.wasm : remove "medium" button

* bench.wasm : fix custom model button

* ggml : add Q5_0 and Q5_1 WASM SIMD

* wasm : add quantized models to all WASM examples

* wasm : bump DB version number to 2

* talk-llama : update example to latest llama.cpp

* node : increase test timeout to 10s

* readme : add information for model quantization

* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov 05c3ea3bc8
ggml : sync with ggml repo (warning fixes + asserts) 2023-04-29 19:33:28 +03:00
Georgi Gerganov acec73ab6e
ggml : sync latest ggml + llama.cpp updates (quantization) 2023-04-29 12:32:28 +03:00
Georgi Gerganov 677ad754a0
ggml : sync latest ggml 2023-04-14 19:20:39 +03:00
Georgi Gerganov 2f889132c6
ggml : sync latest changes from ggml and llama.cpp 2023-04-13 18:53:44 +03:00
Georgi Gerganov 69b8503935
ggml : backport llama.cpp updates (close #709)
- About x2 overall performance improvement on Apple Silicon
- Results should now be the same for different number of threads (not
  tested)
2023-04-10 22:28:54 +03:00
Georgi Gerganov 4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp (#664)
* talk-llama : talk with LLaMA AI

* talk.llama : disable EOS token

* talk-llama : add README instructions

* ggml : fix build in debug
2023-03-27 21:00:32 +03:00
Georgi Gerganov f3ee4a9673
whisper : reduce memory usage during inference (#431)
* ggml : add "scratch" buffer support

* ggml : support for scratch ring-buffer

* ggml : bug fix in ggml_repeat()

* ggml : error on scratch buffer overflow

* whisper : use scratch buffers during inference (base model only)

* whisper : update memory usage for all models

* whisper : fix encoder memory usage

* whisper : use whisper_context functions instead of macros

* whisper : fix FF + remove it from README

* ggml : reuse ggml_new_i32

* ggml : refactor the scratch buffer storage

* whisper : reorder scratch buffers in the decoder

* main : add option to disable temp fallback

* Update README.md
2023-02-04 09:45:52 +02:00
Abitofevrything a62170c656
ggml : add SSE3 and fp16 conversion lookup table (#368)
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons 1944e7c33e whisper : document POWER VSX support 2023-01-05 23:53:00 +02:00
Georgi Gerganov ac521a566e
ggml : simplify the SIMD code (#324)
* ggml : simplify the SIMD code

* ggml : generic reduce for all register sizes + comments
2022-12-24 10:22:28 +02:00
Kevin Brothaler e1432dd91a Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a.
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
2022-12-22 16:47:54 +02:00
Georgi Gerganov 0f11759406
ggml : make more compatible with c99 (#262) 2022-12-16 18:00:12 +02:00
Georgi Gerganov f8ec718b76
ggml : add F16C CPU flag check 2022-12-06 21:56:56 +02:00
katsu560 83456076f0 add AVX support 2022-11-23 22:16:33 +02:00
Georgi Gerganov 3500ce8727
ref #40 : start working on the documentation 2022-11-09 21:41:40 +02:00
Georgi Gerganov 0b2dc3c82c parallel : working 2022-10-29 19:37:19 +03:00
Georgi Gerganov 34bb3ab0cf ggml : add system info functions 2022-10-25 20:53:48 +03:00
Borislav Stanimirov 0b45d25151 Building with MSVC 2022-10-11 21:40:46 +03:00
Georgi Gerganov 167324584b wip : rpi4 support 2022-10-05 23:03:46 +03:00
Georgi Gerganov f888c2373d
Flash + language support (ref #2)
- Achieved big performance improvement + memory usage reduction
- Can now translate / transcribe different languages
2022-09-28 21:07:32 +03:00
Georgi Gerganov b0a11594ae
Initial release 2022-09-25 22:13:49 +03:00