Commit graph

142 commits

Author SHA1 Message Date
Georgi Gerganov 429b9785c0
ggml : update WASM SIMD 2023-05-20 20:00:06 +03:00
Georgi Gerganov e410cfc3ce
ggml : sync latest ggml repo
- new Q4 and Q8 quantization
- updated CUDA
2023-05-20 18:56:30 +03:00
Georgi Gerganov aaf0d41c7c
ggml : add AVX dot products 2023-05-14 18:56:46 +03:00
Georgi Gerganov e693074aa6
ggml : sync latest ggml
- New Q4 and Q5 formats
- Various improvements
2023-05-14 18:04:23 +03:00
Georgi Gerganov 5974c8facd ggml : fix 32-bit ARM build + quantization 2023-05-02 21:52:26 +03:00
Georgi Gerganov 0bcb64b184
ggml : sync ggml (clBLAST + tensor names) 2023-05-02 21:24:18 +03:00
Georgi Gerganov feac80dd3f
ggml : fix UB (int << 31) 2023-04-30 22:27:30 +03:00
Georgi Gerganov 794b162a46
whisper : add integer quantization support (#540)
* whisper : add integer quantization support

* examples : add common-ggml + prepare to add "quantize" tool

* whisper : quantization tool ready

* whisper : fix F32 support

* whisper : try to fix shared lib linkage

* wasm : update quantized models to Q5

* bench.wasm : remove "medium" button

* bench.wasm : fix custom model button

* ggml : add Q5_0 and Q5_1 WASM SIMD

* wasm : add quantized models to all WASM examples

* wasm : bump DB version number to 2

* talk-llama : update example to latest llama.cpp

* node : increase test timeout to 10s

* readme : add information for model quantization

* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov 0ccd6746c9
ggml : fix WASM build 2023-04-29 21:37:23 +03:00
Georgi Gerganov d9b550c0a1
ggml : fix 32-bit ARM NEON (#836)
* ggml : add support for 32-bit ARM

* ggml : fix

* ggml : fix
2023-04-29 21:33:33 +03:00
Georgi Gerganov e9b091c92a
ggml : use vzip instead of vuzp for consistency 2023-04-29 21:14:09 +03:00
Georgi Gerganov 1f30b99208
ggml : fix WASM build 2023-04-29 20:21:25 +03:00
Georgi Gerganov 05c3ea3bc8
ggml : sync with ggml repo (warning fixes + asserts) 2023-04-29 19:33:28 +03:00
Georgi Gerganov acec73ab6e
ggml : sync latest ggml + llama.cpp updates (quantization) 2023-04-29 12:32:28 +03:00
Jhen-Jie Hong ea1f8a50d4
ggml, ci : fix build on whisper.android (ARM_NEON) + add CI (#764)
* ggml : fix undefined symbol by remove inline handle

* ggml : make own ggml_aligned_malloc function

* ci: add ios/android build
2023-04-15 14:21:58 +03:00
Georgi Gerganov 677ad754a0
ggml : sync latest ggml 2023-04-14 19:20:39 +03:00
novag 463e46338c
ggml : fix q4_1 dot product types (#759)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-14 13:34:20 +03:00
Georgi Gerganov 2f889132c6
ggml : sync latest changes from ggml and llama.cpp 2023-04-13 18:53:44 +03:00
Georgi Gerganov ebef1e8620
ggml : fix WASM build 2023-04-10 23:18:29 +03:00
Georgi Gerganov 69b8503935
ggml : backport llama.cpp updates (close #709)
- About x2 overall performance improvement on Apple Silicon
- Results should now be the same for different number of threads (not
  tested)
2023-04-10 22:28:54 +03:00
Georgi Gerganov 4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp (#664)
* talk-llama : talk with LLaMA AI

* talk.llama : disable EOS token

* talk-llama : add README instructions

* ggml : fix build in debug
2023-03-27 21:00:32 +03:00
Georgi Gerganov f3ee4a9673
whisper : reduce memory usage during inference (#431)
* ggml : add "scratch" buffer support

* ggml : support for scratch ring-buffer

* ggml : bug fix in ggml_repeat()

* ggml : error on scratch buffer overflow

* whisper : use scratch buffers during inference (base model only)

* whisper : update memory usage for all models

* whisper : fix encoder memory usage

* whisper : use whisper_context functions instead of macros

* whisper : fix FF + remove it from README

* ggml : reuse ggml_new_i32

* ggml : refactor the scratch buffer storage

* whisper : reorder scratch buffers in the decoder

* main : add option to disable temp fallback

* Update README.md
2023-02-04 09:45:52 +02:00
fitzsim ae16c21e9c
whisper : PPC64 big-endian support (#398)
* ggml : set cache line size to 128 on POWER9

* whisper : add PPC64 big endian support
2023-01-23 20:48:10 +02:00
Georgi Gerganov 1290fc6457
bench : add memcpy and ggml_mul_mat benchmarks 2023-01-18 20:31:46 +02:00
Georgi Gerganov 4ef3398e8f
ggml : remove obsolete zeroing + comment fixes (#390) 2023-01-08 20:21:03 +02:00
Abitofevrything 8d7b29cedd
ggml : correct behaviour of ggml_vec_sum_f32 (#390) 2023-01-08 20:06:09 +02:00
Georgi Gerganov 52a3e0c92a
ggml : improve vec_dot_f16 unrolling in flash_attn_f16 2023-01-08 11:41:18 +02:00
Georgi Gerganov f30b5d322c
ggml : fix bug in new soft max computation 2023-01-07 21:00:07 +02:00
Georgi Gerganov d347a59a5f
ggml : when using BLAS start only 1 CPU thread 2023-01-07 19:48:56 +02:00
Georgi Gerganov 6394c906af
ggml : fix running tasks with variable number of threads 2023-01-07 19:20:18 +02:00
Georgi Gerganov 74ffa14e1d
ggml : unroll ggml_vec_dot_f16 in ggml_compute_forward_flash_attn_f16 2023-01-07 19:19:40 +02:00
Georgi Gerganov 65fdcbbbbb
whisper : revert accidental MB change 2023-01-07 16:18:21 +02:00
Georgi Gerganov d61d55cd4b
ggml : speed-up soft max via Accelerate + unroll 2023-01-07 16:16:42 +02:00
Georgi Gerganov d51fc3ee0a
ggml : use vDSP_sve and vDSP_maxv from Accelerate 2023-01-07 16:10:16 +02:00
Georgi Gerganov f82a7dd019
ggml : make gcc happy (minor) 2023-01-07 09:34:39 +02:00
Abitofevrything a62170c656
ggml : add SSE3 and fp16 conversion lookup table (#368)
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons 1944e7c33e whisper : document POWER VSX support 2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons 49a8dd6732 ggml : reorganize POWER9 ppc64le SIMD code 2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons 8c7f642286 ggml : change f16 load and store macro arguments 2023-01-05 23:53:00 +02:00
Georgi Gerganov 0a0cfa7985
ggml : add void to argument-less functions 2023-01-05 21:40:38 +02:00
Georgi Gerganov d51c5eb906
ggml : define MIN / MAX only if not defined (minor) 2023-01-05 21:16:52 +02:00
Thomas Fitzsimmons 424c410c42 ggml : improve f16 acceleration for POWER9 ppc64le 2022-12-31 10:02:19 +02:00
Georgi Gerganov 4e0b2069e7
ggml : barrier refactor + static functions 2022-12-28 19:00:53 +02:00
Georgi Gerganov ac521a566e
ggml : simplify the SIMD code (#324)
* ggml : simplify the SIMD code

* ggml : generic reduce for all register sizes + comments
2022-12-24 10:22:28 +02:00
Georgi Gerganov 7282e2109e
ggml : use vaddvq_f32 for slightly more efficient reduce 2022-12-23 13:48:19 +02:00
Thomas Fitzsimmons 466ceebb78 ggml : add f16 acceleration for POWER9 ppc64le 2022-12-23 13:23:58 +02:00
Andy Maloney 493d94130d
ggml : make consts static (#317)
These shouldn't be able to be referenced outside the compilation unit.
2022-12-23 11:05:27 +02:00
Andy Maloney fa463313ad
minor : small code cleanups (#302)
* Small code cleanups

- fix indentation
- remove extra semicolons
- remove extra break after returns in case statements
- remove unnecessary call to .data() on string
- use empty() instead of checking size()
- no need to check for nullptr before free
- remove unnecessary initialization of string to ""

* minor : switch case always break

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-22 17:06:19 +02:00
Kevin Brothaler e1432dd91a Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a.
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
2022-12-22 16:47:54 +02:00
katsu560 419b8a6402 Add AVX,AVX2 support for ggml_vec_scale_f32 2022-12-17 19:40:10 +02:00
Georgi Gerganov a7047b2a28
ggml : implement ggml_compute_forward_dup_f16() special cases 2022-12-16 21:50:41 +02:00
Georgi Gerganov 0f11759406
ggml : make more compatible with c99 (#262) 2022-12-16 18:00:12 +02:00
Georgi Gerganov f66ac6dc4f
ggml : fix indentation 2022-12-13 23:09:21 +02:00
Georgi Gerganov 9955fa4ed7
ggml : make compatible with c99 (#262) 2022-12-13 23:07:49 +02:00
Roland Rabien e70d47baab
Remove C++20 requirement (#257)
* Remove C++20 requirement

* Roll back C features not supported in VS2017
2022-12-11 20:03:07 +02:00
Georgi Gerganov 3b1aacbe6d talk : talk with AI in the terminal 2022-12-10 16:51:58 +02:00
Georgi Gerganov 50a061b313
ggml : add alternative cblas_sgemm call 2022-12-08 23:48:04 +02:00
Al Hoang 04a16bbf11 fix compilation on haiku 2022-12-08 09:20:57 +02:00
Georgi Gerganov b6597539f9
ggml : fix typo in previous commit 2022-12-06 22:12:57 +02:00
Georgi Gerganov 9a4b7a916e
ggml : use macros to inline FP16 <-> FP32 conversions 2022-12-06 22:09:26 +02:00
Georgi Gerganov f8ec718b76
ggml : add F16C CPU flag check 2022-12-06 21:56:56 +02:00
katsu560 35b40a93b9 add fp16/fp32 convert intrinsics 2022-12-06 21:44:24 +02:00
Georgi Gerganov 061fc81bd6
ggml : remove inline specifier from fp16 <-> fp32 converters 2022-12-01 22:15:12 +02:00
Georgi Gerganov 388e9f79ad
ggml : fix the fix 2022-11-23 22:40:06 +02:00
Georgi Gerganov 35cd29ce1f
ggml : fix cross-compile Linux -> Window with mingw (#168) 2022-11-23 22:28:41 +02:00
katsu560 804f36aa2c ggml: change inline ggml_fp16_to_fp32, ggml_fp16_t ggml_fp32_to_fp16 2022-11-23 22:16:33 +02:00
katsu560 83456076f0 add AVX support 2022-11-23 22:16:33 +02:00
Georgi Gerganov 2065572a11 ggml : fix Windows build 2022-11-20 22:47:03 +02:00
boolemancer 0bfe728b84 Fix the Windows pthread_create shim
The current implementation doesn't actually set the out parameter,
and it returns 0 on failure instead of on success.
2022-11-08 15:02:32 +02:00
Georgi Gerganov 75171c2b79
ggml : multi-thread the ggml_add operator 2022-11-03 20:53:44 +02:00
Georgi Gerganov 137321915f
ggml : fix the check for NEON support (#7)
Was using the wrong preprocessor macro
2022-11-02 17:52:24 +02:00
Syed Jafri 24cd12f647
Cross compilation (#121)
* Cross compile windows

* set env properly

* rm log

* fix review

* Add back space
2022-11-02 08:46:49 +02:00
Mikhail Grigorev 8dac3c6e10 Fixed sched_yield 2022-10-30 21:38:18 +02:00
Mikhail Grigorev 6417e59aad Implemenated sched_yield function for Windows 2022-10-30 21:38:18 +02:00
Georgi Gerganov e5044f87d9 ggml : fix barrier 2022-10-29 19:37:19 +03:00
Georgi Gerganov a272f10b2e ggml : fix thread-safety of ggml_init and ggml_free 2022-10-29 19:37:19 +03:00
Georgi Gerganov fbd513b813 Add OpenBLAS support
Supported via CMake - just add:

cmake .. -DWHISPER_SUPPORT_OPENBLAS=ON

On Ubuntu, you have to install the library like this:

apt install libopenblas-dev

Unfortunately, I don't observe any benefit compared to the
original AVX2 + FP16 implementation. Maybe I'm missing something
2022-10-27 18:31:49 +03:00
Georgi Gerganov 34bb3ab0cf ggml : add system info functions 2022-10-25 20:53:48 +03:00
Georgi Gerganov c6710efde2 refactoring : move main + stream in examples + other stuff 2022-10-25 20:53:48 +03:00
Georgi Gerganov db460b78ff wip : WASM 128-bit SIMD support 2022-10-22 18:54:01 +03:00
Georgi Gerganov e905c6f827 wip : initial WASM port
Works but it is very slow because no SIMD is used.
For example, jfk.wav is processed in ~23 seconds using "tiny.en" model
2022-10-22 18:54:01 +03:00
Georgi Gerganov 19817711b4
Add reference to FP16 repo 2022-10-18 19:48:34 +03:00
Georgi Gerganov e36aabe00d
Correct implementation of FP16 GELU
Can toggle it via the GGML_GELU_FP16 macro
2022-10-18 18:42:08 +03:00
Georgi Gerganov 91632eb6ea Revert GELU change
Seems it does not work on x86 for some reason
2022-10-18 00:45:08 +03:00
Georgi Gerganov 72d967bce4 Use Accelerate framework on Apple silicon
Huge performance improvement in the Encode (almost x2 on MacBook M1 Pro)

Also various extra optimizations:

- Multi-threaded NORM operator
- Faster GELU via F16 cast
2022-10-18 00:12:51 +03:00
Georgi Gerganov 0e858f080d
close #56 : build on FreeBSD
Thanks to @abelbabel for the contribution
2022-10-17 18:10:16 +03:00
Borislav Stanimirov 0b45d25151 Building with MSVC 2022-10-11 21:40:46 +03:00
lnyan 4bbb8a587b Add MinGW support 2022-10-09 22:26:37 +08:00
Georgi Gerganov e29a5dacc6
ref #11, #18, #26 : fix CACHE_LINE_SIZE constant 2022-10-07 21:56:44 +03:00
Georgi Gerganov 167324584b wip : rpi4 support 2022-10-05 23:03:46 +03:00
Georgi Gerganov f888c2373d
Flash + language support (ref #2)
- Achieved big performance improvement + memory usage reduction
- Can now translate / transcribe different languages
2022-09-28 21:07:32 +03:00
Georgi Gerganov b0a11594ae
Initial release 2022-09-25 22:13:49 +03:00