Commit graph

97 commits

Author SHA1 Message Date
alonfaraj 3998465721
ci : more platforms coverage (#1101)
* add multi platform

* add image name

* fix

* fix /bin/sh path

* add missing \

* add all platforms for check

* remove platforms

* remove s390x

* - add arm v6
- format run cmd

* remove arm v6

* - bump checkout to v3
- use setup emsdk action
- add arch to all ubuntu jobs

* mymindstorm/setup-emsdk to v12

* add missing QEMU step

* add fail-fast: false for debug

* add freebsd

* remark all jobs except freebsd for test

* add sudo

* enable all tests again

* format

* check __AVX__ support before include immintrin.h

* try auto detect flag by cmake

* fix check for immintrin.h

* fix include check for immintrin.h

* Remove all platforms for sanitizer build except amd64

We have no clue why they failed.

---------

Co-authored-by: Alon Faraj <alon.faraj@mapcore.com>
2023-07-16 23:00:34 +03:00
Georgi Gerganov 8ba42095c5
Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027)"
This reverts commit 3f7a03ebe3.
2023-07-02 21:53:52 +03:00
Georgi Gerganov d6509bf78d
ggml : sync latest repo (mostly refactoring changes) 2023-07-02 21:46:09 +03:00
Przemysław Pawełczyk 3f7a03ebe3
ggml : do not use _GNU_SOURCE gratuitously (#1027)
* Do not use _GNU_SOURCE gratuitously.

What is needed to build whisper.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions.

There is no need to penalize musl libc which simply follows standards.

Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs.

It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.

* examples : include SDL headers before other headers

This is an attempt at fixing macOS build error coming from SDL2 relying
on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.
2023-06-25 16:34:30 +03:00
Georgi Gerganov 5feb0dffba
ggml : sync latest ggml lib 2023-06-25 14:30:44 +03:00
Georgi Gerganov 429b9785c0
ggml : update WASM SIMD 2023-05-20 20:00:06 +03:00
Georgi Gerganov e410cfc3ce
ggml : sync latest ggml repo
- new Q4 and Q8 quantization
- updated CUDA
2023-05-20 18:56:30 +03:00
Georgi Gerganov aaf0d41c7c
ggml : add AVX dot products 2023-05-14 18:56:46 +03:00
Georgi Gerganov e693074aa6
ggml : sync latest ggml
- New Q4 and Q5 formats
- Various improvements
2023-05-14 18:04:23 +03:00
Georgi Gerganov 5974c8facd ggml : fix 32-bit ARM build + quantization 2023-05-02 21:52:26 +03:00
Georgi Gerganov 0bcb64b184
ggml : sync ggml (clBLAST + tensor names) 2023-05-02 21:24:18 +03:00
Georgi Gerganov feac80dd3f
ggml : fix UB (int << 31) 2023-04-30 22:27:30 +03:00
Georgi Gerganov 794b162a46
whisper : add integer quantization support (#540)
* whisper : add integer quantization support

* examples : add common-ggml + prepare to add "quantize" tool

* whisper : quantization tool ready

* whisper : fix F32 support

* whisper : try to fix shared lib linkage

* wasm : update quantized models to Q5

* bench.wasm : remove "medium" button

* bench.wasm : fix custom model button

* ggml : add Q5_0 and Q5_1 WASM SIMD

* wasm : add quantized models to all WASM examples

* wasm : bump DB version number to 2

* talk-llama : update example to latest llama.cpp

* node : increase test timeout to 10s

* readme : add information for model quantization

* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov 0ccd6746c9
ggml : fix WASM build 2023-04-29 21:37:23 +03:00
Georgi Gerganov d9b550c0a1
ggml : fix 32-bit ARM NEON (#836)
* ggml : add support for 32-bit ARM

* ggml : fix

* ggml : fix
2023-04-29 21:33:33 +03:00
Georgi Gerganov e9b091c92a
ggml : use vzip instead of vuzp for consistency 2023-04-29 21:14:09 +03:00
Georgi Gerganov 1f30b99208
ggml : fix WASM build 2023-04-29 20:21:25 +03:00
Georgi Gerganov 05c3ea3bc8
ggml : sync with ggml repo (warning fixes + asserts) 2023-04-29 19:33:28 +03:00
Georgi Gerganov acec73ab6e
ggml : sync latest ggml + llama.cpp updates (quantization) 2023-04-29 12:32:28 +03:00
Jhen-Jie Hong ea1f8a50d4
ggml, ci : fix build on whisper.android (ARM_NEON) + add CI (#764)
* ggml : fix undefined symbol by remove inline handle

* ggml : make own ggml_aligned_malloc function

* ci: add ios/android build
2023-04-15 14:21:58 +03:00
Georgi Gerganov 677ad754a0
ggml : sync latest ggml 2023-04-14 19:20:39 +03:00
novag 463e46338c
ggml : fix q4_1 dot product types (#759)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-14 13:34:20 +03:00
Georgi Gerganov 2f889132c6
ggml : sync latest changes from ggml and llama.cpp 2023-04-13 18:53:44 +03:00
Georgi Gerganov ebef1e8620
ggml : fix WASM build 2023-04-10 23:18:29 +03:00
Georgi Gerganov 69b8503935
ggml : backport llama.cpp updates (close #709)
- About x2 overall performance improvement on Apple Silicon
- Results should now be the same for different number of threads (not
  tested)
2023-04-10 22:28:54 +03:00
Georgi Gerganov 4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp (#664)
* talk-llama : talk with LLaMA AI

* talk.llama : disable EOS token

* talk-llama : add README instructions

* ggml : fix build in debug
2023-03-27 21:00:32 +03:00
Georgi Gerganov f3ee4a9673
whisper : reduce memory usage during inference (#431)
* ggml : add "scratch" buffer support

* ggml : support for scratch ring-buffer

* ggml : bug fix in ggml_repeat()

* ggml : error on scratch buffer overflow

* whisper : use scratch buffers during inference (base model only)

* whisper : update memory usage for all models

* whisper : fix encoder memory usage

* whisper : use whisper_context functions instead of macros

* whisper : fix FF + remove it from README

* ggml : reuse ggml_new_i32

* ggml : refactor the scratch buffer storage

* whisper : reorder scratch buffers in the decoder

* main : add option to disable temp fallback

* Update README.md
2023-02-04 09:45:52 +02:00
fitzsim ae16c21e9c
whisper : PPC64 big-endian support (#398)
* ggml : set cache line size to 128 on POWER9

* whisper : add PPC64 big endian support
2023-01-23 20:48:10 +02:00
Georgi Gerganov 1290fc6457
bench : add memcpy and ggml_mul_mat benchmarks 2023-01-18 20:31:46 +02:00
Georgi Gerganov 4ef3398e8f
ggml : remove obsolete zeroing + comment fixes (#390) 2023-01-08 20:21:03 +02:00
Abitofevrything 8d7b29cedd
ggml : correct behaviour of ggml_vec_sum_f32 (#390) 2023-01-08 20:06:09 +02:00
Georgi Gerganov 52a3e0c92a
ggml : improve vec_dot_f16 unrolling in flash_attn_f16 2023-01-08 11:41:18 +02:00
Georgi Gerganov f30b5d322c
ggml : fix bug in new soft max computation 2023-01-07 21:00:07 +02:00
Georgi Gerganov d347a59a5f
ggml : when using BLAS start only 1 CPU thread 2023-01-07 19:48:56 +02:00
Georgi Gerganov 6394c906af
ggml : fix running tasks with variable number of threads 2023-01-07 19:20:18 +02:00
Georgi Gerganov 74ffa14e1d
ggml : unroll ggml_vec_dot_f16 in ggml_compute_forward_flash_attn_f16 2023-01-07 19:19:40 +02:00
Georgi Gerganov 65fdcbbbbb
whisper : revert accidental MB change 2023-01-07 16:18:21 +02:00
Georgi Gerganov d61d55cd4b
ggml : speed-up soft max via Accelerate + unroll 2023-01-07 16:16:42 +02:00
Georgi Gerganov d51fc3ee0a
ggml : use vDSP_sve and vDSP_maxv from Accelerate 2023-01-07 16:10:16 +02:00
Georgi Gerganov f82a7dd019
ggml : make gcc happy (minor) 2023-01-07 09:34:39 +02:00
Abitofevrything a62170c656
ggml : add SSE3 and fp16 conversion lookup table (#368)
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons 1944e7c33e whisper : document POWER VSX support 2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons 49a8dd6732 ggml : reorganize POWER9 ppc64le SIMD code 2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons 8c7f642286 ggml : change f16 load and store macro arguments 2023-01-05 23:53:00 +02:00
Georgi Gerganov 0a0cfa7985
ggml : add void to argument-less functions 2023-01-05 21:40:38 +02:00
Georgi Gerganov d51c5eb906
ggml : define MIN / MAX only if not defined (minor) 2023-01-05 21:16:52 +02:00
Thomas Fitzsimmons 424c410c42 ggml : improve f16 acceleration for POWER9 ppc64le 2022-12-31 10:02:19 +02:00
Georgi Gerganov 4e0b2069e7
ggml : barrier refactor + static functions 2022-12-28 19:00:53 +02:00
Georgi Gerganov ac521a566e
ggml : simplify the SIMD code (#324)
* ggml : simplify the SIMD code

* ggml : generic reduce for all register sizes + comments
2022-12-24 10:22:28 +02:00
Georgi Gerganov 7282e2109e
ggml : use vaddvq_f32 for slightly more efficient reduce 2022-12-23 13:48:19 +02:00