Commit Graph

15 Commits (bb6b54a03d442833dcc34fda6c09d585a112bbcf)

Author SHA1 Message Date
Georgi Gerganov f3ee4a9673
whisper : reduce memory usage during inference (#431)
* ggml : add "scratch" buffer support

* ggml : support for scratch ring-buffer

* ggml : bug fix in ggml_repeat()

* ggml : error on scratch buffer overflow

* whisper : use scratch buffers during inference (base model only)

* whisper : update memory usage for all models

* whisper : fix encoder memory usage

* whisper : use whisper_context functions instead of macros

* whisper : fix FF + remove it from README

* ggml : reuse ggml_new_i32

* ggml : refactor the scratch buffer storage

* whisper : reorder scratch buffers in the decoder

* main : add option to disable temp fallback

* Update README.md
2023-02-04 09:45:52 +02:00
Abitofevrything a62170c656
ggml : add SSE3 and fp16 conversion lookup table (#368)
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons 1944e7c33e whisper : document POWER VSX support 2023-01-05 23:53:00 +02:00
Georgi Gerganov ac521a566e
ggml : simplify the SIMD code (#324)
* ggml : simplify the SIMD code

* ggml : generic reduce for all register sizes + comments
2022-12-24 10:22:28 +02:00
Kevin Brothaler e1432dd91a Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a.
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
2022-12-22 16:47:54 +02:00
Georgi Gerganov 0f11759406
ggml : make more compatible with c99 (#262) 2022-12-16 18:00:12 +02:00
Georgi Gerganov f8ec718b76
ggml : add F16C CPU flag check 2022-12-06 21:56:56 +02:00
katsu560 83456076f0 add AVX support 2022-11-23 22:16:33 +02:00
Georgi Gerganov 3500ce8727
ref #40 : start working on the documentation 2022-11-09 21:41:40 +02:00
Georgi Gerganov 0b2dc3c82c parallel : working 2022-10-29 19:37:19 +03:00
Georgi Gerganov 34bb3ab0cf ggml : add system info functions 2022-10-25 20:53:48 +03:00
Borislav Stanimirov 0b45d25151 Building with MSVC 2022-10-11 21:40:46 +03:00
Georgi Gerganov 167324584b wip : rpi4 support 2022-10-05 23:03:46 +03:00
Georgi Gerganov f888c2373d
Flash + language support (ref #2)
- Achieved big performance improvement + memory usage reduction
- Can now translate / transcribe different languages
2022-09-28 21:07:32 +03:00
Georgi Gerganov b0a11594ae
Initial release 2022-09-25 22:13:49 +03:00