whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	e410cfc3ce	ggml : sync latest ggml repo - new Q4 and Q8 quantization - updated CUDA	2023-05-20 18:56:30 +03:00
Georgi Gerganov	e693074aa6	ggml : sync latest ggml - New Q4 and Q5 formats - Various improvements	2023-05-14 18:04:23 +03:00
Georgi Gerganov	0bcb64b184	ggml : sync ggml (clBLAST + tensor names)	2023-05-02 21:24:18 +03:00
Georgi Gerganov	794b162a46	whisper : add integer quantization support (#540 ) * whisper : add integer quantization support * examples : add common-ggml + prepare to add "quantize" tool * whisper : quantization tool ready * whisper : fix F32 support * whisper : try to fix shared lib linkage * wasm : update quantized models to Q5 * bench.wasm : remove "medium" button * bench.wasm : fix custom model button * ggml : add Q5_0 and Q5_1 WASM SIMD * wasm : add quantized models to all WASM examples * wasm : bump DB version number to 2 * talk-llama : update example to latest llama.cpp * node : increase test timeout to 10s * readme : add information for model quantization * wasm : add links to other examples	2023-04-30 18:51:57 +03:00
Georgi Gerganov	05c3ea3bc8	ggml : sync with ggml repo (warning fixes + asserts)	2023-04-29 19:33:28 +03:00
Georgi Gerganov	acec73ab6e	ggml : sync latest ggml + llama.cpp updates (quantization)	2023-04-29 12:32:28 +03:00
Georgi Gerganov	677ad754a0	ggml : sync latest ggml	2023-04-14 19:20:39 +03:00
Georgi Gerganov	2f889132c6	ggml : sync latest changes from ggml and llama.cpp	2023-04-13 18:53:44 +03:00
Georgi Gerganov	69b8503935	ggml : backport llama.cpp updates (close #709 ) - About x2 overall performance improvement on Apple Silicon - Results should now be the same for different number of threads (not tested)	2023-04-10 22:28:54 +03:00
Georgi Gerganov	4a0deb8b1e	talk-llama : add new example + sync ggml from llama.cpp (#664 ) * talk-llama : talk with LLaMA AI * talk.llama : disable EOS token * talk-llama : add README instructions * ggml : fix build in debug	2023-03-27 21:00:32 +03:00
Georgi Gerganov	f3ee4a9673	whisper : reduce memory usage during inference (#431 ) * ggml : add "scratch" buffer support * ggml : support for scratch ring-buffer * ggml : bug fix in ggml_repeat() * ggml : error on scratch buffer overflow * whisper : use scratch buffers during inference (base model only) * whisper : update memory usage for all models * whisper : fix encoder memory usage * whisper : use whisper_context functions instead of macros * whisper : fix FF + remove it from README * ggml : reuse ggml_new_i32 * ggml : refactor the scratch buffer storage * whisper : reorder scratch buffers in the decoder * main : add option to disable temp fallback * Update README.md	2023-02-04 09:45:52 +02:00
Abitofevrything	a62170c656	ggml : add SSE3 and fp16 conversion lookup table (#368 ) * Improves WASM performance: On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome * Add support for SSE3 SIMD * Add SSE3 to system information * Add Imath support for fp16-fp32 conversions * Add Imath to system information * Wrap Imath calls to avoid static function warnings * Drop Imath; Add lookup table for f16 -> f32 conversions * Remove TODO comments * Update SSE3 to new macro arguments * Correct updated macro definitions * Prefer static inline where possible * ggml : static inlines + add public f16 <-> f32 conversions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons	1944e7c33e	whisper : document POWER VSX support	2023-01-05 23:53:00 +02:00
Georgi Gerganov	ac521a566e	ggml : simplify the SIMD code (#324 ) * ggml : simplify the SIMD code * ggml : generic reduce for all register sizes + comments	2022-12-24 10:22:28 +02:00
Kevin Brothaler	e1432dd91a	Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a. Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks. * Also removed ABI filter from Android project.	2022-12-22 16:47:54 +02:00
Georgi Gerganov	0f11759406	ggml : make more compatible with c99 (#262 )	2022-12-16 18:00:12 +02:00
Georgi Gerganov	f8ec718b76	ggml : add F16C CPU flag check	2022-12-06 21:56:56 +02:00
katsu560	83456076f0	add AVX support	2022-11-23 22:16:33 +02:00
Georgi Gerganov	3500ce8727	ref #40 : start working on the documentation	2022-11-09 21:41:40 +02:00
Georgi Gerganov	0b2dc3c82c	parallel : working	2022-10-29 19:37:19 +03:00
Georgi Gerganov	34bb3ab0cf	ggml : add system info functions	2022-10-25 20:53:48 +03:00
Borislav Stanimirov	0b45d25151	Building with MSVC	2022-10-11 21:40:46 +03:00
Georgi Gerganov	167324584b	wip : rpi4 support	2022-10-05 23:03:46 +03:00
Georgi Gerganov	f888c2373d	Flash + language support (ref #2 ) - Achieved big performance improvement + memory usage reduction - Can now translate / transcribe different languages	2022-09-28 21:07:32 +03:00
Georgi Gerganov	b0a11594ae	Initial release	2022-09-25 22:13:49 +03:00

25 Commits (3f7436e8a09611931709b29f5c507245c8c1d7a4)