whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	f3ee4a9673	whisper : reduce memory usage during inference (#431 ) * ggml : add "scratch" buffer support * ggml : support for scratch ring-buffer * ggml : bug fix in ggml_repeat() * ggml : error on scratch buffer overflow * whisper : use scratch buffers during inference (base model only) * whisper : update memory usage for all models * whisper : fix encoder memory usage * whisper : use whisper_context functions instead of macros * whisper : fix FF + remove it from README * ggml : reuse ggml_new_i32 * ggml : refactor the scratch buffer storage * whisper : reorder scratch buffers in the decoder * main : add option to disable temp fallback * Update README.md	2023-02-04 09:45:52 +02:00
fitzsim	ae16c21e9c	whisper : PPC64 big-endian support (#398 ) * ggml : set cache line size to 128 on POWER9 * whisper : add PPC64 big endian support	2023-01-23 20:48:10 +02:00
Georgi Gerganov	1290fc6457	bench : add memcpy and ggml_mul_mat benchmarks	2023-01-18 20:31:46 +02:00
Georgi Gerganov	4ef3398e8f	ggml : remove obsolete zeroing + comment fixes (#390 )	2023-01-08 20:21:03 +02:00
Abitofevrything	8d7b29cedd	ggml : correct behaviour of ggml_vec_sum_f32 (#390 )	2023-01-08 20:06:09 +02:00
Georgi Gerganov	52a3e0c92a	ggml : improve vec_dot_f16 unrolling in flash_attn_f16	2023-01-08 11:41:18 +02:00
Georgi Gerganov	f30b5d322c	ggml : fix bug in new soft max computation	2023-01-07 21:00:07 +02:00
Georgi Gerganov	d347a59a5f	ggml : when using BLAS start only 1 CPU thread	2023-01-07 19:48:56 +02:00
Georgi Gerganov	6394c906af	ggml : fix running tasks with variable number of threads	2023-01-07 19:20:18 +02:00
Georgi Gerganov	74ffa14e1d	ggml : unroll ggml_vec_dot_f16 in ggml_compute_forward_flash_attn_f16	2023-01-07 19:19:40 +02:00
Georgi Gerganov	65fdcbbbbb	whisper : revert accidental MB change	2023-01-07 16:18:21 +02:00
Georgi Gerganov	d61d55cd4b	ggml : speed-up soft max via Accelerate + unroll	2023-01-07 16:16:42 +02:00
Georgi Gerganov	d51fc3ee0a	ggml : use vDSP_sve and vDSP_maxv from Accelerate	2023-01-07 16:10:16 +02:00
Georgi Gerganov	f82a7dd019	ggml : make gcc happy (minor)	2023-01-07 09:34:39 +02:00
Abitofevrything	a62170c656	ggml : add SSE3 and fp16 conversion lookup table (#368 ) * Improves WASM performance: On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome * Add support for SSE3 SIMD * Add SSE3 to system information * Add Imath support for fp16-fp32 conversions * Add Imath to system information * Wrap Imath calls to avoid static function warnings * Drop Imath; Add lookup table for f16 -> f32 conversions * Remove TODO comments * Update SSE3 to new macro arguments * Correct updated macro definitions * Prefer static inline where possible * ggml : static inlines + add public f16 <-> f32 conversions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons	1944e7c33e	whisper : document POWER VSX support	2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons	49a8dd6732	ggml : reorganize POWER9 ppc64le SIMD code	2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons	8c7f642286	ggml : change f16 load and store macro arguments	2023-01-05 23:53:00 +02:00
Georgi Gerganov	0a0cfa7985	ggml : add void to argument-less functions	2023-01-05 21:40:38 +02:00
Georgi Gerganov	d51c5eb906	ggml : define MIN / MAX only if not defined (minor)	2023-01-05 21:16:52 +02:00
Thomas Fitzsimmons	424c410c42	ggml : improve f16 acceleration for POWER9 ppc64le	2022-12-31 10:02:19 +02:00
Georgi Gerganov	4e0b2069e7	ggml : barrier refactor + static functions	2022-12-28 19:00:53 +02:00
Georgi Gerganov	ac521a566e	ggml : simplify the SIMD code (#324 ) * ggml : simplify the SIMD code * ggml : generic reduce for all register sizes + comments	2022-12-24 10:22:28 +02:00
Georgi Gerganov	7282e2109e	ggml : use vaddvq_f32 for slightly more efficient reduce	2022-12-23 13:48:19 +02:00
Thomas Fitzsimmons	466ceebb78	ggml : add f16 acceleration for POWER9 ppc64le	2022-12-23 13:23:58 +02:00
Andy Maloney	493d94130d	ggml : make consts static (#317 ) These shouldn't be able to be referenced outside the compilation unit.	2022-12-23 11:05:27 +02:00
Andy Maloney	fa463313ad	minor : small code cleanups (#302 ) * Small code cleanups - fix indentation - remove extra semicolons - remove extra break after returns in case statements - remove unnecessary call to .data() on string - use empty() instead of checking size() - no need to check for nullptr before free - remove unnecessary initialization of string to "" * minor : switch case always break Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2022-12-22 17:06:19 +02:00
Kevin Brothaler	e1432dd91a	Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a. Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks. * Also removed ABI filter from Android project.	2022-12-22 16:47:54 +02:00
katsu560	419b8a6402	Add AVX,AVX2 support for ggml_vec_scale_f32	2022-12-17 19:40:10 +02:00
Georgi Gerganov	a7047b2a28	ggml : implement ggml_compute_forward_dup_f16() special cases	2022-12-16 21:50:41 +02:00
Georgi Gerganov	0f11759406	ggml : make more compatible with c99 (#262 )	2022-12-16 18:00:12 +02:00
Georgi Gerganov	f66ac6dc4f	ggml : fix indentation	2022-12-13 23:09:21 +02:00
Georgi Gerganov	9955fa4ed7	ggml : make compatible with c99 (#262 )	2022-12-13 23:07:49 +02:00
Roland Rabien	e70d47baab	Remove C++20 requirement (#257 ) * Remove C++20 requirement * Roll back C features not supported in VS2017	2022-12-11 20:03:07 +02:00
Georgi Gerganov	3b1aacbe6d	talk : talk with AI in the terminal	2022-12-10 16:51:58 +02:00
Georgi Gerganov	50a061b313	ggml : add alternative cblas_sgemm call	2022-12-08 23:48:04 +02:00
Al Hoang	04a16bbf11	fix compilation on haiku	2022-12-08 09:20:57 +02:00
Georgi Gerganov	b6597539f9	ggml : fix typo in previous commit	2022-12-06 22:12:57 +02:00
Georgi Gerganov	9a4b7a916e	ggml : use macros to inline FP16 <-> FP32 conversions	2022-12-06 22:09:26 +02:00
Georgi Gerganov	f8ec718b76	ggml : add F16C CPU flag check	2022-12-06 21:56:56 +02:00
katsu560	35b40a93b9	add fp16/fp32 convert intrinsics	2022-12-06 21:44:24 +02:00
Georgi Gerganov	061fc81bd6	ggml : remove inline specifier from fp16 <-> fp32 converters	2022-12-01 22:15:12 +02:00
Georgi Gerganov	388e9f79ad	ggml : fix the fix	2022-11-23 22:40:06 +02:00
Georgi Gerganov	35cd29ce1f	ggml : fix cross-compile Linux -> Window with mingw (#168 )	2022-11-23 22:28:41 +02:00
katsu560	804f36aa2c	ggml: change inline ggml_fp16_to_fp32, ggml_fp16_t ggml_fp32_to_fp16	2022-11-23 22:16:33 +02:00
katsu560	83456076f0	add AVX support	2022-11-23 22:16:33 +02:00
Georgi Gerganov	2065572a11	ggml : fix Windows build	2022-11-20 22:47:03 +02:00
boolemancer	0bfe728b84	Fix the Windows pthread_create shim The current implementation doesn't actually set the out parameter, and it returns 0 on failure instead of on success.	2022-11-08 15:02:32 +02:00
Georgi Gerganov	75171c2b79	ggml : multi-thread the ggml_add operator	2022-11-03 20:53:44 +02:00
Georgi Gerganov	137321915f	ggml : fix the check for NEON support (#7 ) Was using the wrong preprocessor macro	2022-11-02 17:52:24 +02:00

1 2

71 Commits (bb6b54a03d442833dcc34fda6c09d585a112bbcf)