whisper.cpp

History

bobqianic 7e54df414e whisper : significantly improve the inference quality (#1148 ) * Fix MSVC compile error C3688 Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC. * Significantly improve inference quality In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference. * Significantly improve inference quality At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue. * Addressed a few minor issues Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`. * Significantly improve inference quality Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information. * Add annotation and performance improvement * Calculate FFT only when fft_in are not all zero * Some minor performance improvement * Fixed a bug impacting inference quality * The first version after all the analysis is completed. * Fix some bugs and add debug mode * Fixed several bugs * Temporarily disable speed-up mode and add debug mode. * Add debug mode * Disable speed-up mode and add debug mode * Fix CI error (#1) * Fix error * Fix error * Fixed several bugs including [BLANK_AUDIO] problem * Remove Hard-coded hann window * Some Final Fix (#2) * Fix error * Fix error * Probably the last commit * Probably the last commit * whisper : minor coding style changes * whisper : remove debug from public API --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2023-08-27 19:51:33 +03:00
..
addon.node	whisper : add integer quantization support (#540 )	2023-04-30 18:51:57 +03:00
bench	bench : fix Windows linkage by moving ggml benches in whisper lib ..	2023-01-18 21:19:50 +02:00
bench.wasm	whisper : add integer quantization support (#540 )	2023-04-30 18:51:57 +03:00
command	Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027 )"	2023-07-02 21:53:52 +03:00
command.wasm	examples : fix + refactor Levenshtein distance	2023-04-30 19:12:49 +03:00
main	whisper : significantly improve the inference quality (#1148 )	2023-08-27 19:51:33 +03:00
quantize	quantize : fix load vocab crash when len is 128 (#1160 )	2023-08-06 11:04:42 +03:00
stream	examples : add tinydiarization support for streaming (#1137 )	2023-08-03 11:24:07 +03:00
stream.wasm	whisper : add integer quantization support (#540 )	2023-04-30 18:51:57 +03:00
talk	Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027 )"	2023-07-02 21:53:52 +03:00
talk-llama	talk-llama : fix new rope interface	2023-07-03 19:24:01 +03:00
talk.wasm	whisper : add integer quantization support (#540 )	2023-04-30 18:51:57 +03:00
whisper.android	whisper.android : migrate from ndk-build to CMake (#1204 )	2023-08-27 19:35:16 +03:00
whisper.nvim	examples : add Vim plugin (#1131 )	2023-07-25 18:34:23 +03:00
whisper.objc	whisper.objc : enable Core ML in example & fix segmentation fault (#910 )	2023-05-14 09:47:02 +03:00
whisper.swiftui	whisper.swiftui : update README.md (#682 )	2023-03-29 23:04:38 +03:00
whisper.wasm	whisper : add memory sizes for Q8_0 (close #846 )	2023-05-01 10:03:56 +03:00
CMakeLists.txt	whisper : add integer quantization support (#540 )	2023-04-30 18:51:57 +03:00
common-ggml.cpp	ggml : sync latest ggml lib	2023-06-25 14:30:44 +03:00
common-ggml.h	whisper : add integer quantization support (#540 )	2023-04-30 18:51:57 +03:00
common-sdl.cpp	examples : refactor in order to reuse code and reduce duplication (#482 )	2023-02-15 19:28:10 +02:00
common-sdl.h	examples : refactor in order to reuse code and reduce duplication (#482 )	2023-02-15 19:28:10 +02:00
common.cpp	whisper : minor OpenVINO refactoring (#1037 )	2023-07-04 20:28:27 +03:00
common.h	whisper : minor OpenVINO refactoring (#1037 )	2023-07-04 20:28:27 +03:00
dr_wav.h	refactoring : move main + stream in examples + other stuff	2022-10-25 20:53:48 +03:00
generate-karaoke.sh	minor : add comment for using "generate_karaoke.sh"	2022-11-26 10:22:42 +02:00
helpers.js	whisper : add integer quantization support (#540 )	2023-04-30 18:51:57 +03:00
livestream.sh	livestream.sh : run main with model arg instead of default (#453 )	2023-01-27 01:13:31 +02:00
twitch.sh	twitch.sh : various fixes and polishing	2022-12-08 19:20:04 +02:00
yt-wsp.sh	yt-wsp.sh : print help on empty args	2023-02-18 09:42:31 +02:00