whisper.cpp/examples
bobqianic 7e54df414e
whisper : significantly improve the inference quality (#1148)
* Fix MSVC compile error C3688

Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC.

* Significantly improve inference quality

In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference.

* Significantly improve inference quality

At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue.

* Addressed a few minor issues

Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`.

* Significantly improve inference quality 

Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information.

* Add annotation and performance improvement

* Calculate FFT only when fft_in are not all zero

* Some minor performance improvement

* Fixed a bug impacting inference quality

* The first version after all the analysis is completed.

* Fix some bugs and add debug mode

* Fixed several bugs

* Temporarily disable speed-up mode and add debug mode.

* Add debug mode

* Disable speed-up mode and add debug mode

* Fix CI error (#1)

* Fix error

* Fix error

* Fixed several bugs including [BLANK_AUDIO] problem

* Remove Hard-coded hann window

* Some Final Fix (#2)

* Fix error

* Fix error

* Probably the last commit

* Probably the last commit

* whisper : minor coding style changes

* whisper : remove debug from public API

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-27 19:51:33 +03:00
..
addon.node whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
bench bench : fix Windows linkage by moving ggml benches in whisper lib .. 2023-01-18 21:19:50 +02:00
bench.wasm whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
command Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027)" 2023-07-02 21:53:52 +03:00
command.wasm examples : fix + refactor Levenshtein distance 2023-04-30 19:12:49 +03:00
main whisper : significantly improve the inference quality (#1148) 2023-08-27 19:51:33 +03:00
quantize quantize : fix load vocab crash when len is 128 (#1160) 2023-08-06 11:04:42 +03:00
stream examples : add tinydiarization support for streaming (#1137) 2023-08-03 11:24:07 +03:00
stream.wasm whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
talk Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027)" 2023-07-02 21:53:52 +03:00
talk-llama talk-llama : fix new rope interface 2023-07-03 19:24:01 +03:00
talk.wasm whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
whisper.android whisper.android : migrate from ndk-build to CMake (#1204) 2023-08-27 19:35:16 +03:00
whisper.nvim examples : add Vim plugin (#1131) 2023-07-25 18:34:23 +03:00
whisper.objc whisper.objc : enable Core ML in example & fix segmentation fault (#910) 2023-05-14 09:47:02 +03:00
whisper.swiftui whisper.swiftui : update README.md (#682) 2023-03-29 23:04:38 +03:00
whisper.wasm whisper : add memory sizes for Q8_0 (close #846) 2023-05-01 10:03:56 +03:00
CMakeLists.txt whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
common-ggml.cpp ggml : sync latest ggml lib 2023-06-25 14:30:44 +03:00
common-ggml.h whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
common-sdl.cpp examples : refactor in order to reuse code and reduce duplication (#482) 2023-02-15 19:28:10 +02:00
common-sdl.h examples : refactor in order to reuse code and reduce duplication (#482) 2023-02-15 19:28:10 +02:00
common.cpp whisper : minor OpenVINO refactoring (#1037) 2023-07-04 20:28:27 +03:00
common.h whisper : minor OpenVINO refactoring (#1037) 2023-07-04 20:28:27 +03:00
dr_wav.h refactoring : move main + stream in examples + other stuff 2022-10-25 20:53:48 +03:00
generate-karaoke.sh minor : add comment for using "generate_karaoke.sh" 2022-11-26 10:22:42 +02:00
helpers.js whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
livestream.sh livestream.sh : run main with model arg instead of default (#453) 2023-01-27 01:13:31 +02:00
twitch.sh twitch.sh : various fixes and polishing 2022-12-08 19:20:04 +02:00
yt-wsp.sh yt-wsp.sh : print help on empty args 2023-02-18 09:42:31 +02:00