Commit Graph

340 Commits (9a0b59d990be319952a4a02b9164b3b2327cd454)

Author SHA1 Message Date
Georgi Gerganov 83c742f1a7 whisper : add option to speed up the audio tempo by x2
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Alan 7519eabf65 Adds support for stdin wav input 2022-11-09 20:37:23 +02:00
Georgi Gerganov c30bffc8a5
ref #22 : add "duration" option
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
Georgi Gerganov c71363f14c
examples : add simple script for generating Karaoke video 2022-11-06 09:22:50 +02:00
Georgi Gerganov d42cf6d0df
Update README.md 2022-11-04 22:26:08 +02:00
Georgi Gerganov ef47d77492
main : fix generated bash script 2022-11-04 18:30:38 +02:00
Georgi Gerganov d5afebd37c
whisper : token-level timestamp refactoring (#49, #120)
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
Georgi Gerganov 6fb98370ba
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00
Georgi Gerganov 0729da9a3b
main : fix some edge cases for word-level timestamps 2022-11-01 22:09:25 +02:00
Georgi Gerganov 5dc74e3aff
Update README.md 2022-10-31 22:06:05 +02:00
Georgi Gerganov ac8ef34039
Update README.md 2022-10-31 20:19:41 +02:00
Georgi Gerganov dc12994603
Update README.md 2022-10-30 17:11:37 +02:00
Georgi Gerganov 57fb46f307 main : add option for word-leve timestamps (very experimental) 2022-10-30 17:06:57 +02:00
Georgi Gerganov 5a9e4260a6
stream : add "--capture" option to select capture device (ref #10) 2022-10-30 08:27:04 +02:00
Georgi Gerganov 12fb303d9d
whisper.wasm : update system info print 2022-10-29 20:32:41 +03:00
Georgi Gerganov 2827cbbbe8 main : merge parallel example in main 2022-10-29 19:37:19 +03:00
Georgi Gerganov 0b2dc3c82c parallel : working 2022-10-29 19:37:19 +03:00
Georgi Gerganov 85d6e1e1e7 main : fix sampling time + add max_context parameter 2022-10-29 19:37:19 +03:00
Georgi Gerganov 72e9cdd6bf parallel : adding tool for parallel transformer inference 2022-10-29 19:37:19 +03:00
Georgi Gerganov b89f8960ca
Update README.md 2022-10-28 21:40:52 +03:00
Georgi Gerganov 6f82320b05 Create README.md 2022-10-28 20:25:37 +03:00
Georgi Gerganov 2298310dd8 whisper.nvim : add helper script for the Neovim integration 2022-10-28 20:25:37 +03:00
Georgi Gerganov 8347a7bb6a
stream : few updates to make it compatible for Vim usage (#99) 2022-10-27 22:10:50 +03:00
Georgi Gerganov ebb01b9e33
Print system info at start of program 2022-10-27 17:22:19 +03:00
Georgi Gerganov 2400660f3f Print system info in main 2022-10-26 22:54:09 +03:00
Georgi Gerganov a6c786d5dc Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 91dcf5f35b Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 113a4f06d8 Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 47e78b7288 Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 34bb3ab0cf ggml : add system info functions 2022-10-25 20:53:48 +03:00
Georgi Gerganov c6710efde2 refactoring : move main + stream in examples + other stuff 2022-10-25 20:53:48 +03:00
Georgi Gerganov d4f94ce427 Update README.md 2022-10-24 18:23:07 +03:00
Georgi Gerganov a52ee08c1e objc : polishing the sample application 2022-10-24 18:23:07 +03:00
Georgi Gerganov b41f4a90eb Create README.md 2022-10-24 18:23:07 +03:00
Georgi Gerganov bb1ee266d2 ios : whisper.objc example 2022-10-24 18:23:07 +03:00
Georgi Gerganov 3e69a6071d
Update README.md 2022-10-23 08:04:33 +03:00
Georgi Gerganov f4aa01c2f8
Update README.md 2022-10-22 19:30:35 +03:00
Georgi Gerganov 6b45e37b2b Update README.md and finalize the whisper.wasm example 2022-10-22 18:54:01 +03:00
Georgi Gerganov 491ecd7056 wip : polishing WASM example 2022-10-22 18:54:01 +03:00
Georgi Gerganov e905c6f827 wip : initial WASM port
Works but it is very slow because no SIMD is used.
For example, jfk.wav is processed in ~23 seconds using "tiny.en" model
2022-10-22 18:54:01 +03:00