Commit Graph

304 Commits (444349f4ec6fdc9d0d2bb3128add10da3c701a8a)

Author SHA1 Message Date
Georgi Gerganov 444349f4ec
talk : make compatible with c++11 2022-12-11 20:19:17 +02:00
Georgi Gerganov 37a93d2459
cmake : require c++11 instead of c++20 2022-12-11 20:04:05 +02:00
Roland Rabien e70d47baab
Remove C++20 requirement (#257)
* Remove C++20 requirement

* Roll back C features not supported in VS2017
2022-12-11 20:03:07 +02:00
Lexevolution 6ed786957e
Add newline per segment for text output (#254) 2022-12-11 20:00:29 +02:00
Georgi Gerganov ea38ad6e70
bench : more concise representation of the results (#89) 2022-12-11 11:56:13 +02:00
Georgi Gerganov 054940e1f6
minor : fix .gitignore to not ignore examples 2022-12-11 11:39:46 +02:00
Georgi Gerganov fcf515de60
bench.wasm : same as "bench" but runs in the browser (#89) 2022-12-11 11:09:10 +02:00
Georgi Gerganov 85c9ac18b5
Update README.md 2022-12-10 16:54:57 +02:00
Georgi Gerganov b7c85d1ea6 talk : fix build for MSVC 2022-12-10 16:51:58 +02:00
Georgi Gerganov 3b1aacbe6d talk : talk with AI in the terminal 2022-12-10 16:51:58 +02:00
bert hubert d1da35de06 fix potential bug reading model data into a small size optimized string which could lead to memory corruption. In an SSO string, you can't write data to &str[0] and expect it to work well.
Also added a small wrapper function to more safely read model data without having to get the sizeof right. I tested this on tiny, base and large models, there was no change in behaviour.
2022-12-10 16:20:48 +02:00
Georgi Gerganov 603f97ba11
whisper : minor improvemnt in decoding strategy (#244)
Do not allow for text segments to go beyond end of audio.
This partially mitigates some issues when the last audio window is 1-2
seconds just before the end of the audio file and the decoding spirals
into a repetition of the last transcribed phrase.
2022-12-10 13:38:26 +02:00
Georgi Gerganov 50a061b313
ggml : add alternative cblas_sgemm call 2022-12-08 23:48:04 +02:00
Georgi Gerganov 832b4f34c9
make : indentation + .gitignore 2022-12-08 19:42:06 +02:00
Reinis Muiznieks 0f98755fc5 Flag for Position Independent Code 2022-12-08 19:41:01 +02:00
Georgi Gerganov 56822621a8 twitch.sh : various fixes and polishing
- check if streamlink is installed
- fix audio chunking
- change default threads to 4
2022-12-08 19:20:04 +02:00
keyehzy 9e5f3ddc16 Allow for Twitch.tv live transcription
We rely on streamlink library to give us a stream, then we proceed similarly to
the radio livestream example.
2022-12-08 19:20:04 +02:00
Kartik Saranathan d91c001120 Fix paths echoed after the download
Was using models path instead of root path
2022-12-08 09:23:52 +02:00
Al Hoang 04a16bbf11 fix compilation on haiku 2022-12-08 09:20:57 +02:00
Georgi Gerganov 47afb93c3c
yt-wsp.sh : improve usage instructions 2022-12-07 22:12:08 +02:00
Georgi Gerganov 575c53dc41
yt-wsp.sh : fix usage instruction + comment 2022-12-07 21:12:55 +02:00
Georgi Gerganov 3996ecc156
Update README.md 2022-12-07 05:15:46 +02:00
Georgi Gerganov faa85f9840 livestream.sh : remove obsolete comment 2022-12-07 04:41:43 +02:00
Georgi Gerganov b6597539f9
ggml : fix typo in previous commit 2022-12-06 22:12:57 +02:00
Georgi Gerganov 9a4b7a916e
ggml : use macros to inline FP16 <-> FP32 conversions 2022-12-06 22:09:26 +02:00
Georgi Gerganov f8ec718b76
ggml : add F16C CPU flag check 2022-12-06 21:56:56 +02:00
katsu560 35b40a93b9 add fp16/fp32 convert intrinsics 2022-12-06 21:44:24 +02:00
Georgi Gerganov 9fe7306f4b
models : add the new "large" model release by OpenAI
The old "large" model is now renamed "large-v1".
If you have been using it, make sure to rename it and download the new
"large" model for best results.
2022-12-06 18:48:57 +02:00
Georgi Gerganov 13e8eb2346
bench : add commit hash to bench-all.sh results 2022-12-06 18:47:48 +02:00
Georgi Gerganov 78d13257be
Try to improve the token sampling strategy (#193)
* whisper : try to improve the token sampling strategy

- Add the "max_initial_timestaamp" token logic from OpenAI
- Disallow sampling timestamps that are in the past

* whisper : fix the max initial timestamp logic + fallback decoding
2022-12-02 21:51:50 +02:00
Georgi Gerganov 9b7df68753
tests : adding transcription tests 2022-12-02 21:40:02 +02:00
Georgi Gerganov 061fc81bd6
ggml : remove inline specifier from fp16 <-> fp32 converters 2022-12-01 22:15:12 +02:00
Georgi Gerganov 57e0e6b700
livestream : handle ffmpeg errors gracefully and stabilize transcript 2022-12-01 20:49:09 +02:00
Georgi Gerganov 4f7363077f
livestream : minor changes 2022-12-01 19:47:58 +02:00
semiformal-net 093c840dee
livestream : fix losing words across audio chunk (#195)
* improve livestream script

* Update examples/livestream.sh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Co-authored-by: Paul Edwards <paul.edwards@semiformal.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-01 19:18:22 +02:00
Tienshiao Ma e7f09a0a61 Fix Darwin flags - was incorrectly always using the Linux else clause 2022-12-01 19:17:04 +02:00
Georgi Gerganov 4698dcdb52 whisper : add mechanism for aborting the whisper_full() computation 2022-11-27 20:42:45 +02:00
Georgi Gerganov 6fd5358dd0
Update README.md 2022-11-27 11:30:32 +02:00
Georgi Gerganov 164df0d447
whisper.objc : fix context + broken readme links 2022-11-27 10:52:27 +02:00
Georgi Gerganov e266cb0723
whisper.objc : add real-time processing (#97)
Similar to the "stream" app
2022-11-26 18:32:46 +02:00
Georgi Gerganov c207eed431
whisper.objc : fix build warnings 2022-11-26 16:27:04 +02:00
Georgi Gerganov 67e819baf4
minor : remove "examples/" prefix from the README 2022-11-26 13:07:54 +02:00
Georgi Gerganov a425365b82
yt-wsp.sh : script to easily transcribe VODs
Thanks to @DaniruKun
ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818

Usage:

  cd whisper.cpp
  make

  ./examples/yt-wsp.sh <video-url>
2022-11-26 12:54:42 +02:00
Georgi Gerganov e0e864d9ca
Update README.md 2022-11-26 11:56:55 +02:00
Georgi Gerganov 68ecadbbc9
command.wasm : add voice assistant example for the Web (#171)
Same as the command-line tool "command", but runs in the browser

Also, added helper script "extra/deploy-wasm.sh" and fixed some timing
constants for the WASM examples.
2022-11-26 11:40:06 +02:00
Georgi Gerganov c536ff4005
minor : add comment for using "generate_karaoke.sh" 2022-11-26 10:22:42 +02:00
Georgi Gerganov cb70b07db5
livestream.sh : simple tool to transcribe audio livestreams (#185) 2022-11-26 10:05:37 +02:00
Georgi Gerganov 3c390ffe38
stream.wasm : add web-based real-time transcription (#112) 2022-11-25 23:57:46 +02:00
Georgi Gerganov be16dfa038
whisper.wasm : do not block page while processing (close #86) 2022-11-25 23:07:42 +02:00
Georgi Gerganov 0f619b52ce
main : add stereo-channel-based diarization (#64)
Not tested - I don't have stereo dialog audio
2022-11-25 22:08:58 +02:00