Commit graph

148 commits

Author SHA1 Message Date
Georgi Gerganov 5a5c5ddcca
Update README.md 2022-12-15 20:38:08 +02:00
Georgi Gerganov 34e0b4b9ef
stream : fix build 2022-12-15 20:15:36 +02:00
Georgi Gerganov b0f8013eb9
stream : add sliding window mode 2022-12-15 19:59:17 +02:00
Georgi Gerganov a613f16aec
talk : improve prompting 2022-12-12 23:44:36 +02:00
Georgi Gerganov f309f97df6
Node.js package (#260)
* npm : preparing infra for node package

* npm : package infra ready

* npm : initial version ready

* npm : change name to whisper.cpp

whisper.js is taken
2022-12-12 20:17:27 +02:00
Georgi Gerganov aa6adda26e
talk : make compatible with c++11 (part 2) 2022-12-11 20:34:04 +02:00
Georgi Gerganov 444349f4ec
talk : make compatible with c++11 2022-12-11 20:19:17 +02:00
Lexevolution 6ed786957e
Add newline per segment for text output (#254) 2022-12-11 20:00:29 +02:00
Georgi Gerganov fcf515de60
bench.wasm : same as "bench" but runs in the browser (#89) 2022-12-11 11:09:10 +02:00
Georgi Gerganov 85c9ac18b5
Update README.md 2022-12-10 16:54:57 +02:00
Georgi Gerganov b7c85d1ea6 talk : fix build for MSVC 2022-12-10 16:51:58 +02:00
Georgi Gerganov 3b1aacbe6d talk : talk with AI in the terminal 2022-12-10 16:51:58 +02:00
Georgi Gerganov 56822621a8 twitch.sh : various fixes and polishing
- check if streamlink is installed
- fix audio chunking
- change default threads to 4
2022-12-08 19:20:04 +02:00
keyehzy 9e5f3ddc16 Allow for Twitch.tv live transcription
We rely on streamlink library to give us a stream, then we proceed similarly to
the radio livestream example.
2022-12-08 19:20:04 +02:00
Georgi Gerganov 47afb93c3c
yt-wsp.sh : improve usage instructions 2022-12-07 22:12:08 +02:00
Georgi Gerganov 575c53dc41
yt-wsp.sh : fix usage instruction + comment 2022-12-07 21:12:55 +02:00
Georgi Gerganov faa85f9840 livestream.sh : remove obsolete comment 2022-12-07 04:41:43 +02:00
Georgi Gerganov 9fe7306f4b
models : add the new "large" model release by OpenAI
The old "large" model is now renamed "large-v1".
If you have been using it, make sure to rename it and download the new
"large" model for best results.
2022-12-06 18:48:57 +02:00
Georgi Gerganov 57e0e6b700
livestream : handle ffmpeg errors gracefully and stabilize transcript 2022-12-01 20:49:09 +02:00
Georgi Gerganov 4f7363077f
livestream : minor changes 2022-12-01 19:47:58 +02:00
semiformal-net 093c840dee
livestream : fix losing words across audio chunk (#195)
* improve livestream script

* Update examples/livestream.sh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Co-authored-by: Paul Edwards <paul.edwards@semiformal.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-01 19:18:22 +02:00
Georgi Gerganov 4698dcdb52 whisper : add mechanism for aborting the whisper_full() computation 2022-11-27 20:42:45 +02:00
Georgi Gerganov 164df0d447
whisper.objc : fix context + broken readme links 2022-11-27 10:52:27 +02:00
Georgi Gerganov e266cb0723
whisper.objc : add real-time processing (#97)
Similar to the "stream" app
2022-11-26 18:32:46 +02:00
Georgi Gerganov c207eed431
whisper.objc : fix build warnings 2022-11-26 16:27:04 +02:00
Georgi Gerganov a425365b82
yt-wsp.sh : script to easily transcribe VODs
Thanks to @DaniruKun
ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818

Usage:

  cd whisper.cpp
  make

  ./examples/yt-wsp.sh <video-url>
2022-11-26 12:54:42 +02:00
Georgi Gerganov 68ecadbbc9
command.wasm : add voice assistant example for the Web (#171)
Same as the command-line tool "command", but runs in the browser

Also, added helper script "extra/deploy-wasm.sh" and fixed some timing
constants for the WASM examples.
2022-11-26 11:40:06 +02:00
Georgi Gerganov c536ff4005
minor : add comment for using "generate_karaoke.sh" 2022-11-26 10:22:42 +02:00
Georgi Gerganov cb70b07db5
livestream.sh : simple tool to transcribe audio livestreams (#185) 2022-11-26 10:05:37 +02:00
Georgi Gerganov 3c390ffe38
stream.wasm : add web-based real-time transcription (#112) 2022-11-25 23:57:46 +02:00
Georgi Gerganov be16dfa038
whisper.wasm : do not block page while processing (close #86) 2022-11-25 23:07:42 +02:00
Georgi Gerganov 0f619b52ce
main : add stereo-channel-based diarization (#64)
Not tested - I don't have stereo dialog audio
2022-11-25 22:08:58 +02:00
Georgi Gerganov 1246dd023e
command : add demonstration video 2022-11-25 20:23:58 +02:00
Georgi Gerganov 0be27bbd92
command : fix build + fix README + add bold printing 2022-11-25 19:53:50 +02:00
Georgi Gerganov bc88eb13c6
examples : add "command" tool (#171) 2022-11-25 19:36:57 +02:00
Georgi Gerganov b8ce25dec1
refactoring : more readable code 2022-11-25 19:28:04 +02:00
Georgi Gerganov e4805d9601
wasm : refactor wasm example + reuse fetch mechanism 2022-11-24 23:13:26 +02:00
Georgi Gerganov ff36415a86
talk.wasm : update video link + some minor fixes 2022-11-24 20:15:24 +02:00
Georgi Gerganov 025ff465b6
Update README.md
Use a less cringy video to demo talk.wasm lol
2022-11-24 20:09:45 +02:00
Georgi Gerganov abce28ea99
talk.wasm : move to https://whisper.ggerganov.com/talk
This way, we can share the same models across different WASM examples
and not have to download them for each page
2022-11-24 18:24:06 +02:00
Georgi Gerganov 454b91de16
main : fix dangling pointer when using stdin for input (#65) 2022-11-24 17:53:51 +02:00
Georgi Gerganov d7024cf9dc
main, stream : remove --verbose flag (#178) 2022-11-24 17:52:04 +02:00
Georgi Gerganov 37422ed733
talk.wasm : add audio pre-processing + bump memory 2022-11-24 00:34:00 +02:00
Georgi Gerganov be3b720f96
talk.wasm : refactoring + update README.md 2022-11-24 00:08:57 +02:00
Georgi Gerganov 49706a658a
minor : updates few prints + fix buttons in whisper.wasm 2022-11-23 17:19:21 +02:00
Georgi Gerganov e5dcdabbb8
unicode : fix character replacement (thanks to @tamo) 2022-11-23 08:24:29 +02:00
Georgi Gerganov dad109c3f1
close #109 : add fetching of the model over HTTP (whisper.wasm) 2022-11-22 22:48:56 +02:00
Georgi Gerganov 326573de9a
talk.wasm : final touches 2022-11-22 22:22:17 +02:00
Georgi Gerganov 9aea96f774
talk.wasm : polishing + adding many AI personalities 2022-11-22 20:10:20 +02:00
Georgi Gerganov 385236d1d3
stream : "-kc" now enables context keeping from previous segment (#90)
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
M. Eren Akbiyik 63ae03b8e0
Prompt previous tokens for streaming (#163)
* feat: prompt previous tokens for streaming

I used a vector pointer instead of vector itself because it gave weird errors, and why not

* convert vector to use with C api

* feat: remove old refs, check for prompt size

* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
Georgi Gerganov 78116f8eda
talk.wasm : update README.md 2022-11-21 22:42:29 +02:00
Georgi Gerganov a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly (#155)
* talk : initial real-time transcription in the browser

* talk : polishing the UI

* talk : ready for beta testing

* talk.wasm : rename example
2022-11-21 22:20:42 +02:00
Georgi Gerganov f2df9bd768 stream : add "max_tokens" cli arg
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
Georgi Gerganov fb8d77f760 stream : add "audio_ctx" parameter
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.

The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
Georgi Gerganov 62b5ff875c stream : add "max_tokens" parameter
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
Georgi Gerganov d351771a4b stream : add "single_segment" option
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
Georgi Gerganov c058aaf22e stream : partial encoder experiments 2022-11-20 21:22:41 +02:00
Georgi Gerganov 83c742f1a7 whisper : add option to speed up the audio tempo by x2
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Alan 7519eabf65 Adds support for stdin wav input 2022-11-09 20:37:23 +02:00
Georgi Gerganov c30bffc8a5
ref #22 : add "duration" option
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
Georgi Gerganov c71363f14c
examples : add simple script for generating Karaoke video 2022-11-06 09:22:50 +02:00
Georgi Gerganov d42cf6d0df
Update README.md 2022-11-04 22:26:08 +02:00
Georgi Gerganov ef47d77492
main : fix generated bash script 2022-11-04 18:30:38 +02:00
Georgi Gerganov d5afebd37c
whisper : token-level timestamp refactoring (#49, #120)
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
Georgi Gerganov 6fb98370ba
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00
Georgi Gerganov 0729da9a3b
main : fix some edge cases for word-level timestamps 2022-11-01 22:09:25 +02:00
Georgi Gerganov 5dc74e3aff
Update README.md 2022-10-31 22:06:05 +02:00
Georgi Gerganov ac8ef34039
Update README.md 2022-10-31 20:19:41 +02:00
Georgi Gerganov dc12994603
Update README.md 2022-10-30 17:11:37 +02:00
Georgi Gerganov 57fb46f307 main : add option for word-leve timestamps (very experimental) 2022-10-30 17:06:57 +02:00
Georgi Gerganov 5a9e4260a6
stream : add "--capture" option to select capture device (ref #10) 2022-10-30 08:27:04 +02:00
Georgi Gerganov 12fb303d9d
whisper.wasm : update system info print 2022-10-29 20:32:41 +03:00
Georgi Gerganov 2827cbbbe8 main : merge parallel example in main 2022-10-29 19:37:19 +03:00
Georgi Gerganov 0b2dc3c82c parallel : working 2022-10-29 19:37:19 +03:00
Georgi Gerganov 85d6e1e1e7 main : fix sampling time + add max_context parameter 2022-10-29 19:37:19 +03:00
Georgi Gerganov 72e9cdd6bf parallel : adding tool for parallel transformer inference 2022-10-29 19:37:19 +03:00
Georgi Gerganov b89f8960ca
Update README.md 2022-10-28 21:40:52 +03:00
Georgi Gerganov 6f82320b05 Create README.md 2022-10-28 20:25:37 +03:00
Georgi Gerganov 2298310dd8 whisper.nvim : add helper script for the Neovim integration 2022-10-28 20:25:37 +03:00
Georgi Gerganov 8347a7bb6a
stream : few updates to make it compatible for Vim usage (#99) 2022-10-27 22:10:50 +03:00
Georgi Gerganov ebb01b9e33
Print system info at start of program 2022-10-27 17:22:19 +03:00
Georgi Gerganov 2400660f3f Print system info in main 2022-10-26 22:54:09 +03:00
Georgi Gerganov a6c786d5dc Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 91dcf5f35b Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 113a4f06d8 Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 47e78b7288 Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 34bb3ab0cf ggml : add system info functions 2022-10-25 20:53:48 +03:00
Georgi Gerganov c6710efde2 refactoring : move main + stream in examples + other stuff 2022-10-25 20:53:48 +03:00
Georgi Gerganov d4f94ce427 Update README.md 2022-10-24 18:23:07 +03:00
Georgi Gerganov a52ee08c1e objc : polishing the sample application 2022-10-24 18:23:07 +03:00
Georgi Gerganov b41f4a90eb Create README.md 2022-10-24 18:23:07 +03:00
Georgi Gerganov bb1ee266d2 ios : whisper.objc example 2022-10-24 18:23:07 +03:00
Georgi Gerganov 3e69a6071d
Update README.md 2022-10-23 08:04:33 +03:00
Georgi Gerganov f4aa01c2f8
Update README.md 2022-10-22 19:30:35 +03:00
Georgi Gerganov 6b45e37b2b Update README.md and finalize the whisper.wasm example 2022-10-22 18:54:01 +03:00
Georgi Gerganov 491ecd7056 wip : polishing WASM example 2022-10-22 18:54:01 +03:00
Georgi Gerganov e905c6f827 wip : initial WASM port
Works but it is very slow because no SIMD is used.
For example, jfk.wav is processed in ~23 seconds using "tiny.en" model
2022-10-22 18:54:01 +03:00