whisper.cpp

Author	SHA1	Message	Date
Georgi Gerganov	5a5c5ddcca	Update README.md	2022-12-15 20:38:08 +02:00
Georgi Gerganov	34e0b4b9ef	stream : fix build	2022-12-15 20:15:36 +02:00
Georgi Gerganov	b0f8013eb9	stream : add sliding window mode	2022-12-15 19:59:17 +02:00
Georgi Gerganov	a613f16aec	talk : improve prompting	2022-12-12 23:44:36 +02:00
Georgi Gerganov	f309f97df6	Node.js package (#260 ) * npm : preparing infra for node package * npm : package infra ready * npm : initial version ready * npm : change name to whisper.cpp whisper.js is taken	2022-12-12 20:17:27 +02:00
Georgi Gerganov	aa6adda26e	talk : make compatible with c++11 (part 2)	2022-12-11 20:34:04 +02:00
Georgi Gerganov	444349f4ec	talk : make compatible with c++11	2022-12-11 20:19:17 +02:00
Lexevolution	6ed786957e	Add newline per segment for text output (#254 )	2022-12-11 20:00:29 +02:00
Georgi Gerganov	fcf515de60	bench.wasm : same as "bench" but runs in the browser (#89 )	2022-12-11 11:09:10 +02:00
Georgi Gerganov	85c9ac18b5	Update README.md	2022-12-10 16:54:57 +02:00
Georgi Gerganov	b7c85d1ea6	talk : fix build for MSVC	2022-12-10 16:51:58 +02:00
Georgi Gerganov	3b1aacbe6d	talk : talk with AI in the terminal	2022-12-10 16:51:58 +02:00
Georgi Gerganov	56822621a8	twitch.sh : various fixes and polishing - check if streamlink is installed - fix audio chunking - change default threads to 4	2022-12-08 19:20:04 +02:00
keyehzy	9e5f3ddc16	Allow for Twitch.tv live transcription We rely on streamlink library to give us a stream, then we proceed similarly to the radio livestream example.	2022-12-08 19:20:04 +02:00
Georgi Gerganov	47afb93c3c	yt-wsp.sh : improve usage instructions	2022-12-07 22:12:08 +02:00
Georgi Gerganov	575c53dc41	yt-wsp.sh : fix usage instruction + comment	2022-12-07 21:12:55 +02:00
Georgi Gerganov	faa85f9840	livestream.sh : remove obsolete comment	2022-12-07 04:41:43 +02:00
Georgi Gerganov	9fe7306f4b	models : add the new "large" model release by OpenAI The old "large" model is now renamed "large-v1". If you have been using it, make sure to rename it and download the new "large" model for best results.	2022-12-06 18:48:57 +02:00
Georgi Gerganov	57e0e6b700	livestream : handle ffmpeg errors gracefully and stabilize transcript	2022-12-01 20:49:09 +02:00
Georgi Gerganov	4f7363077f	livestream : minor changes	2022-12-01 19:47:58 +02:00
semiformal-net	093c840dee	livestream : fix losing words across audio chunk (#195 ) * improve livestream script * Update examples/livestream.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Paul Edwards <paul.edwards@semiformal.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2022-12-01 19:18:22 +02:00
Georgi Gerganov	4698dcdb52	whisper : add mechanism for aborting the whisper_full() computation	2022-11-27 20:42:45 +02:00
Georgi Gerganov	164df0d447	whisper.objc : fix context + broken readme links	2022-11-27 10:52:27 +02:00
Georgi Gerganov	e266cb0723	whisper.objc : add real-time processing (#97 ) Similar to the "stream" app	2022-11-26 18:32:46 +02:00
Georgi Gerganov	c207eed431	whisper.objc : fix build warnings	2022-11-26 16:27:04 +02:00
Georgi Gerganov	a425365b82	yt-wsp.sh : script to easily transcribe VODs Thanks to @DaniruKun ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818 Usage: cd whisper.cpp make ./examples/yt-wsp.sh <video-url>	2022-11-26 12:54:42 +02:00
Georgi Gerganov	68ecadbbc9	command.wasm : add voice assistant example for the Web (#171 ) Same as the command-line tool "command", but runs in the browser Also, added helper script "extra/deploy-wasm.sh" and fixed some timing constants for the WASM examples.	2022-11-26 11:40:06 +02:00
Georgi Gerganov	c536ff4005	minor : add comment for using "generate_karaoke.sh"	2022-11-26 10:22:42 +02:00
Georgi Gerganov	cb70b07db5	livestream.sh : simple tool to transcribe audio livestreams (#185 )	2022-11-26 10:05:37 +02:00
Georgi Gerganov	3c390ffe38	stream.wasm : add web-based real-time transcription (#112 )	2022-11-25 23:57:46 +02:00
Georgi Gerganov	be16dfa038	whisper.wasm : do not block page while processing (close #86 )	2022-11-25 23:07:42 +02:00
Georgi Gerganov	0f619b52ce	main : add stereo-channel-based diarization (#64 ) Not tested - I don't have stereo dialog audio	2022-11-25 22:08:58 +02:00
Georgi Gerganov	1246dd023e	command : add demonstration video	2022-11-25 20:23:58 +02:00
Georgi Gerganov	0be27bbd92	command : fix build + fix README + add bold printing	2022-11-25 19:53:50 +02:00
Georgi Gerganov	bc88eb13c6	examples : add "command" tool (#171 )	2022-11-25 19:36:57 +02:00
Georgi Gerganov	b8ce25dec1	refactoring : more readable code	2022-11-25 19:28:04 +02:00
Georgi Gerganov	e4805d9601	wasm : refactor wasm example + reuse fetch mechanism	2022-11-24 23:13:26 +02:00
Georgi Gerganov	ff36415a86	talk.wasm : update video link + some minor fixes	2022-11-24 20:15:24 +02:00
Georgi Gerganov	025ff465b6	Update README.md Use a less cringy video to demo talk.wasm lol	2022-11-24 20:09:45 +02:00
Georgi Gerganov	abce28ea99	talk.wasm : move to https://whisper.ggerganov.com/talk This way, we can share the same models across different WASM examples and not have to download them for each page	2022-11-24 18:24:06 +02:00
Georgi Gerganov	454b91de16	main : fix dangling pointer when using stdin for input (#65 )	2022-11-24 17:53:51 +02:00
Georgi Gerganov	d7024cf9dc	main, stream : remove --verbose flag (#178 )	2022-11-24 17:52:04 +02:00
Georgi Gerganov	37422ed733	talk.wasm : add audio pre-processing + bump memory	2022-11-24 00:34:00 +02:00
Georgi Gerganov	be3b720f96	talk.wasm : refactoring + update README.md	2022-11-24 00:08:57 +02:00
Georgi Gerganov	49706a658a	minor : updates few prints + fix buttons in whisper.wasm	2022-11-23 17:19:21 +02:00
Georgi Gerganov	e5dcdabbb8	unicode : fix character replacement (thanks to @tamo)	2022-11-23 08:24:29 +02:00
Georgi Gerganov	dad109c3f1	close #109 : add fetching of the model over HTTP (whisper.wasm)	2022-11-22 22:48:56 +02:00
Georgi Gerganov	326573de9a	talk.wasm : final touches	2022-11-22 22:22:17 +02:00
Georgi Gerganov	9aea96f774	talk.wasm : polishing + adding many AI personalities	2022-11-22 20:10:20 +02:00
Georgi Gerganov	385236d1d3	stream : "-kc" now enables context keeping from previous segment (#90 ) By default, the context keeping is disabled	2022-11-22 18:21:15 +02:00
M. Eren Akbiyik	63ae03b8e0	Prompt previous tokens for streaming (#163 ) * feat: prompt previous tokens for streaming I used a vector pointer instead of vector itself because it gave weird errors, and why not * convert vector to use with C api * feat: remove old refs, check for prompt size * feat: use better way of getting the pointer	2022-11-22 18:10:35 +02:00
Georgi Gerganov	78116f8eda	talk.wasm : update README.md	2022-11-21 22:42:29 +02:00
Georgi Gerganov	a4dfbeecf9	talk.wasm : GPT-2 meets Whisper in WebAssembly (#155 ) * talk : initial real-time transcription in the browser * talk : polishing the UI * talk : ready for beta testing * talk.wasm : rename example	2022-11-21 22:20:42 +02:00
Georgi Gerganov	f2df9bd768	stream : add "max_tokens" cli arg Controls the max tokens per segment for the stream example	2022-11-20 21:22:41 +02:00
Georgi Gerganov	fb8d77f760	stream : add "audio_ctx" parameter Used to overwrite the audio context size of the Encoder. For example, setting "audio_ctx = 512" will make it run about 3 times faster, processing about 10s of audio, instead of 30s. The transcription quality drops, but this can be used for real-time streaming purposes where performance is important.	2022-11-20 21:22:41 +02:00
Georgi Gerganov	62b5ff875c	stream : add "max_tokens" parameter Used to limit the number of tokens in a segment. Useful to battle with word repetition when using partial encoder context	2022-11-20 21:22:41 +02:00
Georgi Gerganov	d351771a4b	stream : add "single_segment" option Force the entire audio chunk to be transcribed into a single segment	2022-11-20 21:22:41 +02:00
Georgi Gerganov	c058aaf22e	stream : partial encoder experiments	2022-11-20 21:22:41 +02:00
Georgi Gerganov	83c742f1a7	whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.	2022-11-13 16:25:43 +02:00
Alan	7519eabf65	Adds support for stdin wav input	2022-11-09 20:37:23 +02:00
Georgi Gerganov	c30bffc8a5	ref #22 : add "duration" option Can be used to partially process a recording	2022-11-07 20:14:52 +02:00
Georgi Gerganov	c71363f14c	examples : add simple script for generating Karaoke video	2022-11-06 09:22:50 +02:00
Georgi Gerganov	d42cf6d0df	Update README.md	2022-11-04 22:26:08 +02:00
Georgi Gerganov	ef47d77492	main : fix generated bash script	2022-11-04 18:30:38 +02:00
Georgi Gerganov	d5afebd37c	whisper : token-level timestamp refactoring (#49 , #120 ) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters	2022-11-02 21:45:54 +02:00
Georgi Gerganov	6fb98370ba	main : add some comments for the word-level timestamp algorithm	2022-11-01 22:35:21 +02:00
Georgi Gerganov	0729da9a3b	main : fix some edge cases for word-level timestamps	2022-11-01 22:09:25 +02:00
Georgi Gerganov	5dc74e3aff	Update README.md	2022-10-31 22:06:05 +02:00
Georgi Gerganov	ac8ef34039	Update README.md	2022-10-31 20:19:41 +02:00
Georgi Gerganov	dc12994603	Update README.md	2022-10-30 17:11:37 +02:00
Georgi Gerganov	57fb46f307	main : add option for word-leve timestamps (very experimental)	2022-10-30 17:06:57 +02:00
Georgi Gerganov	5a9e4260a6	stream : add "--capture" option to select capture device (ref #10 )	2022-10-30 08:27:04 +02:00
Georgi Gerganov	12fb303d9d	whisper.wasm : update system info print	2022-10-29 20:32:41 +03:00
Georgi Gerganov	2827cbbbe8	main : merge parallel example in main	2022-10-29 19:37:19 +03:00
Georgi Gerganov	0b2dc3c82c	parallel : working	2022-10-29 19:37:19 +03:00
Georgi Gerganov	85d6e1e1e7	main : fix sampling time + add max_context parameter	2022-10-29 19:37:19 +03:00
Georgi Gerganov	72e9cdd6bf	parallel : adding tool for parallel transformer inference	2022-10-29 19:37:19 +03:00
Georgi Gerganov	b89f8960ca	Update README.md	2022-10-28 21:40:52 +03:00
Georgi Gerganov	6f82320b05	Create README.md	2022-10-28 20:25:37 +03:00
Georgi Gerganov	2298310dd8	whisper.nvim : add helper script for the Neovim integration	2022-10-28 20:25:37 +03:00
Georgi Gerganov	8347a7bb6a	stream : few updates to make it compatible for Vim usage (#99 )	2022-10-27 22:10:50 +03:00
Georgi Gerganov	ebb01b9e33	Print system info at start of program	2022-10-27 17:22:19 +03:00
Georgi Gerganov	2400660f3f	Print system info in main	2022-10-26 22:54:09 +03:00
Georgi Gerganov	a6c786d5dc	Update README.md	2022-10-25 20:53:48 +03:00
Georgi Gerganov	91dcf5f35b	Update README.md	2022-10-25 20:53:48 +03:00
Georgi Gerganov	113a4f06d8	Update README.md	2022-10-25 20:53:48 +03:00
Georgi Gerganov	47e78b7288	Update README.md	2022-10-25 20:53:48 +03:00
Georgi Gerganov	34bb3ab0cf	ggml : add system info functions	2022-10-25 20:53:48 +03:00
Georgi Gerganov	c6710efde2	refactoring : move main + stream in examples + other stuff	2022-10-25 20:53:48 +03:00
Georgi Gerganov	d4f94ce427	Update README.md	2022-10-24 18:23:07 +03:00
Georgi Gerganov	a52ee08c1e	objc : polishing the sample application	2022-10-24 18:23:07 +03:00
Georgi Gerganov	b41f4a90eb	Create README.md	2022-10-24 18:23:07 +03:00
Georgi Gerganov	bb1ee266d2	ios : whisper.objc example	2022-10-24 18:23:07 +03:00
Georgi Gerganov	3e69a6071d	Update README.md	2022-10-23 08:04:33 +03:00
Georgi Gerganov	f4aa01c2f8	Update README.md	2022-10-22 19:30:35 +03:00
Georgi Gerganov	6b45e37b2b	Update README.md and finalize the whisper.wasm example	2022-10-22 18:54:01 +03:00
Georgi Gerganov	491ecd7056	wip : polishing WASM example	2022-10-22 18:54:01 +03:00
Georgi Gerganov	e905c6f827	wip : initial WASM port Works but it is very slow because no SIMD is used. For example, jfk.wav is processed in ~23 seconds using "tiny.en" model	2022-10-22 18:54:01 +03:00

1 2 3

148 commits