Commit Graph

340 Commits (9a0b59d990be319952a4a02b9164b3b2327cd454)

Author SHA1 Message Date
Georgi Gerganov 794b162a46
whisper : add integer quantization support (#540)
* whisper : add integer quantization support

* examples : add common-ggml + prepare to add "quantize" tool

* whisper : quantization tool ready

* whisper : fix F32 support

* whisper : try to fix shared lib linkage

* wasm : update quantized models to Q5

* bench.wasm : remove "medium" button

* bench.wasm : fix custom model button

* ggml : add Q5_0 and Q5_1 WASM SIMD

* wasm : add quantized models to all WASM examples

* wasm : bump DB version number to 2

* talk-llama : update example to latest llama.cpp

* node : increase test timeout to 10s

* readme : add information for model quantization

* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov 5fd1bdd7fc
whisper : add GPU support via cuBLAS (#834)
* make : add WHISPER_CUBLAS

* make : fix CUBLAS build

* whisper : disable Flash Attention + adjust memory buffers

* whisper : remove old commented code

* readme : add cuBLAS instructions

* cmake : add WHISPER_CUBLAS option

* gitignore : ignore build-cublas
2023-04-30 12:14:33 +03:00
Zollner 5cc17418c7
whisper.android : add some tips (#816) 2023-04-29 11:00:20 +03:00
Laytan Laats 70567eff23
main : escape quotes in csv output (#815) 2023-04-23 19:01:59 +03:00
Taras Glek 02ec83c5d5
stream : flush upon finishing inference (#811) 2023-04-23 17:00:30 +03:00
Philipp Zabel 2bd4b8d577
examples : add missing #include <cstdint> (#798)
common.cpp uses uint8_t and uint64_t, which are defined in <cstdint>.
2023-04-23 16:52:52 +03:00
Tauseef Mohiuddin eecf2c3d41
main : update escape_double_quotes() function (#776)
Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string.

Changes Made:

- Renamed the function to escape_quotes_and_backslashes

- Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes.

- Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash.

Resolves: #769
2023-04-23 16:47:30 +03:00
Georgi Gerganov f19e23fbd1
whisper : restore decoder temperature fallbacks
I disabled this because there were many complaints about slow decoding.
The current implementation does not allow batching the decoders when
using the "best of" or "beam size" parameters, so the decoding time is
proportional to the number of decoders, which is obviously not great.

However, now there are even more complaints about wrong decodings and
repetition.

So, making a compromise by re-enabling the fallbacks, but defaulting to
just 2 "best of" / "beam size" decoders. Also, the temperature step is
increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum
of 2.

Also, the stream example now has fallbacks enabled by default.

close #471 #477 #508 #612 #719 #731
2023-04-15 16:12:55 +03:00
Bader-eddine Ouaich 2c856fb9e5
whisper : fix potential memory leaks (#740)
* fix potential memory leak if whisper_init_state failed

* fix potential memory leak if gpt2_init failed
2023-04-14 20:05:56 +03:00
Ali Alameh 2c4ac2627d
stream : support language auto-detect (#501)
#445  fix Language auto-detect "auto" flag does not work using the stream tool
2023-04-14 20:02:18 +03:00
DGdev91 001083a769
talk, talk-llama : add basic example script for eleven-labs tts (#728) 2023-04-14 19:53:58 +03:00
Maciek 78548dc03f
talk-llama : correct default speak.sh path (#720)
There is `speak.sh` file in `./examples/talk-llama` as described in README.
However `./examples/talk/speak.sh` is used in `talk-llama.cpp`, this commit corrects that.
2023-04-14 19:36:09 +03:00
LittleLoli 66110dafcc
main : add lrc output support (#718)
* add lrc output support.

* fix wrong comment
2023-04-14 19:35:33 +03:00
Georgi Gerganov 514cd04452 whisper : fix bug in prompt processing (close #705)
Was dereferencing a dangling pointer
2023-04-14 19:17:07 +03:00
Georgi Gerganov 114df388fe
talk-llama : increase context to 2048 2023-04-10 23:09:15 +03:00
Georgi Gerganov ea36831459
talk-llama : update to latest llama.cpp (improved performance) 2023-04-10 22:59:13 +03:00
InconsolableCellist 5e6e2187a3
talk-llama : fixing usage message for talk-llama (#687)
"-ml" instead of "-mg" for specifying the llama file
2023-03-30 00:10:20 +03:00
Georgi Gerganov a7f1f33715
main : add <cstring> header 2023-03-29 23:59:45 +03:00
Lucas Zanek 86ecfc6333
whisper.addon : fixed test to new async implementation (#686)
* fixed blocking code on node addon

* modify the example to run async

* format

* added logic to see the whisper output

* added logic to see the whisper output

* removed extra function for more clean example

* fixed whisper test to new async implementation
2023-03-29 23:59:17 +03:00
Egor Egorov 0f759f125d
main : fix typo in JSON output (#648)
* typo in JSON output

* fix double quotes in JSON output
2023-03-29 23:26:39 +03:00
Jhen-Jie Hong eefed45e37
whisper : add initial_prompt param (#645) 2023-03-29 23:23:23 +03:00
Jonno 21c1e6afc5
whisper.swiftui : update README.md (#682)
- Slight tweaks to README for improved comprehension.
2023-03-29 23:04:38 +03:00
Evan Jones a47e812a54
talk-llama : add alpaca support (#668) 2023-03-29 23:01:14 +03:00
Georgi Gerganov e5c197d8aa
talk-llama : add discussion link 2023-03-28 10:11:34 +03:00
Georgi Gerganov 7cd1d3bc34
talk-llama : try to fix windows build .. 2023-03-27 22:40:59 +03:00
Georgi Gerganov 4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp (#664)
* talk-llama : talk with LLaMA AI

* talk.llama : disable EOS token

* talk-llama : add README instructions

* ggml : fix build in debug
2023-03-27 21:00:32 +03:00
Lucas Zanek 21165580a1
Nodejs Addon blocking main thread. Implemented Napi::AsyncWorker (#642)
* fixed blocking code on node addon

* modify the example to run async

* format

* added logic to see the whisper output

* added logic to see the whisper output

* removed extra function for more clean example
2023-03-22 22:19:22 +02:00
Jhen-Jie Hong 1d749919e3
whisper.objc : add `-O3 -DNDEBUG` in release mode (#640) 2023-03-22 22:16:04 +02:00
Leo Moll 8fcd1a3b32
main : provide option for creating JSON output (#615)
* examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614)

* main : remove leftovers

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-22 21:37:36 +02:00
Georgi Gerganov 1beff6f66d
models : change HF hosting from dataset to model 2023-03-22 20:44:56 +02:00
Takeshi Inoue 09e9068007
whisper.android : support benchmark for Android example. (#542)
* whisper.android: Support benchmark for Android example.

* whisper.android: update screenshot in README.

* update: Make text selectable for copy & paste.

* Update whisper.h to restore API name

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* whisper.android: Restore original API names.

---------

Co-authored-by: tinoue <tinoue@xevo.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-07 21:36:30 +02:00
venkr b597c5a779
qual-bench.sh : add quality comparison tool, and update main.cpp to allow using a font file (#569) 2023-03-06 19:18:11 +02:00
Takeshi Inoue a3fb6c507f
whisper.android : enable fp16 instrinsics (FP16_VA) which is supported by ARMv8.2 or later. (#572) 2023-03-06 19:15:57 +02:00
sandrohanea 59fdcd19c8
whisper : add whisper_state + default state on the whisper_context (#523)
* Added whisper state + default state on the whisper_context

* Fixed some examples and bindings

* Fixed whisper_n_len (which was used in some binding) and added whisper_n_len_from_state

* Fixed comments

* whisper : reuse kv_cache_free() and fix compiler warnings

* whisper : clean-up the API comments

---------

Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-05 21:42:19 +02:00
Georgi Gerganov 478289a4b3
whisper : set no_context == true by default (#537) 2023-03-05 20:53:43 +02:00
HY. Kelvin Lee 72af0f5697
main : add csv header (#552) 2023-03-02 18:32:16 +02:00
Georgi Gerganov f254e78737
yt-wsp.sh : print help on empty args 2023-02-18 09:42:31 +02:00
conradg 69e6e4644a
main : fix std in input (#503)
if we don't add this as an explicit check, then we get an "error: unknown argument: -" later on
2023-02-15 19:31:16 +02:00
Georgi Gerganov 09d7d2b68e
examples : refactor in order to reuse code and reduce duplication (#482)
* examples : refactor common code into a library

* examples : refactor common SDL code into a library

* make : update Makefile to use common libs

* common : fix MSVC M_PI ..

* addon.node : link common lib
2023-02-15 19:28:10 +02:00
genevera (she/her) 459753342d
yt-wsp.sh : add unique filename generation (#495)
Co-authored-by: genevera <genevera@noreply.users.github.com>
2023-02-14 20:12:51 +02:00
Qianhe Chen ab1916fc59
ci : add node addon test and optimize compilation configuration (#468)
* addon: implement node addon call whisper through cpp

* addon: modify the license to MIT

* addon: remove iostream

* addon: rename dir

* addon: fix typo

* addon: configure cmake to build when cmake-js is used

* ci: add addon.node test ci

* addon: remove build WHISPER_BUILD_TESTS

* addon: update build command

* addon: add test

* addon: add test file

* addon: adapt to compile on Windows

* addon: fix typo

* addon: reuse jfk.wav

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* addon: reuse jfk.wav

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-02-05 15:02:08 +02:00
Matija Pevec d012b5c7e4
whisper : add "split_on_word" flag when using using "max_len" option (#455)
* Update whisper.cpp

* fix: trim function

* feat: added flag to split on word

* fix: arguments for main
2023-02-05 14:44:23 +02:00
Georgi Gerganov f3ee4a9673
whisper : reduce memory usage during inference (#431)
* ggml : add "scratch" buffer support

* ggml : support for scratch ring-buffer

* ggml : bug fix in ggml_repeat()

* ggml : error on scratch buffer overflow

* whisper : use scratch buffers during inference (base model only)

* whisper : update memory usage for all models

* whisper : fix encoder memory usage

* whisper : use whisper_context functions instead of macros

* whisper : fix FF + remove it from README

* ggml : reuse ggml_new_i32

* ggml : refactor the scratch buffer storage

* whisper : reorder scratch buffers in the decoder

* main : add option to disable temp fallback

* Update README.md
2023-02-04 09:45:52 +02:00
Qianhe Chen c306a7fd89
addon.node : using whisper as a Node.js addon (#443)
* addon: implement node addon call whisper through cpp

* addon: modify the license to MIT

* addon: remove iostream

* addon: rename dir

* addon: fix typo

* addon: configure cmake to build when cmake-js is used
2023-02-04 09:10:25 +02:00
Taisei Mima 86ef64a855
wasm : fix typo in helper.js (#459) 2023-02-04 08:49:15 +02:00
Alex Bacart 3b1960520a
main : CSV format export trimmed spaces fix (#444)
* Update main.cpp

Removed string trimming

* Update main.cpp

* Update main.cpp

* Revert "Update main.cpp"

This reverts commit d8924fdcfe.

* Revert "Update main.cpp"

This reverts commit 252e508d85.
2023-02-04 08:48:35 +02:00
Eric Tendian 47737b2e82
livestream.sh : run main with model arg instead of default (#453)
Actually utilizes the $model var when calling ./main.
2023-01-27 01:13:31 +02:00
Georgi Gerganov 60337f5306
wasm : check if navigator.storage.estimate() is available
Safari does not support it
2023-01-25 20:00:59 +02:00
Ondrej Kokes 11f61cecd6
whisper.wasm : add labels for easier radio selection (#435) 2023-01-23 20:49:00 +02:00
Georgi Gerganov f583e2d2f5
main : we had accidentally disabled the temperature fallback .. (#291) 2023-01-18 22:51:41 +02:00
Georgi Gerganov 206fc93396
whisper.wasm : add small and small.en models 2023-01-18 21:58:55 +02:00
Chia-Hsiang Cheng 472a473fd1
main : add an option to accept optional output filenames (#424)
* Add an option to accept optional output filenames

* Format the file

Co-authored-by: Chia-Hsiang Cheng <gary.chiahsiang.cheng@gmail.com>
2023-01-18 21:26:31 +02:00
Georgi Gerganov 9ba66c2fad
stream : fix handling of --step == --length (#416) 2023-01-18 21:22:52 +02:00
Georgi Gerganov 1ccb8a46a5
bench : fix Windows linkage by moving ggml benches in whisper lib .. 2023-01-18 21:19:50 +02:00
Georgi Gerganov 1290fc6457
bench : add memcpy and ggml_mul_mat benchmarks 2023-01-18 20:31:46 +02:00
Digipom 49b529ba74
whisper.android : add support for loading directly from asset in C (#415) 2023-01-16 21:57:35 +02:00
Georgi Gerganov c9aeb33676
stream : fix --keep_context argument to be used correctly (#354) 2023-01-16 19:37:40 +02:00
Georgi Gerganov c3991bbb24
Update README.md 2023-01-15 14:08:12 +02:00
Georgi Gerganov fafd78945d
bench.wasm : print system info 2023-01-15 11:34:03 +02:00
Georgi Gerganov 8de452c18b
Improve decoding (#291)
* whisper : prepare infra for new decoding strategies

* whisper : apply logit filters and compute logprobs

* whisper : add whisper_get_logits()

* whisper : separate self and cross attention memory

Initial step needed for supporting parallel decoders

* whisper : move probs_id buffer to whisper_context

* whisper : refactor kv cache into separate struct

* whisper : move self-attention kv cache to whisper_decoder

* whisper : wip decoding parameters + strategies

* whisper : wip decoding parameters + strategies (part 2)

* whisper : wip decoding parameters + strategies (part 3)

* whisper : wip decoding parameters + strategies (part 4)

* whisper : fix prompt_past update to not include prompt_init

* whisper : temperature + best_of support

* whisper : support for compression_ration_threshold

We actually use entropy, but it is similar

* command : fix example to use logits instead of obsolete probs

* whisper : handle empty sequence ranking

* whisper : add WHISPER_DEBUG + diagnostic prints + new main args

* whisper : minor fixes

* whisper : add beam-search support

* whisper : bug fix when there no previous context

* whisper : add comments

* stream : disable temperature fallback

For real-time processing, we always want a single decoder running at T=0

* whisper.swiftui : update example - fix paths + add empty folders
2023-01-15 11:29:57 +02:00
Georgi Gerganov a6dbd9188b
stream : fix a bug that inserted a lot of empty audio at the start
The quality was terrible due to this
2023-01-14 19:20:47 +02:00
Syahmi Azhar 1512545149
whisper : add loader class to allow loading from buffer and others (#353)
* whisper : add loader to allow loading from other than file

* whisper : rename whisper_init to whisper_init_from_file

* whisper : add whisper_init_from_buffer

* android : Delete local.properties

* android : load models directly from assets

* whisper : adding <stddef.h> needed for size_t + code style

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-08 13:03:33 +02:00
Georgi Gerganov 52a3e0c92a
ggml : improve vec_dot_f16 unrolling in flash_attn_f16 2023-01-08 11:41:18 +02:00
Georgi Gerganov d1ea1220ff
command : clean-up / refactoring / formatting (#383) 2023-01-07 21:43:24 +02:00
David 9c4a1522f6
command : always-prompt mode (#383) 2023-01-07 21:41:11 +02:00
Georgi Gerganov 87dd4a3081
talk.wasm : bump memory usage + update whisper.js 2023-01-06 21:13:44 +02:00
Georgi Gerganov 6b351bb669
command : add "guided-mode" video demo in the README.md 2023-01-06 18:59:26 +02:00
Georgi Gerganov b3c865083e
ci : add emscripten build 2023-01-05 22:10:20 +02:00
Georgi Gerganov a0d4f8e65c
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00
Georgi Gerganov 196d738974
minor : close #370 + Makefile build info print change 2023-01-05 21:35:45 +02:00
Andy Maloney 84c6b42e65
cmake : update to 3.19 (#351)
- update from 3.0 (from 2014) to 3.19 (from 2020)
- move some global setting onto the targets (through a cmake include)
2023-01-05 21:22:48 +02:00
Georgi Gerganov a466c3404d
stream : fix data race on bool + avoid division-by-zero 2023-01-02 10:20:50 +02:00
Andy Maloney f00509d57c
command : refactor to split command list & general transcription modes (#331)
This makes it easier to understand if you're looking for only one of the capabilities.
2022-12-31 14:08:57 +02:00
Niels Mayer a593b932e4
main : add -ocsv, aka --output-csv to output a CSV file
Adds -ocsv, aka --output-csv feature to examples/main, which outputs a CSV file containing lines formatted as follows <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, "<transcript-line-including-commas>".
2022-12-29 14:04:00 +02:00
Andy Maloney 331c0bbddc
examples : fix memory leak on failure to load gpt2 model (#323) 2022-12-23 20:19:07 +02:00
Andy Maloney dc90efd504
examples : small code cleanups (#322)
- remove unnecessary initialization of string to ""
- use empty() instead of checking size()
- use emplace_back instead of push_back
- use nullptr instead of NULL
- remove unnecessary call to .data() on string
- use character overload of find_first_of() instead of passing a string
2022-12-23 20:18:51 +02:00
Digipom 0f4227d9ee
examples : add whisper.swiftui demo app (#308)
* Add SwiftUI demo project.

* Add -DGGML_USE_ACCELERATE
2022-12-23 10:56:18 +02:00
Kevin Brothaler 91fc08c641 Build a vfpv4 library for armeabi-v7a and do runtime detection to select the right library 2022-12-22 16:47:54 +02:00
Kevin Brothaler e1432dd91a Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a.
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
2022-12-22 16:47:54 +02:00
Kevin Brothaler 22193cbfe8 Bump NDK version 2022-12-22 16:47:54 +02:00
Georgi Gerganov 90564f85f9
Update README.md 2022-12-19 22:09:21 +02:00
Georgi Gerganov 99da1e5cc8
cmake : enable and fix -Wall -Wextra -Wpedantic C++ warnings 2022-12-19 20:45:08 +02:00
Matheus de Sousa 8e3f129b4d
minor : resolves some of warnings when compiling with clang/clang++ (#294)
* Resolves some of warnings when compiling with clang/clang++

Mostly nit stuff that clang catches when compiling with -Wall -Wextra
-pedantic.

- Fix comparison between sign/unsigned integers.
- Passes a constant reference (const&) instead of copying each time.

* minor : normalize coding style

* minor : fix warning

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-19 20:19:01 +02:00
Georgi Gerganov fba10a4c68 whisper : language auto-detect (#59) 2022-12-17 18:49:44 +02:00
Georgi Gerganov 32fbc8cd04
main : add option to print the progress (#276) 2022-12-16 20:20:43 +02:00
Georgi Gerganov b8065d90f5
main : add "--prompt" command line argument (#90)
This allows to provide an initial prompt to be used at the start of the
processing.
2022-12-16 19:43:16 +02:00
Georgi Gerganov 4312995974 command : better indentation 2022-12-16 19:38:18 +02:00
Georgi Gerganov 5eeeb3412d command : update README, show how to use guided mode 2022-12-16 19:38:18 +02:00
Georgi Gerganov 6a69e3ae27 command : adding guided mode 2022-12-16 19:38:18 +02:00
Georgi Gerganov ea19ed33f1
Update README.md (#46)
Add references to the new Android app
2022-12-16 19:28:51 +02:00
Digipom 675e787171
Add Android sample (#277)
* Add Android sample

* Use main project C files

* Stop existing playback before starting new playback

* Make text scrollable

* Stop playback when starting to record

* Remove extra var
2022-12-16 19:20:13 +02:00
Georgi Gerganov a82d331034
stream : update README.md + comments 2022-12-16 18:04:19 +02:00
Georgi Gerganov 5a5c5ddcca
Update README.md 2022-12-15 20:38:08 +02:00
Georgi Gerganov 34e0b4b9ef
stream : fix build 2022-12-15 20:15:36 +02:00
Georgi Gerganov b0f8013eb9
stream : add sliding window mode 2022-12-15 19:59:17 +02:00
Georgi Gerganov a613f16aec
talk : improve prompting 2022-12-12 23:44:36 +02:00
Georgi Gerganov f309f97df6
Node.js package (#260)
* npm : preparing infra for node package

* npm : package infra ready

* npm : initial version ready

* npm : change name to whisper.cpp

whisper.js is taken
2022-12-12 20:17:27 +02:00
Georgi Gerganov aa6adda26e
talk : make compatible with c++11 (part 2) 2022-12-11 20:34:04 +02:00
Georgi Gerganov 444349f4ec
talk : make compatible with c++11 2022-12-11 20:19:17 +02:00
Lexevolution 6ed786957e
Add newline per segment for text output (#254) 2022-12-11 20:00:29 +02:00
Georgi Gerganov fcf515de60
bench.wasm : same as "bench" but runs in the browser (#89) 2022-12-11 11:09:10 +02:00
Georgi Gerganov 85c9ac18b5
Update README.md 2022-12-10 16:54:57 +02:00
Georgi Gerganov b7c85d1ea6 talk : fix build for MSVC 2022-12-10 16:51:58 +02:00
Georgi Gerganov 3b1aacbe6d talk : talk with AI in the terminal 2022-12-10 16:51:58 +02:00
Georgi Gerganov 56822621a8 twitch.sh : various fixes and polishing
- check if streamlink is installed
- fix audio chunking
- change default threads to 4
2022-12-08 19:20:04 +02:00
keyehzy 9e5f3ddc16 Allow for Twitch.tv live transcription
We rely on streamlink library to give us a stream, then we proceed similarly to
the radio livestream example.
2022-12-08 19:20:04 +02:00
Georgi Gerganov 47afb93c3c
yt-wsp.sh : improve usage instructions 2022-12-07 22:12:08 +02:00
Georgi Gerganov 575c53dc41
yt-wsp.sh : fix usage instruction + comment 2022-12-07 21:12:55 +02:00
Georgi Gerganov faa85f9840 livestream.sh : remove obsolete comment 2022-12-07 04:41:43 +02:00
Georgi Gerganov 9fe7306f4b
models : add the new "large" model release by OpenAI
The old "large" model is now renamed "large-v1".
If you have been using it, make sure to rename it and download the new
"large" model for best results.
2022-12-06 18:48:57 +02:00
Georgi Gerganov 57e0e6b700
livestream : handle ffmpeg errors gracefully and stabilize transcript 2022-12-01 20:49:09 +02:00
Georgi Gerganov 4f7363077f
livestream : minor changes 2022-12-01 19:47:58 +02:00
semiformal-net 093c840dee
livestream : fix losing words across audio chunk (#195)
* improve livestream script

* Update examples/livestream.sh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Co-authored-by: Paul Edwards <paul.edwards@semiformal.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-01 19:18:22 +02:00
Georgi Gerganov 4698dcdb52 whisper : add mechanism for aborting the whisper_full() computation 2022-11-27 20:42:45 +02:00
Georgi Gerganov 164df0d447
whisper.objc : fix context + broken readme links 2022-11-27 10:52:27 +02:00
Georgi Gerganov e266cb0723
whisper.objc : add real-time processing (#97)
Similar to the "stream" app
2022-11-26 18:32:46 +02:00
Georgi Gerganov c207eed431
whisper.objc : fix build warnings 2022-11-26 16:27:04 +02:00
Georgi Gerganov a425365b82
yt-wsp.sh : script to easily transcribe VODs
Thanks to @DaniruKun
ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818

Usage:

  cd whisper.cpp
  make

  ./examples/yt-wsp.sh <video-url>
2022-11-26 12:54:42 +02:00
Georgi Gerganov 68ecadbbc9
command.wasm : add voice assistant example for the Web (#171)
Same as the command-line tool "command", but runs in the browser

Also, added helper script "extra/deploy-wasm.sh" and fixed some timing
constants for the WASM examples.
2022-11-26 11:40:06 +02:00
Georgi Gerganov c536ff4005
minor : add comment for using "generate_karaoke.sh" 2022-11-26 10:22:42 +02:00
Georgi Gerganov cb70b07db5
livestream.sh : simple tool to transcribe audio livestreams (#185) 2022-11-26 10:05:37 +02:00
Georgi Gerganov 3c390ffe38
stream.wasm : add web-based real-time transcription (#112) 2022-11-25 23:57:46 +02:00
Georgi Gerganov be16dfa038
whisper.wasm : do not block page while processing (close #86) 2022-11-25 23:07:42 +02:00
Georgi Gerganov 0f619b52ce
main : add stereo-channel-based diarization (#64)
Not tested - I don't have stereo dialog audio
2022-11-25 22:08:58 +02:00
Georgi Gerganov 1246dd023e
command : add demonstration video 2022-11-25 20:23:58 +02:00
Georgi Gerganov 0be27bbd92
command : fix build + fix README + add bold printing 2022-11-25 19:53:50 +02:00
Georgi Gerganov bc88eb13c6
examples : add "command" tool (#171) 2022-11-25 19:36:57 +02:00
Georgi Gerganov b8ce25dec1
refactoring : more readable code 2022-11-25 19:28:04 +02:00
Georgi Gerganov e4805d9601
wasm : refactor wasm example + reuse fetch mechanism 2022-11-24 23:13:26 +02:00
Georgi Gerganov ff36415a86
talk.wasm : update video link + some minor fixes 2022-11-24 20:15:24 +02:00
Georgi Gerganov 025ff465b6
Update README.md
Use a less cringy video to demo talk.wasm lol
2022-11-24 20:09:45 +02:00
Georgi Gerganov abce28ea99
talk.wasm : move to https://whisper.ggerganov.com/talk
This way, we can share the same models across different WASM examples
and not have to download them for each page
2022-11-24 18:24:06 +02:00
Georgi Gerganov 454b91de16
main : fix dangling pointer when using stdin for input (#65) 2022-11-24 17:53:51 +02:00
Georgi Gerganov d7024cf9dc
main, stream : remove --verbose flag (#178) 2022-11-24 17:52:04 +02:00
Georgi Gerganov 37422ed733
talk.wasm : add audio pre-processing + bump memory 2022-11-24 00:34:00 +02:00
Georgi Gerganov be3b720f96
talk.wasm : refactoring + update README.md 2022-11-24 00:08:57 +02:00
Georgi Gerganov 49706a658a
minor : updates few prints + fix buttons in whisper.wasm 2022-11-23 17:19:21 +02:00
Georgi Gerganov e5dcdabbb8
unicode : fix character replacement (thanks to @tamo) 2022-11-23 08:24:29 +02:00
Georgi Gerganov dad109c3f1
close #109 : add fetching of the model over HTTP (whisper.wasm) 2022-11-22 22:48:56 +02:00
Georgi Gerganov 326573de9a
talk.wasm : final touches 2022-11-22 22:22:17 +02:00
Georgi Gerganov 9aea96f774
talk.wasm : polishing + adding many AI personalities 2022-11-22 20:10:20 +02:00
Georgi Gerganov 385236d1d3
stream : "-kc" now enables context keeping from previous segment (#90)
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
M. Eren Akbiyik 63ae03b8e0
Prompt previous tokens for streaming (#163)
* feat: prompt previous tokens for streaming

I used a vector pointer instead of vector itself because it gave weird errors, and why not

* convert vector to use with C api

* feat: remove old refs, check for prompt size

* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
Georgi Gerganov 78116f8eda
talk.wasm : update README.md 2022-11-21 22:42:29 +02:00
Georgi Gerganov a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly (#155)
* talk : initial real-time transcription in the browser

* talk : polishing the UI

* talk : ready for beta testing

* talk.wasm : rename example
2022-11-21 22:20:42 +02:00
Georgi Gerganov f2df9bd768 stream : add "max_tokens" cli arg
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
Georgi Gerganov fb8d77f760 stream : add "audio_ctx" parameter
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.

The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
Georgi Gerganov 62b5ff875c stream : add "max_tokens" parameter
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
Georgi Gerganov d351771a4b stream : add "single_segment" option
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
Georgi Gerganov c058aaf22e stream : partial encoder experiments 2022-11-20 21:22:41 +02:00
Georgi Gerganov 83c742f1a7 whisper : add option to speed up the audio tempo by x2
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Alan 7519eabf65 Adds support for stdin wav input 2022-11-09 20:37:23 +02:00
Georgi Gerganov c30bffc8a5
ref #22 : add "duration" option
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
Georgi Gerganov c71363f14c
examples : add simple script for generating Karaoke video 2022-11-06 09:22:50 +02:00
Georgi Gerganov d42cf6d0df
Update README.md 2022-11-04 22:26:08 +02:00
Georgi Gerganov ef47d77492
main : fix generated bash script 2022-11-04 18:30:38 +02:00
Georgi Gerganov d5afebd37c
whisper : token-level timestamp refactoring (#49, #120)
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
Georgi Gerganov 6fb98370ba
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00
Georgi Gerganov 0729da9a3b
main : fix some edge cases for word-level timestamps 2022-11-01 22:09:25 +02:00
Georgi Gerganov 5dc74e3aff
Update README.md 2022-10-31 22:06:05 +02:00
Georgi Gerganov ac8ef34039
Update README.md 2022-10-31 20:19:41 +02:00
Georgi Gerganov dc12994603
Update README.md 2022-10-30 17:11:37 +02:00
Georgi Gerganov 57fb46f307 main : add option for word-leve timestamps (very experimental) 2022-10-30 17:06:57 +02:00
Georgi Gerganov 5a9e4260a6
stream : add "--capture" option to select capture device (ref #10) 2022-10-30 08:27:04 +02:00
Georgi Gerganov 12fb303d9d
whisper.wasm : update system info print 2022-10-29 20:32:41 +03:00
Georgi Gerganov 2827cbbbe8 main : merge parallel example in main 2022-10-29 19:37:19 +03:00
Georgi Gerganov 0b2dc3c82c parallel : working 2022-10-29 19:37:19 +03:00
Georgi Gerganov 85d6e1e1e7 main : fix sampling time + add max_context parameter 2022-10-29 19:37:19 +03:00
Georgi Gerganov 72e9cdd6bf parallel : adding tool for parallel transformer inference 2022-10-29 19:37:19 +03:00
Georgi Gerganov b89f8960ca
Update README.md 2022-10-28 21:40:52 +03:00
Georgi Gerganov 6f82320b05 Create README.md 2022-10-28 20:25:37 +03:00
Georgi Gerganov 2298310dd8 whisper.nvim : add helper script for the Neovim integration 2022-10-28 20:25:37 +03:00
Georgi Gerganov 8347a7bb6a
stream : few updates to make it compatible for Vim usage (#99) 2022-10-27 22:10:50 +03:00
Georgi Gerganov ebb01b9e33
Print system info at start of program 2022-10-27 17:22:19 +03:00
Georgi Gerganov 2400660f3f Print system info in main 2022-10-26 22:54:09 +03:00
Georgi Gerganov a6c786d5dc Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 91dcf5f35b Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 113a4f06d8 Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 47e78b7288 Update README.md 2022-10-25 20:53:48 +03:00
Georgi Gerganov 34bb3ab0cf ggml : add system info functions 2022-10-25 20:53:48 +03:00
Georgi Gerganov c6710efde2 refactoring : move main + stream in examples + other stuff 2022-10-25 20:53:48 +03:00
Georgi Gerganov d4f94ce427 Update README.md 2022-10-24 18:23:07 +03:00
Georgi Gerganov a52ee08c1e objc : polishing the sample application 2022-10-24 18:23:07 +03:00
Georgi Gerganov b41f4a90eb Create README.md 2022-10-24 18:23:07 +03:00
Georgi Gerganov bb1ee266d2 ios : whisper.objc example 2022-10-24 18:23:07 +03:00
Georgi Gerganov 3e69a6071d
Update README.md 2022-10-23 08:04:33 +03:00
Georgi Gerganov f4aa01c2f8
Update README.md 2022-10-22 19:30:35 +03:00
Georgi Gerganov 6b45e37b2b Update README.md and finalize the whisper.wasm example 2022-10-22 18:54:01 +03:00
Georgi Gerganov 491ecd7056 wip : polishing WASM example 2022-10-22 18:54:01 +03:00
Georgi Gerganov e905c6f827 wip : initial WASM port
Works but it is very slow because no SIMD is used.
For example, jfk.wav is processed in ~23 seconds using "tiny.en" model
2022-10-22 18:54:01 +03:00