Commit Graph

248 Commits (ff36415a86640200328ac35defe03c0f7d874051)

Author SHA1 Message Date
Georgi Gerganov ff36415a86
talk.wasm : update video link + some minor fixes 2022-11-24 20:15:24 +02:00
Georgi Gerganov 025ff465b6
Update README.md
Use a less cringy video to demo talk.wasm lol
2022-11-24 20:09:45 +02:00
Georgi Gerganov 2c0501b38a
Update README.md 2022-11-24 20:06:51 +02:00
Georgi Gerganov abce28ea99
talk.wasm : move to https://whisper.ggerganov.com/talk
This way, we can share the same models across different WASM examples
and not have to download them for each page
2022-11-24 18:24:06 +02:00
Georgi Gerganov a2ecd54455
models : add instructions for using HF fine-tuned models 2022-11-24 17:54:41 +02:00
Georgi Gerganov 128aaadb93
whisper : improve printfs 2022-11-24 17:54:16 +02:00
Georgi Gerganov 454b91de16
main : fix dangling pointer when using stdin for input (#65) 2022-11-24 17:53:51 +02:00
Georgi Gerganov d7024cf9dc
main, stream : remove --verbose flag (#178) 2022-11-24 17:52:04 +02:00
Georgi Gerganov 37422ed733
talk.wasm : add audio pre-processing + bump memory 2022-11-24 00:34:00 +02:00
Georgi Gerganov be3b720f96
talk.wasm : refactoring + update README.md 2022-11-24 00:08:57 +02:00
Georgi Gerganov 00f46dbc1d
models : add usage comments to the HF convert script (#157) 2022-11-23 23:22:40 +02:00
Georgi Gerganov 5698bddbc9
models : fix HF fine-tuned model conversion script (#157)
It works now
2022-11-23 23:14:11 +02:00
Georgi Gerganov 388e9f79ad
ggml : fix the fix 2022-11-23 22:40:06 +02:00
Georgi Gerganov 35cd29ce1f
ggml : fix cross-compile Linux -> Window with mingw (#168) 2022-11-23 22:28:41 +02:00
Georgi Gerganov a156a358ca
Revert "update README.md"
This reverts commit 6a84147113.
2022-11-23 22:16:50 +02:00
katsu560 6a84147113 update README.md 2022-11-23 22:16:33 +02:00
katsu560 804f36aa2c ggml: change inline ggml_fp16_to_fp32, ggml_fp16_t ggml_fp32_to_fp16 2022-11-23 22:16:33 +02:00
katsu560 4b2f51b479 add gprof option 2022-11-23 22:16:33 +02:00
katsu560 800ae5b808 fix AVX,AVX2,FMA,F16C detection on Linux and add flags for OpenBLAS 2022-11-23 22:16:33 +02:00
katsu560 83456076f0 add AVX support 2022-11-23 22:16:33 +02:00
Tamotsu Takahashi 3df6c14fca Build with OpenBLAS and SDL2 on windows 2022-11-23 22:09:54 +02:00
Georgi Gerganov d64d6ca3fd
models : minor changes to the HF convert script (#157) 2022-11-23 22:07:20 +02:00
Georgi Gerganov 93482d0373
models : add "convert-h5-to-ggml.py" script (#157)
Converts transformers models to ggml.
Although the conversion is successful, it does not work for some reason.
Not sure why
2022-11-23 17:19:22 +02:00
Georgi Gerganov 49706a658a
minor : updates few prints + fix buttons in whisper.wasm 2022-11-23 17:19:21 +02:00
Georgi Gerganov 363a2dadec
Update README.md 2022-11-23 09:53:55 +02:00
Georgi Gerganov 623a486056
Update README.md 2022-11-23 09:52:36 +02:00
Tamotsu Takahashi 2f596f5b33 Find libopenblas.dll.a on windows
"lib" is needed for windows.

With this change, you can build whisper.cpp with OpenBLAS's prebuilt DLL.
1. extract a zip from https://github.com/xianyi/OpenBLAS/releases
2. copy the headers in (openblas)/include to the root directory of whisper.cpp
3. invoke cmake with -DCMAKE_LIBRARY_PATH=(openblas)\lib -DWHISPER_SUPPORT_OPENBLAS=ON
4. copy (openblas)/bin/libopenblas.dll to the same directory of whisper.dll after msbuild

https://github.com/ggerganov/whisper.cpp/issues/89#issuecomment-1324391258
2022-11-23 08:26:45 +02:00
Georgi Gerganov e5dcdabbb8
unicode : fix character replacement (thanks to @tamo) 2022-11-23 08:24:29 +02:00
Georgi Gerganov dad109c3f1
close #109 : add fetching of the model over HTTP (whisper.wasm) 2022-11-22 22:48:56 +02:00
Georgi Gerganov 326573de9a
talk.wasm : final touches 2022-11-22 22:22:17 +02:00
Georgi Gerganov 9aea96f774
talk.wasm : polishing + adding many AI personalities 2022-11-22 20:10:20 +02:00
Georgi Gerganov 385236d1d3
stream : "-kc" now enables context keeping from previous segment (#90)
By default, the context keeping is disabled
2022-11-22 18:21:15 +02:00
M. Eren Akbiyik 63ae03b8e0
Prompt previous tokens for streaming (#163)
* feat: prompt previous tokens for streaming

I used a vector pointer instead of vector itself because it gave weird errors, and why not

* convert vector to use with C api

* feat: remove old refs, check for prompt size

* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
Georgi Gerganov 78116f8eda
talk.wasm : update README.md 2022-11-21 22:42:29 +02:00
Georgi Gerganov a4dfbeecf9
talk.wasm : GPT-2 meets Whisper in WebAssembly (#155)
* talk : initial real-time transcription in the browser

* talk : polishing the UI

* talk : ready for beta testing

* talk.wasm : rename example
2022-11-21 22:20:42 +02:00
Georgi Gerganov 2e311a2917
Update README.md 2022-11-21 18:52:20 +02:00
Georgi Gerganov 2065572a11 ggml : fix Windows build 2022-11-20 22:47:03 +02:00
Georgi Gerganov 5c2176e314 ci : add Windows build 2022-11-20 22:47:03 +02:00
Georgi Gerganov f2df9bd768 stream : add "max_tokens" cli arg
Controls the max tokens per segment for the stream example
2022-11-20 21:22:41 +02:00
Georgi Gerganov fb8d77f760 stream : add "audio_ctx" parameter
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.

The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
Georgi Gerganov 62b5ff875c stream : add "max_tokens" parameter
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
Georgi Gerganov d351771a4b stream : add "single_segment" option
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
Georgi Gerganov c058aaf22e stream : partial encoder experiments 2022-11-20 21:22:41 +02:00
greeshmay 2ba66360c9
fix: free ggml_context (close #149) (#150)
* fix: free ggml_context

* ggml : free the model's contexts in whisper_free()

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-11-17 22:12:51 +02:00
Georgi Gerganov e70e5c8b53
models : simplify the conversion script
"transformers" dependency is not actually needed
2022-11-16 19:22:32 +02:00
Dody Suria Wijaya 55a0e1a64e Update download-ggml-model.sh
follow curl redirect to new hosting site
2022-11-16 18:59:44 +02:00
Georgi Gerganov 864a78a8d0
models : change default hosting to Hugging Face
My Linode is running out of monthly bandwidth due to the big interest in
the project
2022-11-15 19:47:06 +02:00
Georgi Gerganov 83c742f1a7 whisper : add option to speed up the audio tempo by x2
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Georgi Gerganov 41b48ab7f1
make : add libwhisper.so target (#144) 2022-11-13 09:09:48 +02:00
Chidi Williams a728be9cdb
Add WHISPER_NO_AVX and WHISPER_NO_AVX2 to CMakeLists (#136)
* Check for AVX and AVX2 on Darwin

* Add AVX options to CMakeLists
2022-11-11 18:10:01 +02:00