whisper.cpp/examples/stream
Przemysław Pawełczyk b55b505690
build : do not use _GNU_SOURCE gratuitously (#1129)
* Do not use _GNU_SOURCE gratuitously.

What is needed to build whisper.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions,
plus some stuff from BSD that is not specified in POSIX.1.

Well, that was true until NUMA support was added recently in ggml,
so enable GNU libc extensions for Linux builds to cover that.

There is no need to penalize musl libc which simply follows standards.

Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs.

It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.

* examples : include SDL headers before other headers

Avoid macOS build error when _DARWIN_C_SOURCE is not defined, brought by
SDL2 relying on Darwin extension memset_pattern4/8/16 (from string.h).

* make : enable BSD extensions for DragonFlyBSD to expose RLIMIT_MEMLOCK

* make : use BSD-specific FTMs to enable alloca on BSDs

* make : fix OpenBSD build by exposing newer POSIX definitions

* cmake : follow recent FTM improvements from Makefile
2023-09-07 12:36:14 +03:00
..
CMakeLists.txt whisper : add GPU support via cuBLAS (#834) 2023-04-30 12:14:33 +03:00
README.md stream : update README.md + comments 2022-12-16 18:04:19 +02:00
stream.cpp build : do not use _GNU_SOURCE gratuitously (#1129) 2023-09-07 12:36:14 +03:00

stream

This is a naive example of performing real-time inference on audio from your microphone. The stream tool samples the audio every half a second and runs the transcription continously. More info is available in issue #10.

./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000

https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4

Sliding window mode with VAD

Setting the --step argument to 0 enables the sliding window mode:

 ./stream -m ./models/ggml-small.en.bin -t 6 --step 0 --length 30000 -vth 0.6

In this mode, the tool will transcribe only after some speech activity is detected. A very basic VAD detector is used, but in theory a more sophisticated approach can be added. The -vth argument determines the VAD threshold - higher values will make it detect silence more often. It's best to tune it to the specific use case, but a value around 0.6 should be OK in general. When silence is detected, it will transcribe the last --length milliseconds of audio and output a transcription block that is suitable for parsing.

Building

The stream tool depends on SDL2 library to capture audio from the microphone. You can build it like this:

# Install SDL2 on Linux
sudo apt-get install libsdl2-dev

# Install SDL2 on Mac OS
brew install sdl2

make stream

Web version

This tool can also run in the browser: examples/stream.wasm