whisper.cpp/examples/command
Przemysław Pawełczyk 3f7a03ebe3
ggml : do not use _GNU_SOURCE gratuitously (#1027)
* Do not use _GNU_SOURCE gratuitously.

What is needed to build whisper.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions.

There is no need to penalize musl libc which simply follows standards.

Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs.

It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.

* examples : include SDL headers before other headers

This is an attempt at fixing macOS build error coming from SDL2 relying
on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.
2023-06-25 16:34:30 +03:00
..
CMakeLists.txt whisper : add GPU support via cuBLAS (#834) 2023-04-30 12:14:33 +03:00
command.cpp ggml : do not use _GNU_SOURCE gratuitously (#1027) 2023-06-25 16:34:30 +03:00
commands.txt command : adding guided mode 2022-12-16 19:38:18 +02:00
README.md command : add "guided-mode" video demo in the README.md 2023-01-06 18:59:26 +02:00

command

This is a basic Voice Assistant example that accepts voice commands from the microphone. More info is available in issue #171.

# Run with default arguments and small model
./command -m ./models/ggml-small.en.bin -t 8

# On Raspberry Pi, use tiny or base models + "-ac 768" for better performance
./command -m ./models/ggml-tiny.en.bin -ac 768 -t 3 -c 0

https://user-images.githubusercontent.com/1991296/204038393-2f846eae-c255-4099-a76d-5735c25c49da.mp4

Web version: examples/command.wasm

Guided mode

"Guided mode" allows you to specify a list of commands (i.e. strings) and the transcription will be guided to classify your command into one from the list. This can be useful in situations where a device is listening only for a small subset of commands.

Initial tests show that this approach might be extremely efficient in terms of performance, since it integrates very well with the "partial Encoder" idea from #137.

# Run in guided mode, the list of allowed commands is in commands.txt
./command -m ./models/ggml-base.en.bin -cmd ./examples/command/commands.txt

# On Raspberry Pi, in guided mode you can use "-ac 128" for extra performance
./command -m ./models/ggml-tiny.en.bin -cmd ./examples/command/commands.txt -ac 128 -t 3 -c 0

https://user-images.githubusercontent.com/1991296/207435352-8fc4ed3f-bde5-4555-9b8b-aeeb76bee969.mp4

Building

The command tool depends on SDL2 library to capture audio from the microphone. You can build it like this:

# Install SDL2 on Linux
sudo apt-get install libsdl2-dev

# Install SDL2 on Mac OS
brew install sdl2

make command