whisper.cpp/examples/talk-llama
Georgi Gerganov 93935980f8
whisper : Metal and ggml-alloc support (#1270)
* metal : init

* whisper : factor out graph builds

* whisper : allocate encoder and decoder using ggml-alloc

* whisper : ggml-alloc is now supported

* whisper : CoreML support ggml-alloc

* build : fix ggml-alloc

* ios : update submodule

* extra : update sync-ggml.sh script to also sync ggml-alloc

* ci : see if this is causing the crash

* whisper : refactor ggml-alloc init

* whisper.android : try to fix build

* whisper : initial Metal version

* ci : try to debug vmem issue

* metal : decoder works on GPU!

* metal : add multi-decoder support

* ggml : fix ggml_nbytes (probably temp solution)

* metal : run "cross" step on the GPU

* whisper : remove ggml_repeat in the encoder

* whisper : offload the Encoder to Metal

* ggml : use simpler ggml_bytes() implementation

* ggml-alloc : try to make CI happy by reducing vram to 128GB

* whisper : add whisper_allocr to wrap ggml_allocr

* whisper : factor out alloc init in a function

* cmake : update to support Metal build

* whisper : add <functional> header

* objc : fix build (no Metal yet)

* ios : add Metal support

* swiftui : fix build

* metal : speed-up KQ multiplication

* metal : sync latest llama.cpp kernels

* readme : add Metal info

* ios : update submodule

* coreml : add code to toggle Core ML config (CPU, ANE, GPU)

* bench : fix timings by running a pre-heat

* bench : start benching the decoder

* whisper : add ggml_mul_mat_pad

* bench : fix uninitialized vars

* whisper : add comment for disabling mul-mat padding

* whisper : add description of ggml_mul_mat_pad

* whisper : clean-up ggml_mul_mat_pad

* metal : remove the "concurrent" flag

* bench : variable n_past

* ios : update SPM package
2023-09-15 12:18:18 +03:00
..
prompts talk-llama : add alpaca support (#668) 2023-03-29 23:01:14 +03:00
.gitignore talk, talk-llama : add basic example script for eleven-labs tts (#728) 2023-04-14 19:53:58 +03:00
CMakeLists.txt whisper : Metal and ggml-alloc support (#1270) 2023-09-15 12:18:18 +03:00
eleven-labs.py examples : update elevenlabs scripts to use official python API (#837) 2023-05-24 21:11:01 +03:00
llama-util.h talk-llama : fix build on macOS (#1062) 2023-06-28 22:34:50 +03:00
llama.cpp build : do not use _GNU_SOURCE gratuitously (#1129) 2023-09-07 12:36:14 +03:00
llama.h talk-llama : sync latest llama.cpp (close #922, close #954) 2023-05-23 14:04:39 +03:00
README.md speak scripts for Windows 2023-06-01 22:45:00 +10:00
speak speak scripts for Windows 2023-06-01 22:45:00 +10:00
speak.bat speak scripts for Windows 2023-06-01 22:45:00 +10:00
speak.ps1 speak scripts for Windows 2023-06-01 22:45:00 +10:00
talk-llama.cpp build : do not use _GNU_SOURCE gratuitously (#1129) 2023-09-07 12:36:14 +03:00

talk-llama

Talk with an LLaMA AI in your terminal

Demo Talk

Building

The talk-llama tool depends on SDL2 library to capture audio from the microphone. You can build it like this:

# Install SDL2 on Linux
sudo apt-get install libsdl2-dev

# Install SDL2 on Mac OS
brew install sdl2

# Build the "talk-llama" executable
make talk-llama

# Run it
./talk-llama -mw ./models/ggml-small.en.bin -ml ../llama.cpp/models/13B/ggml-model-q4_0.bin -p "Georgi" -t 8
  • The -mw argument specifies the Whisper model that you would like to use. Recommended base or small for real-time experience
  • The -ml argument specifies the LLaMA model that you would like to use. Read the instructions in https://github.com/ggerganov/llama.cpp for information about how to obtain a ggml compatible LLaMA model

Session

The talk-llama tool supports session management to enable more coherent and continuous conversations. By maintaining context from previous interactions, it can better understand and respond to user requests in a more natural way.

To enable session support, use the --session FILE command line option when running the program. The talk-llama model state will be saved to the specified file after each interaction. If the file does not exist, it will be created. If the file exists, the model state will be loaded from it, allowing you to resume a previous session.

This feature is especially helpful for maintaining context in long conversations or when interacting with the AI assistant across multiple sessions. It ensures that the assistant remembers the previous interactions and can provide more relevant and contextual responses.

Example usage:

./talk-llama --session ./my-session-file -mw ./models/ggml-small.en.bin -ml ../llama.cpp/models/13B/ggml-model-q4_0.bin -p "Georgi" -t 8

TTS

For best experience, this example needs a TTS tool to convert the generated text responses to voice. You can use any TTS engine that you would like - simply edit the speak script to your needs. By default, it is configured to use MacOS's say or Windows SpeechSynthesizer, but you can use whatever you wish.

Discussion

If you have any feedback, please let "us" know in the following discussion: https://github.com/ggerganov/whisper.cpp/discussions/672?converting=1