whisper.cpp/examples/talk
Tamotsu Takahashi f18738f247
talk, talk-llama : pass text_to_speak as a file (#1865)
* talk-llama: pass file instead of arg

it is too hard to quote text in a portable way

* talk-llama: pass heard_ok as a file

* talk-llama: let eleven-labs.py accept options

Options: -v voice, -s savefile, -p (--play)

* talk-llama: check installed commands in "speak"

Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed

* talk-llama: pass voice_id again

in order to sync talk with talk-llama

* talk: sync with talk-llama

Passing text_to_speak as a file is safer and more portable
cf. https://stackoverflow.com/a/59036879/45375

* talk and talk-llama: get all installed voices in speak.ps1

* talk and talk-llama: get voices from api

* talk and talk-llama: add more options to eleven-labs.py

and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/)

```
usage: eleven-labs.py [-q] [-l] [-h] [-n NAME | -v NUMBER] [-f KEY=VAL] [-s FILE | -p] [TEXTFILE]

options:
  -q, --quick           skip checking the required library

action:
  TEXTFILE              read the text file (default: stdin)
  -l, --list            show the list of voices and exit
  -h, --help            show this help and exit

voice selection:
  -n NAME, --name NAME  get a voice object by name (default: Arnold)
  -v NUMBER, --voice NUMBER
                        get a voice object by number (see --list)
  -f KEY=VAL, --filter KEY=VAL
                        filter voices by labels (default: "use case=narration")
                        this option can be used multiple times
                        filtering will be disabled if the first -f has no "=" (e.g. -f "any")

output:
  -s FILE, --save FILE  save the TTS to a file (default: audio.mp3)
  -p, --play            play the TTS with ffplay
```

* examples: add speak_with_file()

as suggested in the review

* talk and talk-llama: ignore to_speak.txt
2024-02-24 09:24:47 +02:00
..
.gitignore talk, talk-llama : pass text_to_speak as a file (#1865) 2024-02-24 09:24:47 +02:00
CMakeLists.txt whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
README.md examples: Update the README for Talk - fixing the gpt2 URL (#1334) 2023-10-01 04:21:32 +08:00
eleven-labs.py talk, talk-llama : pass text_to_speak as a file (#1865) 2024-02-24 09:24:47 +02:00
gpt-2.cpp sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677) 2023-12-22 17:53:39 +02:00
gpt-2.h whisper : add integer quantization support (#540) 2023-04-30 18:51:57 +03:00
speak talk, talk-llama : pass text_to_speak as a file (#1865) 2024-02-24 09:24:47 +02:00
speak.bat `speak` scripts for Windows 2023-06-01 22:45:00 +10:00
speak.ps1 talk, talk-llama : pass text_to_speak as a file (#1865) 2024-02-24 09:24:47 +02:00
talk.cpp talk, talk-llama : pass text_to_speak as a file (#1865) 2024-02-24 09:24:47 +02:00

README.md

talk

Talk with an Artificial Intelligence in your terminal

Demo Talk

Web version: examples/talk.wasm

Building

The talk tool depends on SDL2 library to capture audio from the microphone. You can build it like this:

# Install SDL2 on Linux
sudo apt-get install libsdl2-dev

# Install SDL2 on Mac OS
brew install sdl2

# Build the "talk" executable
make talk

# Run it
./talk -p Santa

GPT-2

To run this, you will need a ggml GPT-2 model: instructions

Alternatively, you can simply download the smallest ggml GPT-2 117M model (240 MB) like this:

wget --quiet --show-progress -O models/ggml-gpt-2-117M.bin https://huggingface.co/ggerganov/ggml/resolve/main/ggml-model-gpt-2-117M.bin

TTS

For best experience, this example needs a TTS tool to convert the generated text responses to voice. You can use any TTS engine that you would like - simply edit the speak script to your needs. By default, it is configured to use MacOS's say or espeak or Windows SpeechSynthesizer, but you can use whatever you wish.