From 78116f8eda6a14c4903b65de07b9636857cda175 Mon Sep 17 00:00:00 2001 From: Georgi Gerganov Date: Mon, 21 Nov 2022 22:42:29 +0200 Subject: [PATCH] talk.wasm : update README.md --- examples/talk.wasm/README.md | 39 +++++++++++++++++++++++++++--- examples/talk.wasm/index-tmpl.html | 4 +++ 2 files changed, 39 insertions(+), 4 deletions(-) diff --git a/examples/talk.wasm/README.md b/examples/talk.wasm/README.md index f20bb01..43501cd 100644 --- a/examples/talk.wasm/README.md +++ b/examples/talk.wasm/README.md @@ -1,7 +1,38 @@ -# talk +# talk.wasm -WIP IN PROGRESS +Talk with an Artificial Intelligence entity in your browser: -ref: https://github.com/ggerganov/whisper.cpp/issues/154 +https://user-images.githubusercontent.com/1991296/202914175-115793b1-d32e-4aaa-a45b-59e313707ff6.mp4 -demo: https://talk.ggerganov.com +Online demo: https://talk.ggerganov.com + +## How it works? + +This demo leverages 2 modern neural network models to create a high-quality voice chat directly in your browser: + +- [OpenAI's Whisper](https://github.com/openai/whisper) speech recognition model is used to process your voice and understand what you are saying +- Upon receiving some voice input, the AI generates a text response using [OpenAI's GPT-2](https://github.com/openai/gpt-2) language model +- The AI then vocalizes the response using the browser's [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API) + +The web page does the processing locally on your machine. However, in order to run the models, it first needs to +download the model data which is about ~350 MB. The model data is then cached in your browser's cache and can be reused +in future visits without downloading it again. + +The processing of these heavy neural network models in the browser is possible by implementing them efficiently in C/C++ +and using WebAssembly SIMD capabilities for extra performance. For more detailed information, checkout the +[current repository](https://github.com/ggerganov/whisper.cpp). + +## Requirements + +In order to run this demo efficiently, you need to have the following: + +- Latest Chrome or Firefox browser (Safari is not supported) +- Run this on a desktop or laptop with modern CPU (a mobile phone will likely not be good enough) +- Speak phrases that are no longer than 10 seconds - this is the audio context of the AI +- The web-page uses about 1.4GB of RAM + +## Feedback + +If you have any comments or ideas for improvement, please drop a comment in the following discussion: + +https://github.com/ggerganov/whisper.cpp/discussions/167 diff --git a/examples/talk.wasm/index-tmpl.html b/examples/talk.wasm/index-tmpl.html index abaea13..86e2cea 100644 --- a/examples/talk.wasm/index-tmpl.html +++ b/examples/talk.wasm/index-tmpl.html @@ -46,6 +46,10 @@
+ Select the models you would like to use and click the "Start" button to begin the conversation + +

+
Whisper model: