Commit Graph

78 Commits (9a0b59d990be319952a4a02b9164b3b2327cd454)

Author SHA1 Message Date
M. Eren Akbiyik 63ae03b8e0
Prompt previous tokens for streaming (#163)
* feat: prompt previous tokens for streaming

I used a vector pointer instead of vector itself because it gave weird errors, and why not

* convert vector to use with C api

* feat: remove old refs, check for prompt size

* feat: use better way of getting the pointer
2022-11-22 18:10:35 +02:00
Georgi Gerganov fb8d77f760 stream : add "audio_ctx" parameter
Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.

The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.
2022-11-20 21:22:41 +02:00
Georgi Gerganov 62b5ff875c stream : add "max_tokens" parameter
Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context
2022-11-20 21:22:41 +02:00
Georgi Gerganov d351771a4b stream : add "single_segment" option
Force the entire audio chunk to be transcribed into a single segment
2022-11-20 21:22:41 +02:00
Georgi Gerganov c058aaf22e stream : partial encoder experiments 2022-11-20 21:22:41 +02:00
Georgi Gerganov 83c742f1a7 whisper : add option to speed up the audio tempo by x2
Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
2022-11-13 16:25:43 +02:00
Georgi Gerganov c30bffc8a5
ref #22 : add "duration" option
Can be used to partially process a recording
2022-11-07 20:14:52 +02:00
Georgi Gerganov d5afebd37c
whisper : token-level timestamp refactoring (#49, #120)
This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters
2022-11-02 21:45:54 +02:00
Georgi Gerganov 57fb46f307 main : add option for word-leve timestamps (very experimental) 2022-10-30 17:06:57 +02:00
Georgi Gerganov eba62e0fa1
close #113 : fix struct whisper_token_data 2022-10-30 08:23:52 +02:00
Georgi Gerganov dec40be58f parallel : print time of audio boundaries + fix timings 2022-10-29 19:37:19 +03:00
Georgi Gerganov 0b2dc3c82c parallel : working 2022-10-29 19:37:19 +03:00
Georgi Gerganov 85d6e1e1e7 main : fix sampling time + add max_context parameter 2022-10-29 19:37:19 +03:00
Georgi Gerganov 72e9cdd6bf parallel : adding tool for parallel transformer inference 2022-10-29 19:37:19 +03:00
Georgi Gerganov 34bb3ab0cf ggml : add system info functions 2022-10-25 20:53:48 +03:00
Georgi Gerganov 7affd309d3 whisper : add new-segment callback
Can be used to process new segments as they are being generated.
Sample usage in main, for printing the resulting segments during the
inference.
2022-10-22 21:17:21 +03:00
Georgi Gerganov 31ff0c6a1f wip : experimental color coding of tokens based on probabilities 2022-10-22 21:17:21 +03:00
Georgi Gerganov 7eeef0358a
ref #52 : improve greedy sampling strategy
Force timestamp token to be sampled if the probability sum over all
timestamp tokens is above the probability of any other token
2022-10-18 19:48:15 +03:00
Georgi Gerganov 2d171ced32
close #32 : add comment about thread-safety of the C-style API 2022-10-18 18:27:57 +03:00
Georgi Gerganov e30cf83158
ref #57, #62, #63 : remove unions in C-api + remove designated initializers
We are not ready for designated initializers - many compilers do not
support this C++ feature yet, so removing it's non-trivial usages.
2022-10-18 18:17:24 +03:00
Georgi Gerganov 9d5723435f
ref #35 : add <stdbool.h> to whisper.h
"bool" type is not implicitly defined for some compilers.
2022-10-10 08:11:18 +03:00
Georgi Gerganov 9bbca3110f ref #9 : add API documentation in whisper.h 2022-10-08 18:09:56 +03:00
Georgi Gerganov 2f069335ab Adding sanitizer tests 2022-10-08 11:43:42 +03:00
Georgi Gerganov 481cd685d5
ref #10 : option to keep context in "stream" example
Seems the results become worse when we keep the context, so by default
this is not enabled
2022-10-07 22:30:44 +03:00
Georgi Gerganov 7787b878e1
ref #16, #22 : add "offset" argument
Allows to start processing the input audio at some offset from the
beginning. Useful for splitting a long job into multiple tasks.
2022-10-07 22:00:40 +03:00
Georgi Gerganov 6814cc9b02 Improve result printing 2022-10-04 23:18:15 +03:00
Georgi Gerganov eba33adadd Extend C-style API with full inference methods 2022-10-04 23:18:15 +03:00
Georgi Gerganov 6b77124e01 Initial C-style interface for whisper.cpp 2022-10-04 23:18:15 +03:00