This website requires JavaScript.
a0cc3ab377
add sycl build
Abhilash Majumder
2024-02-13 17:24:58 +0530
3d42463845
models : add update py requirements
Georgi Gerganov
2024-02-13 11:51:32 +0200
0e2139e22e
add sycl abstraction
Abhilash Majumder
2024-02-13 13:54:29 +0530
37f075a922
add changes from llama upstream
Abhilash Majumder
2024-02-13 11:43:58 +0530
dac6533892
Add files via upload
bobqianic
2024-02-13 03:02:25 +0000
b668927591
Add files via upload
bobqianic
2024-02-13 03:01:57 +0000
0f6ad6c2f5
fix bugs
bobqianic
2024-02-13 02:51:28 +0000
3ffc83d90a
swift : package no longer use ggml dependency (#1861 )
Georgi Gerganov
2024-02-12 19:54:11 +0200
e3c5e2cba8
whisper : fix external encoder (#1860 )
Georgi Gerganov
2024-02-12 19:53:51 +0200
ae47fb835b
spm : add ggml.h
Georgi Gerganov
2024-02-12 19:43:34 +0200
65a213daaa
Revert "swift : update Package.swift to use ggml as package dependency (#1701 )"
Georgi Gerganov
2024-02-12 19:16:57 +0200
c604bf4eae
whisper : fix external encoder
Georgi Gerganov
2024-02-12 19:08:54 +0200
b742f13e70
sync : ggml
Georgi Gerganov
2024-02-12 19:07:56 +0200
52c529eeb1
ggml-alloc : allocate all leafs as if they were inputs (ggml/731)
slaren
2024-02-12 18:07:14 +0100
f25edade2b
whisper : alternative way to handle the external encoders
gg/fix-external-encoder
Georgi Gerganov
2024-02-12 16:32:26 +0200
74c260fe34
whisper : fix usage of extenral encoders (e.g. CoreML)
Georgi Gerganov
2024-02-12 15:19:59 +0200
551529290d
talk-llama : sync llama.cpp
Georgi Gerganov
2024-02-12 10:39:58 +0200
25a90ffa38
sync : ggml
Georgi Gerganov
2024-02-12 09:32:15 +0200
866b67ca93
ggml-backend : sync remnant
Georgi Gerganov
2024-02-12 09:27:57 +0200
d7e9f58f7f
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434)
Johannes Gäßler
2024-02-11 19:08:39 +0100
04839bae22
vulkan: only use M-sized matmul on Apple GPUs (llama/5412)
Sergio López
2024-02-11 15:12:00 +0100
3cc6e04a52
ggml : fix compile warnings (unused vars) (llama/4966)
Georgi Gerganov
2024-02-11 15:33:01 +0200
b7ef178b9c
ggml : add mmla kernels for quantized GEMM (llama/4966)
snadampal
2024-02-11 07:22:33 -0600
47dfe9d4db
metal : use autoreleasepool to avoid memory leaks (llama/5437)
Ian Bull
2024-02-10 02:53:28 -0800
1d3270cc8f
ggml-alloc : v3 (ggml/727)
slaren
2024-02-11 13:37:58 +0100
b72a3c47ef
Merge 670d9202ca
into a6fb6ab597
Didzis Gosko
2024-02-12 07:30:47 +0000
a6fb6ab597
examples : added audio_ctx argument to main and server (#1857 )
dscripka
2024-02-12 02:19:07 -0500
f48e2ba26f
better default value (again)
dscripka
2024-02-11 12:44:55 -0500
d5f04c4390
Better default value
dscripka
2024-02-11 11:49:44 -0500
88ffadde04
added audio_ctx argument to main and server examples
dscripka
2024-02-11 11:36:00 -0500
99e5322a79
Apply suggestions from code review
bobqianic
2024-02-11 15:19:01 +0000
0cb356e2a0
Merge 476dff4544
into 163e74b6c3
bobqianic
2024-02-11 15:07:39 +0000
163e74b6c3
metal : option to embed MSL source into compiled binary (#1842 )
Didzis Gosko
2024-02-11 16:41:41 +0200
f273e66dc6
examples : initialize context params properly (#1852 )
Georgi Gerganov
2024-02-11 16:39:12 +0200
047ae5b51a
reduce error rate
bobqianic
2024-02-10 23:02:01 +0000
56a7a22080
Reduce error rate
bobqianic
2024-02-10 21:48:30 +0000
14fef7cc23
Add files via upload
bobqianic
2024-02-10 18:00:08 +0000
221d8d969b
Update Makefile
bobqianic
2024-02-10 17:58:12 +0000
f8c8d493af
Update CMakeLists.txt
bobqianic
2024-02-10 17:56:28 +0000
0806bc330e
Add files via upload
bobqianic
2024-02-10 17:55:11 +0000
a29a3c8c29
bpe_tokenizer implementation
bobqianic
2024-02-10 17:53:30 +0000
02b4c52c12
talk-llama : sync llama.cpp
Georgi Gerganov
2024-02-10 10:10:59 +0200
518199c09e
sync : ggml
Georgi Gerganov
2024-02-10 09:56:47 +0200
8b17a2f776
src : relocate new backend sources
Georgi Gerganov
2024-02-10 09:50:24 +0200
b6d2827914
ggml : fix `error C2078: too many initializers` for MSVC ARM64 (llama/5404)
Michael Podvitskiy
2024-02-09 10:56:43 +0100
9711bae0b3
CUDA: more warps for mmvq on NVIDIA (llama/5394)
Johannes Gäßler
2024-02-08 21:56:40 +0100
eec38f63bd
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386)
Johannes Gäßler
2024-02-07 12:40:26 +0100
ef5e6b746f
Basic Vulkan Multi-GPU implementation (llama/5321)
0cc4m
2024-02-07 07:54:50 +0100
77bf6b5f56
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370)
Johannes Gäßler
2024-02-06 18:43:06 +0100
b562fff9d0
Slight quantization improvement for Q4_K and Q5_K (llama/5361)
Kawrakow
2024-02-06 17:28:02 +0200
b5dec374f4
CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351)
Johannes Gäßler
2024-02-06 14:44:06 +0100
fa0dc6167c
ggml : make use of ggml-quants.h possible in C++ code (llama/5338)
Kawrakow
2024-02-05 14:09:47 +0200
55bcd62a4b
ggml : avoid duplicating function calls using MIN/MAX macros (llama/5325)
Dr. Tom Murphy VII Ph.D
2024-02-05 06:13:57 -0500
0ed762d691
iq2_xxs: tune quantization (llama/5320)
Kawrakow
2024-02-05 10:46:06 +0200
1b5bb7792e
cuda : fix LLAMA_CUDA_F16 (llama/5262)
slaren
2024-02-01 18:30:17 +0100
9b735cea77
metal : add im2col F32 dst support (llama/5132)
Georgi Gerganov
2024-01-31 15:35:41 +0200
12c462d656
llava : add MobileVLM support (llama/5132)
JidongZhang-THU
2024-01-31 21:10:15 +0800
fc7b0e2c28
ggml : limit n_threads to the max n_tasks (llama/5238)
slaren
2024-01-31 13:43:03 +0100
f850a067ed
kompute : llama-bench support and ggml_cpu_has_kompute() (llama/5226)
Jared Van Bortel
2024-01-30 19:04:37 -0500
f75e1197f1
ggml : add abort_callback for cpu backend (ggml/725)
Michael Podvitskiy
2024-02-09 10:42:27 +0100
aa8a75e287
extra : update sync scripts
Georgi Gerganov
2024-02-10 09:55:19 +0200
08aa78cc27
generate Metal library embedding assembly on-fly during build process
Didzis Gosko
2024-02-10 09:33:05 +0200
3d34df2d1d
rename the preprocessor directive
Didzis Gosko
2024-02-10 03:16:28 +0200
670c3715f1
rename the build option
Didzis Gosko
2024-02-10 03:14:25 +0200
476dff4544
Merge pull request #8 from bobqianic/heuristic
bobqianic
2024-02-09 17:59:01 +0000
de4f87ffe0
Bug Fix 2
bobqianic
2024-02-09 17:58:35 +0000
e091189762
Add heuristic mode
bobqianic
2024-02-09 17:55:12 +0000
351252700d
Bug Fix
bobqianic
2024-02-09 17:54:22 +0000
b6d89b08ed
Add heuristic mode
bobqianic
2024-02-09 17:50:33 +0000
80e8a2ea39
server : allow CORS request with authorization headers (#1850 )
Valentin Gosu
2024-02-09 16:42:41 +0100
19f8048139
whisper.android : how to build with CLBlast (#1809 )
Neuman Vong
2024-02-10 02:39:05 +1100
0f80e5a80a
whisper : expose CUDA device setting in public API (#1840 )
Didzis Gosko
2024-02-09 17:27:47 +0200
b6559333ff
make : add macOS deployment target option (#1839 )
Didzis Gosko
2024-02-09 17:26:29 +0200
dca731c59e
Merge 3c3e649eee
into 434b8f3b96
Pablo Duboue
2024-02-09 13:18:24 +0100
b942bfdbad
server: Allow CORS request with authorization headers
Valentin Gosu
2024-02-08 22:42:31 +0100
132a8f7837
@gpokat
Neuman Vong
2024-02-07 09:37:16 +0800
434b8f3b96
talk-llama : stream response (#1121 )
Georgi Gerganov
2024-02-06 19:56:12 +0200
c0277e3e11
revert logsumexp implementation
bobqianic
2024-02-06 15:42:25 +0000
98d4b23baf
Don't wait so long to check if stream is running (to avoid missing audio once 'resume' is called); 2ms is consistent with write wait time as well
Shane Lenagh
2024-02-06 06:45:57 -0600
11cd5602a1
whisper : allow to select GPU (CUDA) device from public API
Didzis Gosko
2024-02-06 10:57:12 +0200
af2b504b84
Makefile : allow to override CUDA_ARCH_FLAG
Didzis Gosko
2024-02-06 10:56:39 +0200
f10a7b43a5
ggml : add dynamic CUDA driver loader and static link against CUDA runtime
Didzis Gosko
2024-02-06 10:35:33 +0200
52c39c81f9
@ggerganov
Neuman Vong
2024-02-06 07:38:26 +0800
b2d0185a2d
Reduced gRPC existing write wait, to prevent excessive sleeping before next write
Shane Lenagh
2024-02-05 13:49:13 -0600
0c404d6ccb
Didn't change top-level stream.cpp comment; changed to indicate this is not the mic/SDL version (obviously much of this code was copied from that, intentionally--perhaps someday these can all have a common stream.cpp class and swap 'implementations' of the asynch_audio classes that implement a common interface)
Shane Lenagh
2024-02-05 13:22:11 -0600
fc9431e589
Comments, cleanup (e.g., removing unused imports), and clarity/consistency changes
Shane Lenagh
2024-02-05 13:17:37 -0600
65bd9e5ba9
Fixed(ish) the time interval computation, basing it on a sample-pull-size decrement from head of audio sample timestamp, and decreased diffs from existing stream.cpp
Shane Lenagh
2024-02-05 10:15:16 -0600
7baa7a6bd4
Merge pull request #7 from bobqianic/fix-go
bobqianic
2024-02-05 12:01:37 +0000
891a4539e3
Update interface.go
bobqianic
2024-02-05 11:59:26 +0000
49e7a7f1e3
Update context.go
bobqianic
2024-02-05 11:58:49 +0000
9fbe59fa7d
Merge pull request #6 from bobqianic/fix-binding
bobqianic
2024-02-05 11:33:06 +0000
4cc4b892d8
Update params.go
bobqianic
2024-02-05 11:29:45 +0000
0f5b5bea49
Update test_whisper.rb
bobqianic
2024-02-05 11:29:08 +0000
09a735ee74
Update ruby_whisper.cpp
bobqianic
2024-02-05 11:27:12 +0000
b3305eb01f
Add files via upload
bobqianic
2024-02-05 01:44:50 +0000
a0d4348b68
Merge pull request #5 from bobqianic/push
bobqianic
2024-02-05 01:38:18 +0000
7a5a2e9a3a
Add files via upload
bobqianic
2024-02-05 01:37:27 +0000
8a46034af7
Add files via upload
bobqianic
2024-02-05 01:36:51 +0000
8bcac3d28a
Adding README.md for stream.grpc
Shane Lenagh
2024-02-03 22:02:18 -0600
a6b15d98ed
Build (CMake) support for bidi GRPC streaming transcription example
Shane Lenagh
2024-02-03 21:33:39 -0600