Commit Graph

  • a0cc3ab377
    add sycl build Abhilash Majumder 2024-02-13 17:24:58 +0530
  • 3d42463845
    models : add update py requirements Georgi Gerganov 2024-02-13 11:51:32 +0200
  • 0e2139e22e
    add sycl abstraction Abhilash Majumder 2024-02-13 13:54:29 +0530
  • 37f075a922
    add changes from llama upstream Abhilash Majumder 2024-02-13 11:43:58 +0530
  • dac6533892
    Add files via upload bobqianic 2024-02-13 03:02:25 +0000
  • b668927591
    Add files via upload bobqianic 2024-02-13 03:01:57 +0000
  • 0f6ad6c2f5
    fix bugs bobqianic 2024-02-13 02:51:28 +0000
  • 3ffc83d90a
    swift : package no longer use ggml dependency (#1861) Georgi Gerganov 2024-02-12 19:54:11 +0200
  • e3c5e2cba8
    whisper : fix external encoder (#1860) Georgi Gerganov 2024-02-12 19:53:51 +0200
  • ae47fb835b
    spm : add ggml.h Georgi Gerganov 2024-02-12 19:43:34 +0200
  • 65a213daaa
    Revert "swift : update Package.swift to use ggml as package dependency (#1701)" Georgi Gerganov 2024-02-12 19:16:57 +0200
  • c604bf4eae
    whisper : fix external encoder Georgi Gerganov 2024-02-12 19:08:54 +0200
  • b742f13e70
    sync : ggml Georgi Gerganov 2024-02-12 19:07:56 +0200
  • 52c529eeb1
    ggml-alloc : allocate all leafs as if they were inputs (ggml/731) slaren 2024-02-12 18:07:14 +0100
  • f25edade2b
    whisper : alternative way to handle the external encoders gg/fix-external-encoder Georgi Gerganov 2024-02-12 16:32:26 +0200
  • 74c260fe34
    whisper : fix usage of extenral encoders (e.g. CoreML) Georgi Gerganov 2024-02-12 15:19:59 +0200
  • 551529290d
    talk-llama : sync llama.cpp Georgi Gerganov 2024-02-12 10:39:58 +0200
  • 25a90ffa38
    sync : ggml Georgi Gerganov 2024-02-12 09:32:15 +0200
  • 866b67ca93
    ggml-backend : sync remnant Georgi Gerganov 2024-02-12 09:27:57 +0200
  • d7e9f58f7f
    CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434) Johannes Gäßler 2024-02-11 19:08:39 +0100
  • 04839bae22
    vulkan: only use M-sized matmul on Apple GPUs (llama/5412) Sergio López 2024-02-11 15:12:00 +0100
  • 3cc6e04a52
    ggml : fix compile warnings (unused vars) (llama/4966) Georgi Gerganov 2024-02-11 15:33:01 +0200
  • b7ef178b9c
    ggml : add mmla kernels for quantized GEMM (llama/4966) snadampal 2024-02-11 07:22:33 -0600
  • 47dfe9d4db
    metal : use autoreleasepool to avoid memory leaks (llama/5437) Ian Bull 2024-02-10 02:53:28 -0800
  • 1d3270cc8f
    ggml-alloc : v3 (ggml/727) slaren 2024-02-11 13:37:58 +0100
  • b72a3c47ef
    Merge 670d9202ca into a6fb6ab597 Didzis Gosko 2024-02-12 07:30:47 +0000
  • a6fb6ab597
    examples : added audio_ctx argument to main and server (#1857) dscripka 2024-02-12 02:19:07 -0500
  • f48e2ba26f
    better default value (again) dscripka 2024-02-11 12:44:55 -0500
  • d5f04c4390
    Better default value dscripka 2024-02-11 11:49:44 -0500
  • 88ffadde04 added audio_ctx argument to main and server examples dscripka 2024-02-11 11:36:00 -0500
  • 99e5322a79
    Apply suggestions from code review bobqianic 2024-02-11 15:19:01 +0000
  • 0cb356e2a0
    Merge 476dff4544 into 163e74b6c3 bobqianic 2024-02-11 15:07:39 +0000
  • 163e74b6c3
    metal : option to embed MSL source into compiled binary (#1842) Didzis Gosko 2024-02-11 16:41:41 +0200
  • f273e66dc6
    examples : initialize context params properly (#1852) Georgi Gerganov 2024-02-11 16:39:12 +0200
  • 047ae5b51a
    reduce error rate bobqianic 2024-02-10 23:02:01 +0000
  • 56a7a22080
    Reduce error rate bobqianic 2024-02-10 21:48:30 +0000
  • 14fef7cc23
    Add files via upload bobqianic 2024-02-10 18:00:08 +0000
  • 221d8d969b
    Update Makefile bobqianic 2024-02-10 17:58:12 +0000
  • f8c8d493af
    Update CMakeLists.txt bobqianic 2024-02-10 17:56:28 +0000
  • 0806bc330e
    Add files via upload bobqianic 2024-02-10 17:55:11 +0000
  • a29a3c8c29
    bpe_tokenizer implementation bobqianic 2024-02-10 17:53:30 +0000
  • 02b4c52c12
    talk-llama : sync llama.cpp Georgi Gerganov 2024-02-10 10:10:59 +0200
  • 518199c09e
    sync : ggml Georgi Gerganov 2024-02-10 09:56:47 +0200
  • 8b17a2f776
    src : relocate new backend sources Georgi Gerganov 2024-02-10 09:50:24 +0200
  • b6d2827914
    ggml : fix `error C2078: too many initializers` for MSVC ARM64 (llama/5404) Michael Podvitskiy 2024-02-09 10:56:43 +0100
  • 9711bae0b3
    CUDA: more warps for mmvq on NVIDIA (llama/5394) Johannes Gäßler 2024-02-08 21:56:40 +0100
  • eec38f63bd
    CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386) Johannes Gäßler 2024-02-07 12:40:26 +0100
  • ef5e6b746f
    Basic Vulkan Multi-GPU implementation (llama/5321) 0cc4m 2024-02-07 07:54:50 +0100
  • 77bf6b5f56
    CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370) Johannes Gäßler 2024-02-06 18:43:06 +0100
  • b562fff9d0
    Slight quantization improvement for Q4_K and Q5_K (llama/5361) Kawrakow 2024-02-06 17:28:02 +0200
  • b5dec374f4
    CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351) Johannes Gäßler 2024-02-06 14:44:06 +0100
  • fa0dc6167c
    ggml : make use of ggml-quants.h possible in C++ code (llama/5338) Kawrakow 2024-02-05 14:09:47 +0200
  • 55bcd62a4b
    ggml : avoid duplicating function calls using MIN/MAX macros (llama/5325) Dr. Tom Murphy VII Ph.D 2024-02-05 06:13:57 -0500
  • 0ed762d691
    iq2_xxs: tune quantization (llama/5320) Kawrakow 2024-02-05 10:46:06 +0200
  • 1b5bb7792e
    cuda : fix LLAMA_CUDA_F16 (llama/5262) slaren 2024-02-01 18:30:17 +0100
  • 9b735cea77
    metal : add im2col F32 dst support (llama/5132) Georgi Gerganov 2024-01-31 15:35:41 +0200
  • 12c462d656
    llava : add MobileVLM support (llama/5132) JidongZhang-THU 2024-01-31 21:10:15 +0800
  • fc7b0e2c28
    ggml : limit n_threads to the max n_tasks (llama/5238) slaren 2024-01-31 13:43:03 +0100
  • f850a067ed
    kompute : llama-bench support and ggml_cpu_has_kompute() (llama/5226) Jared Van Bortel 2024-01-30 19:04:37 -0500
  • f75e1197f1
    ggml : add abort_callback for cpu backend (ggml/725) Michael Podvitskiy 2024-02-09 10:42:27 +0100
  • aa8a75e287
    extra : update sync scripts Georgi Gerganov 2024-02-10 09:55:19 +0200
  • 08aa78cc27 generate Metal library embedding assembly on-fly during build process Didzis Gosko 2024-02-10 09:33:05 +0200
  • 3d34df2d1d rename the preprocessor directive Didzis Gosko 2024-02-10 03:16:28 +0200
  • 670c3715f1 rename the build option Didzis Gosko 2024-02-10 03:14:25 +0200
  • 476dff4544
    Merge pull request #8 from bobqianic/heuristic bobqianic 2024-02-09 17:59:01 +0000
  • de4f87ffe0
    Bug Fix 2 bobqianic 2024-02-09 17:58:35 +0000
  • e091189762
    Add heuristic mode bobqianic 2024-02-09 17:55:12 +0000
  • 351252700d
    Bug Fix bobqianic 2024-02-09 17:54:22 +0000
  • b6d89b08ed
    Add heuristic mode bobqianic 2024-02-09 17:50:33 +0000
  • 80e8a2ea39
    server : allow CORS request with authorization headers (#1850) Valentin Gosu 2024-02-09 16:42:41 +0100
  • 19f8048139
    whisper.android : how to build with CLBlast (#1809) Neuman Vong 2024-02-10 02:39:05 +1100
  • 0f80e5a80a
    whisper : expose CUDA device setting in public API (#1840) Didzis Gosko 2024-02-09 17:27:47 +0200
  • b6559333ff
    make : add macOS deployment target option (#1839) Didzis Gosko 2024-02-09 17:26:29 +0200
  • dca731c59e
    Merge 3c3e649eee into 434b8f3b96 Pablo Duboue 2024-02-09 13:18:24 +0100
  • b942bfdbad server: Allow CORS request with authorization headers Valentin Gosu 2024-02-08 22:42:31 +0100
  • 132a8f7837
    @gpokat Neuman Vong 2024-02-07 09:37:16 +0800
  • 434b8f3b96
    talk-llama : stream response (#1121) Georgi Gerganov 2024-02-06 19:56:12 +0200
  • c0277e3e11
    revert logsumexp implementation bobqianic 2024-02-06 15:42:25 +0000
  • 98d4b23baf Don't wait so long to check if stream is running (to avoid missing audio once 'resume' is called); 2ms is consistent with write wait time as well Shane Lenagh 2024-02-06 06:45:57 -0600
  • 11cd5602a1 whisper : allow to select GPU (CUDA) device from public API Didzis Gosko 2024-02-06 10:57:12 +0200
  • af2b504b84 Makefile : allow to override CUDA_ARCH_FLAG Didzis Gosko 2024-02-06 10:56:39 +0200
  • f10a7b43a5 ggml : add dynamic CUDA driver loader and static link against CUDA runtime Didzis Gosko 2024-02-06 10:35:33 +0200
  • 52c39c81f9
    @ggerganov Neuman Vong 2024-02-06 07:38:26 +0800
  • b2d0185a2d Reduced gRPC existing write wait, to prevent excessive sleeping before next write Shane Lenagh 2024-02-05 13:49:13 -0600
  • 0c404d6ccb Didn't change top-level stream.cpp comment; changed to indicate this is not the mic/SDL version (obviously much of this code was copied from that, intentionally--perhaps someday these can all have a common stream.cpp class and swap 'implementations' of the asynch_audio classes that implement a common interface) Shane Lenagh 2024-02-05 13:22:11 -0600
  • fc9431e589 Comments, cleanup (e.g., removing unused imports), and clarity/consistency changes Shane Lenagh 2024-02-05 13:17:37 -0600
  • 65bd9e5ba9 Fixed(ish) the time interval computation, basing it on a sample-pull-size decrement from head of audio sample timestamp, and decreased diffs from existing stream.cpp Shane Lenagh 2024-02-05 10:15:16 -0600
  • 7baa7a6bd4
    Merge pull request #7 from bobqianic/fix-go bobqianic 2024-02-05 12:01:37 +0000
  • 891a4539e3
    Update interface.go bobqianic 2024-02-05 11:59:26 +0000
  • 49e7a7f1e3
    Update context.go bobqianic 2024-02-05 11:58:49 +0000
  • 9fbe59fa7d
    Merge pull request #6 from bobqianic/fix-binding bobqianic 2024-02-05 11:33:06 +0000
  • 4cc4b892d8
    Update params.go bobqianic 2024-02-05 11:29:45 +0000
  • 0f5b5bea49
    Update test_whisper.rb bobqianic 2024-02-05 11:29:08 +0000
  • 09a735ee74
    Update ruby_whisper.cpp bobqianic 2024-02-05 11:27:12 +0000
  • b3305eb01f
    Add files via upload bobqianic 2024-02-05 01:44:50 +0000
  • a0d4348b68
    Merge pull request #5 from bobqianic/push bobqianic 2024-02-05 01:38:18 +0000
  • 7a5a2e9a3a
    Add files via upload bobqianic 2024-02-05 01:37:27 +0000
  • 8a46034af7
    Add files via upload bobqianic 2024-02-05 01:36:51 +0000
  • 8bcac3d28a Adding README.md for stream.grpc Shane Lenagh 2024-02-03 22:02:18 -0600
  • a6b15d98ed Build (CMake) support for bidi GRPC streaming transcription example Shane Lenagh 2024-02-03 21:33:39 -0600