Commit Graph

  • 4f88940ff6
    Add q3_s and q1_s (llama/5886) Abhilash Majumder 2024-03-11 10:27:56 +0530
  • 7bdb1de9ec
    metal : move mm_id indices to shared mem (llama/5982) Georgi Gerganov 2024-03-10 23:12:48 +0200
  • 653d2e8ff9
    ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (llama/5951) Georgi Gerganov 2024-03-09 17:36:20 +0200
  • 2fef660d0a
    ggml : remove old quantization functions (llama/5942) Georgi Gerganov 2024-03-09 15:53:59 +0200
  • 24eba5a2ff
    ggml : add ggml-common.h to deduplicate shared code (llama/5940) Georgi Gerganov 2024-03-09 12:47:57 +0200
  • 6e9d3aa32d
    llama : support Mamba Selective State Space Models (llama/5328) compilade 2024-03-08 17:31:00 -0500
  • 9ae0d18856
    extra : update sync scripts after ggml-common.h Georgi Gerganov 2024-03-15 14:00:53 +0200
  • 56102531b1 Fix aheads_masks_init for backend != CPU Dener Stassun 2024-03-14 18:29:13 +0000
  • 9fa298f9d5 Fix incorrect n_frames passed to dtw when near end of audio Dener Stassun 2024-03-14 15:15:51 -0300
  • 3283ad1830 return -1 to avoid confusion zhou.weiguo 2024-03-13 09:50:09 +0800
  • 10b0304a59 dtw: cleanup Dener Stassun 2024-03-12 09:53:59 -0300
  • 87f2620788 Copying cross QKs from decoder backend correctly Dener Stassun 2024-03-07 14:55:57 -0300
  • eb531c7d32 Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads Dener Stassun 2024-03-06 11:31:28 -0300
  • 3016444b7c Calling median filter with ggml_map_custom1 Dener Stassun 2024-03-05 10:19:13 -0300
  • 9a19200e22 decoder: save cross QKs only if requested Dener Stassun 2024-03-05 09:06:26 -0300
  • 641fb2c380 Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function Dener Stassun 2024-03-01 13:28:07 -0300
  • 4de1ed4b40 Fix issues related to changes in whisper.cpp Dener Stassun 2024-01-11 10:03:40 -0300
  • dfb24a4dab whisper: fix typo on alignment heads enum Dener Stassun 2023-12-14 14:22:33 -0300
  • 3a5f368ca4 implement N_TOP_MOST and CUSTOM alignment heads setting Dener Stassun 2023-12-11 09:29:09 -0300
  • 93eb345b14 Fix mistake causing incorrect alignment of dtw timestamps Dener Stassun 2023-12-07 16:20:53 -0300
  • f11ff92533 Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. Dener Stassun 2023-12-07 08:40:57 -0300
  • b69f5b4de3 WIP: producing and placing DTW timestamps on tokens Dener Stassun 2023-12-04 16:47:17 -0300
  • cd52c5ae10 whisper.cpp: impl dtw algo Dener Stassun 2023-11-13 16:47:36 -0300
  • 4e6c281192
    Merge branch 'ggerganov:master' into feat/progress 华丽 2024-03-12 13:32:31 +0800
  • a56f435fd4
    whisper : document whisper_batch.n_seq_id (#1942) Josh Bleecher Snyder 2024-03-10 07:55:22 -0700
  • ec166499d8
    whisper : improve beam search candidate diversity (#1947) Josh Bleecher Snyder 2024-03-10 07:54:43 -0700
  • b204d7cc24 whisper : document whisper_batch.n_seq_id Josh Bleecher Snyder 2024-03-08 09:33:19 -0800
  • f99b649a92 whisper : improve beam search candidate diversity Josh Bleecher Snyder 2024-03-09 18:30:54 -0800
  • ccf022f970
    bindings/go : add linker flags to make metal work (#1944) Josh Bleecher Snyder 2024-03-09 08:50:44 -0800
  • 2852e1af55
    whisper : make beam candidate sort more stable (#1943) Josh Bleecher Snyder 2024-03-09 08:50:03 -0800
  • a22a8684cd fix typo in examples/bench/bench.cpp zhou.weiguo 2024-03-09 09:00:07 +0800
  • ce945b50c3
    ggml : try fix 32-bit arm compat (#1938) Georgi Gerganov 2024-03-08 23:45:07 +0200
  • c13b1dca84 bindings/go : add linker flags to make metal work Josh Bleecher Snyder 2024-03-05 08:27:02 -0800
  • 7c040b440c whisper : make beam candidate sort more stable Josh Bleecher Snyder 2024-03-08 12:57:18 -0800
  • faba65159e
    ggml : fix cont Georgi Gerganov 2024-03-08 17:45:26 +0200
  • 2abc2d70f2
    ggml : try fix 32-bit arm compat Georgi Gerganov 2024-03-08 13:48:20 +0200
  • 2f5a5a66dd
    talk-llama : use llama_decode instead of llama_eval Georgi Gerganov 2024-03-08 12:04:43 +0200
  • 8e409d1113
    talk-llama : sync llama.cpp Georgi Gerganov 2024-03-08 11:55:50 +0200
  • 05d1b61af4
    talk-llama : sync llama.cpp Georgi Gerganov 2024-03-08 11:52:47 +0200
  • 647cae178a
    sync : ggml Georgi Gerganov 2024-03-08 11:39:34 +0200
  • bae7c23fbf
    Revert "[SYCL] fix error when set main gpu to non-zero (llama/5901)" (llama/5918) Neo Zhang Jianyu 2024-03-07 19:14:49 +0800
  • 18ea187d42
    fix error when set main gpu to non-zero (llama/5901) Neo Zhang Jianyu 2024-03-07 16:34:31 +0800
  • 1daeffca54
    ggml : use SYS_get_cpu if SYS_getcpu is not defined (llama/5906) Jared Van Bortel 2024-03-06 15:42:23 -0500
  • 2f6f1d4465
    ggml : use `uint8x16_t` return type for `ggml_vqtbl1q_u8` (llama/5894) bobqianic 2024-03-06 07:35:07 +0000
  • 7ff1894c34
    add wait() to make code stable (llama/5895) Neo Zhang Jianyu 2024-03-06 12:08:32 +0800
  • 8edfc54c2b
    quants : use MM256_SET_M128I consistently to fix gcc 7 build (llama/5889) Jared Van Bortel 2024-03-05 11:56:37 -0500
  • 9c399689ec
    Vulkan Improvements (llama/5835) 0cc4m 2024-03-05 13:33:42 +0100
  • 9d9a405cfd
    fix mul_mat fault in CI/unit-test (llama/5862) Neo Zhang Jianyu 2024-03-05 16:08:35 +0800
  • edd8b38a75
    ggml : fix unknown status (llama/0) Georgi Gerganov 2024-03-04 20:53:27 +0200
  • ed76818700
    whisper : fix compute helper return (ggml/750) Georgi Gerganov 2024-03-05 16:05:23 +0200
  • 9a0b59d990
    ggml : introduce ggml_status (ggml/750) Michael Podvitskiy 2024-03-04 10:05:42 +0100
  • 93a84a143b
    cuda : fix data race in soft max (llama/5853) slaren 2024-03-03 14:26:18 +0100
  • bd26876267
    ggml : fix IQ3_S AVX implementation (llama/5834) Georgi Gerganov 2024-03-02 20:00:49 +0200
  • 21d295180d
    ggml : IQ3_S improvements (llama/5829) Kawrakow 2024-03-02 17:00:51 +0200
  • c3bfc9bfda
    Support multiple GPUs (split mode) on SYCL backend (llama/5806) Neo Zhang Jianyu 2024-03-02 19:49:30 +0800
  • 422a6b16fc
    ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (llama/5813) ddpasa 2024-03-01 18:00:00 +0100
  • 11dd0d4482
    Use batched mul_mat pathway (llama/5591) AidanBeltonS 2024-03-01 07:36:47 +0000
  • 26dd2f06ac
    make portability_enumeration_ext apple only (llama/5757) Eve 2024-02-28 19:33:37 +0000
  • 8cee7c08b6
    add some new ops, fix some operators and add batch operations to certain operators. (ggml/747) leejet 2024-03-03 20:23:52 +0800
  • 0be62becec add build scripts of bench.cpp to generate bench tool for android-based device zhou.weiguo 2024-03-07 20:27:14 +0800
  • a99e5dab66 add build scripts of bench.cpp to generate bench tool for android-based device zhou.weiguo 2024-03-07 20:08:53 +0800
  • 2e2626b167
    examples : Auto lowercase language parameter in main.cpp (#1928) F1L1P 2024-03-06 23:25:10 +0100
  • c0c0ae2dea
    examples : fix typo in bench.cpp (#1933) zhouwg 2024-03-07 06:21:44 +0800
  • b40606b996 bench:fix typo zhou.weiguo 2024-03-06 23:47:52 +0800
  • f03270a5e2 add build scripts for bench.cpp zhou.weiguo 2024-03-06 20:30:13 +0800
  • 4b24c7d96e add build scripts for bench.cpp zhou.weiguo 2024-03-06 19:16:56 +0800
  • 6ebff08221 add build scripts for bench.cpp zhou.weiguo 2024-03-06 18:17:26 +0800
  • a712a93d1c add build scripts for bench.cpp zhou.weiguo 2024-03-06 18:08:17 +0800
  • 67c0a9fca1 add build scripts for bench.cpp zhou.weiguo 2024-03-06 18:02:11 +0800
  • 8aa2e6a226
    Update examples/main/main.cpp F1L1P 2024-03-05 21:07:17 +0100
  • 06e73da378 refactor: typescript zcf0508 2024-03-05 23:12:46 +0800
  • 897412b5b6
    whisper : fix typo (#1925) zhouwg 2024-03-05 23:06:31 +0800
  • f22d27a385
    whisper.android.java : fix returns in JNI (#1929) zhouwg 2024-03-05 21:59:26 +0800
  • 98d895afa7 fix SF in JNI zhou.weiguo 2024-03-05 21:05:59 +0800
  • 380c2bebfb Auto lowercase language parameter F1L1P 2024-03-05 11:46:07 +0100
  • 076827069c refine original android demo zhou.weiguo 2024-03-05 13:26:44 +0800
  • a9db18b329 fix typo in whisper.cpp zhou.weiguo 2024-03-05 13:20:17 +0800
  • 8472186df2 fix: avoid test fail zcf0508 2024-03-05 10:12:53 +0800
  • 19b8436ef1
    Merge branch 'master' into androidStreaming liam-mceneaney 2024-03-04 20:03:55 -0500
  • 5bc5434f71 Android realtime whisper transcription attempt. liam-mceneaney 2024-03-04 19:36:42 -0500
  • ccd7c1d2da
    cmake : add library versioning (#1352) kennethge 2024-03-04 14:17:48 -0500
  • 2cc6a5c83c
    Merge branch 'master' into add-versioning Georgi Gerganov 2024-03-04 21:17:21 +0200
  • c713eb5e2a
    readme : recommend MacOS Sonoma for Core ML (#1917) Gavin Cai 2024-03-04 11:16:13 -0800
  • e7f9b70e56 Update README to Recommend MacOS Sonoma for Core ML to avoid hallucination Dongcheng Cai 2024-03-03 16:52:37 -0800
  • 5ecffc9d8e feat: node addon support muti input files and callback function support provide progress zcf0508 2024-03-01 17:54:18 +0800
  • 25d313b38b
    talk-llama : sync llama.cpp Georgi Gerganov 2024-02-28 13:04:05 +0200
  • 3168dbf23b
    sync : ggml Georgi Gerganov 2024-02-28 13:01:33 +0200
  • 1711bb3881
    sync : llama.cpp (ggml/0) Georgi Gerganov 2024-02-28 12:59:11 +0200
  • 2533305596
    ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (llama/5760) Kawrakow 2024-02-28 10:37:02 +0200
  • 0eca512ac8
    Attempt to fix android build (llama/5752) Kawrakow 2024-02-27 19:16:49 +0200
  • 013e394a4b
    IQ4_XS: a 4.25 bpw quantization (llama/5747) Kawrakow 2024-02-27 16:34:24 +0200
  • d83f371b5f
    cuda : replace remaining shfl_xor with calls to warp_reduce functions (llama/5744) Engininja2 2024-02-27 07:22:45 -0600
  • 1c71816eab
    ggml-quants : fix avx2 iq1_s vec_dot when compiled with gcc (llama/5742) Engininja2 2024-02-27 06:50:18 -0600
  • 7b1d8ea7e0
    Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (llama/5721) Kawrakow 2024-02-26 18:28:38 +0200
  • b1f7223a0a
    CUDA: fix DEBUG_CUDA_MALLOC (llama/5729) Johannes Gäßler 2024-02-26 15:36:38 +0100
  • 8408a4be8e
    Add support for soft_max ALiBi (llama/5639) AidanBeltonS 2024-02-26 14:02:11 +0000
  • 72849c24ba
    ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (llama/5711) Radosław Gryta 2024-02-25 19:43:00 +0100
  • c19c28be71
    add google magika inference example (ggml/748) slaren 2024-02-25 20:41:35 +0100
  • 0d8fd8483a
    stream.wasm : fix invalid memory access when no segments (#1902) Andrew S 2024-02-26 02:12:35 -0600
  • 3170841ed9
    talk-llama : sync llama.cpp Georgi Gerganov 2024-02-25 20:00:10 +0200