Commit Graph

  • fb466b3417
    ggml : sync ggml-metal.m Georgi Gerganov 2024-01-18 11:03:13 +0200
  • 1f50a7d29f
    sync : llama.cpp Georgi Gerganov 2024-01-17 21:23:33 +0200
  • 1de21b913d
    sync : ggml Georgi Gerganov 2024-01-17 21:22:38 +0200
  • 4aea058e5a
    ggml : add IQ2 to test-backend-ops + refactoring (llama/4990) Georgi Gerganov 2024-01-17 18:54:56 +0200
  • fd10234363
    imatrix : offload to GPU support (llama/4957) Georgi Gerganov 2024-01-17 18:46:30 +0200
  • 8fb5c6a409
    backend : add eval callback (llama/4935) Georgi Gerganov 2024-01-17 18:39:41 +0200
  • 2fe5fbfcc2
    metal : create autorelease pool during library build (llama/4970) Georgi Gerganov 2024-01-17 18:38:39 +0200
  • 01637e1a4c
    ggml : importance matrix support for legacy quants (llama/4969) Kawrakow 2024-01-16 19:51:26 +0200
  • 1b349eb1f9
    metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (llama/4936) Alex Azarov 2024-01-16 14:33:02 +0100
  • 138eaebead
    ggml : introduce GGML_CALL function annotation (llama/4850) Justine Tunney 2024-01-16 03:16:33 -0800
  • 61b9192f27
    cuda : fix dequantize kernel names (llama/4938) Georgi Gerganov 2024-01-15 13:27:00 +0200
  • 161b51d91a
    CUDA: faster dequantize kernels for Q4_0 and Q4_1 (llama/4938) Kawrakow 2024-01-15 07:48:06 +0200
  • f904b31a7d
    Add ability to use importance matrix for all k-quants (llama/4930) Kawrakow 2024-01-14 16:21:12 +0200
  • 2676819cb5
    edit some comments bobqianic 2024-01-16 23:32:55 +0000
  • 41df3f010a
    Remove hallucination by using `token_nosp` bobqianic 2024-01-16 22:15:27 +0000
  • f6614155e4
    talk-llama : optional wake-up command and audio confirmation (#1765) Benjamin Heiniger 2024-01-16 14:52:01 +0100
  • 5ea1d91310
    Merge branch 'ggerganov:master' into fix-decoding bobqianic 2024-01-15 23:51:50 +0000
  • 271c321bc5
    Revert some changes bobqianic 2024-01-15 23:48:30 +0000
  • 80589d2bf2
    Revert some changes bobqianic 2024-01-15 23:46:43 +0000
  • b5c4d5cd46
    Add files via upload bobqianic 2024-01-15 21:47:05 +0000
  • 3818acbbcc
    Add files via upload bobqianic 2024-01-15 21:46:15 +0000
  • 4b3a21143e
    Fix ruby and go bindings bobqianic 2024-01-15 21:39:19 +0000
  • 5e2c820fd1
    Update Makefile bobqianic 2024-01-15 21:18:54 +0000
  • 96a9349f1a
    Add files via upload bobqianic 2024-01-15 20:24:20 +0000
  • 7047d32141
    Merge pull request #2 from bobqianic/patch bobqianic 2024-01-15 20:16:13 +0000
  • c8528a7c10
    Add files via upload bobqianic 2024-01-15 20:14:52 +0000
  • 9d0ebd193f
    Add files via upload bobqianic 2024-01-15 20:13:58 +0000
  • 6648641e1b
    Add files via upload bobqianic 2024-01-15 19:44:09 +0000
  • 7499e3c8ec
    Merge pull request #1 from bobqianic/bobqianic-patch-1 bobqianic 2024-01-15 19:42:08 +0000
  • dfef69ef49
    Delete server directory bobqianic 2024-01-15 19:41:44 +0000
  • c53c33b6b0
    revert change bobqianic 2024-01-15 19:41:10 +0000
  • 1226204af2
    Add files via upload bobqianic 2024-01-15 19:38:02 +0000
  • 28f10498af
    Update Makefile bobqianic 2024-01-15 18:23:47 +0000
  • f5f159c320
    server : fix building and simplify lib deps on Windows (#1772) Przemysław Pawełczyk 2024-01-15 14:48:13 +0100
  • 5036229f41 cmake : simplify server example lib deps on Windows Przemyslaw Pawelczyk 2024-01-15 13:45:13 +0100
  • 346ea9304d make : fix server example building on MSYS2 environments (Windows) Przemyslaw Pawelczyk 2024-01-15 13:25:38 +0100
  • 7c15a462bc
    Merge branch 'ggerganov:master' into master Benjamin Heiniger 2024-01-14 20:42:53 +0100
  • 076d1e1d78
    talk-llama.cpp: fix Windows build Benjamin Heiniger 2024-01-14 20:41:17 +0100
  • 6ebba525f1
    talk-llama : sync llama.cpp Georgi Gerganov 2024-01-14 18:08:20 +0200
  • 8301f8874b
    Add files via upload bobqianic 2024-01-14 15:53:40 +0000
  • 71a65e7b7a
    Add files via upload bobqianic 2024-01-14 15:14:35 +0000
  • 2a5874441d
    talk-llama : llama.cpp Georgi Gerganov 2024-01-14 11:06:28 +0200
  • d08445c9ad
    sync : ggml Georgi Gerganov 2024-01-14 10:55:18 +0200
  • 4a945696cb
    metal : correctly set SIMD support flags on iOS (llama/4923) Alex Azarov 2024-01-14 09:44:39 +0100
  • dabc964d83
    2-bit quantizations (llama/4897) Kawrakow 2024-01-14 09:45:56 +0200
  • 654baf693d
    scripts : sync-ggml-am.sh add option to skip commits Georgi Gerganov 2024-01-14 10:53:19 +0200
  • 0644da442d
    talk-llama: fix small formatting issue in output Benjamin Heiniger 2024-01-14 04:09:39 +0100
  • f2c2ff9d67
    talk-llama: add optional audio confirmation before generating answer Benjamin Heiniger 2024-01-14 03:31:39 +0100
  • e93891833d
    talk-llama: add optional wake-word detection from command Benjamin Heiniger 2024-01-14 03:29:50 +0100
  • f001a3b7b6
    talk-llama : sync llama.cpp Georgi Gerganov 2024-01-14 00:13:17 +0200
  • c615f2c335
    sync : ggml Georgi Gerganov 2024-01-14 00:12:17 +0200
  • d839dd0242
    examples : adapt to metal API Georgi Gerganov 2024-01-14 00:09:26 +0200
  • 435847891c
    ggml: cache sin/cos for RoPE (llama/4908) Johannes Gäßler 2024-01-13 21:41:37 +0100
  • 182f290808
    metal : remove old API (llama/4919) Georgi Gerganov 2024-01-13 20:45:45 +0200
  • 447dfc11fc
    metal : disable log for loaded kernels (llama/4794) Georgi Gerganov 2024-01-13 18:46:37 +0200
  • 9aa9f3b84e
    gguf : fix potential infinite for-loop (llama/4600) texmex76 2024-01-13 17:06:20 +0100
  • 396ebd1e80
    metal : refactor kernel loading code (llama/4794) Georgi Gerganov 2024-01-13 18:03:45 +0200
  • 12490f4398
    CUDA: faster q8_0 -> f16 dequantization (llama/4895) Johannes Gäßler 2024-01-12 20:38:54 +0100
  • db078a9ba8
    talk-llama : add optional CLI arg to set the bot name (#1764) RhinoDevel 2024-01-13 19:51:35 +0100
  • 2d1b58f617 Add optional commandline parameter to set the bot name. Marc 2024-01-13 19:38:17 +0100
  • a13a7da5ad
    examples : add python example for transcription (#1744) james wolf 2024-01-13 12:37:18 -0500
  • dcaba63a64 moved python files to examples/python contractorwolf 2024-01-13 12:28:26 -0500
  • 519f8e8684
    whisper : load the model into multiple buffers of max size 1GB (#1763) Georgi Gerganov 2024-01-13 17:47:40 +0200
  • 49edad37d1
    whisper : load the model into multiple buffers of max size 1GB Georgi Gerganov 2024-01-13 15:45:38 +0200
  • 40ae0962f4
    talk-llama : sync llama.cpp Georgi Gerganov 2024-01-12 22:04:51 +0200
  • 1560288048
    sync : ggml Georgi Gerganov 2024-01-12 21:56:50 +0200
  • 1ad6fafd91
    backend_sched : fix assignments slaren 2024-01-12 20:38:34 +0100
  • 70840aed5f
    llama : ggml-backend integration (llama/4766) slaren 2024-01-12 20:07:38 +0100
  • b24d18feb9
    CUDA: fix softmax compile for old CUDA versions (llama/4862) Johannes Gäßler 2024-01-12 12:30:41 +0100
  • 3fa98f4395
    Importance Matrix calculation (llama/4861) Kawrakow 2024-01-12 06:59:57 +0100
  • d05b7ee90e
    models : make all scripts to be POSIX Compliant (#1725) Sơn Phan Trung 2024-01-12 19:11:04 +0700
  • 6dcee35129
    ggml : fix 32-bit ARM compat for IQ2_XS (#1758) Georgi Gerganov 2024-01-12 14:02:30 +0200
  • 563c7f1687
    ggml : fix fix fix Georgi Gerganov 2024-01-12 14:00:49 +0200
  • 2d55685bad
    ggml : fix fix Georgi Gerganov 2024-01-12 13:59:18 +0200
  • 4b87469aee
    ggml : fix 32-bit ARM compat Georgi Gerganov 2024-01-12 13:58:19 +0200
  • 5cb345f5e9
    go : add SetInitialPrompt method to bindings (#1753) Boris Bliznioukov 2024-01-12 14:44:50 +0300
  • fbcb52d3cd
    server : add more parameters to server api (#1754) George Hindle 2024-01-12 11:42:52 +0000
  • 6b01e3fedd
    whisper : fix segment length with params.no_timestamps == true Georgi Gerganov 2024-01-12 13:37:38 +0200
  • f7908f9bb8
    params : don't compute timestamps when not printing them (#1755) George Hindle 2024-01-12 11:24:38 +0000
  • dfc7c63124
    Merge branch 'ggerganov:master' into master Boris Bliznioukov 2024-01-12 10:16:26 +0300
  • 00b7a4be02
    talk-llama : sync llama.cpp Georgi Gerganov 2024-01-11 22:10:10 +0200
  • 04b0a768b8
    swift : remove local ggml.h reference Georgi Gerganov 2024-01-11 22:00:12 +0200
  • 87670425f2
    swift : track ggml release branch Georgi Gerganov 2024-01-11 21:57:40 +0200
  • 32e71a1861
    sync : ggml Georgi Gerganov 2024-01-11 21:54:17 +0200
  • 9c857cf280
    sync : llama.cpp Georgi Gerganov 2024-01-11 21:49:13 +0200
  • 97b12212dd
    ggml : SOTA 2-bit quants (add IQ2_XS) (llama/4856) Kawrakow 2024-01-11 20:39:39 +0100
  • 9fa34d79ec
    metal : put encoder debug group behind a define (llama/4873) Paul Tsochantaris 2024-01-11 14:31:52 +0000
  • a0a64a19dd
    metal : improve dequantize precision to match CPU (llama/4836) Georgi Gerganov 2024-01-09 19:37:08 +0200
  • bbc23611fa
    ggml : fix vld1q_s8_x4 32-bit compat (llama/4828) Georgi Gerganov 2024-01-09 10:42:06 +0200
  • e9783a1fb4
    CUDA: faster softmax via shared memory + fp16 math (llama/4742) Johannes Gäßler 2024-01-09 08:58:55 +0100
  • 9e0cc28792
    metal : fix deprecation warning (ggml/690) Georgi Gerganov 2024-01-11 09:34:59 +0200
  • 73072a7c73
    ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693) Timothy Cronin 2024-01-11 02:27:48 -0500
  • a8ba1262ff
    metal : wrap each operation in debug group (ggml/690) Jack Mousseau 2024-01-10 06:19:19 -0800
  • e66a9a7806
    ggml : change GGML_MAX_NAME at compile time (ggml/682) leejet 2024-01-10 21:13:42 +0800
  • 338442d773
    Fix execlp call (ggml/689) Halalaluyafail3 2024-01-09 11:16:37 -0500
  • 10651bddf6
    SOTA 2-bit quants (llama/4773) Kawrakow 2024-01-08 16:02:32 +0100
  • 53d4d0b30d
    CUDA: fixed redundant value dequantization (llama/4809) Johannes Gäßler 2024-01-07 17:24:08 +0100
  • 2865e4710b
    ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (llama/4787) Konstantin Zhuravlyov 2024-01-07 01:52:42 -0500
  • c46a74a19d
    ggml : do not sched_yield when calling BLAS (llama/4761) Georgi Gerganov 2024-01-05 15:18:21 +0200
  • 46dc49a6a1
    ggml : include stdlib.h before intrin.h (llama/4736) Georgi Gerganov 2024-01-04 10:12:26 +0200