Commit graph

218 commits

Author SHA1 Message Date
Georgi Gerganov b1306c4394
readme : update hot topics 2023-12-17 20:16:23 +02:00
BarfingLemurs 0353a18401
readme : update supported model list (#4457) 2023-12-14 09:38:49 +02:00
Georgi Gerganov 113f9942fc
readme : update hot topics 2023-12-13 14:05:38 +02:00
Georgi Gerganov bcc0eb4591
llama : per-layer KV cache + quantum K cache (#4309)
* per-layer KV

* remove unnecessary copies

* less code duplication, offload k and v separately

* llama : offload KV cache per-layer

* llama : offload K shift tensors

* llama : offload for rest of the model arches

* llama : enable offload debug temporarily

* llama : keep the KV related layers on the device

* llama : remove mirrors, perform Device -> Host when partial offload

* common : add command-line arg to disable KV cache offloading

* llama : update session save/load

* llama : support quantum K cache (#4312)

* llama : support quantum K cache (wip)

* metal : add F32 -> Q8_0 copy kernel

* cuda : add F32 -> Q8_0 copy kernel

ggml-ci

* cuda : use mmv kernel for quantum cache ops

* llama : pass KV cache type through API

* llama : fix build

ggml-ci

* metal : add F32 -> Q4_0 copy kernel

* metal : add F32 -> Q4_1 copy kernel

* cuda : wip

* cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels

* llama-bench : support type_k/type_v

* metal : use mm kernel only for quantum KV cache

* cuda : add comment

* llama : remove memory_f16 and kv_f16 flags

---------

Co-authored-by: slaren <slarengh@gmail.com>

* readme : add API change notice

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-12-07 13:03:17 +02:00
vodkaslime 524907aa76
readme : fix (#4135)
* fix: readme

* chore: resolve comments

* chore: resolve comments
2023-11-30 23:49:21 +02:00
Dawid Wysocki 74daabae69
readme : fix typo (#4253)
llama.cpp uses GitHub Actions, not Gitlab Actions.
2023-11-30 23:43:32 +02:00
Peter Sugihara 4fea3420ee
readme : add FreeChat (#4248) 2023-11-29 09:16:34 +02:00
Kasumi 0dab8cd7cc
readme : add Amica to UI list (#4230) 2023-11-27 19:39:42 +02:00
Georgi Gerganov 9656026b53
readme : update hot topics 2023-11-26 20:42:51 +02:00
Georgi Gerganov 04814e718e
readme : update hot topics 2023-11-25 12:02:13 +02:00
Aaryaman Vasishta b35f3d0def
readme : use PATH for Windows ROCm (#4195)
* Update README.md to use PATH for Windows ROCm

* Update README.md

* Update README.md
2023-11-24 09:52:39 +02:00
Georgi Gerganov d103d935c0
readme : update hot topics 2023-11-23 13:51:22 +02:00
Aaryaman Vasishta dfc7cd48b1
readme : update ROCm Windows instructions (#4122)
* Update README.md

* Update README.md

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

---------

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-20 17:02:46 +02:00
Galunid 36eed0c42c
stablelm : StableLM support (#3586)
* Add support for stablelm-3b-4e1t
* Supports GPU offloading of (n-1) layers
2023-11-14 11:17:12 +01:00
Georgi Gerganov c049b37d7b
readme : update hot topics 2023-11-13 14:18:08 +02:00
Richard Kiss 532dd74e38
Fix some documentation typos/grammar mistakes (#4032)
* typos

* Update examples/parallel/README.md

Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>

---------

Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-11-11 23:04:58 -07:00
Georgi Gerganov 224e7d5b14
readme : add notice about #3912 2023-11-02 20:44:12 +02:00
Ian Scrivener 5a42a5f8e8
readme : remove unsupported node.js library (#3703)
- https://github.com/Atome-FE/llama-node is quite out of date
- doesn't support recent/current llama.cpp functionality
2023-10-22 21:16:43 +03:00
Georgi Gerganov d1031cf49c
sampling : refactor init to use llama_sampling_params (#3696)
* sampling : refactor init to use llama_sampling_params

* llama : combine repetition, frequency and presence penalties in 1 call

* examples : remove embd-input and gptneox-wip

* sampling : rename penalty params + reduce size of "prev" vector

* sampling : add llama_sampling_print helper

* sampling : hide prev behind API and apply #3661

ggml-ci
2023-10-20 21:07:23 +03:00
Georgi Gerganov 004797f6ac
readme : update hot topics 2023-10-18 21:44:43 +03:00
BarfingLemurs 8402566a7c
readme : update hot-topics & models, detail windows release in usage (#3615)
* Update README.md

* Update README.md

* Update README.md

* move "Running on Windows" section below "Prepare data and run"

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 21:13:21 +03:00
ldwang 5fe268a4d9
readme : add Aquila2 links (#3610)
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-10-17 18:52:33 +03:00
Ian Scrivener f3040beaab
typo : it is --n-gpu-layers not --gpu-layers (#3592)
fixed a typo in the MacOS Metal run doco
2023-10-12 14:10:50 +03:00
Galunid 9f6ede19f3
Add MPT model to supported models in README.md (#3574) 2023-10-10 19:02:49 -04:00
Xingchen Song(宋星辰) c5b49360d0
readme : add bloom (#3570) 2023-10-10 19:28:50 +03:00
BarfingLemurs 1faaae8c2b
readme : update models, cuda + ppl instructions (#3510) 2023-10-06 22:13:36 +03:00
Georgi Gerganov beabc8cfb0
readme : add project status link 2023-10-04 16:50:44 +03:00
slaren 40e07a60f9
llama.cpp : add documentation about rope_freq_base and scale values (#3401)
* llama.cpp : add documentation about rope_freq_base and scale values

* add notice to hot topics
2023-09-29 18:42:32 +02:00
BarfingLemurs 0a4a4a0982
readme : update hot topics + model links (#3399) 2023-09-29 15:50:35 +03:00
Andrew Duffy 569550df20
readme : add link to grammars app (#3388)
* Add link to grammars app per @ggernagov suggestion

Adding a sentence in the Grammars section of README to point to grammar app, per https://github.com/ggerganov/llama.cpp/discussions/2494#discussioncomment-7138211

* Update README.md
2023-09-29 14:15:57 +03:00
Pierre Alexandre SCHEMBRI 4aea3b846e
readme : add Mistral AI release 0.1 (#3362) 2023-09-28 15:13:37 +03:00
BarfingLemurs ffe88a36a9
readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340)
* Update README.md

* Update README.md

* Update README.md with k-quants bpw measurements
2023-09-27 18:30:36 +03:00
2f38b454 1726f9626f
docs: Fix typo CLBlast_DIR var. (#3330) 2023-09-25 20:24:52 +02:00
Lee Drake bc9d3e3971
Update README.md (#3289)
* Update README.md

* Update README.md

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-09-21 21:00:24 +02:00
Georgi Gerganov 7eb41179ed
readme : update hot topics 2023-09-20 20:48:22 +03:00
Johannes Gäßler 111163e246
CUDA: enable peer access between devices (#2470) 2023-09-17 16:37:53 +02:00
dylan 980ab41afb
docker : add gpu image CI builds (#3103)
Enables the GPU enabled container images to be built and pushed
alongside the CPU containers.

Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com>
2023-09-14 19:47:00 +03:00
Ikko Eltociear Ashimine 7d99aca759
readme : fix typo (#3043)
* readme : fix typo

acceleation -> acceleration

* Update README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-08 19:04:32 +03:00
Georgi Gerganov 94f10b91ed
readme : update hot tpoics 2023-09-08 18:18:04 +03:00
Yui 6ff712a6d1
Update deprecated GGML TheBloke links to GGUF (#3079) 2023-09-08 12:32:55 +02:00
Georgi Gerganov e36ecdccc8
build : on Mac OS enable Metal by default (#2901)
* build : on Mac OS enable Metal by default

* make : try to fix build on Linux

* make : move targets back to the top

* make : fix target clean

* llama : enable GPU inference by default with Metal

* llama : fix vocab_only logic when GPU is enabled

* common : better `n_gpu_layers` assignment

* readme : update Metal instructions

* make : fix merge conflict remnants

* gitignore : metal
2023-09-04 22:26:24 +03:00
Ido S 340af42f09
docs : add catai to README.md (#2967) 2023-09-03 08:50:51 +03:00
bandoti 52315a4216
readme : update clblast instructions (#2903)
* Update Windows CLBlast instructions

* Update Windows CLBlast instructions

* Remove trailing whitespace
2023-09-02 15:53:18 +03:00
Konstantin Herud 49bb9cbe0f
docs : add java-llama.cpp to README.md (#2935) 2023-09-01 16:36:14 +03:00
Gilad S 35092fb547
docs : add node-llama-cpp to README.md (#2885) 2023-08-30 11:40:12 +03:00
slaren c03a243abf
remove outdated references to -eps and -gqa from README (#2881) 2023-08-29 23:17:34 +02:00
Jhen-Jie Hong 74e0caeb82
readme : add react-native binding (#2869) 2023-08-29 12:30:10 +03:00
Georgi Gerganov da7455d046
readme : fix headings 2023-08-27 15:52:34 +03:00
Georgi Gerganov c48c5bb0b0
readme : update hot topics 2023-08-27 14:44:35 +03:00
Henri Vasserman 6bbc598a63
ROCm Port (#1087)
* use hipblas based on cublas
* Update Makefile for the Cuda kernels
* Expand arch list and make it overrideable
* Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5)
* add hipBLAS to README
* new build arg LLAMA_CUDA_MMQ_Y
* fix half2 decomposition
* Add intrinsics polyfills for AMD
* AMD assembly optimized __dp4a
* Allow overriding CC_TURING
* use "ROCm" instead of "CUDA"
* ignore all build dirs
* Add Dockerfiles
* fix llama-bench
* fix -nommq help for non CUDA/HIP

---------

Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com>
Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Co-authored-by: jammm <2500920+jammm@users.noreply.github.com>
Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com>
2023-08-25 12:09:42 +03:00