llama.cpp/examples
Georgi Gerganov ecb217db4f
llama : Metal inference (#1642)
* mtl : export the LLaMA computation graph

* ci : disable temporary

* mtl : adapt the MNIST example as starter

* mtl : no need for mtl-export tool, add cli arg for main instead

* mtl : export just a small part of the graph for now to make it easier

* mtl : move MSL code into separate file for easy editing

* mtl : initial get_rows_q4_0 kernel

* mtl : confirmed get_rows_q4_0 is working correctly

* mtl : add rms_norm kernel + confirm working

* mtl : add mul kernel + confirm working

* mtl : initial mul_mat Q4 kernel (wrong results)

* mtl : mul_mat fixes (still wrong)

* mtl : another mul_mat Q4 (still does not work)

* mtl : working mul_mat q4

* ggml : fix handling of "view" ops in ggml_graph_import()

* mtl : add rope kernel

* mtl : add reshape and transpose handling

* ggml : store offset as opt arg for ggml_view_xd() operators

* mtl : add cpy kernel + handle view ops

* mtl : confirm f16 x f32 attention mul mat

* mtl : add scale kernel

* mtl : add diag_mask_inf kernel

* mtl : fix soft_max kernel

* ggml : update ggml_nbytes() to handle non-contiguous tensors

* mtl : verify V tensor contents

* mtl : add f32 -> f32 cpy kernel

* mtl : add silu kernel

* mtl : add non-broadcast mul kernel

* mtl : full GPU inference of the computation graph

* mtl : optimize rms_norm and soft_max kernels

* mtl : add f16 mat x f32 vec multiplication kernel

* mtl : fix bug in f16 x f32 mul mat + speed-up computation

* mtl : faster mul_mat_q4_0_f32 kernel

* mtl : fix kernel signature + roll inner loop

* mtl : more threads for rms_norm + better timing

* mtl : remove printfs from inner loop

* mtl : simplify implementation

* mtl : add save/load vocab to ggml file

* mtl : plug Metal inference into llama.cpp (very quick-n-dirty)

* mtl : make it work with main example

Lots of hacks but at least now it generates text

* mtl : preparing for merge

* mtl : clean-up ggml mtl interface + suport scratch / inplace

* mtl : remove temp / debug code

* metal : final refactoring and simplification

* Revert "ci : disable temporary"

This reverts commit 98c267fc77.

* metal : add comments

* metal : clean-up stuff, fix typos

* readme : add Metal instructions

* readme : add example for main
2023-06-04 23:34:30 +03:00
..
baby-llama ggml : implement backward pass for llama + small training-llama-from-scratch example (#1360) 2023-05-13 15:56:40 +03:00
benchmark llama : add llama_init_backend() API (close #1527) 2023-05-20 11:06:37 +03:00
embedding llama : add llama_init_backend() API (close #1527) 2023-05-20 11:06:37 +03:00
jeopardy examples : add Jeopardy example (#1168) 2023-04-28 19:13:33 +03:00
main llama : Metal inference (#1642) 2023-06-04 23:34:30 +03:00
metal llama : Metal inference (#1642) 2023-06-04 23:34:30 +03:00
perplexity llama : add llama_init_backend() API (close #1527) 2023-05-20 11:06:37 +03:00
quantize llama : add llama_init_backend() API (close #1527) 2023-05-20 11:06:37 +03:00
quantize-stats Remove unused n_parts parameter (#1509) 2023-05-17 22:12:01 +00:00
save-load-state Remove unused n_parts parameter (#1509) 2023-05-17 22:12:01 +00:00
server Only show -ngl option when relevant + other doc/arg handling updates (#1625) 2023-05-28 11:48:57 -06:00
alpaca.sh examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) 2023-04-22 09:54:33 +03:00
chat-13B.bat Create chat-13B.bat (#592) 2023-03-29 20:21:09 +03:00
chat-13B.sh examples : read chat prompts from a template file (#1196) 2023-05-03 20:58:11 +03:00
chat-persistent.sh chat-persistent.sh : use bracket expressions in grep (#1564) 2023-05-24 09:16:22 +03:00
chat.sh If n_predict == -1, generate forever 2023-03-25 21:51:41 +02:00
CMakeLists.txt llama : Metal inference (#1642) 2023-06-04 23:34:30 +03:00
common.cpp llama : Metal inference (#1642) 2023-06-04 23:34:30 +03:00
common.h llama : Metal inference (#1642) 2023-06-04 23:34:30 +03:00
gpt4all.sh examples : add -n to alpaca and gpt4all scripts (#706) 2023-04-13 16:03:39 +03:00
Miku.sh examples : various prompt and example fixes (#1298) 2023-05-03 18:26:47 +03:00
reason-act.sh add example of re-act pattern (#583) 2023-03-29 10:10:24 -05:00