llama.cpp

History

slaren 5bf3953d7e cuda : improve cuda pool efficiency using virtual memory (#4606 ) * cuda : improve cuda pool efficiency using virtual memory * fix mixtral * fix cmake build * check for vmm support, disable for hip ggml-ci * fix hip build * clarify granularity * move all caps to g_device_caps * refactor error checking * add cuda_pool_alloc, refactor most pool allocations ggml-ci * fix hip build * CUBLAS_TF32_TENSOR_OP_MATH is not a macro * more hip crap * llama : fix msvc warnings * ggml : fix msvc warnings * minor * minor * cuda : fallback to CPU on host buffer alloc fail * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * ensure allocations are always aligned * act_size -> actual_size --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>		2023-12-24 14:34:22 +01:00
..
CMakeLists.txt	sync : ggml (new ops, tests, backend, etc.) (#4359 )	2023-12-07 22:26:54 +02:00
test-backend-ops.cpp	ggml : change ggml_scale to take a float instead of tensor (#4573 )	2023-12-21 23:20:49 +02:00
test-c.c	tests : add a C compliance test (#2848 )	2023-08-30 09:20:26 +03:00
test-double-float.cpp	ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861 )	2023-10-30 19:19:15 +02:00
test-grad0.cpp	cuda : improve cuda pool efficiency using virtual memory (#4606 )	2023-12-24 14:34:22 +01:00
test-grammar-parser.cpp	gguf : new file format with flexible meta data (beta) (#2398 )	2023-08-21 23:07:43 +03:00
test-llama-grammar.cpp	gguf : new file format with flexible meta data (beta) (#2398 )	2023-08-21 23:07:43 +03:00
test-opt.cpp	sync : ggml (backend v2) (#3912 )	2023-11-13 14:16:23 +02:00
test-quantize-fns.cpp	ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861 )	2023-10-30 19:19:15 +02:00
test-quantize-perf.cpp	ggml : use ggml_row_size where possible (#4472 )	2023-12-14 20:05:21 +01:00
test-rope.cpp	llama : custom attention mask + parallel decoding + no context swaps (#3228 )	2023-09-28 19:04:36 +03:00
test-sampling.cpp	sampling : refactor init to use llama_sampling_params (#3696 )	2023-10-20 21:07:23 +03:00
test-tokenizer-0-falcon.cpp	Minor improvements in GPT2 tokenizer (#3567 )	2023-10-10 18:59:52 +02:00
test-tokenizer-0-falcon.py	ci : add flake8 to github actions (python linting) (#4129 )	2023-11-20 11:35:47 +01:00
test-tokenizer-0-llama.cpp	Minor improvements in GPT2 tokenizer (#3567 )	2023-10-10 18:59:52 +02:00
test-tokenizer-0-llama.py	ci : add flake8 to github actions (python linting) (#4129 )	2023-11-20 11:35:47 +01:00
test-tokenizer-1-bpe.cpp	Add more tokenizer tests (#3742 )	2023-10-24 09:17:17 +02:00
test-tokenizer-1-llama.cpp	Work on the BPE tokenizer (#3252 )	2023-10-03 09:16:26 +02:00