llama.cpp

Author	SHA1	Message	Date
Stephan Walter	948d124837	AVX implementations (#1370 )	2023-05-08 22:14:06 +03:00
Georgi Gerganov	d155f0f865	scripts : add script for measuring the time per token	2023-05-08 22:06:54 +03:00
Georgi Gerganov	8fbf7777ce	ggml : fix Q5_0 quantization	2023-05-08 21:36:57 +03:00
Georgi Gerganov	60f62bbc85	ggml : minor formatting	2023-05-08 21:36:57 +03:00
Georgi Gerganov	7cdc08a5d1	ggml : remove Q4_2 mode	2023-05-08 21:36:55 +03:00
Georgi Gerganov	b47bd2877f	ggml : update cuBLAS + normalize variable names	2023-05-08 21:35:52 +03:00
Georgi Gerganov	c216656990	ggml : fix Q4_1 quantization	2023-05-08 21:35:52 +03:00
Georgi Gerganov	4991499a5a	ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit	2023-05-08 21:35:52 +03:00
Georgi Gerganov	ba953d6e21	ggml : simplify scalar dot	2023-05-08 21:35:52 +03:00
Georgi Gerganov	c7af9042b3	ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)	2023-05-08 21:35:52 +03:00
Georgi Gerganov	39bb8e7d19	ggml : 2x faster scalar implementations	2023-05-08 21:35:52 +03:00
Georgi Gerganov	796f8ae261	ggml : remove Q5_0 bit shuffling (ARM NEON)	2023-05-08 21:35:51 +03:00
Georgi Gerganov	a6a1d96c91	ggml : remove Q4_2 bit shuffling (WIP, BROKEN)	2023-05-08 21:35:51 +03:00
Georgi Gerganov	086cfea11f	ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)	2023-05-08 21:35:51 +03:00
Georgi Gerganov	edb6c8bb66	ggml : remove Q4_1 bit shuffling (ARM NEON + reference)	2023-05-08 21:35:51 +03:00
Georgi Gerganov	a546dc6d60	ggml : remove Q4_0 bit shufling (ARM NEON)	2023-05-08 21:35:50 +03:00
AlpinDale	fe60904eef	readme : add TOC and Pygmalion instructions (#1359 )	2023-05-08 19:33:30 +03:00
Pavol Rusnak	003ba2fb43	llama : fix hparams shadow (#1367 ) fixes #1363	2023-05-08 17:48:21 +03:00
Georgi Gerganov	f9a6364912	llama : require first token to be BOS (#1303 ) * llama : require first token to be BOS * scripts : add ppl-run-all.sh * perplexity : add BOS for each chunk * readme : update perplexity values after BOS fix * perplexity : add clarifying comments	2023-05-08 17:41:54 +03:00
ubik2	95078cc554	convert: add ability to convert safetensors files (#1276 ) * when loading a safetensors file, ignore the metadata header * check for safetensors files first, and only use PyTorch versions when safetensors aren't available	2023-05-08 13:54:26 +02:00
Johannes Gäßler	1f48b0abcf	Documented CUDA reproducibility, added warning (#1346 )	2023-05-08 02:42:01 +02:00
Henri Vasserman	e1295513a4	CI: add Windows CLBlast and OpenBLAS builds (#1277 ) * Add OpenCL and CLBlast support * Add OpenBLAS support * Remove testing from matrix * change build name to 'clblast'	2023-05-07 13:20:09 +02:00
swittk	1b0fd45465	ggml : Allow usage of CLBlast alongside Accelerate.framework (#1336 ) Minor edit in ggml.c which originally would prevent OpenCL from loading completely if GGML_USE_ACCELERATE was defined. Minor speedup in prompt eval time.	2023-05-06 23:03:23 -04:00
Jed Fox	3924088512	Remove default arguments from sampling functions (#1343 )	2023-05-06 17:01:47 -04:00
DaniAndTheWeb	173d0e6419	makefile: automatic Arch Linux detection (#1332 ) This commit is a port of a detection method used in koboldcpp's Makefile in order to automatically set the -lcblas option on Arch Linux	2023-05-05 23:57:14 +02:00
Erik Scholz	a3b85b28da	ci : add cublas to windows release (#1271 )	2023-05-05 22:56:09 +02:00
Pavol Rusnak	921dcee00a	readme: add missing info (#1324 )	2023-05-05 16:43:36 +02:00
Ionoclast Laboratories	2d13786e91	Fix for OpenCL / clbast builds on macOS. (#1329 )	2023-05-05 14:18:21 +02:00
Benjamin Lecaillon	a90e96b266	Convert.py @staticmethod (#1327 ) * Line 698 has one #staticmethod and should not otherwise throw error at unpickle.load() as not callable * Update convert.py --------- Co-authored-by: Ivan Stepanov <ivanstepanovftw@gmail.com>	2023-05-05 03:17:07 +03:00
slaren	94c5652fc0	quantize: make output filename optional, default to ggml-model-<ftype>.bin (#1301 )	2023-05-05 00:58:56 +02:00
Ivan Stepanov	34d9f22f44	Wrap exceptions in std::exception to verbose output on exception. (#1316 )	2023-05-04 18:56:27 +02:00
Ivan Stepanov	d3e8093e9b	convert: support DT_BF16 tensors (#1309 ) Co-authored-by: Pavol Rusnak <pavol@rusnak.io>	2023-05-04 18:54:37 +02:00
44670	360cfe5bec	readme : add OpenBuddy link (#1321 )	2023-05-04 19:33:31 +03:00
44670	2edbdb0f99	main : add --in-suffix option (#1318 ) * adding --in-suffix option * print input suffix before generation	2023-05-04 18:41:12 +03:00
Ron Jailall	20fbf2a2a0	ggml : change immintrin.h to intrin.h for compatibility (#1307 ) * change immintrin.h to intrin.h for compatibility Building on windows11 arm throws an error on this line. Seems like using intrin.h covers x86 and and arm * conditional def of intrin.h * fix typo in ggml.c	2023-05-04 18:05:59 +03:00
DannyDaemonic	db1080876a	Only escape prompts when used with `-e` (#1311 )	2023-05-04 05:08:25 -07:00
DannyDaemonic	c65a7fbfa9	Update main's README.md with new features (#1296 )	2023-05-04 03:02:59 -07:00
Tomas	f647ce040f	fix #1224 reverse prompt and multi line (#1297 ) * fix reverse prompt and multi line * Code Formatting Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-04 03:02:30 -07:00
Georgi Gerganov	799fdc1b5d	ggml : vectorize Q8_0 quantization https://github.com/ggerganov/ggml/pull/127#issuecomment-1533648531	2023-05-03 23:24:20 +03:00
khimaros	6daa09d879	examples : read chat prompts from a template file (#1196 )	2023-05-03 20:58:11 +03:00
Georgi Gerganov	bca9ad938a	minor : fix whitespaces (#1302 )	2023-05-03 20:09:42 +03:00
Georgi Gerganov	e2a937ca6a	minor : fix trailing whitespaces	2023-05-03 18:43:23 +03:00
KASR	b0c71c7b6d	scripts : platform independent script to verify sha256 checksums (#1203 ) * python script to verify the checksum of the llama models Added Python script for verifying SHA256 checksums of files in a directory, which can run on multiple platforms. Improved the formatting of the output results for better readability. * Update README.md update to the readme for improved readability and to explain the usage of the python checksum verification script * update the verification script I've extended the script based on suggestions by @prusnak The script now checks the available RAM, is there is enough to check the file at once it will do so. If not the file is read in chunks. * minor improvment small change so that the available ram is checked and not the total ram * remove the part of the code that reads the file at once if enough ram is available based on suggestions from @prusnak i removed the part of the code that checks whether the user had enough ram to read the entire model at once. the file is now always read in chunks. * Update verify-checksum-models.py quick fix to pass the git check	2023-05-03 18:31:28 +03:00
CRD716	a8a2efdc81	examples : various prompt and example fixes (#1298 ) * fix dan.txt * miku prompt improvements * use common characters	2023-05-03 18:26:47 +03:00
Evan Jones	e216aa0463	llama : only copy used KV cache in get / set state (#1272 ) * llama : only copy used KV cache in get / set state * switch to ggml for copying k, v * avoid designated initializers	2023-05-02 22:26:13 -04:00
DannyDaemonic	2485d7a4d3	Process escape sequences given in prompts (#1173 )	2023-05-02 18:46:20 -07:00
DannyDaemonic	13b0c68ed7	Handle signals properly on Windows (#1123 )	2023-05-02 18:01:57 -07:00
DannyDaemonic	55bc5f0900	Call sh on build-info.sh (#1294 )	2023-05-02 17:52:35 -07:00
kuvaus	9daff419f6	fix build-info.h for git submodules (#1289 ) * make git build info work with submodules --------- Co-authored-by: Green Sky <green@g-s.xyz>	2023-05-03 02:43:43 +02:00
slaren	bf4b22ffe4	fix missing parameters in `llama_init_from_gpt_params` (#1293 )	2023-05-03 01:36:45 +02:00

1 2 3 4 5 ...

538 commits