Default Branch

master

784e11dea1 · README: add graphic for matrix multiplication (#6881) · Updated 2024-04-24 21:29:13 +02:00

Branches

sl/moe-rework-1

8c2f7b8169 · Update convert-hf-to-gguf.py · Updated 2024-03-31 19:52:46 +02:00

147
11
sycl-refactor

a2e77e60d6 · clang-format · Updated 2024-03-31 11:00:52 +00:00

144
3
gg/authors

805d705032 · license : add AUTHORS · Updated 2024-03-31 10:37:39 +03:00

144
1
update_flake_lock_action

e74d82494d · flake.lock: Update · Updated 2024-03-31 00:18:05 +00:00

144
1
sync

bd3d9f1bad · cuda : move GGML_CUDA_DMMV constants to dmmv.cuh · Updated 2024-03-29 16:01:44 +01:00

174
2
gg/flash-attn-a

4c190ba676 · cuda : reduce registers · Updated 2024-03-28 21:17:08 +02:00

156
77
gg/flash-attn

08e69c5008 · cuda : adapt soft_max to F16 mask and pos · Updated 2024-03-28 19:40:11 +02:00

156
75
compilade/fix-command-r

64b7d85891 · llama : fix command-r inference · Updated 2024-03-28 06:22:24 -04:00

161
1
gg/flash-attn-wip

6be02b5969 · cuda : fix build · Updated 2024-03-27 10:31:52 +02:00

178
72
ceb/wpm-portable-tolower

87a6088ffe · rename unicodedata.{cpp,h} to unicode-data.{cpp,h} · Updated 2024-03-26 10:52:33 -04:00

193
7
ik/quantize_with_kv_overrides

9c5fd6be14 · minor : spacing · Updated 2024-03-26 14:09:02 +02:00

191
2
ik/test_quantize_fns

6f20e2672f · Include IQ2_XXS and IQ2_XS in teet-quantize-fns · Updated 2024-03-25 19:01:20 +02:00

195
1
sl/cuda-f16-fix3

210e469114 · cuda : fix LLAMA_CUDA_F16 build · Updated 2024-03-25 15:31:10 +01:00

197
1
ceb/fix-win-unicode-fpaths

d05c13b3b9 · llama : fix BPE LF token on MSVC · Updated 2024-03-23 14:03:16 -04:00

217
3
hp/server/logs-flush

2187f34b4a · server: flush stdout after logging in both text and json layout · Updated 2024-03-23 10:50:09 +01:00

216
1
gg/flash-attn-rebase

3a468e6f9f · llama : fix type of KQ_mask and KQ_pos · Updated 2024-03-22 17:12:17 +02:00

222
68
ik/quantize_not_repeating

0e826d12a5 · quantize: be able to specify the token embedding tensor type · Updated 2024-03-22 16:27:34 +02:00

231
2
gg/hf-args

8c3d5b5a79 · common : remove defaults · Updated 2024-03-22 15:33:24 +02:00

227
2
patch-1

12aa74ba7d · minor : spacing · Updated 2024-03-22 15:24:57 +02:00

452
6