llama.cpp/examples/parallel
Georgi Gerganov fcca0a7004
refact : fix convert script + zero out KV cache to avoid nans (#3523)
* refact : fix convert script + zero out KV cache to avoid nans

* ggml : silu(-inf) should never happen

* metal : assert various kernel requirements
2023-10-09 14:32:17 +03:00
..
CMakeLists.txt llama : custom attention mask + parallel decoding + no context swaps (#3228) 2023-09-28 19:04:36 +03:00
parallel.cpp refact : fix convert script + zero out KV cache to avoid nans (#3523) 2023-10-09 14:32:17 +03:00
README.md llama : custom attention mask + parallel decoding + no context swaps (#3228) 2023-09-28 19:04:36 +03:00

llama.cpp/example/parallel

Simplified simluation for serving incoming requests in parallel