llama.cpp/examples/parallel
Marcus Dunn 5be6c803fa
llama : remove token functions with context args in favor of model (#3720)
* added `llama_model_token_*` variants to all the `llama_token_*` functions.

* added `LLAMA_API`

* formatting

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* removed old `llama_token` functions

* changed 3 more functions to take in model

- `llama_token_get_text`
- `llama_token_get_score`
- `llama_token_get_type`

* added back docs

* fixed main.cpp

* changed token functions to use new model variants

* changed token functions to use new model variants

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-23 22:40:03 +03:00
..
CMakeLists.txt llama : custom attention mask + parallel decoding + no context swaps (#3228) 2023-09-28 19:04:36 +03:00
parallel.cpp llama : remove token functions with context args in favor of model (#3720) 2023-10-23 22:40:03 +03:00
README.md llama : custom attention mask + parallel decoding + no context swaps (#3228) 2023-09-28 19:04:36 +03:00

llama.cpp/example/parallel

Simplified simluation for serving incoming requests in parallel