History

pudepiedj a8777ad84e parallel : add option to load external prompt file (#3416 ) * Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2023-10-06 16:16:38 +03:00
..
graph.py	chmod : make scripts executable (#2675 )	2023-08-23 17:29:09 +03:00
jeopardy.sh	chmod : make scripts executable (#2675 )	2023-08-23 17:29:09 +03:00
qasheet.csv	examples : add Jeopardy example (#1168 )	2023-04-28 19:13:33 +03:00
questions.txt	examples : add Jeopardy example (#1168 )	2023-04-28 19:13:33 +03:00
README.md	parallel : add option to load external prompt file (#3416 )	2023-10-06 16:16:38 +03:00

README.md

llama.cpp/example/jeopardy

This is pretty much just a straight port of aigoopy/llm-jeopardy/ with an added graph viewer.

The jeopardy test can be used to compare the fact knowledge of different models and compare them to each other. This is in contrast to some other tests, which test logical deduction, creativity, writing skills, etc.

Step 1: Open jeopardy.sh and modify the following:

MODEL=(path to your model)
MODEL_NAME=(name of your model)
prefix=(basically, if you use vicuna it's Human: , if you use something else it might be User: , etc)
opts=(add -instruct here if needed for your model, or anything else you want to test out)

Step 2: Run jeopardy.sh from the llama.cpp folder

Step 3: Repeat steps 1 and 2 until you have all the results you need.

Step 4: Run graph.py, and follow the instructions. At the end, it will generate your final graph.

Note: The Human bar is based off of the full, original 100 sample questions. If you modify the question count or questions, it will not be valid.