Commit Graph

60 Commits (9a0b59d990be319952a4a02b9164b3b2327cd454)

Author SHA1 Message Date
Georgi Gerganov 3168dbf23b
sync : ggml 2024-02-28 13:01:33 +02:00
Georgi Gerganov 7a6e385c1b
sync : ggml 2024-02-25 19:59:34 +02:00
Georgi Gerganov 7b1ff212d9
sync : ggml 2024-02-22 23:25:38 +02:00
Georgi Gerganov 6b16927d18
sync : ggml 2024-02-22 15:15:38 +02:00
Georgi Gerganov b602819b6e
sync : ggml 2024-02-19 15:54:25 +02:00
Georgi Gerganov b742f13e70
sync : ggml 2024-02-12 19:07:56 +02:00
Georgi Gerganov 25a90ffa38
sync : ggml 2024-02-12 09:32:15 +02:00
Georgi Gerganov 518199c09e
sync : ggml 2024-02-10 09:56:47 +02:00
Georgi Gerganov aa8a75e287
extra : update sync scripts 2024-02-10 09:55:19 +02:00
Georgi Gerganov 7a74e929c8
sync : ggml (#0) 2024-01-30 21:30:26 +02:00
Georgi Gerganov bd41733db2
sync : ggml 2024-01-28 19:30:32 +02:00
Georgi Gerganov 7fe3ed5e00
sync : ggml 2024-01-27 17:23:25 +02:00
Georgi Gerganov 1de21b913d
sync : ggml 2024-01-17 21:22:38 +02:00
Georgi Gerganov d08445c9ad
sync : ggml 2024-01-14 10:55:18 +02:00
Georgi Gerganov 654baf693d
scripts : sync-ggml-am.sh add option to skip commits 2024-01-14 10:53:19 +02:00
Georgi Gerganov c615f2c335
sync : ggml 2024-01-14 00:12:17 +02:00
Georgi Gerganov 1560288048
sync : ggml 2024-01-12 21:56:50 +02:00
Georgi Gerganov 32e71a1861
sync : ggml 2024-01-11 21:54:17 +02:00
Erik Scholz 11b1b63b14
fix : cuda order of synchronization when setting a buffer (ggml/679)
* fix : cuda order of synchronization when setting a buffer

* also sync before memcpy

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-01-05 17:01:59 +02:00
Georgi Gerganov 0e26a6c92e
metal : switch back to default.metallib (ggml/681)
ggml-ci
2024-01-05 16:31:30 +02:00
Georgi Gerganov 14c57952f7 cuda : simplify expression
Co-authored-by: slaren <slarengh@gmail.com>
2024-01-03 14:43:51 +02:00
Georgi Gerganov 620a223814 scripts : fix sync order + metal sed 2024-01-03 14:43:51 +02:00
Georgi Gerganov 2623640cd6
scripts : do not sync commits from this repo 2023-12-29 15:03:08 +02:00
Georgi Gerganov e77b27c331
sync : ggml (VMM, sync-ggml-am, dotprod ARM fixes, CUDA fixes) (#1691)
* scripts : add sync-ggml-am.sh

* sync : ggml (VMM, ARM dot prod fix, etc.)

* build : fix CUDA build

* ggml : fix some mul mat cases + add tests for src1 F16

dbd02958fa
2023-12-29 11:30:47 +02:00
Georgi Gerganov 3a5302108d
sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677)
* sync : ggml

* sync : llama.cpp

* talk-llama : fix obsolete param

* ggml-alloc : fix ggml_tallocr_is_own

* talk.wasm : update to new ggml

* ggml : fix type punning in ggml_scale

* ggml : cuda jetson + arm quants warnings
2023-12-22 17:53:39 +02:00
Alfredo Montesinos 9286d3f584
bench.py : add different large models (#1655)
Amend different large v1,v2,v3 models to benchmark.
2023-12-19 12:40:14 +02:00
Georgi Gerganov 94267df08e
bench-all : add distil models 2023-11-15 20:49:12 +02:00
Georgi Gerganov 57a60639bb
bench-all : indentations 2023-11-15 20:01:15 +02:00
Georgi Gerganov bfbaa4dce5
whisper : make large version explicit + fix data size units (#1493) 2023-11-15 19:42:25 +02:00
Georgi Gerganov b6c5f49b78
whisper : add batched decoding (#1486)
* whisper : add whisper_batch

* whisper : move kv_self to whisper_state

* whisper : full batched decoding support

* whisper : fix memory leak in whisper_batch

* whisper : fix mem leak again + remove oboslete function

* whisper : clear kv cache when using whisper_decode API

* whisper : speed-up sampling

* whisper : fix decoders initializer

* bench : add batch size 5 bench

* whisper : add comment about the KV cache size

* whisper : add check for max number of decoders

* whisper : avoid starting sampling threads with bs=1

* whisper : enable beam-search by default

* cuda : sync llama.cpp fixes
2023-11-15 16:12:52 +02:00
Georgi Gerganov b0502836b8
whisper : add full CUDA and Metal offloading (#1472)
* whisper : migrate to ggml-backend

* whisper : fix logit reading

* whisper : fix tensor allocation during load

* whisper : fix beam-search with CUDA

* whisper : free backends + fix compile warning

* whisper : print when CUDA is enabled

* whisper : fix CoreML

* make : clean-up

* talk : fix compile warning

* whisper : support ggml_conv with CUDA and Metal (#1473)

* ggml : add CUDA support for ggml_conv

* whisper : remove ggml_repeat for conv bias + single backend

* cuda : fix im2col kernel

* metal : add im2col support + mul mat-vec f16 x f16

* bench-all : add q4 models

* whisper : clean-up

* quantize-all : fix

* ggml : im2col opts

* whisper : avoid whisper_model_data wrapper

* whisper : add note that ggml_mul_mat_pad does not work with CUDA

* whisper : factor out graph compute in common function

* whisper : fixes

* whisper : fix UB with measure buffers

* whisper : try to fix the parallel whisper_state functionality (#1479)

* whisper : try to fix the parallel whisper_state functionality

* whisper : fix multi-state Metal

* whisper : free backend instances in whisper_state
2023-11-12 15:31:08 +02:00
Georgi Gerganov 2cdfc4e025
whisper : add support for large v3 (#1444)
* whisper : add support for large v3

* bench : fix build + fix go bindings

* bench : fix n_mels

* models : update readme
2023-11-07 15:30:18 +02:00
Georgi Gerganov 6d4d0b5b4b
cuda : fix HIPBLAS build 2023-11-05 19:41:15 +02:00
Georgi Gerganov f96e1c5b78
sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) (#1422)
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.)

* metal : allow env metal variable to override resource path (#1415)

* Allow env variable to override resource path

* Update ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* sync : restore common / main from `master`

* sync : restore whisper from `master`

* talk-llama : update to latest llama.cpp

* ruby : fix build

* ggml : fix 32-bit ARM build

* ggml : fix MIN / MAX macro collisions + update ios bindings

* ggml : fix ifdefs and MIN / MAX again

* exampels : fix Obj-C and Swift examples

* ggml : fix 32-bit ARM compatibility

* ggml : one more attempt to fix 32-bit ARM compat

* whisper : fix support for larger graphs

---------

Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>
2023-11-03 21:35:05 +02:00
Neil Chudleigh 9edbd0a204
extra: Add benchmark script implemented in Python (#1298)
* Create bench.py

* Various benchmark results

* Update benchmark script with hardware name, and file checks

* Remove old benchmark results

* Add git shorthash

* Round to 2 digits on calculated floats

* Fix the header reference when sorting results

* FIx order of models

* Parse file name

* Simplify filecheck

* Improve print run print statement

* Use simplified model name

* Update benchmark_results.csv

* Process single or lists of processors and threads

* Ignore benchmark results, dont check in

* Move bench.py to extra folder

* Readme section on how to use

* Move command to correct location

* Use separate list for models that exist

* Handle subprocess error in git short hash check

* Fix filtered models list initialization
2023-09-25 23:45:15 +08:00
Georgi Gerganov b8432f28f4
metal : add F32 support + update bench output 2023-09-15 13:56:08 +03:00
Georgi Gerganov 93935980f8
whisper : Metal and ggml-alloc support (#1270)
* metal : init

* whisper : factor out graph builds

* whisper : allocate encoder and decoder using ggml-alloc

* whisper : ggml-alloc is now supported

* whisper : CoreML support ggml-alloc

* build : fix ggml-alloc

* ios : update submodule

* extra : update sync-ggml.sh script to also sync ggml-alloc

* ci : see if this is causing the crash

* whisper : refactor ggml-alloc init

* whisper.android : try to fix build

* whisper : initial Metal version

* ci : try to debug vmem issue

* metal : decoder works on GPU!

* metal : add multi-decoder support

* ggml : fix ggml_nbytes (probably temp solution)

* metal : run "cross" step on the GPU

* whisper : remove ggml_repeat in the encoder

* whisper : offload the Encoder to Metal

* ggml : use simpler ggml_bytes() implementation

* ggml-alloc : try to make CI happy by reducing vram to 128GB

* whisper : add whisper_allocr to wrap ggml_allocr

* whisper : factor out alloc init in a function

* cmake : update to support Metal build

* whisper : add <functional> header

* objc : fix build (no Metal yet)

* ios : add Metal support

* swiftui : fix build

* metal : speed-up KQ multiplication

* metal : sync latest llama.cpp kernels

* readme : add Metal info

* ios : update submodule

* coreml : add code to toggle Core ML config (CPU, ANE, GPU)

* bench : fix timings by running a pre-heat

* bench : start benching the decoder

* whisper : add ggml_mul_mat_pad

* bench : fix uninitialized vars

* whisper : add comment for disabling mul-mat padding

* whisper : add description of ggml_mul_mat_pad

* whisper : clean-up ggml_mul_mat_pad

* metal : remove the "concurrent" flag

* bench : variable n_past

* ios : update SPM package
2023-09-15 12:18:18 +03:00
thefinaldegree 49c9472fa0
extra : update 'quantize-all.sh' to quantize all downloaded models (#1054)
Script will now do what it says: quantize everything except testing models in the 'models'  directory.
2023-06-28 22:07:02 +03:00
Georgi Gerganov f1c9df5806
metal : sync ggml-metal (ref #1047) 2023-06-25 15:40:39 +03:00
Georgi Gerganov 6c25fae1c4
opencl : sync latest ggml-opencl 2023-06-25 15:38:30 +03:00
Georgi Gerganov 2b6a074305
extra : update ggml sync script 2023-05-14 10:01:52 +03:00
Georgi Gerganov 0bcb64b184
ggml : sync ggml (clBLAST + tensor names) 2023-05-02 21:24:18 +03:00
Georgi Gerganov d375d73b2e
bench : improve benchmarks 2023-05-01 14:44:39 +03:00
Georgi Gerganov 794b162a46
whisper : add integer quantization support (#540)
* whisper : add integer quantization support

* examples : add common-ggml + prepare to add "quantize" tool

* whisper : quantization tool ready

* whisper : fix F32 support

* whisper : try to fix shared lib linkage

* wasm : update quantized models to Q5

* bench.wasm : remove "medium" button

* bench.wasm : fix custom model button

* ggml : add Q5_0 and Q5_1 WASM SIMD

* wasm : add quantized models to all WASM examples

* wasm : bump DB version number to 2

* talk-llama : update example to latest llama.cpp

* node : increase test timeout to 10s

* readme : add information for model quantization

* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov 3eaeb030ff
extra : add sync-ggml.sh script 2023-04-29 12:32:28 +03:00
Georgi Gerganov 5e47e223bd
whisper : add Core ML support (#566)
* coreml : use Core ML encoder inference

* coreml : simlpify whisper_encode + log messages

* whisper : resolve rebase conflicts

* coreml : add scripts for CoreML model generation

* bench-all : recognize COREML flag
2023-04-15 13:21:27 +03:00
Georgi Gerganov bb6b54a03d
bench-wts.sh : rename script + add execute permission 2023-03-06 21:02:24 +02:00
venkr b597c5a779
qual-bench.sh : add quality comparison tool, and update main.cpp to allow using a font file (#569) 2023-03-06 19:18:11 +02:00
Georgi Gerganov a6cf6f4c4a
bench : minor fixes 2023-01-18 21:40:10 +02:00
Georgi Gerganov 1290fc6457
bench : add memcpy and ggml_mul_mat benchmarks 2023-01-18 20:31:46 +02:00