Commit Graph

1110 Commits (9a0b59d990be319952a4a02b9164b3b2327cd454)

Author SHA1 Message Date
Finn Voorhees a3d0aa73d1
ggml : add error handling to graph_compute (#1714) 2024-01-03 15:39:43 +02:00
Georgi Gerganov 14c57952f7 cuda : simplify expression
Co-authored-by: slaren <slarengh@gmail.com>
2024-01-03 14:43:51 +02:00
Georgi Gerganov 6c369d6788 cuda : mark I16 and I32 ops as unsupported
ggml-ci
2024-01-03 14:43:51 +02:00
Georgi Gerganov 4cdd9aad9b metal : add kernel_get_rows_i32
ggml-ci
2024-01-03 14:43:51 +02:00
Georgi Gerganov f38c057503 metal : optimize ggml_mul_mat_id (faster Mixtral PP) (llama/4725)
* ggml : disable fast-math for Metal (cmake build only)

ggml-ci

* metal : fix Metal API debug warnings

* cmake : add -fno-inline for Metal build (llama/4545)

* metal : fix API debug warnings

* metal : fix compile warnings

* metal : use uint64_t for strides

* cmake : rename option to LLAMA_METAL_SHADER_DEBUG

* metal : fix mat-vec Q8_0 kernel for BS > 1

* metal : normalize mat-vec kernel signatures

* cmake : respect LLAMA_QKK_64 option

* metal : fix mat-vec Q4_K kernel for QK_K == 64

* metal : optimizing ggml_mul_mat_id (wip)

* metal : minor fix

* metal : opt mul_mm_id
2024-01-03 14:43:51 +02:00
Georgi Gerganov 1e5544b39b metal : enable shader debugging (cmake option) (llama/4705)
* ggml : disable fast-math for Metal (cmake build only)

ggml-ci

* metal : fix Metal API debug warnings

* cmake : add -fno-inline for Metal build (llama/4545)

* metal : fix API debug warnings

* metal : fix compile warnings

* metal : use uint64_t for strides

* cmake : rename option to LLAMA_METAL_SHADER_DEBUG

* metal : fix mat-vec Q8_0 kernel for BS > 1

* metal : normalize mat-vec kernel signatures

* cmake : respect LLAMA_QKK_64 option

* metal : fix mat-vec Q4_K kernel for QK_K == 64

ggml-ci
2024-01-03 14:43:51 +02:00
Georgi Gerganov d5673af79f ggml : add ggml_vdotq_s32 alias (llama/4715)
ggml-ci
2024-01-03 14:43:51 +02:00
Johannes Gäßler a28dacec65 CUDA: fixed tensor cores not being used on RDNA3 (llama/4697) 2024-01-03 14:43:51 +02:00
automaticcat dbe29d4e33 ggml : add ggml_cpu_has_avx_vnni() (llama/4589)
* feat: add avx_vnni based on intel documents

* ggml: add avx vnni based on intel document

* llama: add avx vnni information display

* docs: add more details about using oneMKL and oneAPI for intel processors

* docs: add more details about using oneMKL and oneAPI for intel processors

* docs: add more details about using oneMKL and oneAPI for intel processors

* docs: add more details about using oneMKL and oneAPI for intel processors

* docs: add more details about using oneMKL and oneAPI for intel processors

* Update ggml.c

Fix indentation upgate

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03 14:43:51 +02:00
Johannes Gäßler fe3a67c546 CUDA: fix tensor core logic for Pascal and HIP (llama/4682) 2024-01-03 14:43:51 +02:00
hydai b138ff2be3 cuda: fix vmm oom issue on NVIDIA AGX Orin (llama/4687)
Signed-off-by: hydai <hydai@secondstate.io>
2024-01-03 14:43:51 +02:00
Guillaume Wenzek cf6f1e4181 ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)
* add more int ops

* ggml_compute_forward_dup_bytes

* add tests

* PR comments

* tests : minor indentations

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03 14:43:51 +02:00
Georgi Gerganov 620a223814 scripts : fix sync order + metal sed 2024-01-03 14:43:51 +02:00
Andreu Huguet f39f9690ec
examples : fix WASM Stack Overflow (#1713)
Fix for problem:

"""
RuntimeError: Aborted(Stack overflow! Stack cookie has been overwritten at 0x12be2b10, expected hex dwords 0x89BACDFE and 0x2135467, but received 0x00000000 0x00000000)
"""

That appears when executing the WASM example with the newer versions.
2024-01-02 16:50:04 +00:00
bobqianic f9ca90256b
docker : fix the publishing of the CUDA Docker image (#1704) 2023-12-30 23:12:31 +02:00
Georgi Gerganov 2623640cd6
scripts : do not sync commits from this repo 2023-12-29 15:03:08 +02:00
Tamotsu Takahashi d87de61ae6
ci : build with CLBlast + ggml-opencl use GGML_API (#1576)
* Build with CLBlast

* Declare GGML_API

After rebasing, examples/talk-llama failed:

"D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) ->
"D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) ->
(Link target) ->
  llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@QEAA@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
  llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context *,void (__cdecl*)(float,void *),void *,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@Z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
  D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
2023-12-29 12:23:27 +02:00
bobqianic f5f485f899
whisper : replace `tensor->n_dims` with `ggml_n_dims(tensor)` (#1694) 2023-12-29 11:38:35 +02:00
Georgi Gerganov e77b27c331
sync : ggml (VMM, sync-ggml-am, dotprod ARM fixes, CUDA fixes) (#1691)
* scripts : add sync-ggml-am.sh

* sync : ggml (VMM, ARM dot prod fix, etc.)

* build : fix CUDA build

* ggml : fix some mul mat cases + add tests for src1 F16

dbd02958fa
2023-12-29 11:30:47 +02:00
Dimo a5cc3dc8a2
download : fix large q5 model name (#1695)
fixed typo in large-v3-q5-0 model name to match HF link
2023-12-29 11:14:32 +02:00
bobqianic 37a709f655
whisper : Replace WHISPER_PRINT_DEBUG with WHISPER_LOG_DEBUG (#1681) 2023-12-23 12:02:58 +00:00
Georgi Gerganov 3a5302108d
sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677)
* sync : ggml

* sync : llama.cpp

* talk-llama : fix obsolete param

* ggml-alloc : fix ggml_tallocr_is_own

* talk.wasm : update to new ggml

* ggml : fix type punning in ggml_scale

* ggml : cuda jetson + arm quants warnings
2023-12-22 17:53:39 +02:00
Chaoqun d2ee117a0a
docker : Dockerize whisper.cpp (#1674)
* build: add dockerfile for ci

* ci: add action to build/push docker image

* fix: lowercase repository to fix ci

* ci: update cuBLAS flag

* build: install curl and ffmped in image

* docs: add docker section

* fix: improve args check when download model
2023-12-22 11:16:02 +00:00
bobqianic db8ccdb850
CI : Add coverage for talk-llama when WHISPER_CUBLAS=1 (#1672) 2023-12-21 22:39:46 +00:00
bobqianic d2419030b0
examples : Revert CMakeLists.txt for talk-llama (#1669) 2023-12-21 22:48:52 +02:00
bobqianic 8986690c2a
cmake : set default CUDA architectures (#1667) 2023-12-21 15:44:04 +02:00
Alfredo Montesinos 9286d3f584
bench.py : add different large models (#1655)
Amend different large v1,v2,v3 models to benchmark.
2023-12-19 12:40:14 +02:00
Georgi Gerganov 940de9dbe9
wchess : update README.md 2023-12-14 22:00:47 +02:00
Georgi Gerganov 88112c8afb
release : v1.5.2 2023-12-14 17:56:39 +02:00
Georgi Gerganov 375585c07c
wchess : update readme 2023-12-14 17:51:14 +02:00
fraxy-v fd99ece8e3
wchess : whisper assisted chess (#1595)
* wchess: whisper assisted chess

* wchess: fix allowed moves in check

* wchess: touchstart, touchend events

* wchess: css, disabled button

* wchess : html touches

* wchess : minor fixes and code style

* wchess : bump encoder context to 1280

* wchess : index.html

* wchess : fix CI warnings

* wchess : add array header

* wchess : build static library

* wchess : display grammar

* wchess : update UX

* wchess : add comment

* wchess : add README

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 15:58:26 +02:00
Georgi Gerganov 8171e621fc
sync : ggml (Metal fixes, new ops, tests) (#1633)
* sync : ggml (Metal fixes, new ops, tests)

* cuda : fix bin bcast when src1 and dst have different types
2023-12-13 21:55:03 +02:00
Kreijstal ec03661b20
cmake : target windows 8 or above for prefetchVirtualMemory in llama-talk (#1617)
Since we use prefetchVirtualMemory we specify we target win 8 or above, otherwise other compilers will refuse to use the prefetchVirtualMemory api, (I understand you are loading it dynamically but the header definition has this limitation)
2023-12-12 11:35:00 +00:00
Kreijstal 6335933a5b
cmake : Fix bug in httplib.h for mingw (#1615)
Fix bug in httlib.h for mingw, please see https://github.com/yhirose/cpp-httplib/issues/1669
2023-12-10 17:47:52 +00:00
Finn Voorhees 885b5563d0
metal : fix `ggml_metal_log` vargs (#1606) 2023-12-08 13:50:50 +02:00
Georgi Gerganov 9521ba6801
whisper.objc : disable timestamps for real-time transcription 2023-12-08 13:43:37 +02:00
Georgi Gerganov 29511d33c7
whisper : more debug messages + fix fallback logic 2023-12-08 13:43:12 +02:00
Georgi Gerganov 7bc4d22337
metal : fix soft_max kernel src1 argument (#1602) 2023-12-08 13:39:32 +02:00
Georgi Gerganov afce6fa113
sync : ggml (new ops, new backend, etc) (#1602)
* sync : ggml (new ops, new backend, etc)

* whisper : remove obsolete broadcasting code

* ggml : remove backend self-registers + fix ggml_concat + n_task logic

* metal : fix assert

* metal : print resource path

* whisper : fix bug if metal init fails
2023-12-07 22:27:19 +02:00
Oleg Sidorov 3163090d89
server : pass max-len argument to the server (#1574)
This commit fixes the missing parameter binding for max-len between the input
arguments and wparams.
2023-12-05 23:01:45 +02:00
Finn Voorhees f0efd0202d
ios : Remove `#if arch(arm)` check for using Metal (#1561) 2023-12-05 01:14:26 +00:00
Digipom 3c28d1a571
ggml : Fix 32-bit compiler warning (#1575)
Warning about %lu on 32-bit targets. Updated to %zu.
2023-12-03 14:15:28 +00:00
Georgi Gerganov e369243ebd
ggml : re-enable blas for src0 != F32 (#1583) 2023-12-01 23:57:52 +02:00
Aleksander Andrzejewski a0ec3fac54
Server : Add support for .vtt format to Whisper server (#1578)
- The code comes from examples/main
- The output mimetype is set to text/vtt

Example usage:
```shell
curl 127.0.0.1:8080/inference \
-H "Content-Type: multipart/form-data" \
-F file="@samples/jfk.wav" \
-F temperature="0.2" \
-F response-format="vtt"
```
2023-11-30 23:44:26 +00:00
Oleg Sidorov 6559b538e5
server : backport .srt output format (#1565)
This commit adds a support of .srt format to Whisper server. The code is
effectively backported from examples/main. The output mimetype is set to
application/x-subrip as per https://en.wikipedia.org/wiki/SubRip.

Example usage:

  curl 127.0.0.1:8080/inference \
    -H "Content-Type: multipart/form-data" \
    -F file="@<file-path>" \
    -F temperature="0.2" \
    -F response-format="srt"
2023-11-28 15:42:58 +02:00
Gregor Jasny 73d5005880
cmake : install required ggml.h header (#1568) 2023-11-28 15:41:49 +02:00
Kasumi 6b094b6dfe
server : set default CORS headers to allow all (#1567) 2023-11-28 11:55:20 +02:00
Hang 641f2f4282
readme : update help (#1560) 2023-11-27 12:04:08 +02:00
bobqianic bfacd9f8ce
CI : Add CUDA 11.8.0 support (#1554)
* try to fix cublas build in CI

* add multiple cuda-toolkit version

* Update build.yml

* Disable CUDA-toolkit 10.2.89
2023-11-27 12:03:16 +02:00
bobqianic f52e74d4dc
CI : Rectify the Clang-Related workflow issues (#1551)
* fix bugs in workflow

* fix missing clang in workflow

* Update build.yml
2023-11-27 11:35:37 +02:00