koboldcpp/.gitignore
Reese Levine 7ca5991d2b
ggml webgpu: add support for emscripten builds (#17184)
* Faster tensors (#8)

Add fast matrix and matrix/vector multiplication.

* Use map for shader replacements instead of pair of strings

* Wasm (#9)

* webgpu : fix build on emscripten

* more debugging stuff

* test-backend-ops: force single thread on wasm

* fix single-thread case for init_tensor_uniform

* use jspi

* add pthread

* test: remember to set n_thread for cpu backend

* Add buffer label and enable dawn-specific toggles to turn off some checks

* Intermediate state

* Fast working f16/f32 vec4

* Working float fast mul mat

* Clean up naming of mul_mat to match logical model, start work on q mul_mat

* Setup for subgroup matrix mat mul

* Basic working subgroup matrix

* Working subgroup matrix tiling

* Handle weirder sg matrix sizes (but still % sg matrix size)

* Working start to gemv

* working f16 accumulation with shared memory staging

* Print out available subgroup matrix configurations

* Vectorize dst stores for sg matrix shader

* Gemv working scalar

* Minor set_rows optimization (#4)

* updated optimization, fixed errors

* non vectorized version now dispatches one thread per element

* Simplify

* Change logic for set_rows pipelines

---------

Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan>
Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

* Comment on dawn toggles

* Working subgroup matrix code for (semi)generic sizes

* Remove some comments

* Cleanup code

* Update dawn version and move to portable subgroup size

* Try to fix new dawn release

* Update subgroup size comment

* Only check for subgroup matrix configs if they are supported

* Add toggles for subgroup matrix/f16 support on nvidia+vulkan

* Make row/col naming consistent

* Refactor shared memory loading

* Move sg matrix stores to correct file

* Working q4_0

* Formatting

* Work with emscripten builds

* Fix test-backend-ops emscripten for f16/quantized types

* Use emscripten memory64 to support get_memory

* Add build flags and try ci

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

* Remove extra whitespace

* Move wasm single-thread logic out of test-backend-ops for cpu backend

* Disable multiple threads for emscripten single-thread builds in ggml_graph_plan

* Fix .gitignore

* Add memory64 option and remove unneeded macros for setting threads to 1

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-12-03 10:25:34 +01:00

151 lines
2.1 KiB
Text

*.o
*.a
*.bin
*.tmp.yaml
*.Identifier
.DS_Store
.build/
.cache/
.ccls-cache/
.direnv/
.envrc
.swiftpm
.venv
.clang-tidy
.vs/
.vscode/
ggml-metal-embed.metal
ggml/src/ggml-vulkan-shaders.cpp
ggml/src/ggml-vulkan-shaders.hpp
vulkan-shaders-gen.exe
vulkan-shaders-gen
ggml/src/ggml-vulkan-shaders-noext.cpp
ggml/src/ggml-vulkan-shaders-noext.hpp
vulkan-shaders-gen-noext.exe
vulkan-shaders-gen-noext
vulkan-spv-tmp/*
vulkan-spv-noext-tmp/*
lcov-report/
gcovr-report/
build*/
out/
tmp/
autogen-*.md
models/*
models-mnt
/Pipfile
/baby-llama
/beam-search
/benchmark-matmult
/convert-llama2c-to-ggml
/embd-input-test
/embedding
/eval-callback
/gguf
/gguf-llama-simple
/gritlm
/imatrix
/infill
/libllama.so
/llama-bench
/lookahead
/lookup
/main
/metal
/passkey
/perplexity
/q8dot
/quantize
/quantize-stats
/result
/save-load-state
/server
/simple
/batched
/batched-bench
/export-lora
/finetune
/speculative
/parallel
/train-text-from-scratch
/tokenize
/vdot
/common/build-info.cpp
arm_neon.h
compile_commands.json
CMakeSettings.json
__pycache__
dist
dist/
*.spec
zig-out/
zig-cache/
ppl-*.txt
qnt-*.txt
perf-*.txt
poetry.lock
poetry.toml
ggml-metal-merged.metal
# Test binaries
/tests/test-llama-grammar
tests/test-double-float
tests/test-grad0
tests/test-opt
tests/test-quantize-fns
tests/test-quantize-perf
tests/test-sampling
tests/test-tokenizer-0
tests/test-tokenizer-0-llama
tests/test-tokenizer-0-falcon
tests/test-tokenizer-1-llama
tests/test-tokenizer-1-bpe
/tests/test-rope
/tests/test-backend-ops
/koboldcpp_default.so
/koboldcpp_failsafe.so
/koboldcpp_noavx2.so
/koboldcpp_clblast.so
/koboldcpp_clblast_noavx2.so
/koboldcpp_clblast_failsafe.so
/koboldcpp_cublas.so
/koboldcpp_vulkan.so
/koboldcpp_vulkan_noavx2.so
/koboldcpp_default.dll
/koboldcpp_failsafe.dll
/koboldcpp_noavx2.dll
/koboldcpp_clblast.dll
/koboldcpp_clblast_noavx2.dll
/koboldcpp_vulkan_noavx2.dll
/koboldcpp_clblast_failsafe.dll
/koboldcpp_cublas.dll
/koboldcpp_vulkan.dll
/cublas64_11.dll
/cublasLt64_11.dll
/cublas64_12.dll
/cublasLt64_12.dll
/rocblas/
rocblas.dll
hipblas.dll
koboldcpp_hipblas.so
koboldcpp_hipblas.dll
.tokenizer_configs
bin/
conda/
# Jetbrains idea folder
.idea/