this shouldn't do anything (+2 squashed commit)
Squashed commit:
[1cdcb302a] another attempt to tip the scales, part 2
[8f647b709] another attempt to tip the scales (volta)
* examples/finetune -opt SGD (stochastic gradient descent) memory opt
add unit tested GGML_OPT_OPTIMIZER_SGD to ggml - avoids allocating
m, v tensors.
support finetune.cpp arg -opt SGD (or sgd). (default adamw as before)
llama 3.2-1b-F32 result: observed 11gb gpu ram (41 sec/epoch)
when using SGD instead of 19gb (55 sec/epoch) using adamw.
(wikipedia 100 lines finetune)
(
using the same GPU memory, adamw can only do before OOM 512
batch/context, reaching:
train: [███████▉] data=0000140/0000140 loss=0.02575±0.00099 acc=99.52±0.03% t=00:00:47 ETA=00:00:00
val: [███████▉] data=0000008/0000008 loss=4.76565±0.28810 acc=41.46±0.77% t=00:00:00 ETA=00:00:00
SGD is superior, though it converges slower, with max before OOM 1728
batch/context (esp see the better validation perf):
train: [███████▉] data=0000039/0000039 loss=0.00371±0.00010 acc=99.96±0.01% t=00:00:41 ETA=00:00:00
val: [███████▉] data=0000003/0000003 loss=5.11406±0.76034 acc=48.01±0.69% t=00:00:01 ETA=00:00:00
)
note: when finetuning long enough (or w/ enough -lr),
validation accuracy *eventually* drops ('catastrophic forgetting')
-lr-half (halflife) option useful for SGD to avoid oscillation or
super slow underdamped learning (makes setting -lr more forgiving).
terminal -lr for now is set by lr-halvings i.e. if you want at most
1/8 the inital -lr you set -lr-halvings 3.
note: objective loss not directly comparable between adamw, sgd? -
check perplexity or accuracy or consider relative improvements
for convergence
new finetune args -wd 1e-9 to enable weight decay in sgd or adamw,
and max -epochs N (default 2 as before)
cache (1 - wd*alpha) in 'adamw' opt struct -
no noticeable perf benefit, disabled (still done
for new SGD though)
since opt. memory is pre-allocated, the ggml_opt_get_optimizer_params
would probably be able to change between SGD and AdamW with each epoch
but would need to use adamw for the first (unconfirmed - no cmdline arg
to set such a policy yet)
test-opt checks adamw as before and now sgd (except for a few disabled
tests for sgd only; probably just needs logging values and adding
alternate reference values); tolerance on the 'regression'
test is broader for sgd (so we don't need many more epochs)
* Vulkan: Implement GGML_OP_OPT_STEP_SGD
* tests: Fix OPT_STEP_SGD test-backend-ops
* SGD op param store weight-decay and not 1-alpha*wd
* minor + cosmetic changes
* fix vulkan sgd
* try CI fix
---------
Co-authored-by: 0cc4m <picard12@live.de>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
there is so much duplicate code in each cpu arch, i expect upstream will prune it eventually
arch detection has no fallback if all the arches are not found, by right we should set GGML_CPU_GENERIC
i should be relaxing its the weekend
this should finally work (+21 squashed commit)
Squashed commit:
[5edac5b59] Revert "quick dbg"
This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f.
[fd62a997c] quick dbg
[bcccae7e6] sanity check 2
[568e2eb08] sanity check
[2f30d573a] please work 2
[cf8765221] please work
[c535e60d9] try a small trick
[d4ba79b80] 2022 test
[3f146b000] t2
[4a3b9a9b4] revert and test
[4bdc9a149] reverted test2
[5081cb4a3] reverted test
[ea9a826f3] broken test
[3c11ae389] compare 2019
[8ecec4fec] not for cu12
[0be964f3a] added vs2019 for the other runners
[5d24641cb] debugging 4
[1dee79207] debugging 3
[ab172f133] more debugging 2
[b1a895e84] more debugging
[5d21d8bd0] vs2019 setup
Squashed commit:
[e050f83db] Revert "test cu11 build on win2022"
This reverts commit 1bf989f2b3789c99aa9883cfe70550de6c26db23.
[1bf989f2b] test cu11 build on win2022
[5dc94eae8] updated lite
Squashed commits:
[3c5112398] 117 (+10 squashed commit)
Squashed commit:
[4f01bb2d4] 117 graphs 80v
[7549034ea] 117 graphs
[dabf9cb99] checking if cuda 11.5.2 works
[ba7ccdb7a] another try cu11.7 only
[752cf2ae5] increase aria2c download log rate
[dc4f198fd] test send turing to wmma flash attention
[496a22e83] temp build test cu11.7.0
[ca759c424] temp build test cu11.7
[c46ada17c] test build: enable virtual80 for oldcpu
[3ccfd939a] test build: with cuda graphs for all
* cmake: Factor out compiler flag function from ggml
llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).
* cmake: Enable building against system ggml
This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends.
* initial porting of previous LLG patch
* update for new APIs
* build: integrate llguidance as an external project
* use '%llguidance' as marker to enable llg lark syntax
* add some docs
* clarify docs
* code style fixes
* remove llguidance.h from .gitignore
* fix tests when llg is enabled
* pass vocab not model to llama_sampler_init_llg()
* copy test-grammar-integration.cpp to test-llguidance.cpp
* clang fmt
* fix ref-count bug
* build and run test
* gbnf -> lark syntax
* conditionally include llguidance test based on LLAMA_LLGUIDANCE flag
* rename llguidance test file to test-grammar-llguidance.cpp
* add gh action for llg test
* align tests with LLG grammar syntax and JSON Schema spec
* llama_tokenizer() in fact requires valid utf8
* update llg
* format file
* add $LLGUIDANCE_LOG_LEVEL support
* fix whitespace
* fix warning
* include <cmath> for INFINITY
* add final newline
* fail llama_sampler_init_llg() at runtime
* Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes
* simplify #includes
* improve doc string for LLAMA_LLGUIDANCE
* typo in merge
* bump llguidance to 0.6.12