Concedo
d7fed4732f
fix for typical sampler
2023-09-01 15:24:00 +08:00
Concedo
fe4a233d79
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/tools.sh
# llama.cpp
2023-09-01 00:47:06 +08:00
DannyDaemonic
e8422de39e
@vxiiduu's fix for PrefetchVirtualMemory ( #2930 )
...
Reimplement fix for `PrefetchVirtualMemory`.
Co-authored-by: vxiiduu <73044267+vxiiduu@users.noreply.github.com>
2023-08-31 04:21:45 -07:00
Concedo
e2fd30b5d1
reverted the failsafe removal, since they dropped support for dll check
2023-08-31 15:39:32 +08:00
Johannes Gäßler
8afe228000
CUDA: mul_mat_q=true llama_context_params default ( #2912 )
2023-08-30 21:46:19 +02:00
Concedo
f2c02dd06d
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .gitignore
# CMakeLists.txt
# Makefile
# README.md
# tests/test-grad0.cpp
2023-08-30 10:51:28 +08:00
Kawrakow
e37e69dcc3
10X faster BPE tokenizer ( #2876 )
...
* 10X faster BPE tokenizer
* Remove comment that no longer applies
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-29 23:55:03 +03:00
xaedes
44c117f41e
train : mem usage and other improvements ( #2439 )
...
* fix track_max_mem in forward_batch_wo_cache_flash_attn_train
* remove unnecessary Adam(W) optimizer tensors.
reduces optimizer memory overhead from 7*modelsize to 2*modelsize.
additionally allows to optimize models with more than 2^31 parameters by replacing int with int64_t.
bumps training checkpoint file version, but old checkpoints can still be read.
new version with less tensors is saved.
* add gradient clipping to AdamW
* Fix reset of unused g->nodes and g->grads to NULL
* implement gradient checkpointing for training
reduces memory overhead from O(n_layer) to O(sqrt(n_layer))
as explained in readme of https://github.com/cybertronai/gradient-checkpointing
* remove unused compute buffer 3
* add and use function ggml_build_backward_expand to avoid stack overflows with large maximum number of nodes
GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep);
* change AdamW decay parameter to work like the torch AdamW decay parameter
It is now relative to Adam learning rate `alpha*sched`.
Before that it was relative to `sched` only.
`alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1]
* change default AdamW weight decay parameter used in training to 0.1 as used in nanoGPT
* change default AdamW weight decay parameter defined in ggml to 0.0, making Adam default instead of AdamW
btw: the default weight decay parameter for torch.optim.AdamW is 0.01
* bug fixes for cross entropy loss
ggml_cross_entropy_loss: sums where not correctly added in workload of each thread
ggml_cross_entropy_loss_back: simplify backward process, reducing numerical issues
guard usage of exp f16 lookup in cross entropy by #define GGML_CROSS_ENTROPY_EXP_FP16
cross entropy loss is only used once during training, but it is quite sensitive to numerical errors introduced by exp-f16-lookup.
so exp-f16-lookup for cross entropy loss is disabled by default, trading better gradients for very slightly worse runtime performance.
* fix test-grad0 for cross_entropy_loss
the second argument to cross_entropy_loss must sum up to 1 for each row
* fix test-grad0 for soft_max
dont use only sum as aggregation, because sum of softmax is always 1 -> finite differences should not work
instead use sum(log(soft_max()*(1-eps)+eps)); use eps to avoid log(0)
* improve finite differences of test-grad0 by using double instead of float
* change cross_entropy_loss to output average over all rows
this helps keeping the loss and gradients in a sane range
* improve gradient checkpointing
sqrt(n_layers) is only the best checkpoint step when mem size of checkpoints and mem size of layers are equal.
since layers require more memory than the single-tensor-checkpoint we use, the optimal values are compute different:
```
given: n, u, v
objective: minimize(a*u+b*v) where a*b=n, a>0, b>0
b=n/a
minimize(a*u+v*n/a)
diff(a*u+v*n/a, a) = u - (v*n/a)/a
diff(a*u+v*n/a, a) == 0
u - (v*n/a)/a == 0
u == v*n/(a*a)
u*a*a = v*n
a*a = v*n/u
a = sqrt(n*v/u)
```
this change results in more checkpoints, requiring less layers to store between checkpoints, overall improving memory usage.
* disable gradient checkpointing debug output
* llama : fix rope usage in train-text-from-scratch after ChatGLM change
* add more training parameters:
--enable-restart N Only for Adam optimizer. Enable restarts of cos-decay
--disable-restart N Only for Adam optimizer. Disable restarts of cos-decay
--opt-past N Number of optimization iterations to track for delta convergence test. Disabled when zero.
--opt-delta N Maximum delta for delta convergence test. Disabled when <= zero.
--opt-max-no-improvement N Maximum number of optimization iterations with no improvement. Disabled when <= zero.
--adam-epsf N AdamW epsilon for convergence test. Disabled when <= zero.
--adam-min-alpha N Adam minimum learning rate alpha, usually 0.1 * alpha
* replace memcpy with reshape operation so that the graph is not cut at the input
this makes it possible to store other values into the input tensor and then simply recompute the graph without rebuilding it
* remove unused function argument from get_example_targets_batch
* measure and print total training time
* add optimization callback to ggml_opt_resume_g
this callback is called before each iteration with custom data and pointer to learning schedule parameter (only used in Adam(W)).
can be used for dynamic learning schedule and setting input data for batches before each iteration
* use optimization callback in training
allows dynamic learning schedule and different batch data for each iteration without relying on low n_iter and high n_examples parameters
reduces runtime by avoiding restart of optimization function and improves training convergence by providing a different batch for each iteration
* add minimum number of tensor dimensions to apply weight decay (default 2)
this allows to not apply weight decay to bias parameters
* rename training parameter cos-decay-alpha to cos-decay-min and clarify that adam-min-alpha also applies to warmup
* fix increase of model.train_samples and model.train_tokens
now that each optimizer iteration gets its own batch we need to multiply by number of opt iterations
* change sampling parameters for prediction after training to defaults of common.h
and clarify what is context for prediction and what are generated tokens
* tighten abs error bounds for cross_entropy_loss in test-grad0
* add conditional compilation of using F16 exp in flash attention
uncomment `// #define GGML_FLASH_ATTN_EXP_FP16` to enable usage of f16 exp in flash attention
* tighten abs error bounds for flash_attn in test-grad0
* tighten abs error bounds for sqrt in test-grad0
* remove out-commented vectorized code of opt_adam
the vectorized code might be bit faster for low number of parameters, but it had a big memory usage overhead
* ggml : update ggml_rms_norm_back with configurable eps
* llama training : fix ggml_rms_norm_back calls to pass configurable eps
* remove trailing whitespace
* add train function using automatic gradient checkpointing backward pass and allocator
* in train function replace add_inplace by regular add
because using add_inplace seems to result in different gradients
* don't use allocate hash_map on context
because the context has no_alloc=True when using memory allocator resulting in NULL data pointers
* correctly clone reshape and permute operations by also cloning tensor->nb values
* fix variable name and add missing type cast
* terminate recursive tensor cloning when reaching tensor without src tensors
* correctly clone view tensors by setting data pointers
without this the checkpointing would only work when being used together with memory allocator
* fix variable names
* swap arguments to commutative ops to be the same as in `forward_batch_wo_cache_flash_attn`
* add input tensors as checkpoints
so that recursive tensor cloning of gradient checkpointing terminates on input tensors
* fix variable name and add missing boolean negation
* make sure some tensors are not reallocated by inserting new temporary nodes depending on them:
output and parameter gradient tensors need to be available at the end of the graph execution
parameter gradient tensors also need to be available before the graph execution because they are set to zero before each optimizer iteration
checkpoint tensors are allocated all together to reduce memory allocator fragmentation
afterwards, in addition to the temporary nodes, we also need to reset the temporary leafs
* fix ASSERT to work with zero layers
* add training options whether to use allocator and/or unified training function
* integrate unified training function which may use memory allocator
the unified training function also supports arguments whether to use flash attention and/or gradient checkpointing
* format name of cloned tensors with " (clone)" suffix
* set names for tensors in unified train function for easier debugging
* allocate graph on context using ggml_new_graph
* remove handwritten training functions
* remove unused training parameters "use_scratch" and "use_unified"
* remove trailing whitespace
* remove unused train params: mem_compute1_gb & mem_compute2_gb
mem_compute_gb is used for compute when automatic memory allocator is not enabled, otherwise it can be very small to only hold the tensor definitions
mem_compute0_gb is used for automatic memory allocator (as long as measurement of max required size is not implemented)
* remove unused forward_batch function
* add debug asserts in ggml_allocr_alloc to some common pitfalls when using this function directly
* only use ggml_allocr_alloc when tensor has NULL data and is no view
* fix test when to create temporary backward graph
temporary backward graph is only necessary when using checkpointing
* fix memory "leak" in optimizers
each iteration a new cplan with new memory for work data was allocated.
now cplan creation only happens at the start of optimization, with each iteration reusing the cplan and its work data.
* reverse order of for loop in ggml_build_backward_expand to save memory when using gradient checkpointing and allocator
with this loop order gradient checkpointing with allocator on 16 layer model saves 13% memory; 2 layer memory it saves 2% memory.
the computation results are the same
* add missing lctx argument to get_example_targets_batch
* implement llama model file saving using gguf
checkpoint loading and saving disabled, to be replaced by loading and saving via gguf
* implement loading/saving of checkpointing files using GGUF
* bug fixes
* add checkpoint file version for future compatibility
* update readme with gguf filenames
* save & load opt->just_initialized value
* add first draft for checkpoint conversion script
* add gguf arch and ftype
* save opt parameter counter as uint64
* add gguf key and tensor names for optimizer and training
* add layer_norm_rms_eps to checkpoint convert script
* use same GGUF_GET_KEY macro as in llama.cpp
* use norm_rms_eps, and rope parameters and command line options to set them
* fix memory corruption bug in gguf
ctx->kv and ctx->infos was reallocated using not-aligned realloc, but freed with aligned free.
to fix this a GGML_ALIGNED_REALLOC was added, but there is no posix_memalign_realloc function.
so on non-windows and non-mingw32 platforms we fall back to aligned malloc, followed by copying
and freeing the old data.
* add gguf example cmake file
* bug fixes in tokenize_file
* bug fixes in load_llama_model_gguf
* bug fix: init model when no checkpoint was loaded
* bug fix in read_tensor_by_name
* bug fix in load_opt_context_gguf
* avoid printing lots of spaced on the unusual case that loss gets nan
* set name of tensors with empty name from what was read from gguf
* remove trailing whitespace
* print data checksums before saving and after loading to verify correctness
* bug fixes for convert-train-checkpoint-to-gguf
* temporarily add code to write old checkpoint files
used to verify that old checkpoint files are correctly converted to gguf
* bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints with opt_version=0
* remove code used to verify correctness of checkpoint file conversion
* remove trailing whitespace
* remove prediction related code
use main for prediction, it is better optimized
* update train-text-from-scratch README.md
* fix non-windows GGML_ALIGNED_REALLOC
* add missing blank line at end of file
* remove GGML_ALIGNED_REALLOC and use normal malloc/realloc/free for gguf ctx->kv & ctx->infos
* train : fix compile warnings
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-28 22:51:47 +03:00
Johannes Gäßler
6b73ef1201
YAML result logging + preset script ( #2657 )
2023-08-28 17:59:39 +02:00
grahameth
be475f60af
llama.cpp : fix wrong vsnprintf call in MS compiler ( #2856 )
...
Co-authored-by: grahameth <->
2023-08-28 18:38:12 +03:00
YellowRoseCx
cf5d918073
Koboldcpp-ROCm Port ( #399 )
...
* koboldcpp-ROCm Port
commit 3416c986d9d9a31c3cdefd7e7bd4d9438d72ba35
Merge: 5eb17f0 4c4e435
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Fri Aug 25 13:46:56 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit 5eb17f02c8638e003bb91bddf95ccf54d2ad0c12
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Fri Aug 25 13:38:21 2023 -0500
ROCm Port update
* use hipblas based on cublas
* Update Makefile for the Cuda kernels
* Expand arch list and make it overrideable
* Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5 )
* add hipBLAS to README
* new build arg LLAMA_CUDA_MMQ_Y
* fix half2 decomposition
* Add intrinsics polyfills for AMD
* AMD assembly optimized __dp4a
* Allow overriding CC_TURING
* use "ROCm" instead of "CUDA"
* ignore all build dirs
* Add Dockerfiles
* fix llama-bench
* fix -nommq help for non CUDA/HIP
---------
Co-Authored-By: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Co-Authored-By: ardfork <134447697+ardfork@users.noreply.github.com>
Co-Authored-By: funnbot <22226942+funnbot@users.noreply.github.com>
Co-Authored-By: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Co-Authored-By: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Co-Authored-By: jammm <2500920+jammm@users.noreply.github.com>
Co-Authored-By: jdecourval <7315817+jdecourval@users.noreply.github.com>
commit b34f4bd2724733e188ec4f6074042f66a5ed28c9
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Aug 19 17:12:52 2023 -0500
Update README.md
commit 7d1196108ad330b32845546fb3472c2172a0b6b8
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Aug 14 23:03:12 2023 -0500
remove force DMMV
commit cd61aa0d9e16627935c7978adf488a679ddfa745
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Aug 12 17:24:31 2023 -0500
restore main_gpu parameter
commit 4a042f326830271a4c31104051b7b08e08ac234e
Author: Henri Vasserman <henv@hot.ee>
Date: Sat Aug 12 10:51:46 2023 +0300
gfx1100 support
---------
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: jammm <2500920+jammm@users.noreply.github.com>
Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com>
commit 8913bc6fea97d3cb860937b0461f455c6abe3ea1
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Aug 11 10:16:02 2023 +0300
Allow overriding CC_TURING
commit e77a4c37a756c002e97173f4122e088fb304e18a
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Aug 11 10:00:07 2023 +0300
Merge 'origin/master' into hipblas
commit cc4c4e355cd553b1557d5fba2562e824db93f9b4
Author: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Date: Fri Aug 11 09:43:14 2023 +0300
New __dp4a assembly
Now compatible with gfx900 and faster as well.
commit 1a03b709848ce68d5bf5966237756167e2cac540
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Aug 11 09:30:28 2023 +0300
Undo mess
---------
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
commit 4366ff9ba1b1f12e494118ef9b5198479022fcc5
Author: DannyDaemonic <DannyDaemonic@gmail.com>
Date: Thu Aug 10 13:11:36 2023 -0700
Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows.
commit 811ff855a24323cafddc95c1b8aca711fef05f76
Author: Christian Demsar <crasm@git.vczf.us>
Date: Thu Aug 10 10:28:27 2023 -0400
Add --n-predict -2 for stopping generation on full context (#2565 )
commit 37c9717aaa6815b6a5be21aaab970212f20fe6bf
Author: Martin Krasser <krasserm@googlemail.com>
Date: Thu Aug 10 12:16:38 2023 +0200
Fix grammar-based sampling issue in server (#2566 )
commit d18ecd5b9e5dde58ae08a3eef1637406159ddaca
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Aug 10 13:19:41 2023 -0500
make mmq gen faster for amd
commit 243894a952147a4fac5b6aee748861a0df6cc2c6
Author: Henri Vasserman <henv@hot.ee>
Date: Thu Aug 10 12:14:40 2023 +0300
ws fix
commit ac2f14da445ea87d73539adbd29d19ff2c9eba58
Author: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Date: Thu Aug 10 12:11:27 2023 +0300
AMD assembly optimized __dp4a
Doesn't seem to work for gfx900, so commented out.
commit 9dba0c985f140ddded8cbb671f139e81fff82eed
Author: Henri Vasserman <henv@hot.ee>
Date: Thu Aug 10 12:09:28 2023 +0300
Fix merge
---------
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
commit f570b5cb1070591527a82d94bba408927b37778d
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Aug 9 22:11:20 2023 -0500
Revert "revert cuda changes as they are bugggy"
This reverts commit 1541bf879772aeeed8ff646bfc52185c2a88b79b.
commit 1541bf879772aeeed8ff646bfc52185c2a88b79b
Author: Concedo <39025047+LostRuins@users.noreply.github.com>
Date: Wed Aug 9 22:36:41 2023 +0800
revert cuda changes as they are bugggy
commit bacc20203efb1839aa313858a04d75255bb4b7f4
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Aug 9 20:37:17 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit b7cb4cfd109986bd66e8fd382d1e2516eaddfebb
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Aug 9 20:00:52 2023 -0500
additional fixes
commit fadae727baa3735ad3e0667384d6e05ca056b3ef
Merge: 518eb2a 8f8ab6c
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Aug 9 18:45:50 2023 -0500
Merge branch 'hipblas' into develop4Main
commit 518eb2af9225f8300a108c4244c7eb0a2217c3bc
Merge: bda0215 cae6a84
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Aug 9 18:32:10 2023 -0500
Merge remote-tracking branch 'upstream/concedo' into develop2Main
commit bda0215b413bafc49890aa23fc35f96a191fb3e0
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Aug 9 18:17:54 2023 -0500
update makefile to multisystem path
commit 8f8ab6c4c049df501e9a5ed8fef3aa0fc0691421
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Aug 9 18:05:03 2023 -0500
hipLDFLAG Path change Unix to multisystem in Makefile
changed the hardcoded linux distro hipblas LD path from -L/opt/rocm/lib to use the defined ROCM_PATH variable to be flexible with ROCm on non-Linux OS
commit 610ba4cfc460ed65c4adc32d3365a216690384d5
Merge: 4024f91 25d43e0
Author: Henri Vasserman <henv@hot.ee>
Date: Wed Aug 9 23:54:58 2023 +0300
Merge 'origin/master' into hipblas
commit 4024f91a665d83b6de8658d45ec9d004c5d90c79
Author: Henri Vasserman <henv@hot.ee>
Date: Wed Aug 9 01:56:44 2023 +0300
Add intrinsics polyfills for AMD
---------
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com>
Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com>
commit ab6212864ce8e9af200bcedb3e0126ee49aa8d0a
Merge: d91456a f5bfea0
Author: Henri Vasserman <henv@hot.ee>
Date: Wed Aug 9 00:37:01 2023 +0300
Merge 'origin/master' into hipblas
commit ee9fa2aca4f2e6645b99702935b34a5f8ec8f05d
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Aug 2 01:53:58 2023 -0500
Update Makefile
commit d91456aaf138566fa0aa3d507964049c8a09499b
Author: ardfork <134447697+ardfork@users.noreply.github.com>
Date: Mon Jul 31 20:35:00 2023 +0300
fix half2 decomposition
commit c1cb70d64d307d3fd9b7b9f61bb574e36520499a
Author: Henri Vasserman <henv@hot.ee>
Date: Mon Jul 31 19:56:44 2023 +0300
new build arg LLAMA_CUDA_MMQ_Y
commit c1664a00ae98059df863a88cbcb13eeca3025742
Merge: 4336231 0728c5a
Author: Henri Vasserman <henv@hot.ee>
Date: Mon Jul 31 19:32:27 2023 +0300
Merge 'origin/master' into hipblas
commit 848558d7d95a5036ac057efdefa9b2a2e6fb61b7
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 30 20:02:52 2023 -0500
import vars logic fix
commit b650b849d52aac65364558521f76e75ded7ea590
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 30 00:21:36 2023 -0500
Update easy_KCPP-ROCm_install.sh
commit 8573a67a29e813d82e7f032912a8c221cd199505
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 29 21:31:12 2023 -0500
remove duplicate code and fix typo
remove duplicate tooltip
commit 430986e3f68f599fd7a11ea4b2b8e45ef33da643
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 29 21:07:34 2023 -0500
hide "missing" if all are built
move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
" if len(runopts)==6 else + "
commit dd0db7265dbc0b0699ca861291006808b662b0e4
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 29 20:52:31 2023 -0500
hide "missing" if all are built
move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
commit 43fffb66d8a30cbd776c3682f8a104c3644206b1
Merge: 0ed65a4 b40550c
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 29 19:13:15 2023 -0500
Merge branch 'concedo'
commit 0ed65a44a5fdb529611730f276a4b910cbf70ae0
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 29 18:34:21 2023 -0500
Hide unavailable backends & Add tooltip over backend count
Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command
Add tooltip when hovering over backend count label
hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built
commit 2a263983ab35024a95c411995963182ada06ed6f
Merge: cee2e9d 31486eb
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 29 15:16:33 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit 4336231a32a0c6168da5d79801752289622e9e58
Author: Henri Vasserman <henv@hot.ee>
Date: Sat Jul 29 18:35:56 2023 +0300
add hipBLAS to README
---------
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
commit f8e3fc6c746b37d69656fb5ae6af8e411d85dbca
Author: Henri Vasserman <henv@hot.ee>
Date: Sat Jul 29 14:16:46 2023 +0300
rocblas init stuff
commit d2ade639f4339e786311effb3eafca8bfc360d56
Merge: cde52d6 8a88e58
Author: Henri Vasserman <henv@hot.ee>
Date: Sat Jul 29 12:59:48 2023 +0300
Merge 'origin/master' into hipblas
commit cee2e9d76740fd8e8f50b612078f3e7658460f29
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 26 23:36:55 2023 -0500
Only Show Available Backends in GUI
Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command
commit 78636109fc2ded79ee3e9a44d2e3c2d63a8de70e
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 26 13:27:22 2023 -0500
Update easy_KCPP-ROCm_install.sh
commit 731cd6e2ab9bb722e211142bb633e7018ccdb31b
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Tue Jul 25 22:39:50 2023 -0500
Create easy_rocm_install.sh
commit f154685bbdc79b5ace752fbc179e32f2f7806bdb
Merge: cbdc1f3 94e0a06
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Tue Jul 25 22:25:10 2023 -0500
Merge branch 'concedo_experimentalMAIN'
commit cbdc1f3fb91969e79bc8640e0cebfc3247e200df
Merge: 5b838d4 9731682
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 24 16:53:21 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit cde52d6a63f13f46d6403cc2957f4b4c34ddf4e2
Merge: 8e8054a 84e09a7
Author: Henri Vasserman <henv@hot.ee>
Date: Mon Jul 24 12:22:58 2023 +0300
Merge 'origin/master' into hipblas
commit 8e8054ad83e794b261914ad4f337d43e2c76882d
Author: Henri Vasserman <henv@hot.ee>
Date: Mon Jul 24 12:20:49 2023 +0300
Add rocblas to build files
commit 1f6294dc4473701b5be791d47e4b3733f95dbc0a
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 24 03:52:01 2023 -0500
Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5 )
* initialize rocblas
commit 5b838d47874536ebffc2f6cb25877e0476a9402d
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 24 03:10:35 2023 -0500
amd multigpu full layer offload w/o vram scratch
commit 9bfb2fdd68000670bda85c4e9748d72f5af09764
Merge: b379f9d 66328fc
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 24 03:07:44 2023 -0500
Merge branch 'concedo_experimental'
commit b379f9d6fac570c220c928ff5f4ba4ed1ca7c051
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 24 03:07:00 2023 -0500
Revert "amd multigpu full layer offload w/o vram scratch"
This reverts commit 9adfc8e33f7116d6ae2e0992920733f783b70d08.
commit 9adfc8e33f7116d6ae2e0992920733f783b70d08
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 24 02:56:40 2023 -0500
amd multigpu full layer offload w/o vram scratch
commit 05c792e622a1d9838f9343e04f79ddf2bb63ae96
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 24 00:18:48 2023 -0500
initialize rocblas
commit ade68d09d7b63d3344e18b6193043b378671eb12
Merge: 521ad6b 56995ca
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 23 20:25:05 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit 521ad6b5cb2a107ad7b972025aeb0f353e0cac67
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Jul 20 21:42:33 2023 -0500
lazy import_var error handling for saves
commit 9553e52e7e4eabe46312729f6c4effeef6390df7
Merge: cac6650 f036109
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Jul 20 19:59:41 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit cac6650754502208abfead61ba169fefc5ae84ac
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 17 23:05:02 2023 -0500
Makefile fix! Allows hip/clblast build together
commit 3db70b5f0a1a4a1207041ddc5f2c5e25306bad4d
Merge: 2ec4466 7568d1a
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jul 18 01:54:17 2023 +0300
Merge 'origin/master' into hipblas
commit f208670ffb6cdbb1e225adfb2fd80a67a6dc5055
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Fri Jul 14 02:56:03 2023 -0500
improve error handling with gpu names
commit 860e73845f61fe0afb6a26cc8054d8be1f9e3669
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Fri Jul 14 00:33:03 2023 -0500
Show GPU names in GUI, Only show GPUs that exist
changed the pre-set 1,2,3 and 1,2,3,all settings that the GPU selector had and replaced them with a function that grabs the GPU names and sets the names as the values for the selector boxes.
commit 2ec4466db54fd2f42f2ab7713cc1061e0cf59bf3
Author: Henri Vasserman <henv@hot.ee>
Date: Thu Jul 13 13:44:02 2023 +0300
Update build flags.
GGML_CUDA_DMMV_Y is now GGML_CUDA_MMV_Y
so update your build instructions.
GGML_CUDA_FORCE_DMMV is always enabled.
---------
Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
commit cd36b185ff6de91abbfd1b80366dd79a1303a878
Merge: afcb8fe 1cbf561
Author: Henri Vasserman <henv@hot.ee>
Date: Thu Jul 13 13:03:01 2023 +0300
Merge 'origin/master' into hipblas
commit ac7ebc3ac1deedfbc2940443b26774f1b4c85fae
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 12 18:32:18 2023 -0500
add hipBLAS name scheme to GUI and update README
commit 7f85cc5ac30f2f300ca817a489ef209c995c634b
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 12 17:35:54 2023 -0500
update makefile and ggml.c
commit 6ca3499275ba168320424f06ab3301ec329a6a83
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 12 15:43:45 2023 -0500
ggml.c fix
commit 770e674aa5b2a1a9ffff2888a12e27b04ccfc7ef
Merge: 2b289cd 5941514
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 12 15:24:36 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit 2b289cde558310c6c67dfc8d508c04e634595716
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 12 14:30:00 2023 -0500
Update c-cpp.yml
commit 5dae95a9bb486c7f720789dffde1cfb470bffce0
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 12 14:28:51 2023 -0500
Update c-cpp.yml
commit b37cd738c84debb53b149f5a9fb73de958f263fd
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 12 14:27:04 2023 -0500
Create c-cpp.yml to test Actions
commit afcb8fe0c4f5e918422ea41d08824653d58575ed
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jul 11 18:09:27 2023 +0300
Add new config option
commit 8c2c4978a32d671253809d8f0f09d98af2dd18ab
Merge: e610466 2347463
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jul 11 17:53:54 2023 +0300
Merge 'origin/master' into hipblas
commit e610466307abc8f8bae641682ab3f91dbc33930e
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jul 11 17:53:14 2023 +0300
Expand arch list and make it overrideable
commit 80e4e548bfbace2a966a58cb57dd1720ad7216b2
Merge: 7735c5a 1d16309
Author: Henri Vasserman <henv@hot.ee>
Date: Mon Jul 10 02:09:28 2023 +0300
Merge 'origin/master' into hipblas
commit 8432e9d5dc8d080535243467f8d380271e8d9489
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 9 16:55:30 2023 -0500
Update Makefile
commit b58c1893fa839c0f35df96f6a8b026a7f2576762
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 9 16:20:00 2023 -0500
Add multi-gpu CuBLAS support to new GUI
commit 0c1c71b9927127b45030fe88283dfbdd23853d34
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 8 07:56:57 2023 -0500
Update Makefile
commit f864f60cd8e563e2594cee5a7da7e9aebed494f9
Author: Johannes Gäßler <johannesg@5d6.de>
Date: Sat Jul 8 00:25:15 2023 +0200
CUDA: add __restrict__ to mul mat vec kernels (#2140 )
commit 4539bc2761a7a23b588b5420b9d3fd1962ff63e5
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 8 01:36:14 2023 -0500
update makefile for changes
commit 912e31ec523eac9ef308f0d28bc2d93aab7c3ecb
Merge: 74e2703 ddaa4f2
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Fri Jul 7 23:15:37 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit 74e2703ac3b1557f107e540657d0919db115f913
Merge: cf65429 f9108ba
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jul 5 15:16:49 2023 -0500
Merge branch 'LostRuins:concedo' into main
commit 7735c5a9af58f6713b54fd5a4b6463f3b116d44d
Merge: c3e3733 7ee76e4
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jul 4 17:09:16 2023 +0300
Merge 'origin/master' into hipblas
commit cf65429c3832d32a8c17c7ed5ab47066d7511fbe
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 3 16:56:40 2023 -0500
print cuda or opencl based on what's used
commit 72c16d2310b2e4c44018e2084aeb79e68c0b8709
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 3 16:45:39 2023 -0500
Revert "fix my mistake that broke other arches"
This reverts commit 777aed5e69e240a54e7d3da962d8520855f072b9.
commit 777aed5e69e240a54e7d3da962d8520855f072b9
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jul 3 15:53:32 2023 -0500
fix my mistake that broke other arches
commit 27780a987a8dabb18689038c0397e16f2f219c7e
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 2 16:03:27 2023 -0500
rocm fixes
commit f52c7d439770c1ea0bebc1f895b74d6aeea5f0a6
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 2 16:02:58 2023 -0500
Revert "rocm fixes"
This reverts commit 2fe9927353a1e53353623f850d3d534da88f5154.
commit 2fe9927353a1e53353623f850d3d534da88f5154
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 2 15:58:21 2023 -0500
rocm fixes
commit efe7560c83a497f5e750bbe27922babd4233bda9
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 2 15:55:43 2023 -0500
Revert "move HIPBLAS definitions into ggml-cuda.h"
This reverts commit bf49a93d63f833b7871ba6e60f8fe207562678ee.
commit 4fc0181e44685019dcd309d4bb345cac7a5fef87
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 2 15:55:36 2023 -0500
Revert "move hipblas definitions to header files"
This reverts commit 2741ffb70464a71fd138484de4b41da05622e027.
commit 89eb576f2771bd81a3a6274348b47535dfdd5f63
Merge: 2741ffb 3d2907d
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jul 2 14:44:13 2023 -0500
Merge branch 'LostRuins:concedo' into main
commit c3e3733c61f7705ea00fd593ee94527da8c12f1b
Author: Henri Vasserman <henv@hot.ee>
Date: Sun Jul 2 15:51:31 2023 +0300
ROCm fixes
commit 15db19ae7b70d2a6350063e633b898a89ad78cbc
Merge: 04419f1 46088f7
Author: Henri Vasserman <henv@hot.ee>
Date: Sun Jul 2 15:39:57 2023 +0300
Merge 'origin/master' into hipblas
commit 2741ffb70464a71fd138484de4b41da05622e027
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 1 17:07:42 2023 -0500
move hipblas definitions to header files
commit bf49a93d63f833b7871ba6e60f8fe207562678ee
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 1 16:38:50 2023 -0500
move HIPBLAS definitions into ggml-cuda.h
commit 540f4e05f4e95378f46a83e2919d3962c0ef9eac
Merge: 2c3b46f eda663f
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jul 1 14:58:32 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit 2c3b46f8a80ca9d94b2d3d06e1af6b6f7b791914
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Jun 29 18:43:43 2023 -0500
changes to fix build
commit c9e1103da0d72fd39a36391ac4b5d941a133598a
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Jun 29 18:20:07 2023 -0500
Update ggml_v2-cuda-legacy.cu for ROCM
commit b858fc5db80ed545a6fbeae3d551bddb47955598
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Jun 29 17:49:39 2023 -0500
changes to work with upstream
commit 69a0c2534bb8825f4009760b12d9bd44d108c6ed
Merge: 096f0b0 1347d3a
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Jun 29 16:59:06 2023 -0500
Merge remote-tracking branch 'upstream/concedo'
commit 04419f18947e7b0dc43c07869eac3965f22b34cf
Merge: bb16eff d3494bb
Author: Henri Vasserman <henv@hot.ee>
Date: Wed Jun 28 23:30:10 2023 +0300
Merge 'origin/master' into hipblas
commit bb16effc750e2706050f5d4ec89cecc42cc13882
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jun 28 15:27:10 2023 -0500
headers fix; add kquants_iter for hipblas and add gfx803 (#1 )
* kquants_iter for hipblas and add gfx803
* Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16
* remove dmmv_f16 for now
commit 096f0b055e11b7d930842f86146d0e5013c5dce6
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jun 28 15:27:02 2023 -0500
revert unnecessary hipblas conditionals
commit d81e81adffd6eb59e280ae1885864bb5fbd9bba6
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jun 28 14:48:23 2023 -0500
Update Makefile hipblas nvcc correction
commit c8ae94524a8bd7dca891b6b711cb5598a30fcf74
Merge: c1e5c83 0be54f7
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jun 27 10:50:37 2023 +0300
Merge 'origin/master' into hipblas
commit 2579ecf8db9569d7756161f05ce7b0f5f23174b0
Merge: abed427 d2034ce
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jun 25 17:50:04 2023 -0500
Merge branch 'LostRuins:concedo' into main
commit c1e5c8345eca45563d382d9417b84ed5f0ab77ff
Merge: 35a6031 447ccbe
Author: Henri Vasserman <henv@hot.ee>
Date: Sun Jun 25 21:40:05 2023 +0300
Merge 'origin/master' into hipblas
commit 35a603161a17ddeb6128e9d4718b8fab5e34b558
Merge: df7346c 66a2555
Author: Henri Vasserman <henv@hot.ee>
Date: Sun Jun 25 10:57:48 2023 +0300
Merge 'origin/master' into hipblas
commit abed427b6f370698fe8e8409e7980f238aad03ef
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jun 24 19:16:30 2023 -0500
reorganize If statements to include proper headers
commit 06c3bf03b92c2e00fc4bcd27f0c34f32c58b19a9
Merge: ea6d320 8342fe8
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sat Jun 24 16:57:20 2023 -0500
Merge branch 'LostRuins:concedo' into main
commit ea6d3208dcdc0b05e2c164dde8ee0bfc6a02ad09
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Fri Jun 23 01:53:28 2023 -0500
Update README.md
commit 4d56ad8158595d1e835cb379939dc5526deb39e2
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Jun 22 16:19:43 2023 -0500
Update README.md
commit 21f930872b6e232679fe02eac9e429367365c6af
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Jun 22 15:42:05 2023 -0500
kquants_iter for hipblas and add gfx803
commit df7346ccd52bc0368eeeb878e31a284e01eac61a
Merge: 5dd2fbe 7487137
Author: Henri Vasserman <henv@hot.ee>
Date: Thu Jun 22 20:51:09 2023 +0300
Merge 'origin/master' into hipblas
commit b6ff89066bbf2de23dab90bc8bbf9f63d8d1e070
Merge: eb094f0 e6ddb15
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Thu Jun 22 12:42:09 2023 -0500
Merge branch 'LostRuins:concedo' into main
commit eb094f043f9b0b94e7db028ca36e96ce479b0369
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jun 21 23:59:18 2023 -0500
lowvram parameter description
commit 3a5dfeb568d543376910180caa9a99b081fef9d4
Merge: 665cc11 b1f00fa
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jun 21 16:53:03 2023 -0500
Merge branch 'LostRuins:concedo' into koboldcpp-rocm
commit 665cc1136b188e7ff5c1aa1359118c999ff6d162
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Wed Jun 21 01:13:19 2023 -0500
add lowvram parameter
commit 222cbbb141f7ce79884cafb6bcebd860ae27cc04
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Tue Jun 20 19:03:28 2023 -0500
add additional hipblas conditions for cublas
commit e1f958124ec99525cb58d8c534f9d1789377544e
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Tue Jun 20 16:51:59 2023 -0500
Add hip def for cuda v2
commit 3bff5c0f0defd9d49b770c5ce107c71e5cba8003
Merge: a7e74b3 266d47a
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Tue Jun 20 13:38:06 2023 -0500
Merge branch 'LostRuins:concedo' into koboldcpp-rocm
commit a7e74b39fe5eedf85d955fe5ea5f4c546322a9b0
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jun 19 22:04:18 2023 -0500
Update README.md
commit 5e99b3cb72d83f45b3f7904ffb8f242e743a142c
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jun 19 22:03:42 2023 -0500
Update Makefile
commit 9190b17432ebdc489ab05b71df6c3b8d5e7f5895
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Mon Jun 19 21:47:10 2023 -0500
Update README.md
commit 5dd2fbe6ea87f78e38d888844a3820302a297048
Merge: 67e229b 20568fe
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jun 20 01:23:12 2023 +0300
Merge 'origin/master' into hipblas
commit 2780ea292b1e9c6ead274de3afb34337716be08f
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jun 18 15:48:00 2023 -0500
Update Makefile
commit 04a3e64807a92c2e105af92f16dd6db2ea024d39
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jun 18 14:33:39 2023 -0500
remove extra line
commit cccbca9dea3780e797a3b4972ba211e0c762fdc1
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jun 18 14:31:17 2023 -0500
attempt adding ROCM hipblas
commit a44a1d4b90ed11d83d622eb976a945ff26a8974e
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jun 18 14:31:01 2023 -0500
attempt adding ROCM hipblas
commit b08818416972f83349bc4d6479bccc55ee31436d
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date: Sun Jun 18 14:30:54 2023 -0500
attempt adding ROCM hipblas
commit 67e229b7ca0a51f367c1e1495a15c261d0893d25
Merge: 6f7c156 b241649
Author: Henri Vasserman <henv@hot.ee>
Date: Sun Jun 18 00:36:54 2023 +0300
Merge 'origin/master' into hipblas
commit 6f7c15637a8ed60d5d5dade24aaf63a296bc32a6
Merge: 61df8e9 fc45a81
Author: Henri Vasserman <henv@hot.ee>
Date: Sat Jun 17 16:53:22 2023 +0300
Merge 'origin/master' into hipblas
commit 61df8e92179b84af9041e53f61d0194dfd791de0
Author: Henri Vasserman <henv@hot.ee>
Date: Wed Jun 14 22:46:10 2023 +0300
add cudaMemset
commit a836529996845343dfb96becb4fd48e3f55da55c
Merge: 85f902d 254a7a7
Author: Henri Vasserman <henv@hot.ee>
Date: Wed Jun 14 22:41:55 2023 +0300
Merge 'origin/master' into hipblas
commit 85f902d5c44cee18858812212dad850b9409c7f9
Merge: 4362e80 b50b570
Author: Henri Vasserman <henv@hot.ee>
Date: Thu Jun 8 10:50:28 2023 +0300
Merge 'origin/master' into hipblas
commit 4362e805a4b0bd80d0cff0e3d8d0b1162cc8043c
Merge: fa5b3d7 17366df
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jun 6 23:14:40 2023 +0300
Merge 'origin/master' into hipblas
commit fa5b3d7365266a9903450c1105551ffec7f51d92
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jun 6 18:47:00 2023 +0300
fix makefile.
commit 1ba4ce4ad792f9672eecc37bf982386d3a007914
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jun 6 18:41:08 2023 +0300
Revert "warp size fixes"
It seems like 32 is faster for me, at least and it won't cause so many conflicts.
This reverts commit 5d6eb72164e5ae000d07dd725e635faa7a2f723d.
commit 5d6eb72164e5ae000d07dd725e635faa7a2f723d
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jun 6 18:32:41 2023 +0300
warp size fixes
commit 33091a9bd3bb3ecf59b0f5535b084f443f6a20b6
Merge: 9fdaa1d 2d43387
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Jun 6 16:19:23 2023 +0300
Merge 'origin/master' into hipblas
commit 9fdaa1d2501a2c4a030af6d34e97b2e4766b27c4
Author: Henri Vasserman <henv@hot.ee>
Date: Sat May 27 19:17:53 2023 +0300
Add more defs
For forward compatibility #1607
commit a4648c1e7c70b4985393ec0851403ef7fb8d1ffc
Merge: 4c8b3fb 0ecb1bb
Author: Henri Vasserman <henv@hot.ee>
Date: Sat May 27 18:22:39 2023 +0300
Merge 'origin/master' into hipblas
commit 4c8b3fb1071dff0cd0c4b4f96e506294ba6473f4
Author: Henri Vasserman <henv@hot.ee>
Date: Fri May 26 01:08:53 2023 +0300
add configurable vars
commit 30d921af3e0b21f511652c98448ccb631434d0d4
Author: Henri Vasserman <henv@hot.ee>
Date: Fri May 26 01:03:56 2023 +0300
and makefile
commit a593a4f6c24389528a5eed8e6dc86eb06ced38b8
Author: Henri Vasserman <henv@hot.ee>
Date: Fri May 26 00:55:28 2023 +0300
Add missing parameters
commit 174bf6a86d045a30b1253cbe3cc773808b202186
Merge: f80ce7a 1fcdcc2
Author: Henri Vasserman <henv@hot.ee>
Date: Fri May 26 00:44:23 2023 +0300
Merge 'origin/master' into hipblas
commit f80ce7a4e00b33adf6b13d231689dbf3a33ec475
Merge: 600ace3 ac7876a
Author: Henri Vasserman <henv@hot.ee>
Date: Thu May 25 00:02:50 2023 +0300
Merge branch 'origin/master' into hipblas
commit 600ace39c8f1d311b8f3c49003f5a6448a44b18e
Author: Henri Vasserman <henv@hot.ee>
Date: Sat May 20 23:42:20 2023 +0300
update warp size
commit b19fefef943d974db2eda8a8908e67e1d08e317c
Author: Henri Vasserman <henv@hot.ee>
Date: Sat May 20 23:28:08 2023 +0300
Forwardcompat
commit c66115b833178ea3711543ddbbd4eb2b21ab523e
Merge: a0b2d5f
b8ee340
Author: Henri Vasserman <henv@hot.ee>
Date: Sat May 20 18:29:31 2023 +0300
Merge 'origin/master' into hipblas
commit a0b2d5f291
Merge: 8bab456
2a5ee02
Author: Henri Vasserman <henv@hot.ee>
Date: Tue May 16 17:08:29 2023 +0300
Merge 'origin/master' into hipblas
commit 8bab45611e
Merge: 2956630
b5c9295
Author: Henri Vasserman <henv@hot.ee>
Date: Mon May 15 00:01:12 2023 +0300
Merge 'origin/master' into hipblas
commit 2956630a3d
Merge: 0fe6384
f048af0
Author: Henri Vasserman <henv@hot.ee>
Date: Sat May 13 13:12:52 2023 +0300
Merge 'origin/master' into hipblas
commit 0fe6384755
Author: Henri Vasserman <henv@hot.ee>
Date: Fri May 12 17:22:11 2023 +0300
fix makefile
commit 605560d9ec
Merge: 127f68e
089b1c9
Author: Henri Vasserman <henv@hot.ee>
Date: Fri May 12 16:12:53 2023 +0300
Merge 'origin/master' into hipblas
commit 127f68eb5a
Merge: 070cbcc
b608b55
Author: Henri Vasserman <henv@hot.ee>
Date: Thu May 11 20:21:27 2023 +0300
Merge 'origin/master' into hipblas
commit 070cbcc1bd
Author: Henri Vasserman <henv@hot.ee>
Date: Sun May 7 18:10:56 2023 +0300
occupanct function
commit a3296d50aa
Merge: 0aefa6a
e129551
Author: Henri Vasserman <henv@hot.ee>
Date: Sun May 7 18:06:04 2023 +0300
Merge 'origin/master' into hipblas
commit 0aefa6ab71
Merge: baeb482
1b0fd45
Author: Henri Vasserman <henv@hot.ee>
Date: Sun May 7 12:24:41 2023 +0300
Merge 'origin/master' into hipblas
commit baeb482a94
Author: Henri Vasserman <henv@hot.ee>
Date: Sun May 7 12:24:12 2023 +0300
Revert to default copy
commit 289073a532
Merge: 1107194
173d0e6
Author: Henri Vasserman <henv@hot.ee>
Date: Sat May 6 19:59:41 2023 +0300
Merge 'origin/master' into hipblas
commit 1107194e6b
Merge: 04c0d48
a3b85b2
Author: Henri Vasserman <henv@hot.ee>
Date: Sat May 6 00:38:20 2023 +0300
Merge 'origin/master' into hipblas
commit 04c0d480d7
Author: Henri Vasserman <henv@hot.ee>
Date: Thu May 4 12:31:16 2023 +0300
Move all HIP stuff to ggml-cuda.cu
commit d83cfbad0c
Merge: b67cc50
799fdc1
Author: Henri Vasserman <henv@hot.ee>
Date: Thu May 4 11:31:16 2023 +0300
Merge 'origin/master' into hipblas
commit b67cc50dad
Merge: fcbc262
e216aa0
Author: Henri Vasserman <henv@hot.ee>
Date: Wed May 3 15:04:51 2023 +0300
Merge 'origin/master' into hipblas
commit fcbc262eb9
Merge: c73def1
f4cef87
Author: Henri Vasserman <henv@hot.ee>
Date: Mon May 1 22:45:29 2023 +0300
Merge 'origin/master' into hipblas
commit c73def129a
Merge: d8ea75e
f0d70f1
Author: Henri Vasserman <henv@hot.ee>
Date: Sun Apr 30 18:40:42 2023 +0300
Merge 'origin/master' into hipblas
commit d8ea75e952
Merge: d194586
334637e
Author: Henri Vasserman <henv@hot.ee>
Date: Sat Apr 29 11:25:51 2023 +0300
Merge 'origin/master' into hipblas
commit d194586f65
Merge: 2ab9d11
7f15c5c
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Apr 28 23:03:52 2023 +0300
Merge 'origin/master' into hipblas
commit 2ab9d11f37
Merge: 3b4a531
04aaae1
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Apr 28 16:30:05 2023 +0300
Merge 'origin/master' into hipblas
commit 3b4a53138f
Merge: a1caa48
0b2da20
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Apr 28 10:08:41 2023 +0300
Merge 'origin/master' into hipblas
commit a1caa48611
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Apr 28 10:08:21 2023 +0300
add more cuda defines
This is so 'slaren/cuda-f16f32' would merge.
commit ecc056519f
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Apr 28 01:58:27 2023 +0300
only .cu file needs to be complied as device
commit ef51e9ecac
Merge: d571d16
4afcc37
Author: Henri Vasserman <henv@hot.ee>
Date: Wed Apr 26 12:46:26 2023 +0300
Merge branch 'ggerganov:master' into hipblas
commit d571d1629f
Merge: 608aa33
dd0eabc
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Apr 25 21:15:33 2023 +0300
Merge 'origin/master' into hipblas
commit 608aa33d9f
Author: Henri Vasserman <henv@hot.ee>
Date: Tue Apr 25 21:15:04 2023 +0300
change default GPU arch to match CMake
commit 3a004b2a01
Author: Henri Vasserman <henv@hot.ee>
Date: Mon Apr 24 02:24:54 2023 +0300
add rpath
commit db7a01297e
Merge: 3677235
284685f
Author: Henri Vasserman <henv@hot.ee>
Date: Sun Apr 23 21:49:28 2023 +0300
Merge 'origin/master' into hipblas
commit 367723544c
Author: Henri Vasserman <henv@hot.ee>
Date: Sat Apr 22 23:28:00 2023 +0300
More build file changes
commit d3e1984ce0
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Apr 21 03:32:06 2023 +0300
add rpath
commit 0e005f7793
Author: Henri Vasserman <henv@hot.ee>
Date: Fri Apr 21 02:13:00 2023 +0300
Build file changes
Now HIP Clang is not required, the CMake scripts will configure the
needed compiler, which can be system clang++. Also other code can
still use GCC, but CMake will force the clang to link.
commit 54a63c10e8
Author: Henri Vasserman <henv@hot.ee>
Date: Thu Apr 20 22:19:22 2023 +0300
Update Makefile for the Cuda kernels
commit 0fd8363adc
Author: Henri Vasserman <henv@hot.ee>
Date: Thu Apr 20 02:04:00 2023 +0300
use hipblas based on cublas
* Merge Fixes
* readme merge fix
* remove old ggmlv2 changes
* bring ggml v2_cuda up to date with AMD changes
* Revert ggml v2_cuda changes BC they werent needed
This reverts commit 3385dd4240
.
* avoid launching subprocesses to get device names for now, but other than that seems to be working
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-08-28 17:05:06 +08:00
Concedo
4b00916ac7
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .dockerignore
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# flake.lock
# flake.nix
# tests/CMakeLists.txt
2023-08-28 14:19:05 +08:00
Georgi Gerganov
c10704d01e
llama : fix MPI threads ( close #2827 )
2023-08-27 18:55:41 +03:00
Kawrakow
463173a6c0
llama : speedup tokenization ( #2831 )
...
* Speedup tokenization
On current master it takes ~3.2 seconds to tokenize
Wikitext. With this change it becomes ~525 ms.
* Fixit: it was missing the piece after the last found occurence
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-27 16:50:33 +03:00
Georgi Gerganov
eaa13a48ff
falcon : fix CUDA inference by making K and Q contiguous ( #2830 )
...
* falcon : fix CUDA inference by making K and Q contiguous
ggml-ci
* cuda : add assert to guard from non-cont ropes
2023-08-27 16:40:48 +03:00
Kawrakow
a6d1189fdd
k_quants tuning for Falcon-7b ( #2816 )
...
* Make ggml-cuda.cu build with QK_K = 64
Using LLAMA_CUDA_FORCE_DMMV = ON and -nommq it runs and produces
a meaningful result.
* k_quants tuning for Falcon-7b
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-27 15:19:59 +03:00
Georgi Gerganov
d0cee0d36d
gguf : add 64-bit support (GGUF v2) ( #2821 )
...
* gguf : bump version to 2
* gguf : add support for 64-bit (no backwards comp yet)
* gguf : v1 backwards comp
* gguf.py : bump GGUF version
* gguf.py : uint64_t on all lengths, sizes and counts, enums still uint32_t
* gguf.py : string lengths uint32_t
* gguf : update all counts to 64-bit
* gguf.py : string len uint64_t and n_dims uint32_t
* gguf : fix typo
* llama.cpp : print gguf version
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
2023-08-27 14:19:54 +03:00
Georgi Gerganov
edd4c14817
llama : more tokenizer fixes ( #2810 )
...
* tests : write a Python tokenizer test (wip)
* llama : prefix input text for tokenization with whitespace
* llama : distinguish pieces from decoded text + fix detokenization
* common : add comments
* examples : no longer manually add leading space when tokenizing
* tests : use Python to generate tokenizer tests for C++
* tests : add option to tokenize text files
ggml-ci
* tests : add test-tokenizer-1.py
* llama.cpp : fix LF token
* hellaswag : move the concat space for clarity
* tests : add falcon tests (py + cpp, currently do not pass Unicode)
ggml-ci
* common : temporary separate llama_detokenize calls for SPM and BPE
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
2023-08-27 14:19:19 +03:00
Przemysław Pawełczyk
1591e2e590
ggml : detect SSSE3 ( #2825 )
...
* ggml : add ggml_cpu_has_ssse3
* llama : show SSSE3 in system info
2023-08-27 11:10:25 +03:00
Tim Miller
c7d92e6dfe
llama : use Unicode Escape Sequence to replace encoded characters ( #2814 )
...
The use of special characters within source files can break compiling on some computers with different region and language settings. Using Unicode escape sequences should allow for the code to be compiled on all setups without needing to change your computers settings or switch regions.
2023-08-26 21:27:07 +03:00
Cebtenzzre
741ca7dd1c
llama : move #includes out of _GNU_SOURCE conditional ( #2817 )
2023-08-26 21:17:51 +03:00
Cebtenzzre
50526f37eb
llama : use std::abs in llama_sample_tail_free ( #2800 )
...
Plain 'abs' casts the input to int.
2023-08-26 19:53:52 +03:00
Georgi Gerganov
04f4b1eb10
k-quants : remove unnecessary tensor shape restrictions ( #2811 )
2023-08-26 17:37:35 +03:00
Kawrakow
7592375403
Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B ( #2807 )
...
* Better perplexity for 2- and 3-bit quantization for the 70B model
* PR comment
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-26 17:27:49 +03:00
klosax
2ba83c8685
Fix spm whitespaces ( #2806 )
...
* llama.cpp : fix spm whitespace escaping + clean up
* main.cpp : spm - add whitespace in front of prompt
* test-tokenizer-0.cpp : spm - add whitespace in front of prompt
2023-08-26 13:45:53 +02:00
Matt Pulver
c82742ac9c
llama : add llama_beam_search() ( #2267 )
...
* Add llama_beam_search().
* Add '// Beam search' heading to llama.{h,cpp} after llama_grammar_accept_token().
* Add space around * pointers and & references.
* Add spaces around comparison and assignment operators.
* Prefer west const.
* Use llama_ prefix for structs in global namespace.
* Delete obsolete comment from an earlier revision.
* Change eos to eob in llama_beam and llama_beam_view structs.
2023-08-25 18:18:48 +03:00
slaren
154725c543
llama-bench : add model sizes ( #2771 )
...
* llama-bench : add model sizes
* more compact markdown output
* back to GiB
* adjust column sizes
2023-08-25 15:16:19 +02:00
Henri Vasserman
6bbc598a63
ROCm Port ( #1087 )
...
* use hipblas based on cublas
* Update Makefile for the Cuda kernels
* Expand arch list and make it overrideable
* Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5 )
* add hipBLAS to README
* new build arg LLAMA_CUDA_MMQ_Y
* fix half2 decomposition
* Add intrinsics polyfills for AMD
* AMD assembly optimized __dp4a
* Allow overriding CC_TURING
* use "ROCm" instead of "CUDA"
* ignore all build dirs
* Add Dockerfiles
* fix llama-bench
* fix -nommq help for non CUDA/HIP
---------
Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com>
Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Co-authored-by: jammm <2500920+jammm@users.noreply.github.com>
Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com>
2023-08-25 12:09:42 +03:00
Georgi Gerganov
3f460a2b72
cuda : add RoPE kernel for mode == 2 (NeoX) ( #2760 )
...
* cuda : add RoPE kernel for mode == 2 (NeoX)
* falcon : do not offload the embeddings layer
2023-08-25 11:55:59 +03:00
slaren
0d3094f0c7
gguf : add rope_freq_base parameter for CodeLlama ( #2769 )
2023-08-24 21:04:05 +03:00
Shouzheng Liu
38b16dfca6
metal : bug-fix when enable ggml-alloc ( #2757 )
...
* metal: better memory alloc w/ concurrency dispatch
The ggml-alloc should only free tensors at memory barriers.
* ggml-alloc: avoid return silently
In certain cases, the allocate_node() function may silently return
without performing any memory allocation.
2023-08-24 19:27:25 +03:00
slaren
fea95c682d
fix convert.py for codellama, add llama 34B to the list of recognized models ( #2768 )
2023-08-24 17:44:11 +02:00
Georgi Gerganov
c3e53b421a
llama : escape all U+2581 in a string ( #2750 )
2023-08-24 12:26:01 +03:00
Concedo
b8372d4466
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .gitignore
# README.md
# tests/CMakeLists.txt
2023-08-24 15:21:24 +08:00
Evan Jones
6e91a1b070
llama : fix grammar sometimes generating null char ( #2756 )
2023-08-24 07:07:13 +03:00
Georgi Gerganov
cf658adc83
llm : add Falcon support ( #2717 )
...
* llama : refactor GGUF constants into static maps
* llama : check if model architecture is known
* llama : refactor llama_model_load_internal()
* gguf : add KV constant maps
* llm : read arch-specific KVs
* convert : add dummy scores + types
* falcon : load tensor data (CPU only)
* llama : fix loading progress bar
* llama : add arch member to llama_model
* falcon : CPU inference working
* falcon : support non-40B models
* falcon : minor
* llama : minor updates
ggml-ci
* convert-falcon-hf-to-gguf.py : fix special token mapping
* llama.cpp : llama default UNK token = id 0
* llama.cpp : fix bpe tokenizer
* llama.cpp : fix the fix of bpe tokenizer
* ggml : pass eps to ggml_norm
* metal : implement RoPE (mode = 2) + avoid ggml_repeat
* ggml : ggml_repeat always creates new tensor
* falcon : copy-paste self-attention from LLaMA
* metal : print extra compute pipeline info
* falcon : minor changes (still chasing the Metal problem)
* llama.cpp : fix linefeed token
* metal : fix GELU kernel numerical stability by using precise::tanh
* metal : temporary workaround for the concurrency optimization bug
* falcon : add CUDA offloading (#2739 )
* llama : better model naming and size reporting
* llama : prep new tokenizer support
* llama : advanced BPE tokenizer based on ggllm.cpp imlpementation
* llama : remove oboslete comment
ggml-ci
* common : remove obsolete BPE API + disable test-tokenizer-1
* llama : revert BPE special-case in llama_byte_to_token()
* cuda : add TODOs for RoPE NeoX implementation
* llama : default special tokens based on vocab type
* perplexity : add log for start of tokenization
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-08-23 23:08:04 +03:00
Concedo
af170fc2db
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
# llama.cpp
# scripts/sync-ggml.sh
# tests/test-tokenizer-0.cpp
2023-08-23 17:08:09 +08:00
Kerfuffle
777f42ba18
Improve handling of special tokens in GGML to GGUF converter ( #2725 )
...
* Improve UNK, BOS, EOS token handling when converting without metadata.
* Allow importing as a module.
* Remove some obsolete code and minor cleanups.
* Set default UNK token mapping from -1 to 0 in llama.cpp
* Try to handle overflow due to buggy Windows Python with a better error message
2023-08-22 17:39:39 -06:00
goerch
46ef5b5fcf
llama : fix whitespace escaping in tokenizer ( #2724 )
2023-08-23 00:10:42 +03:00
Georgi Gerganov
deb7dfca4b
gguf : add ftype meta info to the model ( #2710 )
...
* llama : add ftype meta info to the model
ggml-ci
* convert.py : add ftype when converting (does not work)
* convert.py : fix Enum to IntEnum
ggml-ci
2023-08-22 20:05:59 +03:00
Kawrakow
bac66994cf
Quantization imrovements for k_quants ( #2707 )
...
* Improve LLaMA-2 2-, 3- and 4-bit quantization
* Q3_K_S: use Q5_K for 1st 2 layers of attention.wv and feed_forward.w2
* Q4_K_S: use Q6_K for 1st 2 layers of attention.wv and feed_forward.w2
* Q2_K and Q3_K_M: use Q5_K instead of Q4_K for 1st 2 layers of
attention.wv and feed_forward.w2
This leads to a slight model sized increase as follows:
Q2_K : 2.684G vs 2.670G
Q3_K_S: 2.775G vs 2.745G
Q3_K_M: 3.071G vs 3.057G
Q4_K_S: 3.592G vs 3.563G
LLaMA-2 PPL for context 512 changes as follows:
Q2_K : 6.6691 vs 6.8201
Q3_K_S: 6.2129 vs 6.2584
Q3_K_M: 6.0387 vs 6.1371
Q4_K_S: 5.9138 vs 6.0041
There are improvements for LLaMA-1 as well, but they are
way smaller than the above.
* Minor 4-bit quantization improvement
For the same model size as previus commit, we get
PPL = 5.9069 vs 5.9138.
* Some more fine tuning
* Adding make_qkx2_quants
With it, we get PPL = 5.8828 for L2-7B Q4_K_S.
* Another minor improvement
* Q2_K improvement
Smaller model, lower perplexity.
7B: file size = 2.632G, PPL = 6.3772 vs original 2.670G PPL = 6.8201
12B: file size = 5.056G, PPL = 5.4577 vs original 5.130G PPL = 5.7178
It is mostly Q3_K except for tok_embeddings, attention.wq, attention.wk,
which are Q2_K
* Iterating
* Revert Q5_K back to make_qkx1_quants
* Better Q6_K
* make_qkx2_quants is better for Q5_K after all
* Fix after rebasing on master
* Fix for changed tensor names
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-22 19:14:09 +03:00
Concedo
39cc83e8c9
incomplete merge, compiles but generates rubbish
2023-08-22 23:12:47 +08:00
slaren
1123f7fbdf
ggml-cuda : use graph allocator ( #2684 )
...
use a different function for no_alloc to avoid breaking backwards compat, fixes lora
remove 512 n_batch limit
fixed 2048 batch size
cleanup
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-08-22 15:25:19 +02:00
Concedo
2d17c22437
functional commit before gguf merge
2023-08-22 18:20:06 +08:00
Georgi Gerganov
6381d4e110
gguf : new file format with flexible meta data (beta) ( #2398 )
...
* gguf : first API pass
* gguf : read header + meta data
* gguf : read tensor info
* gguf : initial model loading - not tested
* gguf : add gguf_get_tensor_name()
* gguf : do not support passing existing ggml_context to gguf_init
* gguf : simplify gguf_get_val
* gguf : gguf.c is now part of ggml.c
* gguf : read / write sample models
* gguf : add comments
* refactor : reduce code duplication and better API (#2415 )
* gguf : expose the gguf_type enum through the API for now
* gguf : add array support
* gguf.py : some code style changes
* convert.py : start a new simplified implementation by removing old stuff
* convert.py : remove GGML vocab + other obsolete stuff
* GGUF : write tensor (#2426 )
* WIP: Write tensor
* GGUF : Support writing tensors in Python
* refactor : rm unused import and upd todos
* fix : fix errors upd writing example
* rm example.gguf
* gitignore *.gguf
* undo formatting
* gguf : add gguf_find_key (#2438 )
* gguf.cpp : find key example
* ggml.h : add gguf_find_key
* ggml.c : add gguf_find_key
* gguf : fix writing tensors
* gguf : do not hardcode tensor names to read
* gguf : write sample tensors to read
* gguf : add tokenization constants
* quick and dirty conversion example
* gguf : fix writing gguf arrays
* gguf : write tensors one by one and code reuse
* gguf : fix writing gguf arrays
* gguf : write tensors one by one
* gguf : write tensors one by one
* gguf : write tokenizer data
* gguf : upd gguf conversion script
* Update convert-llama-h5-to-gguf.py
* gguf : handle already encoded string
* ggml.h : get array str and f32
* ggml.c : get arr str and f32
* gguf.py : support any type
* Update convert-llama-h5-to-gguf.py
* gguf : fix set is not subscriptable
* gguf : update convert-llama-h5-to-gguf.py
* constants.py : add layer norm eps
* gguf.py : add layer norm eps and merges
* ggml.h : increase GGML_MAX_NAME to 64
* ggml.c : add gguf_get_arr_n
* Update convert-llama-h5-to-gguf.py
* add gptneox gguf example
* Makefile : add gptneox gguf example
* Update convert-llama-h5-to-gguf.py
* add gptneox gguf example
* Update convert-llama-h5-to-gguf.py
* Update convert-gptneox-h5-to-gguf.py
* Update convert-gptneox-h5-to-gguf.py
* Update convert-llama-h5-to-gguf.py
* gguf : support custom alignment value
* gguf : fix typo in function call
* gguf : mmap tensor data example
* fix : update convert-llama-h5-to-gguf.py
* Update convert-llama-h5-to-gguf.py
* convert-gptneox-h5-to-gguf.py : Special tokens
* gptneox-main.cpp : special tokens
* Update gptneox-main.cpp
* constants.py : special tokens
* gguf.py : accumulate kv and tensor info data + special tokens
* convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens
* gguf : gguf counterpart of llama-util.h
* gguf-util.h : update note
* convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens
* convert-llama-h5-to-gguf.py : special tokens
* Delete gptneox-common.cpp
* Delete gptneox-common.h
* convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer
* gptneox-main.cpp : gpt2 bpe tokenizer
* gpt2 bpe tokenizer (handles merges and unicode)
* Makefile : remove gptneox-common
* gguf.py : bytesarray for gpt2bpe tokenizer
* cmpnct_gpt2bpe.hpp : comments
* gguf.py : use custom alignment if present
* gguf : minor stuff
* Update gptneox-main.cpp
* map tensor names
* convert-gptneox-h5-to-gguf.py : map tensor names
* convert-llama-h5-to-gguf.py : map tensor names
* gptneox-main.cpp : map tensor names
* gguf : start implementing libllama in GGUF (WIP)
* gguf : start implementing libllama in GGUF (WIP)
* rm binary commited by mistake
* upd .gitignore
* gguf : calculate n_mult
* gguf : inference with 7B model working (WIP)
* gguf : rm deprecated function
* gguf : start implementing gguf_file_saver (WIP)
* gguf : start implementing gguf_file_saver (WIP)
* gguf : start implementing gguf_file_saver (WIP)
* gguf : add gguf_get_kv_type
* gguf : add gguf_get_kv_type
* gguf : write metadata in gguf_file_saver (WIP)
* gguf : write metadata in gguf_file_saver (WIP)
* gguf : write metadata in gguf_file_saver
* gguf : rm references to old file formats
* gguf : shorter name for member variable
* gguf : rm redundant method
* gguf : get rid of n_mult, read n_ff from file
* Update gguf_tensor_map.py
* Update gptneox-main.cpp
* gguf : rm references to old file magics
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : quantization is working
* gguf : roper closing of file
* gguf.py : no need to convert tensors twice
* convert-gptneox-h5-to-gguf.py : no need to convert tensors twice
* convert-llama-h5-to-gguf.py : no need to convert tensors twice
* convert-gptneox-h5-to-gguf.py : simplify nbytes
* convert-llama-h5-to-gguf.py : simplify nbytes
* gptneox-main.cpp : n_layer --> n_block
* constants.py : n_layer --> n_block
* gguf.py : n_layer --> n_block
* convert-gptneox-h5-to-gguf.py : n_layer --> n_block
* convert-llama-h5-to-gguf.py : n_layer --> n_block
* gptneox-main.cpp : n_layer --> n_block
* Update gguf_tensor_map.py
* convert-gptneox-h5-to-gguf.py : load model in parts to save memory
* convert-llama-h5-to-gguf.py : load model in parts to save memory
* convert : write more metadata for LLaMA
* convert : rm quantization version
* convert-gptneox-h5-to-gguf.py : add file_type key
* gptneox-main.cpp : add file_type key
* fix conflicts
* gguf : add todos and comments
* convert-gptneox-h5-to-gguf.py : tensor name map changes
* Create gguf_namemap.py : tensor name map changes
* Delete gguf_tensor_map.py
* gptneox-main.cpp : tensor name map changes
* convert-llama-h5-to-gguf.py : fixes
* gguf.py : dont add empty strings
* simple : minor style changes
* gguf : use UNIX line ending
* Create convert-llama-7b-pth-to-gguf.py
* llama : sync gguf-llama.cpp with latest llama.cpp (#2608 )
* llama : sync gguf-llama.cpp with latest llama.cpp
* minor : indentation + assert
* llama : refactor gguf_buffer and gguf_ctx_buffer
* llama : minor
* gitignore : add gptneox-main
* llama : tokenizer fixes (#2549 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* convert : update convert-new.py with tokenizer fixes (#2614 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* llama : sync gguf-llama with llama (#2613 )
* llama : sync gguf-llama with llama
* tests : fix build + warnings (test-tokenizer-1 still fails)
* tests : fix wstring_convert
* convert : fix layer names
* llama : sync gguf-llama.cpp
* convert : update HF converter to new tokenizer voodoo magics
* llama : update tokenizer style
* convert-llama-h5-to-gguf.py : add token types
* constants.py : add token types
* gguf.py : add token types
* convert-llama-7b-pth-to-gguf.py : add token types
* gguf-llama.cpp : fix n_head_kv
* convert-llama-h5-to-gguf.py : add 70b gqa support
* gguf.py : add tensor data layout
* convert-llama-h5-to-gguf.py : add tensor data layout
* convert-llama-7b-pth-to-gguf.py : add tensor data layout
* gptneox-main.cpp : add tensor data layout
* convert-llama-h5-to-gguf.py : clarify the reverse permute
* llama : refactor model loading code (#2620 )
* llama : style formatting + remove helper methods
* llama : fix quantization using gguf tool
* llama : simplify gguf_file_saver
* llama : fix method names
* llama : simplify write_header()
* llama : no need to pass full file loader to the file saver
just gguf_ctx
* llama : gguf_file_saver write I32
* llama : refactor tensor names (#2622 )
* gguf: update tensor names searched in quantization
* gguf : define tensor names as constants
* gguf : initial write API (not tested yet)
* gguf : write to file API (not tested)
* gguf : initial write API ready + example
* gguf : fix header write
* gguf : fixes + simplify example + add ggml_nbytes_pad()
* gguf : minor
* llama : replace gguf_file_saver with new gguf write API
* gguf : streaming support when writing files
* gguf : remove oboslete write methods
* gguf : remove obosolete gguf_get_arr_xxx API
* llama : simplify gguf_file_loader
* llama : move hparams and vocab from gguf_file_loader to llama_model_loader
* llama : merge gguf-util.h in llama.cpp
* llama : reorder definitions in .cpp to match .h
* llama : minor simplifications
* llama : refactor llama_model_loader (WIP)
wip : remove ggml_ctx from llama_model_loader
wip : merge gguf_file_loader in llama_model_loader
* llama : fix shape prints
* llama : fix Windows build + fix norm_rms_eps key
* llama : throw error on missing KV paris in model meta data
* llama : improve printing + log meta data
* llama : switch print order of meta data
---------
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
* gguf : deduplicate (#2629 )
* gguf : better type names
* dedup : CPU + Metal is working
* ggml : fix warnings about unused results
* llama.cpp : fix line feed and compiler warning
* llama : fix strncpy warning + note token_to_str does not write null
* llama : restore the original load/save session implementation
Will migrate this to GGUF in the future
* convert-llama-h5-to-gguf.py : support alt ctx param name
* ggml : assert when using ggml_mul with non-F32 src1
* examples : dedup simple
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
* gguf.py : merge all files in gguf.py
* convert-new.py : pick #2427 for HF 70B support
* examples/gguf : no need to keep q option for quantization any more
* llama.cpp : print actual model size
* llama.cpp : use ggml_elements()
* convert-new.py : output gguf (#2635 )
* convert-new.py : output gguf (WIP)
* convert-new.py : add gguf key-value pairs
* llama : add hparams.ctx_train + no longer print ftype
* convert-new.py : minor fixes
* convert-new.py : vocab-only option should work now
* llama : fix tokenizer to use llama_char_to_byte
* tests : add new ggml-vocab-llama.gguf
* convert-new.py : tensor name mapping
* convert-new.py : add map for skipping tensor serialization
* convert-new.py : convert script now works
* gguf.py : pick some of the refactoring from #2644
* convert-new.py : minor fixes
* convert.py : update to support GGUF output
* Revert "ci : disable CI temporary to not waste energy"
This reverts commit 7e82d25f40386540c2c15226300ad998ecd871ea.
* convert.py : n_head_kv optional and .gguf file extension
* convert.py : better always have n_head_kv and default it to n_head
* llama : sync with recent PRs on master
* editorconfig : ignore models folder
ggml-ci
* ci : update ".bin" to ".gguf" extension
ggml-ci
* llama : fix llama_model_loader memory leak
* gptneox : move as a WIP example
* llama : fix lambda capture
ggml-ci
* ggml : fix bug in gguf_set_kv
ggml-ci
* common.h : .bin --> .gguf
* quantize-stats.cpp : .bin --> .gguf
* convert.py : fix HF tensor permuting / unpacking
ggml-ci
* llama.cpp : typo
* llama : throw error if gguf fails to init from file
ggml-ci
* llama : fix tensor name grepping during quantization
ggml-ci
* gguf.py : write tensors in a single pass (#2644 )
* gguf : single pass for writing tensors + refactoring writer
* gguf : single pass for writing tensors + refactoring writer
* gguf : single pass for writing tensors + refactoring writer
* gguf : style fixes in simple conversion script
* gguf : refactor gptneox conversion script
* gguf : rename h5 to hf (for HuggingFace)
* gguf : refactor pth to gguf conversion script
* gguf : rm file_type key and method
* gguf.py : fix vertical alignment
* gguf.py : indentation
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* convert-gptneox-hf-to-gguf.py : fixes
* gguf.py : gptneox mapping
* convert-llama-hf-to-gguf.py : fixes
* convert-llama-7b-pth-to-gguf.py : fixes
* ggml.h : reverse GGUF_MAGIC
* gguf.py : reverse GGUF_MAGIC
* test-tokenizer-0.cpp : fix warning
* llama.cpp : print kv general.name
* llama.cpp : get special token kv and linefeed token id
* llama : print number of tensors per type + print arch + style
* tests : update vocab file with new magic
* editorconfig : fix whitespaces
* llama : re-order functions
* llama : remove C++ API + reorganize common source in /common dir
* llama : minor API updates
* llama : avoid hardcoded special tokens
* llama : fix MPI build
ggml-ci
* llama : introduce enum llama_vocab_type + remove hardcoded string constants
* convert-falcon-hf-to-gguf.py : falcon HF --> gguf conversion, not tested
* falcon-main.cpp : falcon inference example
* convert-falcon-hf-to-gguf.py : remove extra kv
* convert-gptneox-hf-to-gguf.py : remove extra kv
* convert-llama-7b-pth-to-gguf.py : remove extra kv
* convert-llama-hf-to-gguf.py : remove extra kv
* gguf.py : fix for falcon 40b
* falcon-main.cpp : fix for falcon 40b
* convert-falcon-hf-to-gguf.py : update ref
* convert-falcon-hf-to-gguf.py : add tensor data layout
* cmpnct_gpt2bpe.hpp : fixes
* falcon-main.cpp : fixes
* gptneox-main.cpp : fixes
* cmpnct_gpt2bpe.hpp : remove non-general stuff
* Update examples/server/README.md
Co-authored-by: slaren <slarengh@gmail.com>
* cmpnct_gpt2bpe.hpp : cleanup
* convert-llama-hf-to-gguf.py : special tokens
* convert-llama-7b-pth-to-gguf.py : special tokens
* convert-permute-debug.py : permute debug print
* convert-permute-debug-master.py : permute debug for master
* convert-permute-debug.py : change permute type of attn_q
* convert.py : 70b model working (change attn_q permute)
* Delete convert-permute-debug-master.py
* Delete convert-permute-debug.py
* convert-llama-hf-to-gguf.py : fix attn_q permute
* gguf.py : fix rope scale kv
* convert-llama-hf-to-gguf.py : rope scale and added tokens
* convert-llama-7b-pth-to-gguf.py : rope scale and added tokens
* llama.cpp : use rope scale kv
* convert-llama-7b-pth-to-gguf.py : rope scale fix
* convert-llama-hf-to-gguf.py : rope scale fix
* py : fix whitespace
* gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682 )
* First pass at converting GGMLv3 LLaMA models to GGUF
* Cleanups, better output during conversion
* Fix vocab space conversion logic
* More vocab conversion fixes
* Add description to converted GGUF files
* Improve help text, expand warning
* Allow specifying name and description for output GGUF
* Allow overriding vocab and hyperparams from original model metadata
* Use correct params override var name
* Fix wrong type size for Q8_K
Better handling of original style metadata
* Set default value for gguf add_tensor raw_shape KW arg
* llama : improve token type support (#2668 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* Improved tokenizer test
But does it work on MacOS?
* Improve token type support
- Added @klosax code to convert.py
- Improved token type support in vocabulary
* Exclude platform dependent tests
* More sentencepiece compatibility by eliminating magic numbers
* Restored accidentally removed comment
* llama : add API for token type
ggml-ci
* tests : use new tokenizer type API (#2692 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* Improved tokenizer test
But does it work on MacOS?
* Improve token type support
- Added @klosax code to convert.py
- Improved token type support in vocabulary
* Exclude platform dependent tests
* More sentencepiece compatibility by eliminating magic numbers
* Restored accidentally removed comment
* Improve commentary
* Use token type API in test-tokenizer-1.cpp
* py : cosmetics
* readme : add notice about new file format
ggml-ci
---------
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
Co-authored-by: goerch <jhr.walter@t-online.de>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-08-21 23:07:43 +03:00
slaren
097e121e2f
llama : add benchmark example ( #2626 )
...
* llama : add benchmark example
* add to examples CMakeLists.txt
* fix msvc build
* add missing include
* add Bessel's correction to stdev calculation
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* improve markdown formatting
* add missing include
* print warning is NDEBUG is not defined
* remove n_prompt and n_gen from the matrix, use each value separately instead
* better checks for non-optimized builds
* llama.cpp : fix MEM_REQ_SCRATCH0 reusing the value of n_ctx of the first call
* fix json formatting
* add sql output
* add basic cpu and gpu info (linx/cuda only)
* markdown: also show values that differ from the default
* markdown: add build id
* cleanup
* improve formatting
* formatting
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2023-08-18 12:44:58 +02:00
Evan Jones
604b8bdfa6
Fix unicode in grammars ( fixes #2501 ) ( #2553 )
...
* Fix unicode in grammars (fixes #2501 )
* add more comments
* fix test-llama-grammar
2023-08-17 19:54:44 -04:00
Georgi Gerganov
a73ccf1aa3
llama : replace (permute + reshape + view_1d) with (view_3d) ( #2538 )
...
ggml-ci
2023-08-17 10:47:09 +03:00
Shouzheng Liu
fc8ef549e5
metal : enable ggml-alloc ( #2627 )
...
* metal: enable ggml-alloc
Make ggml-alloc work with concurrently dispatch.
* style-fix
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-16 23:08:28 +03:00
Shouzheng Liu
bf83bff674
metal : matrix-matrix multiplication kernel ( #2615 )
...
* metal: matrix-matrix multiplication kernel
This commit removes MPS and uses custom matrix-matrix multiplication
kernels for all quantization types. This commit also adds grouped-query
attention to support llama2 70B.
* metal: fix performance degradation from gqa
Integers are slow on the GPU, and 64-bit divides are extremely slow.
In the context of GQA, we introduce a 64-bit divide that cannot be
optimized out by the compiler, which results in a decrease of ~8% in
inference performance. This commit fixes that issue by calculating a
part of the offset with a 32-bit divide. Naturally, this limits the
size of a single matrix to ~4GB. However, this limitation should
suffice for the near future.
* metal: fix bugs for GQA and perplexity test.
I mixed up ne02 and nb02 in previous commit.
2023-08-16 23:07:04 +03:00