Li, Zonghang
|
1b3b6a506f
|
fix: add warm-up in profiling to prevent init delay
|
2025-06-03 17:10:09 +04:00 |
|
Lizonghang
|
421b3deca5
|
fix llama-cli pos sync
|
2025-05-19 18:08:27 +04:00 |
|
Lizonghang
|
c54a6a0132
|
fix context shifting
|
2025-05-19 16:58:35 +04:00 |
|
Lizonghang
|
2fbc0c8da3
|
fix: reset -ngl to 0 when GPU is not used and reformat code
|
2025-05-14 13:27:20 +04:00 |
|
DeEMO
|
168c14f4e8
|
remove unnecessary profile when --lw is specified
|
2025-04-17 13:49:09 +00:00 |
|
leeetao
|
fc1e2d3fc6
|
Added support for iq1s and iq1m quantization type
|
2025-04-17 10:27:53 +00:00 |
|
Zonghang Li
|
63b45a4c26
|
add args -k and --force
|
2025-03-11 22:09:39 +04:00 |
|
Zonghang Li
|
bcfdace59b
|
add args -k and --force
|
2025-03-11 20:44:36 +04:00 |
|
leeetao
|
45ec52c2cb
|
Added support for IQ1_M and IQ2_XXS quantization type
|
2025-03-07 16:56:16 +00:00 |
|
leeetao
|
6a416534c8
|
Fixed the alignment display of device performance
|
2025-03-07 07:46:30 +00:00 |
|
leeetao
|
2f049b8428
|
Added support for Q2K, IQ1s, IQ4NL quantization types
|
2025-03-04 15:22:55 +00:00 |
|
leeetao
|
e2cda4cfa0
|
Removed support for GGML_TYPE_Q4_0_4_4, GGML_TYPE_0_4_8, and GGML_TYPE_0_8_8 (GGUF no longer supports these types)
|
2025-03-01 14:31:38 +00:00 |
|
Lizonghang
|
9cbdf01645
|
fix support for Q5_0
|
2025-02-27 22:25:03 +04:00 |
|
Lizonghang
|
550fdcbc4f
|
add support for Q5_0
|
2025-02-27 21:47:14 +04:00 |
|
leeetao
|
42da179d66
|
Added parameter display for the distilled model of deepseek-qwen
|
2025-02-24 13:24:56 +00:00 |
|
leeetao
|
7bf1b743fb
|
Merge branch 'dev' into lt_test
Merge dev branch updates into local branch lt_test.
|
2025-02-23 08:35:45 +00:00 |
|
leeetao
|
f99e08b9fe
|
Added inference support for the Deepseek distilled model
|
2025-02-23 08:27:37 +00:00 |
|
Lizonghang
|
e219fada4e
|
disable timer
|
2025-02-19 16:24:12 +04:00 |
|
Lizonghang
|
c84f9d29fe
|
use arg prefetch and remove arg unload
|
2025-02-12 17:04:41 +04:00 |
|
Lizonghang
|
708b1d8c89
|
disable force fetching
|
2025-02-12 16:55:44 +04:00 |
|
Lizonghang
|
ea0e655a8b
|
disable force feteching
|
2025-02-12 16:55:21 +04:00 |
|
Lizonghang
|
b163918b46
|
disable prefetch in standalone mode
|
2025-02-12 00:17:33 +04:00 |
|
Lizonghang
|
6a50d494d2
|
increase prefetch dense
|
2025-02-11 17:25:06 +04:00 |
|
Lizonghang
|
65ad14140a
|
do not check loaded tensors due to increased latency
|
2025-02-11 17:10:11 +04:00 |
|
Lizonghang
|
3dd3138207
|
ignore tensors already in page cache when prefetching
|
2025-02-11 17:00:17 +04:00 |
|
Zonghang Li
|
261c88f058
|
skip tensors on CUDA in manage_graph_tensors
|
2025-02-11 09:49:17 +04:00 |
|
Zonghang Li
|
c4c6a642fc
|
manage_graph_tensors: fix segment prefetch
|
2025-02-08 22:44:38 +04:00 |
|
Lizonghang
|
8e41362af0
|
fix prefetch
|
2025-02-04 17:38:41 +04:00 |
|
Lizonghang
|
215151918f
|
fix gpu mem limit
|
2025-01-31 18:52:13 +04:00 |
|
Lizonghang
|
17cd8ba618
|
reverse 300MiB for Metal kernel
|
2025-01-31 16:24:44 +04:00 |
|
Lizonghang
|
dd632ee6df
|
ignore the first 5 evals due to preheat
|
2025-01-31 08:53:51 +04:00 |
|
Lizonghang
|
b680cb74fe
|
set POSIX_MADV_WILLNEED for the next subgraph
|
2025-01-30 13:29:34 +04:00 |
|
Lizonghang
|
f9b4c46b74
|
ignore the first eval to make time test more accurate
|
2025-01-30 11:12:26 +04:00 |
|
Lizonghang
|
849b47ccd0
|
fix auto schedule logic
|
2025-01-29 13:13:37 +04:00 |
|
Lizonghang
|
631daadd92
|
test
|
2025-01-28 16:36:47 +04:00 |
|
Lizonghang
|
2934cf3e8e
|
reserve 200 mib for internal gpu usage
|
2025-01-27 22:14:12 +04:00 |
|
Lizonghang
|
1ca9a43bd1
|
keep the output layer weights in shared memory by default
|
2025-01-25 23:31:43 +04:00 |
|
Lizonghang
|
f3dd5776eb
|
fix kappa and memory bounds, account for look-up table and input/output layer delay
|
2025-01-25 22:31:40 +04:00 |
|
Lizonghang
|
1c0087e919
|
rename arg --keep-inp-out-in-metal to --keep-out-in-metal
|
2025-01-23 23:17:06 +04:00 |
|
Lizonghang
|
fb05f80f89
|
remove token_embd from metal mem
|
2025-01-23 16:58:25 +04:00 |
|
Lizonghang
|
78a544d716
|
add metal mem limit
|
2025-01-23 16:08:52 +04:00 |
|
Zonghang Li
|
33429ec4e1
|
add option --keep-inp-out-in-metal
|
2025-01-22 11:25:09 +04:00 |
|
Lizonghang
|
facb4ea736
|
add option --keep-inp-out-in-metal and fix bugs in unmap
|
2025-01-22 11:15:19 +04:00 |
|
Lizonghang
|
ce2ef9699f
|
fix mapping unmap_fragment error
|
2025-01-21 22:43:51 +04:00 |
|
Lizonghang
|
189ed92cba
|
segment mmap range on Metal shared memory to avoid memory waste
|
2025-01-21 21:07:02 +04:00 |
|
Lizonghang
|
e7fae2acdb
|
fix cuda mem limitation
|
2025-01-16 09:48:08 +04:00 |
|
Zonghang Li
|
46e99218b4
|
add arg --cuda-mem
|
2025-01-16 09:15:34 +04:00 |
|
Lizonghang
|
1e1ba5bb91
|
add api llama_model_set_n_gpu_layers
|
2025-01-15 10:47:53 +04:00 |
|
Lizonghang
|
9279a2e3ff
|
fix error in llama_context_n_gpu_layers
|
2025-01-15 10:08:41 +04:00 |
|
Lizonghang
|
5d9aadf3d5
|
use highs to solve the allocation program
|
2025-01-15 10:04:04 +04:00 |
|