DannyDaemonic
41654efea8
Interface improvements and --multiline-input (previously --author-mode) ( #1040 )
...
* Interface improvements
* Multiline input
* Track character width
* Works with all characters and control codes + Windows console fixes
2023-05-08 19:45:48 -07:00
Georgi Gerganov
f9a6364912
llama : require first token to be BOS ( #1303 )
...
* llama : require first token to be BOS
* scripts : add ppl-run-all.sh
* perplexity : add BOS for each chunk
* readme : update perplexity values after BOS fix
* perplexity : add clarifying comments
2023-05-08 17:41:54 +03:00
Concedo
1083876a1b
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
2023-05-08 11:12:42 +08:00
Johannes Gäßler
1f48b0abcf
Documented CUDA reproducibility, added warning ( #1346 )
2023-05-08 02:42:01 +02:00
Concedo
62beded0e7
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# Makefile
# README.md
2023-05-07 19:10:01 +08:00
Jed Fox
3924088512
Remove default arguments from sampling functions ( #1343 )
2023-05-06 17:01:47 -04:00
Concedo
39f3d1cf48
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# examples/quantize/quantize.cpp
2023-05-05 21:34:33 +08:00
slaren
94c5652fc0
quantize: make output filename optional, default to ggml-model-<ftype>.bin ( #1301 )
2023-05-05 00:58:56 +02:00
44670
2edbdb0f99
main : add --in-suffix option ( #1318 )
...
* adding --in-suffix option
* print input suffix before generation
2023-05-04 18:41:12 +03:00
DannyDaemonic
db1080876a
Only escape prompts when used with -e ( #1311 )
2023-05-04 05:08:25 -07:00
DannyDaemonic
c65a7fbfa9
Update main's README.md with new features ( #1296 )
2023-05-04 03:02:59 -07:00
Tomas
f647ce040f
fix #1224 reverse prompt and multi line ( #1297 )
...
* fix reverse prompt and multi line
* Code Formatting
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-04 03:02:30 -07:00
Concedo
e01dc631f7
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
2023-05-04 14:04:41 +08:00
khimaros
6daa09d879
examples : read chat prompts from a template file ( #1196 )
2023-05-03 20:58:11 +03:00
Concedo
ede8e4edbb
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
2023-05-03 23:34:50 +08:00
CRD716
a8a2efdc81
examples : various prompt and example fixes ( #1298 )
...
* fix dan.txt
* miku prompt improvements
* use common characters
2023-05-03 18:26:47 +03:00
DannyDaemonic
2485d7a4d3
Process escape sequences given in prompts ( #1173 )
2023-05-02 18:46:20 -07:00
DannyDaemonic
13b0c68ed7
Handle signals properly on Windows ( #1123 )
2023-05-02 18:01:57 -07:00
slaren
bf4b22ffe4
fix missing parameters in llama_init_from_gpt_params ( #1293 )
2023-05-03 01:36:45 +02:00
Ron Evans
67c77799e0
examples : add llama_init_from_gpt_params() common function ( #1290 )
...
Signed-off-by: deadprogram <ron@hybridgroup.com>
2023-05-02 23:39:51 +03:00
Georgi Gerganov
0e6cbff1b7
llama : fix compile warnings
2023-05-02 23:09:08 +03:00
Ron Evans
8c9be35ff9
examples : improve vertical alignment of a few variables ( #1286 )
...
Signed-off-by: deadprogram <ron@hybridgroup.com>
2023-05-02 20:53:52 +03:00
Robert Brisita
2bb992f034
llama : allow 0 as a seed number. ( #1275 )
2023-05-02 19:23:44 +03:00
Ron Evans
e2cd506999
main : switch input_noecho to input_echo to remove negation ( #979 )
...
Signed-off-by: deadprogram <ron@hybridgroup.com>
2023-05-02 19:13:26 +03:00
Concedo
94827172e0
Merge branch 'master' into concedo
...
# Conflicts:
# CMakeLists.txt
# Makefile
# ggml-cuda.cu
# ggml-cuda.h
2023-05-02 14:38:31 +08:00
DannyDaemonic
f4cef87edf
Add git-based build information for better issue tracking ( #1232 )
...
* Add git-based build information for better issue tracking
* macOS fix
* "build (hash)" and "CMAKE_SOURCE_DIR" changes
* Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages
* Fix conditional dependency on missing target
* Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile
* 4 space indenting for cmake, attempt to clean up my mess in Makefile
* Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it
2023-05-01 18:23:47 +02:00
Georgi Gerganov
70269cae37
llama : fix session load / save ( #1263 )
2023-05-01 14:54:59 +03:00
Concedo
3de34ee492
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# ggml-opencl.c
2023-05-01 12:03:46 +08:00
jon-chuang
a5d30b1f53
common : better default number of threads ( #934 )
...
* commit
* fix
* try-catch
* apply code review
* improve
* improve
* add macos headers
* done
* remove color
* fix windows
* minor
* fix
* Apply suggestions from code review
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
* remove
* minor
* minor
---------
Co-authored-by: jon-chuang <jon-chuang@users.noreply.github.com>
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-04-30 21:41:35 +03:00
Stephan Walter
f0d70f147d
Various fixes to mat_mul benchmark ( #1253 )
2023-04-30 12:32:37 +00:00
Concedo
0061b90ec6
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
2023-04-30 10:35:02 +08:00
Georgi Gerganov
305eb5afd5
build : fix reference to old llama_util.h
2023-04-29 13:53:12 +03:00
Georgi Gerganov
84ca9c2ecf
examples : fix save-load-state + rename llama-util.h
2023-04-29 13:48:11 +03:00
Concedo
da0c34b028
Merge branch 'master' into concedo_experimental
2023-04-29 18:27:06 +08:00
Georgi Gerganov
334637e43e
common : change default parameters to pre-#1126 ( #1223 )
2023-04-29 09:51:06 +03:00
Ivan Stepanov
dd7eff57d8
llama : new sampling algorithms ( #1126 )
...
* Sample interface, new samplers.
New samplers:
- locally typical sampling
- tail free sampling
- frequency and presence penalty
- mirostat
Ignore EOS fix: -inf should be used.
* mirostat
* Added --logit-bias and --no-penalize-nl, removed std::span
* Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)
Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)
* Save and load example adjust
* Tests
* Windows build fix
* Windows test fix
2023-04-29 08:34:41 +03:00
Concedo
bb282a4ecf
reinstated the q4_3 format, for backwards compatibility.
2023-04-29 11:42:04 +08:00
Stephan Walter
36d19a603b
Remove Q4_3 which is no better than Q5 ( #1218 )
2023-04-28 23:10:43 +00:00
CRD716
5fba3c016b
examples : add Jeopardy example ( #1168 )
...
* Basic Setup
* Prevent Results.txt from coming up
* Prefixes, Line separators, etc
* editorcheck
* introduction to give more consistent results
* Basic graph thing
* Grading, ready for testing!
* Y'all ready to get funky?
* fix column removal stuff
* missed a few
2023-04-28 19:13:33 +03:00
Evan Jones
1481a9cf25
llama : add session file format and saved sessions in main ( #1169 )
2023-04-28 18:59:37 +03:00
Georgi Gerganov
574406dc7e
ggml : add Q5_0 and Q5_1 quantization ( #1187 )
...
* ggml : add Q5_0 quantization (cuBLAS only)
* ggml : fix Q5_0 qh -> uint32_t
* ggml : fix q5_0 histogram stats
* ggml : q5_0 scalar dot product
* ggml : q5_0 ARM NEON dot
* ggml : q5_0 more efficient ARM NEON using uint64_t masks
* ggml : rename Q5_0 -> Q5_1
* ggml : adding Q5_0 mode
* quantize : add Q5_0 and Q5_1 to map
* ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195 )
---------
Co-authored-by: Stephan Walter <stephan@walter.name>
2023-04-26 23:14:13 +03:00
Pavol Rusnak
859fee6dfb
quantize : use map to assign quantization type from string ( #1191 )
...
instead of `int` (while `int` option still being supported)
This allows the following usage:
`./quantize ggml-model-f16.bin ggml-model-q4_0.bin q4_0`
instead of:
`./quantize ggml-model-f16.bin ggml-model-q4_0.bin 2`
2023-04-26 18:43:27 +02:00
Georgi Gerganov
7a32fcb3b2
ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) ( #1179 )
...
* ggml : add Q8_0 quantization format (rename the old one to Q8_1)
* tests : fix test-quantize-fns
* ggml : finalize Q8_0 implementation
* ggml : use q4_0_q8_0 and q4_2_q8_0
* ggml : fix Q8_0 dot product bug (ARM)
* ggml : Q8_0 unroll x2
* ggml : fix bug - using wrong block type
* ggml : extend quantize_fns_t with "vec_dot_type"
* ggml : fix Q8_0 to use 255 values out of 256
* ggml : fix assert using wrong QK4_2 instead of QK4_3
2023-04-25 23:40:51 +03:00
xaedes
0c5692345d
examples : add save_load_state example ( #1150 )
...
* add save_load_state example
* use <cstdio> instead of <iostream> and fprintf / printf instead of cout
* renamed save-load-state example files replacing underscores by dashes
2023-04-24 19:23:31 +03:00
mgroeber9110
9b0a4d4214
examples/main README improvements and some light refactoring ( #1131 )
2023-04-24 15:45:32 +00:00
slaren
1d78fecdab
Fix LoRA acronym ( #1145 )
2023-04-23 23:03:44 +02:00
DannyDaemonic
edce63baa9
Added README.md for main with examples and explanations ( #1139 )
2023-04-23 15:37:02 +00:00
Stephan Walter
c50b628810
Fix CI: ARM NEON, quantization unit tests, editorconfig ( #1122 )
2023-04-22 10:54:13 +00:00
wbpxre150
36b4f7e064
llama : print timings on ctrl+c exit ( #1021 )
...
* print timings on ctrl+c exit
* remove redundant free memory call.
* add global pointer to ctx.
2023-04-22 11:56:35 +03:00
eiery
10f19c1121
llama : have n_batch default to 512 ( #1091 )
...
* set default n_batch to 512 when using BLAS
* spacing
* alternate implementation of setting different n_batch for BLAS
* set n_batch to 512 for all cases
2023-04-22 11:27:05 +03:00