Commit graph

12 commits

Author SHA1 Message Date
Concedo
f6ece6fd37 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/check-vendor.yml
#	.github/workflows/close-issue.yml
#	.github/workflows/editorconfig.yml
#	.github/workflows/gguf-publish.yml
#	.github/workflows/labeler.yml
#	.github/workflows/pre-tokenizer-hashes.yml
#	.github/workflows/python-check-requirements.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/python-type-check.yml
#	.github/workflows/server.yml
#	.github/workflows/update-ops-docs.yml
#	README.md
#	docs/build.md
#	examples/model-conversion/scripts/utils/perplexity-gen.sh
#	examples/model-conversion/scripts/utils/perplexity-run-simple.sh
#	examples/model-conversion/scripts/utils/perplexity-run.sh
#	examples/model-conversion/scripts/utils/quantize.sh
#	examples/model-conversion/scripts/utils/run-embedding-server.sh
#	ggml/src/ggml-cpu/ggml-cpu.c
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	ggml/src/ggml-opencl/kernels/mul_mv_q6_k_f32.cl
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/compare-llama-bench.py
#	tests/test-backend-ops.cpp
#	tests/test-gguf.cpp
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/server/README.md
2026-01-27 23:06:13 +08:00
Georgi Gerganov
080b161995
completion : fix prompt cache for recurrent models (#19045) 2026-01-25 09:12:50 +02:00
Concedo
66ccf8f6b8 Merge commit 'f14f4e421b' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	AGENTS.md
#	CONTRIBUTING.md
#	docs/build.md
#	examples/llama.android/app/build.gradle.kts
#	examples/llama.android/app/src/main/java/com/example/llama/MainActivity.kt
#	examples/llama.android/app/src/main/res/layout/activity_main.xml
#	examples/llama.android/gradle/libs.versions.toml
#	examples/llama.android/lib/src/main/cpp/ai_chat.cpp
#	examples/llama.android/lib/src/main/java/com/arm/aichat/InferenceEngine.kt
#	examples/llama.android/lib/src/main/java/com/arm/aichat/internal/InferenceEngineImpl.kt
#	examples/model-conversion/scripts/causal/compare-embeddings-logits.sh
#	examples/model-conversion/scripts/embedding/run-original-model.py
#	examples/retrieval/retrieval.cpp
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-cpu/kleidiai/kernels.cpp
#	ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-cuda/mmq.cu
#	ggml/src/ggml-cuda/mmq.cuh
#	src/CMakeLists.txt
#	tools/llama-bench/llama-bench.cpp
#	tools/server/CMakeLists.txt
2026-01-01 15:20:56 +08:00
o7si
daa242dfc8
common: fix return value check for setpriority (#18412)
* common: fix return value check for setpriority

* tools: add logging for process priority setting
2025-12-29 11:07:49 +02:00
Concedo
1f2c9f6b62 gpt4v not working correctly 2025-12-17 21:02:16 +08:00
Concedo
050a5b1f52 Merge commit '4aced7a631' into concedo_experimental
# Conflicts:
#	.devops/cann.Dockerfile
#	.devops/cpu.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/rocm.Dockerfile
#	.devops/tools.sh
#	.devops/vulkan.Dockerfile
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	.gitignore
#	docs/ops.md
#	docs/ops/SYCL.csv
#	examples/batched/batched.cpp
#	examples/eval-callback/eval-callback.cpp
#	examples/gen-docs/gen-docs.cpp
#	examples/lookahead/lookahead.cpp
#	examples/lookup/lookup-create.cpp
#	examples/lookup/lookup-stats.cpp
#	examples/lookup/lookup.cpp
#	examples/model-conversion/scripts/causal/compare-logits.py
#	examples/model-conversion/scripts/causal/run-org-model.py
#	examples/model-conversion/scripts/utils/check-nmse.py
#	examples/parallel/parallel.cpp
#	examples/retrieval/retrieval.cpp
#	examples/save-load-state/save-load-state.cpp
#	examples/speculative-simple/speculative-simple.cpp
#	examples/speculative/speculative.cpp
#	examples/training/finetune.cpp
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/repack.cpp
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/convert.cpp
#	ggml/src/ggml-sycl/dequantize.hpp
#	ggml/src/ggml-sycl/dpct/helper.hpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-sycl/pad.cpp
#	ggml/src/ggml-sycl/ssm_conv.cpp
#	ggml/src/ggml-sycl/vecdotq.hpp
#	pyrightconfig.json
#	scripts/sync-ggml.last
#	tests/test-arg-parser.cpp
#	tests/test-backend-ops.cpp
#	tools/cvector-generator/cvector-generator.cpp
#	tools/imatrix/imatrix.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/clip.cpp
#	tools/perplexity/perplexity.cpp
#	tools/server/README.md
2025-12-16 23:14:12 +08:00
Xuan-Son Nguyen
7b1db3d3b7
arg: clarify auto kvu/np being set on server (#17997)
* arg: clarify auto kvu/np being set on server

* improve docs

* use invalid_argument
2025-12-16 12:01:27 +01:00
Concedo
e88bf41fdc Merge commit '12280ae905' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	common/CMakeLists.txt
#	docs/docker.md
#	examples/model-conversion/scripts/causal/compare-logits.py
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	tests/test-backend-ops.cpp
#	tests/test-barrier.cpp
#	tools/server/CMakeLists.txt
#	tools/server/README.md
2025-12-16 16:29:01 +08:00
Georgi Gerganov
254098a279
common : refactor common_sampler + grammar logic changes (#17937)
* common : refactor common_sampler + grammar logic changes

* tests : increase max_tokens to get needed response

* batched : fix uninitialized samplers
2025-12-14 10:11:13 +02:00
Concedo
010995c967 Merge commit '4df6e859e9' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	ci/run.sh
#	examples/gen-docs/gen-docs.cpp
#	scripts/snapdragon/adb/run-cli.sh
#	tests/test-lora-conversion-inference.sh
#	tools/CMakeLists.txt
#	tools/completion/CMakeLists.txt
#	tools/completion/README.md
#	tools/server/CMakeLists.txt
2025-12-12 17:23:25 +08:00
Xuan-Son Nguyen
34a6d86982
cli: enable jinja by default (#17911)
* cli: enable jinja by default

* Update common/arg.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-10 22:19:42 +01:00
Xuan-Son Nguyen
6c2131773c
cli: new CLI experience (#17824)
* wip

* wip

* fix logging, add display info

* handle commands

* add args

* wip

* move old cli to llama-completion

* rm deprecation notice

* move server to a shared library

* move ci to llama-completion

* add loading animation

* add --show-timings arg

* add /read command, improve LOG_ERR

* add args for speculative decoding, enable show timings by default

* add arg --image and --audio

* fix windows build

* support reasoning_content

* fix llama2c workflow

* color default is auto

* fix merge conflicts

* properly fix color problem

Co-authored-by: bandoti <bandoti@users.noreply.github.com>

* better loading spinner

* make sure to clean color on force-exit

* also clear input files on "/clear"

* simplify common_log_flush

* add warning in mtmd-cli

* implement console writter

* fix data race

* add attribute

* fix llama-completion and mtmd-cli

* add some notes about console::log

* fix compilation

---------

Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00
Renamed from tools/main/main.cpp (Browse further)