Concedo
|
f6ece6fd37
|
Merge branch 'upstream' into concedo_experimental
# Conflicts:
# .github/workflows/check-vendor.yml
# .github/workflows/close-issue.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/labeler.yml
# .github/workflows/pre-tokenizer-hashes.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/python-type-check.yml
# .github/workflows/server.yml
# .github/workflows/update-ops-docs.yml
# README.md
# docs/build.md
# examples/model-conversion/scripts/utils/perplexity-gen.sh
# examples/model-conversion/scripts/utils/perplexity-run-simple.sh
# examples/model-conversion/scripts/utils/perplexity-run.sh
# examples/model-conversion/scripts/utils/quantize.sh
# examples/model-conversion/scripts/utils/run-embedding-server.sh
# ggml/src/ggml-cpu/ggml-cpu.c
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-opencl/kernels/mul_mv_q6_k_f32.cl
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/compare-llama-bench.py
# tests/test-backend-ops.cpp
# tests/test-gguf.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
|
2026-01-27 23:06:13 +08:00 |
|
Georgi Gerganov
|
080b161995
|
completion : fix prompt cache for recurrent models (#19045)
|
2026-01-25 09:12:50 +02:00 |
|
Concedo
|
66ccf8f6b8
|
Merge commit 'f14f4e421b' into concedo_experimental
# Conflicts:
# .github/workflows/docker.yml
# AGENTS.md
# CONTRIBUTING.md
# docs/build.md
# examples/llama.android/app/build.gradle.kts
# examples/llama.android/app/src/main/java/com/example/llama/MainActivity.kt
# examples/llama.android/app/src/main/res/layout/activity_main.xml
# examples/llama.android/gradle/libs.versions.toml
# examples/llama.android/lib/src/main/cpp/ai_chat.cpp
# examples/llama.android/lib/src/main/java/com/arm/aichat/InferenceEngine.kt
# examples/llama.android/lib/src/main/java/com/arm/aichat/internal/InferenceEngineImpl.kt
# examples/model-conversion/scripts/causal/compare-embeddings-logits.sh
# examples/model-conversion/scripts/embedding/run-original-model.py
# examples/retrieval/retrieval.cpp
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/kleidiai/kernels.cpp
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-cuda/mmq.cu
# ggml/src/ggml-cuda/mmq.cuh
# src/CMakeLists.txt
# tools/llama-bench/llama-bench.cpp
# tools/server/CMakeLists.txt
|
2026-01-01 15:20:56 +08:00 |
|
o7si
|
daa242dfc8
|
common: fix return value check for setpriority (#18412)
* common: fix return value check for setpriority
* tools: add logging for process priority setting
|
2025-12-29 11:07:49 +02:00 |
|
Concedo
|
1f2c9f6b62
|
gpt4v not working correctly
|
2025-12-17 21:02:16 +08:00 |
|
Concedo
|
050a5b1f52
|
Merge commit '4aced7a631' into concedo_experimental
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/musa.Dockerfile
# .devops/rocm.Dockerfile
# .devops/tools.sh
# .devops/vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/release.yml
# .gitignore
# docs/ops.md
# docs/ops/SYCL.csv
# examples/batched/batched.cpp
# examples/eval-callback/eval-callback.cpp
# examples/gen-docs/gen-docs.cpp
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup-create.cpp
# examples/lookup/lookup-stats.cpp
# examples/lookup/lookup.cpp
# examples/model-conversion/scripts/causal/compare-logits.py
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/model-conversion/scripts/utils/check-nmse.py
# examples/parallel/parallel.cpp
# examples/retrieval/retrieval.cpp
# examples/save-load-state/save-load-state.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# examples/training/finetune.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/dpct/helper.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/pad.cpp
# ggml/src/ggml-sycl/ssm_conv.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# pyrightconfig.json
# scripts/sync-ggml.last
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
# tools/cvector-generator/cvector-generator.cpp
# tools/imatrix/imatrix.cpp
# tools/mtmd/CMakeLists.txt
# tools/mtmd/clip.cpp
# tools/perplexity/perplexity.cpp
# tools/server/README.md
|
2025-12-16 23:14:12 +08:00 |
|
Xuan-Son Nguyen
|
7b1db3d3b7
|
arg: clarify auto kvu/np being set on server (#17997)
* arg: clarify auto kvu/np being set on server
* improve docs
* use invalid_argument
|
2025-12-16 12:01:27 +01:00 |
|
Concedo
|
e88bf41fdc
|
Merge commit '12280ae905' into concedo_experimental
# Conflicts:
# .github/workflows/build.yml
# common/CMakeLists.txt
# docs/docker.md
# examples/model-conversion/scripts/causal/compare-logits.py
# ggml/src/ggml-hexagon/htp/rope-ops.c
# tests/test-backend-ops.cpp
# tests/test-barrier.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
|
2025-12-16 16:29:01 +08:00 |
|
Georgi Gerganov
|
254098a279
|
common : refactor common_sampler + grammar logic changes (#17937)
* common : refactor common_sampler + grammar logic changes
* tests : increase max_tokens to get needed response
* batched : fix uninitialized samplers
|
2025-12-14 10:11:13 +02:00 |
|
Concedo
|
010995c967
|
Merge commit '4df6e859e9' into concedo_experimental
# Conflicts:
# .github/workflows/build.yml
# README.md
# ci/run.sh
# examples/gen-docs/gen-docs.cpp
# scripts/snapdragon/adb/run-cli.sh
# tests/test-lora-conversion-inference.sh
# tools/CMakeLists.txt
# tools/completion/CMakeLists.txt
# tools/completion/README.md
# tools/server/CMakeLists.txt
|
2025-12-12 17:23:25 +08:00 |
|
Xuan-Son Nguyen
|
34a6d86982
|
cli: enable jinja by default (#17911)
* cli: enable jinja by default
* Update common/arg.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
2025-12-10 22:19:42 +01:00 |
|
Xuan-Son Nguyen
|
6c2131773c
|
cli: new CLI experience (#17824)
* wip
* wip
* fix logging, add display info
* handle commands
* add args
* wip
* move old cli to llama-completion
* rm deprecation notice
* move server to a shared library
* move ci to llama-completion
* add loading animation
* add --show-timings arg
* add /read command, improve LOG_ERR
* add args for speculative decoding, enable show timings by default
* add arg --image and --audio
* fix windows build
* support reasoning_content
* fix llama2c workflow
* color default is auto
* fix merge conflicts
* properly fix color problem
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
* better loading spinner
* make sure to clean color on force-exit
* also clear input files on "/clear"
* simplify common_log_flush
* add warning in mtmd-cli
* implement console writter
* fix data race
* add attribute
* fix llama-completion and mtmd-cli
* add some notes about console::log
* fix compilation
---------
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
|
2025-12-10 15:28:59 +01:00 |
|