Commit graph

  • afcda09d15
    vocab : fix HybridDNA tokenizer (#23466) upstream Kashif Rasul 2026-05-22 11:17:31 +02:00
  • bbce619adb
    cmake : add install() for impl libraries + fix apple builds (#23511) Georgi Gerganov 2026-05-22 11:46:26 +03:00
  • de6b8f9369 increase ctx slider granularity concedo_experimental Concedo 2026-05-22 16:17:54 +08:00
  • 7e754dc7f0
    Merge e7f386ceb6 into 718dc159b6 Wagner Bruna 2026-05-21 21:36:46 -03:00
  • e7f386ceb6 sd: sync to master-642-3a8788c Wagner Bruna 2026-05-21 20:46:14 -03:00
  • f27795cef0 sd: sync to master-637-ef92a00 Wagner Bruna 2026-05-20 22:42:01 -03:00
  • 627e317cd7 sd: sync to master-633-5b0267e Wagner Bruna 2026-05-19 23:37:17 -03:00
  • 4b8d631da8 sd: sync to master-621-baf7eda Wagner Bruna 2026-05-17 19:45:44 -03:00
  • 4f0e43da6f
    CUDA: fix PDL CC check for JIT compilation (#23471) Johannes Gäßler 2026-05-21 23:35:29 +02:00
  • bb28c1fe24
    cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by default (#23462) Georgi Gerganov 2026-05-21 21:13:59 +03:00
  • ee7c30578a
    Update WebGPU support and add link to blog/demo (#23483) Reese Levine 2026-05-21 11:00:27 -07:00
  • 47c0eda9d4
    vulkan: fuse snake activation (mul, sin, sqr, mul, add) (#22855) Pascal 2026-05-21 19:39:42 +02:00
  • 718dc159b6 Merge branch 'upstream' into concedo_experimental Concedo 2026-05-21 23:47:21 +08:00
  • 54af9aada9 Merge commit 'e6b4acfe86' into concedo_experimental Concedo 2026-05-21 23:31:32 +08:00
  • 5306f4b3b5
    fix(flash-attn): replace f32 with kv_type and q_type (#23372) Chen Yuan 2026-05-21 10:58:49 -04:00
  • 2451feaf69 an easy way to toggle thinking for jinja Concedo 2026-05-21 22:45:33 +08:00
  • 40d5358d3c
    tests : move save-load-state from examples to tests (#23336) Georgi Gerganov 2026-05-21 14:41:50 +03:00
  • b65bb4baae
    server: expose prompt token counts in /slots endpoint (#23454) ScrewTSW 2026-05-21 13:29:13 +02:00
  • a1a69f777a
    metal : optimize concat kernel and fix set kernel threads (#23411) Georgi Gerganov 2026-05-21 13:34:08 +03:00
  • e8bf5b9c6c fixed a potential vuln with onready when combined with admin Concedo 2026-05-21 16:11:28 +08:00
  • 52fb93a2bd
    server : free draft/MTP resources on sleep to fix VRAM leak (#23461) Aman Gupta 2026-05-21 16:11:11 +08:00
  • c9021714e8
    server: re-inject subcommand when router spawns children under unified binary (#23442) Pascal 2026-05-21 10:09:19 +02:00
  • 1d7ab2b947
    app : add batched-bench, fit-params, quantize & perplexity (#23459) Adrien Gallouët 2026-05-21 09:29:44 +02:00
  • 12e5d99078
    mtp: use inp_out_ids for skipping logit computation (#23433) Aman Gupta 2026-05-21 15:23:14 +08:00
  • 7ea23ddf7b
    vocab : add Carbon-3B (HybridDNATokenizer) support (#23410) Kashif Rasul 2026-05-21 08:34:32 +02:00
  • 2fc8d1851e
    doc: fix spec mtp typo (#23435) Ruixiang Wang 2026-05-21 08:30:55 +02:00
  • 5e932a1c8d
    ui: Improve Git Hooks for UI development (#23403) Aleksander Grygier 2026-05-21 08:27:50 +02:00
  • 2754ce1b3e
    ggml : Check the right iface method before using the fallback 2d get (#23306) Matt Corallo 2026-05-21 06:24:40 +00:00
  • eeeaf6180b
    llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (#23131) Daniel Elliott 2026-05-20 23:20:51 -07:00
  • 0be84685bd
    hexagon: ssm-conv fix for large prompts (#23307) Todor Boinovski 2026-05-20 22:14:13 -07:00
  • ce02093fdd
    app : show version (#23426) Adrien Gallouët 2026-05-21 06:21:13 +02:00
  • 471989a272
    Merge a2b8dd1614 into f85a747dc0 Wagner Bruna 2026-05-21 11:52:04 +08:00
  • f85a747dc0
    sd: add backend support for max_vram (#2221) Wagner Bruna 2026-05-21 00:51:00 -03:00
  • 6a257d4463
    mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329) wendadawen 2026-05-21 06:35:37 +08:00
  • 3a479c9132
    ui: Add max image size option (#22849) stduhpf 2026-05-21 00:00:09 +02:00
  • ad27757261
    Move to backend sampling for MTP draft path (#23287) Gaurav Garg 2026-05-20 22:34:45 +05:30
  • 3a6db741a8
    opencl: refactor backend initilization (#23318) lhez 2026-05-20 09:57:36 -07:00
  • 510b5c2a35
    common/speculative : fix nullptr crash in get_devices_str (#23386) Georgi Gerganov 2026-05-20 19:44:30 +03:00
  • a8681a0ed2
    mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor (#23345) Saba Fallah 2026-05-20 17:37:10 +02:00
  • 095bf63b58 prep for rpc Concedo 2026-05-20 23:29:49 +08:00
  • acd604fb27
    vulkan: optimize operations in the IM2COL shader (#22685) Daniele 2026-05-20 17:15:13 +02:00
  • 6ce96713de
    feat: Add WAV MIME type variants and improve audio format detection (#23396) Aleksander Grygier 2026-05-20 16:55:24 +02:00
  • c9872a2575
    hexagon: HMX quantized matmul rework (#23368) Max Krasnyansky 2026-05-20 07:39:01 -07:00
  • e947228222
    Programmatic Dependent Launch (PDL) for more performance on newer NVIDIA GPUs (Hopper+) (#22522) Andreas Kieslinger 2026-05-20 13:59:02 +02:00
  • 29f1482221
    app : introduce the llama unified executable (#23296) Adrien Gallouët 2026-05-20 13:22:22 +02:00
  • e6b4acfe86
    refactor: Move text attachments up before the message content in chat completions payload (#23406) Aleksander Grygier 2026-05-20 13:04:01 +02:00
  • 7d987af23a Merge branch 'upstream' into concedo_experimental Concedo 2026-05-20 18:48:34 +08:00
  • d5ac109874 sd: add backend support for max_vram Wagner Bruna 2026-05-20 07:24:32 -03:00
  • e2b129e1bf
    mtmd: fit_params now take into account mmproj (#21489) Xuan-Son Nguyen 2026-05-20 11:27:44 +02:00
  • 7e50ef7d79
    docker : copy conversion files (#23370) Sigbjørn Skjæret 2026-05-20 11:03:18 +02:00
  • 5028447384
    ui: Refactor isMobile as reactive value in viewport store (#23330) Aleksander Grygier 2026-05-20 10:52:00 +02:00
  • 585080d310
    fix: Div wrapper no pointer events on hidden (#23390) Aleksander Grygier 2026-05-20 09:46:31 +02:00
  • 643862ac7d
    sd: device selection fixes and improvements (#2220) Wagner Bruna 2026-05-20 04:31:19 -03:00
  • b909b32135 remove off by 1 offset for sd Concedo 2026-05-20 15:23:56 +08:00
  • 57ebaf4edd
    metal : optimize pad + cpy (#23354) Georgi Gerganov 2026-05-20 09:42:00 +03:00
  • 871b0b70f8
    snapdragon: update toolchain to v0.6 (#23369) Max Krasnyansky 2026-05-19 22:04:04 -07:00
  • b39a7bf1b0
    ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps (#23349) ravel7524 2026-05-20 03:52:21 +02:00
  • b28a2f372a
    opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (#23303) shaofeiqi 2026-05-19 14:29:00 -07:00
  • 17d22a35b2
    hexagon: add MROPE and IMROPE support in HTP rope op (#23317) Aparna M P 2026-05-20 02:40:13 +05:30
  • 773f9c5ba7 sd: change sdmaingpu control to match other sd devices Wagner Bruna 2026-05-19 17:52:07 -03:00
  • 67ace021da
    refactor: Chat Screen UI rendering (#23333) Aleksander Grygier 2026-05-19 22:38:42 +02:00
  • a8078675a6
    github: mention --log-file in issue templates (#23277) Johannes Gäßler 2026-05-19 21:35:10 +02:00
  • 57cb35c886
    common: fix --help for --verbosity (#23278) Johannes Gäßler 2026-05-19 21:34:04 +02:00
  • 7256fce047
    common: fix --fit verbosity with --verbosity 4 (#23282) Johannes Gäßler 2026-05-19 21:33:23 +02:00
  • b7393a4d19
    convert : update mtp related help (#23334) Sigbjørn Skjæret 2026-05-19 21:16:58 +02:00
  • da1c06c33d sd: cleanup and document sd device mappings Wagner Bruna 2026-05-19 13:32:25 -03:00
  • cf3d374f14 sd: remove device offset for zero-based indexes Wagner Bruna 2026-05-19 13:31:32 -03:00
  • ac76808e4d
    hexagon: enable support for NORM op (#23319) Aparna M P 2026-05-19 22:18:21 +05:30
  • baf3cc6e1d
    model : clarify MTP layer comment in qwen35.cpp [no ci] (#23338) Daniel Bevenius 2026-05-19 18:41:44 +02:00
  • c95d25abe2 gui visual breaking change - make GPU ID list 0-based index instead of 1-based index to match CLI Concedo 2026-05-19 22:00:43 +08:00
  • 592d12d0a3
    sd: support for CLIP and VAE on different devices (#2184) Wagner Bruna 2026-05-19 10:51:23 -03:00
  • c7b3a07034 add deprecated flags to avoid breaking old cli args Concedo 2026-05-19 21:46:58 +08:00
  • 7232096c11 Merge branch 'concedo' into concedo_experimental Concedo 2026-05-19 20:35:11 +08:00
  • 8bdef0bcfe hotfix 1.113.2 bugs concedo v1.113.2 Concedo 2026-05-19 20:33:57 +08:00
  • d14ce3dab4
    llama : MTP clean-up (#23269) Georgi Gerganov 2026-05-19 15:32:58 +03:00
  • 6db130445d
    ui: Bump packages + address build warnings (#23300) Aleksander Grygier 2026-05-19 10:16:04 +02:00
  • 4b262ab662
    ci : install libssl-dev (#23325) Sigbjørn Skjæret 2026-05-19 10:11:04 +02:00
  • 00c461ce1a
    ci : install server kleidiai runner dependencies (#23259) Sigbjørn Skjæret 2026-05-19 09:06:56 +02:00
  • ccee426426
    server-context: guarantee there is at least 1 token to decode (#23280) Pascal 2026-05-19 08:49:01 +02:00
  • 3c81c8deea
    server : print graphs reused in slot timings (#23279) Georgi Gerganov 2026-05-19 09:46:58 +03:00
  • cd963fee6a
    save-load-state : refactor tests and improve readability (#23196) Georgi Gerganov 2026-05-19 09:46:34 +03:00
  • d2e179a477
    llama-eval : add per-task summary stats (#23151) Georgi Gerganov 2026-05-19 09:46:05 +03:00
  • c85a242ed0
    ggml-webgpu : extend GDN for K>1 (#23299) Reese Levine 2026-05-18 23:45:41 -07:00
  • aabee047d8
    [SCYL] add chapter for performance reference in SYCL.md (#23315) Neo Zhang 2026-05-19 14:44:51 +08:00
  • f1c1c5c057
    convert : filter lora tensor names (#23077) Sigbjørn Skjæret 2026-05-19 08:44:25 +02:00
  • 439f1b193d
    sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle (#22153) Intel AI Get-to Market Customer Success and Solutions 2026-05-18 23:44:02 -07:00
  • c3e9ade6dd
    rpc : keep last_graph_uid in the device context (#23273) Radoslav Gerganov 2026-05-19 09:42:36 +03:00
  • 9a532ae4ba
    hexagon: add support for TRI op (#22822) Pranav Dhinakar 2026-05-18 14:04:57 -07:00
  • b7340443d4
    ggml-hexagon: add PAD op HVX kernel (#23078) Pranav Dhinakar 2026-05-18 13:39:36 -07:00
  • 5cbaa5e69e
    docker : add OCI image labels for version and build date (#21653) SamareshSingh 2026-05-18 15:14:45 -05:00
  • 45b455e66f
    common : remove hf cache migration (#23266) Adrien Gallouët 2026-05-18 17:11:47 +02:00
  • 712ee6be64 try fix recent segfault on SIGINT https://github.com/LostRuins/koboldcpp/issues/2215 Concedo 2026-05-18 22:37:14 +08:00
  • 3a9c1b854d
    ui: Update KaTeX package and clean up logs from sass warnings (#23275) Aleksander Grygier 2026-05-18 16:26:01 +02:00
  • 7e08e8d8b4 add some rpc dependencies (+1 squashed commits) Concedo 2026-05-18 22:03:50 +08:00
  • b9a2170fce
    feat: add scroll-to-bottom button to chat + prevent forced scroll down (#23270) Aleksander Grygier 2026-05-18 16:17:21 +02:00
  • 1ff0fc1384
    ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG (#23236) Aleksander Grygier 2026-05-18 16:09:40 +02:00
  • a2b8dd1614 sd: adjust VAE tile size according to sdtiledvae Wagner Bruna 2026-05-14 21:01:23 -03:00
  • 3b6678afc9 sd: frontend support for multi-device selection Wagner Bruna 2026-05-16 18:10:24 -03:00
  • 4c2f145392 sd: backend support for multi-device selection Wagner Bruna 2026-05-16 17:53:28 -03:00
  • 8ab6d5db31 sd: generalize internal interfaces to place generation on CPU Wagner Bruna 2026-05-03 08:16:12 -03:00