* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Add Arcee AFM support
* Add draft update code
* Fix linter and update URL, may still not be final
* Update src/llama-model.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Remote accidental blank line
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Adds:
* Dots1Model to convert_hf_to_gguf.py
* Computation graph code to llama-model.cpp
* Chat template to llama-chat.cpp to detect this model's template.
---
The model is called "dots.llm1" (I decided to shorten it to dots1 or
DOTS1 in the code generally) architecture.
The only models that exist as of writing of this commit that follow this
architecture are "dots.llm1.inst" and "dots.llm1.base" from here:
* https://huggingface.co/rednote-hilab/dots.llm1.inst
* https://huggingface.co/rednote-hilab/dots.llm1.base
The model architecture is a combination of Qwen and Deepseek parts, as
seen here:
ffe12627b4/src/transformers/models/dots1/modular_dots1.py
Currently when a model generates output which looks like a tool call,
but is invalid an exception is thrown and not handled, causing the cli
or llama-server to bail. Instead, handle the chat parser exception and
simply return the generated text in such cases.
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
* compare llama-bench: add option to plot
* Address review comments: convert case + add type hints
* Add matplotlib to requirements
* fix tests
* Improve comment and fix assert condition for test
* Add back default test_name, add --plot_log_scale
* use log_scale regardless of x_values
Squashed commits:
[fe14845cc] Allow override config for gguf files when reloading in admin mode, updated lite (+2 squashed commit)
Squashed commit:
[9ded66aa5] Allow override config for gguf files when reloading in admin mode
[9597f6a34] update lite
there is so much duplicate code in each cpu arch, i expect upstream will prune it eventually
arch detection has no fallback if all the arches are not found, by right we should set GGML_CPU_GENERIC
i should be relaxing its the weekend
* refactor image gen configuration screen
* make image size limit configurable
* fix resolution limits and keep dimensions closer to the original ratio
* use 0.0 for the configured default image size limit
This prevents the current default value from being saved into the
config files, in case we later decide to adopt a different value.
* export image model version when loading
* restore model-specific default image size limit
* change the image area restriction to be specified by a square side
* move image resolution limits down to the C++ level
* Revert "export image model version when loading"
This reverts commit fa65b23de3.
* Linting Fixes:
PY:
- Inconsistent var name sd_restrict_square -> sd_restrict_square_var
- GUI swap back to using absolute row numbers for now.
- fstring fix
- size_limit -> side_limit inconsistency
C++:
- roundup_64 standalone function
- refactor sd_fix_resolution variable names for clarity
- move "anti crashing" hard total megapixel limit always to be applied after soft total megapixel limit instead of conditionally only when sd_restrict_square is unset
* allow unsafe resolutions if debugmode is on
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>