wang jiahao
a1162eea01
Merge pull request #1158 from Creeper-MZ/function_call
...
Book-CI / test (push) Failing after 9s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Update Function call
2025-04-19 16:31:37 +08:00
Creeper-MZ
133ba746e9
优化提示词,解决部分Deepseek r1的兼容性
...
优化提示词,解决部分Deepseek r1的兼容性
fix non stream
2025-04-19 01:20:27 -04:00
wang jiahao
0892d37d2d
Merge pull request #1172 from kvcache-ai/move_create_sched
...
Book-CI / test (push) Failing after 4s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Move KV cache creation to balance_serve
2025-04-18 18:19:29 +08:00
qiyuxinlin
38e841900d
Move KV cache creation to balance_serve
2025-04-18 10:10:07 +00:00
Atream
e44c45e782
Merge pull request #1163 from cyhasuka/main
...
Enh: Make Ollama perf data more accurate, consistent with OpenAI's implementation
2025-04-18 00:50:58 -06:00
Atream
e6fb4d5a58
remove hard code max_length
2025-04-18 12:11:18 +08:00
Creeper-MZ
62c4023160
Fixed #1155
2025-04-17 10:21:51 -04:00
Creeper-MZ
4fb19bfcae
Update chat.py
2025-04-17 09:19:14 -04:00
Yuhao Tsui
8ce34b3b5c
Modify the performance calculation module
...
Modify the performance data calculation module from estimation to retrieving from `raw_usage`.
2025-04-17 16:57:53 +08:00
wang jiahao
6e4da83d4b
Merge pull request #978 from cyhasuka/main
...
Deploy / deploy (ubuntu-latest) (push) Failing after 3s
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Feat: Support Non-streaming chat in Ollama backend
2025-04-17 14:34:35 +08:00
wang jiahao
b055132369
Merge pull request #1154 from 344303947/features/add-function-calling
...
Fix the error caused by the client not passing temperature and top_p being empty
2025-04-17 14:31:02 +08:00
Creeper-MZ
cb266c98d4
Fix a bug
2025-04-16 23:31:33 -04:00
Creeper-MZ
6bc2e85343
Update chat.py
2025-04-16 15:54:23 -04:00
Creeper-MZ
88f688e2c8
更改token注入逻辑,减少token注入量,防止遗忘
...
Update chat.py
Update chat.py
Update chat.py
2025-04-16 15:52:24 -04:00
root
921061666c
fix some bugs
2025-04-17 00:48:09 +08:00
kevin
c8db24d5eb
Update config.py
...
Update config.py
2025-04-16 17:32:08 +08:00
sean.su
8699109129
Refactor the chat interface to support tool calling and parameter processing
...
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.
Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.
Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.
Updated ktransformers.py and transformers.py to support the passing of tool parameters.
Modified the default value of top_p in config.py to 1.0 to increase generation diversity.
Extended the message model in chat.py to support the transmission of tool call information.
These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
Creeper-MZ
a7e8d7c1af
updata function_call
2025-04-13 23:48:51 -04:00
wangkuigang-yewu-cmss
4538bdae97
prevent rpc process from crashing on long prompt
...
当prompt超过cache_len的时候,rpc进程会crash掉,导致整体不可用。
这里增加一个检查,让过长的prompt在请求早期就被提前过滤掉
2025-04-13 16:13:16 +08:00
Yuhao Tsui
877aec858e
Merge branch 'kvcache-ai:main' into main
2025-04-09 11:46:39 +08:00
Atream
3b9e16cec7
Update attention.py
2025-04-09 10:54:00 +08:00
qiyuxinlin
64de784328
format kvc2, delete quant_configs, move model_configs to ~/.ktransformers
2025-04-08 10:06:07 +00:00
Azure
77c6cc82ac
Merge pull request #1063 from aubreyli/KLinearCPUInfer.forward-fix
...
Fix TypeError when invoke KLinearCPUInfer.forward()
2025-04-07 15:10:46 +08:00
dongjw
ec03bcbd7f
fix temperature=0, flashinfer sample error
2025-04-07 12:30:47 +08:00
Aubrey Li
12a4c631df
Fix TypeError when invoke KLinearCPUInfer.forward()
...
Fix the following error:
File "/home/aubrey/work/ktransformers/ktransformers/operators/linear.py", line 825, in forward
y = self.generate_linear.forward(x, bsz_tensor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: KLinearCPUInfer.forward() takes 2 positional arguments but 3 were given
2025-04-07 12:03:35 +08:00
ZiWei Yuan
a5608dcb80
🔖 release v0.2.4post1
2025-04-04 16:01:25 +08:00
dongjw
be84d04253
Fix bug with non-base-multiple chunk_size, update test examples, and resolve issue with writing model_config. Hugging Face URL input is still unsupported.
2025-04-04 15:41:07 +08:00
liam
b151a98cab
🔧 update config.yaml setting default config
2025-04-03 11:55:50 +00:00
Atream
e36ddc36a8
Update modeling_deepseek_v3.py
2025-04-03 17:13:06 +08:00
Qin's repo
2c3a3a1e1c
slove [Bug] #1023
...
Only modified the mixed single and double quotes in server/config/config.py
2025-04-03 14:37:32 +08:00
dongjw
1b7672937b
update install doc and fix local_chat bug
2025-04-03 12:42:41 +08:00
dongjw
56a18ad02c
change tag v0.2.4
2025-04-01 21:07:13 +08:00
dongjw
5c7ed7b579
fix top_p = 0 bug
2025-04-01 20:38:33 +08:00
Azure-Tang
31677181c3
Fix ktransformers-server flashinfer wrapper position arg issue;
...
Fix db position issue
2025-04-01 07:30:23 +00:00
Azure-Tang
203b853c75
rm KMoEGateDeepSeekV3, fall back to KMoEGate
2025-04-01 07:13:05 +00:00
Azure-Tang
3a5330b215
Merge branch 'main' into work-concurrent
2025-04-01 06:48:19 +00:00
Atream
25cee5810e
add balance-serve, support concurrence
2025-03-31 22:55:32 +08:00
Atream
8d0292aa44
refactor folders
2025-03-31 22:45:37 +08:00
Yuhao Tsui
84164f584c
Update completions.py
2025-03-26 15:39:46 +08:00
Yuhao Tsui
52fa671c10
Merge branch 'kvcache-ai:main' into main
2025-03-26 11:06:00 +08:00
Aubrey Li
f4d52d1f0c
Restore CPU offloading capability
2025-03-21 10:04:31 +08:00
Jiaqi Liao
05f6cede37
Merge pull request #943 from SkqLiao/main
...
fix benchmark params for human eval benchmark
2025-03-20 18:49:34 +08:00
SkqLiao
6d4626a5d9
fix params
2025-03-20 18:48:51 +08:00
Atream
633af5d235
Update gate.py
2025-03-20 14:54:01 +08:00
SkqLiao
8cc4df980e
use DeepSeek V3 instead of R1 for benchmarking
2025-03-20 11:59:03 +08:00
Jiaqi Liao
32a91c78c1
Merge pull request #935 from SkqLiao/main
...
Fix benchmarking slow issue on self-hosted actions
2025-03-20 10:14:37 +08:00
SkqLiao
19c824f9d0
change cpu-infer due to actual cpu cores on self-hosted server.
2025-03-20 10:10:52 +08:00
Jiaqi Liao
649489dc67
Merge pull request #931 from SkqLiao/main
...
Add Human Eval Benchmark Test for CI/CD
2025-03-19 21:35:24 +08:00
SkqLiao
bc369b256c
add CI/CD for human eval score benchmarking
2025-03-19 21:25:21 +08:00
Atream
b453333f60
Update gate.py
2025-03-19 16:14:54 +08:00