vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-10 23:34:35 +00:00

Author	SHA1	Message	Date
wang jiahao	a1162eea01	Merge pull request #1158 from Creeper-MZ/function_call Some checks failed Book-CI / test (push) Failing after 9s Details Deploy / deploy (ubuntu-latest) (push) Failing after 2s Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Update Function call	2025-04-19 16:31:37 +08:00
Creeper-MZ	133ba746e9	优化提示词，解决部分Deepseek r1的兼容性优化提示词，解决部分Deepseek r1的兼容性 fix non stream	2025-04-19 01:20:27 -04:00
wang jiahao	0892d37d2d	Merge pull request #1172 from kvcache-ai/move_create_sched Some checks failed Book-CI / test (push) Failing after 4s Details Deploy / deploy (ubuntu-latest) (push) Failing after 2s Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Move KV cache creation to balance_serve	2025-04-18 18:19:29 +08:00
qiyuxinlin	38e841900d	Move KV cache creation to balance_serve	2025-04-18 10:10:07 +00:00
Atream	e44c45e782	Merge pull request #1163 from cyhasuka/main Enh: Make Ollama perf data more accurate, consistent with OpenAI's implementation	2025-04-18 00:50:58 -06:00
Atream	e6fb4d5a58	remove hard code max_length	2025-04-18 12:11:18 +08:00
Creeper-MZ	62c4023160	Fixed #1155	2025-04-17 10:21:51 -04:00
Creeper-MZ	4fb19bfcae	Update chat.py	2025-04-17 09:19:14 -04:00
Yuhao Tsui	8ce34b3b5c	Modify the performance calculation module Modify the performance data calculation module from estimation to retrieving from `raw_usage`.	2025-04-17 16:57:53 +08:00
wang jiahao	6e4da83d4b	Merge pull request #978 from cyhasuka/main Some checks failed Deploy / deploy (ubuntu-latest) (push) Failing after 3s Details Book-CI / test (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Feat: Support Non-streaming chat in Ollama backend	2025-04-17 14:34:35 +08:00
wang jiahao	b055132369	Merge pull request #1154 from 344303947/features/add-function-calling Fix the error caused by the client not passing temperature and top_p being empty	2025-04-17 14:31:02 +08:00
Creeper-MZ	cb266c98d4	Fix a bug	2025-04-16 23:31:33 -04:00
Creeper-MZ	6bc2e85343	Update chat.py	2025-04-16 15:54:23 -04:00
Creeper-MZ	88f688e2c8	更改token注入逻辑，减少token注入量，防止遗忘 Update chat.py Update chat.py Update chat.py	2025-04-16 15:52:24 -04:00
root	921061666c	fix some bugs	2025-04-17 00:48:09 +08:00
kevin	c8db24d5eb	Update config.py Update config.py	2025-04-16 17:32:08 +08:00
sean.su	8699109129	Refactor the chat interface to support tool calling and parameter processing Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling. Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations. Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases. Updated ktransformers.py and transformers.py to support the passing of tool parameters. Modified the default value of top_p in config.py to 1.0 to increase generation diversity. Extended the message model in chat.py to support the transmission of tool call information. These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.	2025-04-14 15:23:37 +08:00
Creeper-MZ	a7e8d7c1af	updata function_call	2025-04-13 23:48:51 -04:00
wangkuigang-yewu-cmss	4538bdae97	prevent rpc process from crashing on long prompt 当prompt超过cache_len的时候，rpc进程会crash掉，导致整体不可用。这里增加一个检查，让过长的prompt在请求早期就被提前过滤掉	2025-04-13 16:13:16 +08:00
Yuhao Tsui	877aec858e	Merge branch 'kvcache-ai:main' into main	2025-04-09 11:46:39 +08:00
Atream	3b9e16cec7	Update attention.py	2025-04-09 10:54:00 +08:00
qiyuxinlin	64de784328	format kvc2, delete quant_configs, move model_configs to ~/.ktransformers	2025-04-08 10:06:07 +00:00
Azure	77c6cc82ac	Merge pull request #1063 from aubreyli/KLinearCPUInfer.forward-fix Fix TypeError when invoke KLinearCPUInfer.forward()	2025-04-07 15:10:46 +08:00
dongjw	ec03bcbd7f	fix temperature=0, flashinfer sample error	2025-04-07 12:30:47 +08:00
Aubrey Li	12a4c631df	Fix TypeError when invoke KLinearCPUInfer.forward() Fix the following error: File "/home/aubrey/work/ktransformers/ktransformers/operators/linear.py", line 825, in forward y = self.generate_linear.forward(x, bsz_tensor) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: KLinearCPUInfer.forward() takes 2 positional arguments but 3 were given	2025-04-07 12:03:35 +08:00
ZiWei Yuan	a5608dcb80	🔖 release v0.2.4post1	2025-04-04 16:01:25 +08:00
dongjw	be84d04253	Fix bug with non-base-multiple chunk_size, update test examples, and resolve issue with writing model_config. Hugging Face URL input is still unsupported.	2025-04-04 15:41:07 +08:00
liam	b151a98cab	🔧 update config.yaml setting default config	2025-04-03 11:55:50 +00:00
Atream	e36ddc36a8	Update modeling_deepseek_v3.py	2025-04-03 17:13:06 +08:00
Qin's repo	2c3a3a1e1c	slove [Bug] #1023 Only modified the mixed single and double quotes in server/config/config.py	2025-04-03 14:37:32 +08:00
dongjw	1b7672937b	update install doc and fix local_chat bug	2025-04-03 12:42:41 +08:00
dongjw	56a18ad02c	change tag v0.2.4	2025-04-01 21:07:13 +08:00
dongjw	5c7ed7b579	fix top_p = 0 bug	2025-04-01 20:38:33 +08:00
Azure-Tang	31677181c3	Fix ktransformers-server flashinfer wrapper position arg issue; Fix db position issue	2025-04-01 07:30:23 +00:00
Azure-Tang	203b853c75	rm KMoEGateDeepSeekV3, fall back to KMoEGate	2025-04-01 07:13:05 +00:00
Azure-Tang	3a5330b215	Merge branch 'main' into work-concurrent	2025-04-01 06:48:19 +00:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Atream	8d0292aa44	refactor folders	2025-03-31 22:45:37 +08:00
Yuhao Tsui	84164f584c	Update completions.py	2025-03-26 15:39:46 +08:00
Yuhao Tsui	52fa671c10	Merge branch 'kvcache-ai:main' into main	2025-03-26 11:06:00 +08:00
Aubrey Li	f4d52d1f0c	Restore CPU offloading capability	2025-03-21 10:04:31 +08:00
Jiaqi Liao	05f6cede37	Merge pull request #943 from SkqLiao/main fix benchmark params for human eval benchmark	2025-03-20 18:49:34 +08:00
SkqLiao	6d4626a5d9	fix params	2025-03-20 18:48:51 +08:00
Atream	633af5d235	Update gate.py	2025-03-20 14:54:01 +08:00
SkqLiao	8cc4df980e	use DeepSeek V3 instead of R1 for benchmarking	2025-03-20 11:59:03 +08:00
Jiaqi Liao	32a91c78c1	Merge pull request #935 from SkqLiao/main Fix benchmarking slow issue on self-hosted actions	2025-03-20 10:14:37 +08:00
SkqLiao	19c824f9d0	change cpu-infer due to actual cpu cores on self-hosted server.	2025-03-20 10:10:52 +08:00
Jiaqi Liao	649489dc67	Merge pull request #931 from SkqLiao/main Add Human Eval Benchmark Test for CI/CD	2025-03-19 21:35:24 +08:00
SkqLiao	bc369b256c	add CI/CD for human eval score benchmarking	2025-03-19 21:25:21 +08:00
Atream	b453333f60	Update gate.py	2025-03-19 16:14:54 +08:00

1 2 3 4 5 ...

323 commits