Modified RoPE with linear scaling

When the context size is greater than the maximum context size during training, scale the position given to RoPE with trainign context / n_ctx.
2025-09-10 17:14:36 +00:00 · 2023-06-27 15:00:22 +03:00 · 2023-06-27 15:00:22 +03:00 · cda30038e4
commit cda30038e4
parent 0be54f75a6
6 changed files with 34 additions and 3 deletions
--- a/llama.cpp
+++ b/llama.cpp
@ -1491,11 +1491,11 @@ static bool llama_eval_internal(
            offload_func_kq(tmpq);
            ggml_set_name(tmpq, "tmpq");

-            struct ggml_tensor * Kcur = ggml_rope_inplace(ctx0, ggml_reshape_3d(ctx0, tmpk, n_embd/n_head, n_head, N), n_past, n_rot, 0, 0);
+            struct ggml_tensor * Kcur = ggml_rope_inplace(ctx0, ggml_reshape_3d(ctx0, tmpk, n_embd/n_head, n_head, N), n_past, n_rot, 0, n_ctx);
            offload_func_kq(Kcur);
            ggml_set_name(Kcur, "Kcur");

-            struct ggml_tensor * Qcur = ggml_rope_inplace(ctx0, ggml_reshape_3d(ctx0, tmpq, n_embd/n_head, n_head, N), n_past, n_rot, 0, 0);
+            struct ggml_tensor * Qcur = ggml_rope_inplace(ctx0, ggml_reshape_3d(ctx0, tmpq, n_embd/n_head, n_head, N), n_past, n_rot, 0, n_ctx);
            offload_func_kq(Qcur);
            ggml_set_name(Qcur, "Qcur");