ggml : add ggml_set_rows #14274

rgerganov · 2025-06-19T08:10:14Z

Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using indices from 'c'.

ggerganov · 2025-06-19T16:33:07Z

So far so good: #14285

I think the ggml_set_rows() alone could be a very useful addition since this mechanism can make the llama_kv_cache_unified::find_slot() to search not just for continuous slots of KV cells, but effectively be able to "scatter" the ubatch. This would be a useful improvement, regardless if the graph reuse works or not, so I think we should proceed to implement this operator.

ggml/src/ggml.c

ggerganov · 2025-06-20T06:18:27Z

ggml/src/ggml-cpu/ops.cpp

+    switch (src0->type) {
+        case GGML_TYPE_F32:
+            {
+                ggml_compute_forward_set_rows_f32(params, dst);


We should aim to reuse the existing cpy/dup code in order to support F32 -> any type, not just F16.

Do we need support for quantized data types or just F16,F32,BF16?

We should support all possible KV cache quantization types:

llama.cpp/common/arg.cpp

Lines 819 to 830 in 28ee6d2

const std::vector<ggml_type> kv_cache_types = {

GGML_TYPE_F32,

GGML_TYPE_F16,

GGML_TYPE_BF16,

GGML_TYPE_Q8_0,

GGML_TYPE_Q4_0,

GGML_TYPE_Q4_1,

GGML_TYPE_IQ4_NL,

GGML_TYPE_Q5_0,

GGML_TYPE_Q5_1,

};

Ideally, it should work for all ggml types that also work with ggml_cpy.

ggerganov · 2025-06-20T17:08:55Z

ggml/src/ggml.c

+struct ggml_tensor * ggml_set_rows(
+        struct ggml_context * ctx,
+        struct ggml_tensor  * a,
+        struct ggml_tensor  * b,
+        struct ggml_tensor  * c) {
+    GGML_ASSERT(b->ne[2] == c->ne[1]);
+    GGML_ASSERT(c->ne[3] == 1);
+    GGML_ASSERT(a->type == GGML_TYPE_F16);
+    GGML_ASSERT(b->type == GGML_TYPE_F32);
+    GGML_ASSERT(c->type == GGML_TYPE_I64);


We might want to allow broadcasting c into b. It would avoid this ggml_repeat_4d here:

llama.cpp/src/llama-kv-cache-unified.cpp

Lines 795 to 799 in a0c0fb6

v_cur = ggml_cont_3d(ctx, v_cur, 1, v_cur->ne[0], v_cur->ne[1]);

kv_idxs = ggml_repeat_4d(ctx, kv_idxs, v_cur->ne[1], v_cur->ne[2], 1, 1);

return ggml_set_rows(ctx, v_view, v_cur, kv_idxs);

Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using indices from 'c'. ref: ggml-org#8366

rgerganov mentioned this pull request Jun 19, 2025

ggml: avoid rebuild of GGML graph for each token (#7456) #8366

Draft

4 tasks

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 19, 2025

ggerganov mentioned this pull request Jun 19, 2025

kv-cache : use ggml_set_rows #14285

Open

4 tasks

ggerganov reviewed Jun 19, 2025

View reviewed changes

ggml/src/ggml.c Outdated Show resolved Hide resolved

ggerganov reviewed Jun 20, 2025

View reviewed changes

rgerganov added 2 commits June 21, 2025 08:43

ggml : add ggml_set_rows

990ee1b

Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using indices from 'c'. ref: ggml-org#8366

use I64 for indices

70e3d27

rgerganov force-pushed the ggml-set-rows branch from 45d846a to 70e3d27 Compare June 21, 2025 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : add ggml_set_rows #14274

ggml : add ggml_set_rows #14274

Uh oh!

rgerganov commented Jun 19, 2025

Uh oh!

ggerganov commented Jun 19, 2025

Uh oh!

Uh oh!

ggerganov Jun 20, 2025

Uh oh!

rgerganov Jun 20, 2025

Uh oh!

ggerganov Jun 20, 2025

Uh oh!

ggerganov Jun 20, 2025

Uh oh!

Uh oh!

	const std::vector<ggml_type> kv_cache_types = {
	GGML_TYPE_F32,
	GGML_TYPE_F16,
	GGML_TYPE_BF16,
	GGML_TYPE_Q8_0,
	GGML_TYPE_Q4_0,
	GGML_TYPE_Q4_1,
	GGML_TYPE_IQ4_NL,
	GGML_TYPE_Q5_0,
	GGML_TYPE_Q5_1,
	};

	v_cur = ggml_cont_3d(ctx, v_cur, 1, v_cur->ne[0], v_cur->ne[1]);

	kv_idxs = ggml_repeat_4d(ctx, kv_idxs, v_cur->ne[1], v_cur->ne[2], 1, 1);

	return ggml_set_rows(ctx, v_view, v_cur, kv_idxs);

ggml : add ggml_set_rows #14274

Are you sure you want to change the base?

ggml : add ggml_set_rows #14274

Uh oh!

Conversation

rgerganov commented Jun 19, 2025

Uh oh!

ggerganov commented Jun 19, 2025

Uh oh!

Uh oh!

ggerganov Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

rgerganov Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!