From 74d4979aa57b0af1e3a4c6e0ecf7fbe8e5ec79b5 Mon Sep 17 00:00:00 2001 From: phantomfjh Date: Mon, 14 Jul 2025 15:21:10 +0800 Subject: [PATCH] Fix inaccurate description of RoPE paper findings - Corrected the description to clarify that the paper shows removing (not rotating) lowest frequencies improves performance - The original paper (arXiv:2410.06205) proposes p-RoPE which removes lowest frequencies, not rotates them - This addresses the misleading summary that suggested rotation of low frequencies was beneficial --- designing-positional-encoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/designing-positional-encoding.md b/designing-positional-encoding.md index 72b939e6e6..2ee9c9c71d 100644 --- a/designing-positional-encoding.md +++ b/designing-positional-encoding.md @@ -406,7 +406,7 @@ of values from our input vector. For \\(2D\\) data, we need to encode both horiz ## The future of positional encoding -Is RoPE the final incarnation of positional encoding? This [recent paper](https://arxiv.org/pdf/2410.06205) from DeepMind deeply analyses RoPE and highlights some fundamental problems. TLDR: RoPE isn't a perfect solution, and the models mostly focus on the lower frequencies and the rotation for a certain percent of low frequencies improves performance on Gemma 2B! +Is RoPE the final incarnation of positional encoding? This [recent paper](https://arxiv.org/pdf/2410.06205) from DeepMind deeply analyses RoPE and highlights some fundamental problems. TLDR: RoPE isn't a perfect solution, and the models mostly focus on the lower frequencies, but the paper shows that **removing** (not rotating) the lowest frequencies improves performance on Gemma 2B! I anticipate some future breakthroughs, perhaps taking inspiration from signal processing with ideas like wavelets or hierarchical implementations. As models