Rotary Position Embeddings for Long Context Length

Dec 22, 2025 - 21:00
 0
Rotary Position Embeddings for Long Context Length
This article is divided into two parts; they are: • Simple RoPE • RoPE for Long Context Length Compared to the sinusoidal position embeddings in the original Transformer paper, RoPE mutates the input tensor using a rotation matrix: $$ \begin{aligned} X_{n,i} &= X_{n,i} \cos(n\theta_i) - X_{n,\frac{d}{2}+i} \sin(n\theta_i) \\ X_{n,\frac{d}{2}+i} &= X_{n,i} \sin(n\theta_i) + X_{n,\frac{d}{2}+i} \cos(n\theta_i) \end{aligned} $$ where $X_{n,i}$ is the $i$-th element of the vector at the $n$-th position of the sequence of tensor $X$.

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0