Rotary Position Embeddings for Long Context Length

1Hosting

Dec 22, 2025 - 21:00

Rotary Position Embeddings for Long Context Length

This article is divided into two parts; they are: • Simple RoPE • RoPE for Long Context Length Compared to the sinusoidal position embeddings in the original Transformer paper, RoPE mutates the input tensor using a rotation matrix: $$ \begin{aligned} X_{n,i} &= X_{n,i} \cos(n\theta_i) - X_{n,\frac{d}{2}+i} \sin(n\theta_i) \\ X_{n,\frac{d}{2}+i} &= X_{n,i} \sin(n\theta_i) + X_{n,\frac{d}{2}+i} \cos(n\theta_i) \end{aligned} $$ where $X_{n,i}$ is the $i$-th element of the vector at the $n$-th position of the sequence of tensor $X$.