Positional Encodings: Part 10: Exercises to References

10. Exercises

(*) Compute a sinusoidal position row.
- (a) State the scheme.
- (b) Compute the numeric example.
- (c) Explain the LLM consequence.
(*) Show token plus position addition.
- (a) State the scheme.
- (b) Compute the numeric example.
- (c) Explain the LLM consequence.
(*) Build a relative distance matrix.
- (a) State the scheme.
- (b) Compute the numeric example.
- (c) Explain the LLM consequence.
(**) Create a relative attention bias matrix.
- (a) State the scheme.
- (b) Compute the numeric example.
- (c) Explain the LLM consequence.
(**) Apply a RoPE rotation and check norm preservation.
- (a) State the scheme.
- (b) Compute the numeric example.
- (c) Explain the LLM consequence.
(**) Verify a RoPE relative-offset dot-product identity in two dimensions.
- (a) State the scheme.
- (b) Compute the numeric example.
- (c) Explain the LLM consequence.
(**) Build an ALiBi matrix.
- (a) State the scheme.
- (b) Compute the numeric example.
- (c) Explain the LLM consequence.
(***) Compute learned position table parameters.
- (a) State the scheme.
- (b) Compute the numeric example.
- (c) Explain the LLM consequence.
(***) Check decode position ids for a KV cache.
- (a) State the scheme.
- (b) Compute the numeric example.
- (c) Explain the LLM consequence.
(***) Design a long-context position diagnostic.

(a) State the scheme.
(b) Compute the numeric example.
(c) Explain the LLM consequence.

11. Why This Matters for AI

Concept	AI impact
Sinusoidal encodings	Provide fixed absolute order features without learned position rows.
Learned position tables	Work well in-range but tie the model to trained maximum positions.
Relative biases	Let attention reason about pairwise distances.
RoPE	Supports relative offset behavior through rotations and is common in modern decoder LLMs.
ALiBi	Adds simple distance penalties that extrapolate without a learned position table.
Position ids	Matter for KV-cache decoding and long-context serving correctness.
Long-context diagnostics	Expose lost-in-the-middle, recency bias, and extrapolation failures.
Mask interaction	Ensures order signals do not override causal or padding visibility.

12. Conceptual Bridge

The backward bridge is attention. Attention computes content-based interactions, but position mechanisms determine whether those interactions know sequence order and distance.

The forward bridge is language-model probability. In next-token prediction, position affects which prefix states are visible, how generated tokens receive ids, and whether the model can use long contexts reliably.

+-------------+      +----------------+      +----------------------+
| attention   | ---> | position signal | ---> | ordered next-token    |
| content mix |      | absolute/rel    |      | prediction            |
+-------------+      +----------------+      +----------------------+

The practical habit is to test length behavior, not only maximum accepted length. Position encodings can be mathematically valid and still behave poorly outside the training regime.

References

Vaswani et al.. Attention Is All You Need. https://arxiv.org/abs/1706.03762
Shaw, Uszkoreit, Vaswani. Self-Attention with Relative Position Representations. https://arxiv.org/abs/1803.02155
Dai et al.. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. https://arxiv.org/abs/1901.02860
Su et al.. RoFormer: Enhanced Transformer with Rotary Position Embedding. https://arxiv.org/abs/2104.09864
Press et al.. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. https://arxiv.org/abs/2108.12409

Positional Encodings: Part 3 - Exercises To References

Positional Encodings: Part 10: Exercises to References

10. Exercises

11. Why This Matters for AI

12. Conceptual Bridge

References

Test this lesson

Which module does this lesson belong to?

Which section is covered in this lesson content?

Which term is most central to this lesson?

What is the best way to use this lesson for real learning?