Attention
The mechanism that lets a model weigh the relevance of different tokens to each other.
Self-attention compares every token with every other token in the input.
The mechanism that lets a model weigh the relevance of different tokens to each other.
Self-attention compares every token with every other token in the input.