Transformer

The transformer architecture introduced self-attention, which lets the model weigh the relevance of every token in an input sequence against every other token simultaneously rather than one by one. This parallel processing made it practical to train on massive datasets and gave rise to large language models like those available through Amazon Bedrock. The encoder portion learns contextual representations useful for tasks such as classification, while the decoder generates new tokens one at a time, and many LLMs use a decoder-only design. A key exam nuance: transformers replaced recurrent networks (RNNs and LSTMs), which processed tokens sequentially and struggled with long-range dependencies.

Related terms