Technical Terms
Attention Head
Definition
One parallel attention pathway inside a transformer block, often specialising in different token relationships or patterns.
In Plain English
One of the model's separate mini-focus channels inside an attention layer.