Intuitions for Tranformer Circuits
In my view, the most important concepts to understand from this paper are the residual stream, attention, circuits, and induction heads.
This will cause data from the key’s residual stream to be moved into the query’s residual stream.
It reads in from the residual stream and writes out to the residual stream via the W_O matrix.
In this case, this would the first ‘ on’ token (token 4 above).
ConclusionHopefully now you have some better intuition for how different components in a transformer interact with each other through the residual stream.
1 час назад @ connorjdavis.com
infomate
