Full Model Flow Visualization

Model: knock_6_1_36_words
Probe Sentence: "knock knock whos there cat"

Input Embeddings

Word	Token Embedding	Combined Embedding (Input to Block 0)
'knock' (pos 0)
'knock' (pos 1)
'whos' (pos 2)
'there' (pos 3)
'cat' (pos 4)

Transformer Block 0

Word	Input (x)	After LN1	Query (q)	Key (k)	Value (v)	Attn Out (z)	Attention	Attn Proj	After Resid1	After LN2	MLP Out	Block Output
'knock' (pos 0)							knock 0.096 knock 0.064 whos 0.244 there 0.309 cat 0.288
'knock' (pos 1)							knock 0.104 knock 0.060 whos 0.199 there 0.335 cat 0.302
'whos' (pos 2)							knock 0.090 knock 0.060 whos 0.196 there 0.358 cat 0.296
'there' (pos 3)							knock 0.066 knock 0.042 whos 0.259 there 0.198 cat 0.435
'cat' (pos 4)							knock 0.119 knock 0.086 whos 0.295 there 0.299 cat 0.201

Transformer Block 1

Word	Input (x)	After LN1	Query (q)	Key (k)	Value (v)	Attn Out (z)	Attention	Attn Proj	After Resid1	After LN2	MLP Out	Block Output
'knock' (pos 0)							knock 0.195 knock 0.119 whos 0.105 there 0.186 cat 0.395
'knock' (pos 1)							knock 0.222 knock 0.111 whos 0.086 there 0.206 cat 0.374
'whos' (pos 2)							knock 0.153 knock 0.088 whos 0.085 there 0.153 cat 0.520
'there' (pos 3)							knock 0.171 knock 0.147 whos 0.137 there 0.124 cat 0.420
'cat' (pos 4)							knock 0.197 knock 0.123 whos 0.162 there 0.204 cat 0.314

Transformer Block 2

Word	Input (x)	After LN1	Query (q)	Key (k)	Value (v)	Attn Out (z)	Attention	Attn Proj	After Resid1	After LN2	MLP Out	Block Output
'knock' (pos 0)							knock 0.195 knock 0.142 whos 0.036 there 0.124 cat 0.503
'knock' (pos 1)							knock 0.215 knock 0.149 whos 0.030 there 0.149 cat 0.457
'whos' (pos 2)							knock 0.201 knock 0.160 whos 0.037 there 0.082 cat 0.520
'there' (pos 3)							knock 0.093 knock 0.059 whos 0.006 there 0.006 cat 0.835
'cat' (pos 4)							knock 0.102 knock 0.069 whos 0.050 there 0.050 cat 0.729

Transformer Block 3

Word	Input (x)	After LN1	Query (q)	Key (k)	Value (v)	Attn Out (z)	Attention	Attn Proj	After Resid1	After LN2	MLP Out	Block Output
'knock' (pos 0)							knock 0.159 knock 0.120 whos 0.039 there 0.093 cat 0.588
'knock' (pos 1)							knock 0.164 knock 0.122 whos 0.031 there 0.100 cat 0.584
'whos' (pos 2)							knock 0.196 knock 0.149 whos 0.020 there 0.068 cat 0.566
'there' (pos 3)							knock 0.072 knock 0.043 whos 0.003 there 0.009 cat 0.872
'cat' (pos 4)							knock 0.073 knock 0.047 whos 0.024 there 0.056 cat 0.800

Transformer Block 4

Word	Input (x)	After LN1	Query (q)	Key (k)	Value (v)	Attn Out (z)	Attention	Attn Proj	After Resid1	After LN2	MLP Out	Block Output
'knock' (pos 0)							knock 0.063 knock 0.088 whos 0.033 there 0.758 cat 0.057
'knock' (pos 1)							knock 0.033 knock 0.050 whos 0.014 there 0.885 cat 0.018
'whos' (pos 2)							knock 0.157 knock 0.207 whos 0.189 there 0.265 cat 0.181
'there' (pos 3)							knock 0.049 knock 0.022 whos 0.031 there 0.006 cat 0.892
'cat' (pos 4)							knock 0.127 knock 0.089 whos 0.066 there 0.109 cat 0.610

Transformer Block 5

Word	Input (x)	After LN1	Query (q)	Key (k)	Value (v)	Attn Out (z)	Attention	Attn Proj	After Resid1	After LN2	MLP Out	Block Output
'knock' (pos 0)							knock 0.159 knock 0.091 whos 0.192 there 0.102 cat 0.456
'knock' (pos 1)							knock 0.170 knock 0.095 whos 0.221 there 0.054 cat 0.461
'whos' (pos 2)							knock 0.102 knock 0.062 whos 0.062 there 0.186 cat 0.589
'there' (pos 3)							knock 0.090 knock 0.085 whos 0.010 there 0.097 cat 0.718
'cat' (pos 4)							knock 0.033 knock 0.011 whos 0.066 there 0.277 cat 0.612

Final Projection

Word	After Final Layer Normalisation	Final Linear Layer	Dot Product Breakdown for cat
'knock' (pos 0)
'knock' (pos 1)
'whos' (pos 2)
'there' (pos 3)
'cat' (pos 4)

Next Token Prediction

Below are the raw embedding representations for the top 10 potential next tokens. You can compare the visualization for the top prediction with the 'Dot Product Breakdown' visualization in the 'Final Projection' section above. The 'Dot Product Breakdown' shows how the model's final internal state aligns with a token's embedding to produce a high logit score.