Model: knock_6_1_36_words
Probe Sentence: "knock knock whos there cat"
Word | Token Embedding | Combined Embedding (Input to Block 0) |
---|---|---|
'knock' (pos 0) | ![]() | ![]() |
'knock' (pos 1) | ![]() | ![]() |
'whos' (pos 2) | ![]() | ![]() |
'there' (pos 3) | ![]() | ![]() |
'cat' (pos 4) | ![]() | ![]() |
Word | Input (x) | After LN1 | Query (q) | Key (k) | Value (v) | Attn Out (z) | Attention | Attn Proj | After Resid1 | After LN2 | MLP Out | Block Output |
---|---|---|---|---|---|---|---|---|---|---|---|---|
'knock' (pos 0) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'knock' (pos 1) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'whos' (pos 2) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'there' (pos 3) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'cat' (pos 4) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Word | Input (x) | After LN1 | Query (q) | Key (k) | Value (v) | Attn Out (z) | Attention | Attn Proj | After Resid1 | After LN2 | MLP Out | Block Output |
---|---|---|---|---|---|---|---|---|---|---|---|---|
'knock' (pos 0) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'knock' (pos 1) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'whos' (pos 2) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'there' (pos 3) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'cat' (pos 4) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Word | Input (x) | After LN1 | Query (q) | Key (k) | Value (v) | Attn Out (z) | Attention | Attn Proj | After Resid1 | After LN2 | MLP Out | Block Output |
---|---|---|---|---|---|---|---|---|---|---|---|---|
'knock' (pos 0) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'knock' (pos 1) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'whos' (pos 2) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'there' (pos 3) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'cat' (pos 4) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Word | Input (x) | After LN1 | Query (q) | Key (k) | Value (v) | Attn Out (z) | Attention | Attn Proj | After Resid1 | After LN2 | MLP Out | Block Output |
---|---|---|---|---|---|---|---|---|---|---|---|---|
'knock' (pos 0) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'knock' (pos 1) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'whos' (pos 2) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'there' (pos 3) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'cat' (pos 4) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Word | Input (x) | After LN1 | Query (q) | Key (k) | Value (v) | Attn Out (z) | Attention | Attn Proj | After Resid1 | After LN2 | MLP Out | Block Output |
---|---|---|---|---|---|---|---|---|---|---|---|---|
'knock' (pos 0) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'knock' (pos 1) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'whos' (pos 2) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'there' (pos 3) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'cat' (pos 4) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Word | Input (x) | After LN1 | Query (q) | Key (k) | Value (v) | Attn Out (z) | Attention | Attn Proj | After Resid1 | After LN2 | MLP Out | Block Output |
---|---|---|---|---|---|---|---|---|---|---|---|---|
'knock' (pos 0) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'knock' (pos 1) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'whos' (pos 2) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'there' (pos 3) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
'cat' (pos 4) | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Word | After Final Layer Normalisation | Final Linear Layer | Dot Product Breakdown for cat |
---|---|---|---|
'knock' (pos 0) | ![]() | ![]() | |
'knock' (pos 1) | ![]() | ![]() | |
'whos' (pos 2) | ![]() | ![]() | |
'there' (pos 3) | ![]() | ![]() | |
'cat' (pos 4) | ![]() | ![]() | ![]() |
Below are the raw embedding representations for the top 10 potential next tokens. You can compare the visualization for the top prediction with the 'Dot Product Breakdown' visualization in the 'Final Projection' section above. The 'Dot Product Breakdown' shows how the model's final internal state aligns with a token's embedding to produce a high logit score.
Token | Probability | Embedding Visualization |
---|---|---|
cat |
|
![]() |
dog |
|
![]() |
catch |
|
![]() |
can |
|
![]() |
like |
|
![]() |
dead |
|
![]() |
hot |
|
![]() |
day |
|
![]() |
new |
|
![]() |
as |
|
![]() |