2022 January 22
I visualized the token and position embeddings for GPT-2. The plot below uses t-SNE for the visualization. Click and drag to zoom in.
My favorite cluster is at (-51, 110) where there’s a strip of years in increasing order along with “Watergate” and “postwar” near the mid-twentieth century.
The next plot projects the embedding vectors onto their first two principal components. These two components only explain 3.1% of the variance.
The code to generate these plots is here. Thanks to Nat for collaborating on this.