GPT-2 token embeddings visualized

2022 January 22

I visualized the token and position embeddings for GPT-2. The plot below uses t-SNE for the visualization. Click and drag to zoom in.

My favorite cluster is at (-51, 110) where there’s a strip of years in increasing order along with “Watergate” and “postwar” near the mid-twentieth century.

The next plot projects the embedding vectors onto their first two principal components. These two components only explain 3.1% of the variance.

The code to generate these plots is here. Thanks to Nat for collaborating on this.