3D Visualization of Matrix Multiplications
#10: Vision transformers need registers, Demystifying CLIP data
Visualizing Matrix Multiplication, Attention & Beyond
“mm” - a visualization tool for matrix multiplications and compositions.
Uses 3D to visualize matrix multiplication expressions, attention heads with real weights.
Paper: Vision Transformers Need Registers
Artifacts are identified and characterised in the feature maps of both supervised and self-supervised ViT networks.
The artifacts correspond to high-norm tokens, primarily in low-informative background areas of images.
Proposes adding additional tokens to the input sequence of Vision Transformer to achieve the same. Read the paper here.
Paper: Demystifying CLIP Data
CLIP provides very limited information about its data leading to works that aim to reproduce CLIP's data.
This paper intends to reveal CLIP's data curation approach, make it open and introduce Metadata-Curated Language-Image Pre-training (MetaCLIP).
MetaCLIP outperforms CLIP's original data on multiple benchmarks, achieving an accuracy of 70.8% on the ImageNet classification task surpassing CLIP’s 68.3%