Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere

Tongzhou Wang MIT CSAIL

Phillip Isola MIT CSAIL

ICML 2020

Paper:	[arXiv] [ICML 2020]
Code:	[General] [ImageNet with MoCo Variant]

Abstract

Contrastive representation learning has been outstandingly successful in practice. In this work, we identify two key properties related to the contrastive loss: (1) alignment (closeness) of features from positive pairs, and (2) uniformity of the induced distribution of the (normalized) features on the hypersphere. We prove that, asymptotically, the contrastive loss optimizes these properties, and analyze their positive effects on downstream tasks. Empirically, we introduce an optimizable metric to quantify each property. Extensive experiments on standard vision and language datasets confirm the strong agreement between both metrics and downstream task performance. Remarkably, directly optimizing for these two metrics leads to representations with comparable or better performance at downstream tasks than contrastive learning.

Alignment and Uniformity of Representations

Theoretical insight:

In the limit of infinite negative samples, the popular contrastive loss decomposes into two terms, optimizing for alignment and uniformity respectively.

Optimizable Metrics for Alignment and Uniformity

# bsz : batch size (number of positive pairs)
# d   : latent dim
# x   : Tensor, shape=[bsz, d]
#       latents for one side of positive pairs
# y   : Tensor, shape=[bsz, d]
#       latents for the other side of positive pairs

def align_loss(x, y, alpha=2):
    return (x - y).norm(p=2, dim=1).pow(alpha).mean()

def uniform_loss(x, t=2):
    return torch.pdist(x, p=2).pow(2).mul(-t).exp().mean().log()

PyTorch implementation of the alignment and uniformity losses

Alignment and Uniformity Metrics Agree Well with Representation Quality

Encoders are evaluated on Alignment (y-axis), Uniformity (x-axis), and downstream task performance (color).
Blue is better for all tasks.
Aligned and Uniform encoders (lower left corners) consistently perform better.

STL-10	NYU-Depth-V2

ImageNet 100 Class Subset	Moview Review Sentence Polarity (Encoders trained on BookCorpus)

Paper

ICML 2020. arXiv 2005.10242.

Citation

Tongzhou Wang, Phillip Isola. "Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere" International Conference on Machine Learning (ICML). 2020.

Code:

[General] [ImageNet with MoCo Variant]

`bibtex` entry

@inproceedings{tongzhouw2020hypersphere,
  title={Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere},
  author={Wang, Tongzhou and Isola, Phillip},
  booktitle={International Conference on Machine Learning},
  organization={PMLR},
  pages={9929--9939},
  year={2020}
}

Acknowledgements

We thank Philip Bachman, Ching-Yao Chuang, Justin Solomon, Yonglong Tian, and Zhenyang Zhang for many helpful comments and suggestions. Tongzhou Wang was supported by the MIT EECS Merrill Lynch Graduate Fellowship.