Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives

Wei Wang; Liangzhu Ge; Jingqiao Zhang; Cheng Yang

doi:10.1145/3477495.3531823

ScienceGate Book Chapters

JOURNAL ARTICLE

Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives

Wei Wang Liangzhu Ge Jingqiao Zhang Cheng Yang

Year: 2022 Journal: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval Pages: 2159-2165

DOI: 10.1145/3477495.3531823

Get Full-Text PDF Get Analytical Report

Abstract

Following SimCSE, contrastive learning based methods have achieved the\nstate-of-the-art (SOTA) performance in learning sentence embeddings. However,\nthe unsupervised contrastive learning methods still lag far behind the\nsupervised counterparts. We attribute this to the quality of positive and\nnegative samples, and aim to improve both. Specifically, for positive samples,\nwe propose switch-case augmentation to flip the case of the first letter of\nrandomly selected words in a sentence. This is to counteract the intrinsic bias\nof pre-trained token embeddings to frequency, word cases and subwords. For\nnegative samples, we sample hard negatives from the whole dataset based on a\npre-trained language model. Combining the above two methods with SimCSE, our\nproposed Contrastive learning with Augmented and Retrieved Data for Sentence\nembedding (CARDS) method significantly surpasses the current SOTA on STS\nbenchmarks in the unsupervised setting.\n

Keywords:

Sentence Computer science Artificial intelligence Word (group theory) Natural language processing Embedding Unsupervised learning False positive paradox Sample (material) Security token Speech recognition Mathematics

Metrics

Cited By

1.41

FWCI (Field Weighted Citation Impact)

Refs

0.82

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives

Abstract

Metrics

Citation History

Topics

Related Documents

Contrastive learning with hard negatives for sentence embeddings

HSimCSE: Improving Contrastive Learning of Unsupervised Sentence Representation with Adversarial Hard Positives and Dual Hard Negatives

Improving Contrastive Learning of Sentence Embeddings with Focal InfoNCE

Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

Improving Contrastive Learning of Sentence Embeddings from AI Feedback