JOURNAL ARTICLE

Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives

Wei WangLiangzhu GeJingqiao ZhangCheng Yang

Year: 2022 Journal:   Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval Pages: 2159-2165

Abstract

Following SimCSE, contrastive learning based methods have achieved the\nstate-of-the-art (SOTA) performance in learning sentence embeddings. However,\nthe unsupervised contrastive learning methods still lag far behind the\nsupervised counterparts. We attribute this to the quality of positive and\nnegative samples, and aim to improve both. Specifically, for positive samples,\nwe propose switch-case augmentation to flip the case of the first letter of\nrandomly selected words in a sentence. This is to counteract the intrinsic bias\nof pre-trained token embeddings to frequency, word cases and subwords. For\nnegative samples, we sample hard negatives from the whole dataset based on a\npre-trained language model. Combining the above two methods with SimCSE, our\nproposed Contrastive learning with Augmented and Retrieved Data for Sentence\nembedding (CARDS) method significantly surpasses the current SOTA on STS\nbenchmarks in the unsupervised setting.\n

Keywords:
Sentence Computer science Artificial intelligence Word (group theory) Natural language processing Embedding Unsupervised learning False positive paradox Sample (material) Security token Speech recognition Mathematics

Metrics

12
Cited By
1.41
FWCI (Field Weighted Citation Impact)
62
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.