JOURNAL ARTICLE

ClusCSE: Clustering-Based Contrastive Learning of Sentence Embeddings

Abstract

We propose ClusCSE, an unsupervised sentence embedding framework. Contrastive learning has been widely researched for learning universal sentence embeddings in natural language processing. Contrastive methods typically apply well-designed transformations to raw sentences to construct positive pairs and combine different raw sentences to construct negative pairs. Following the usual paradigm of contrastive learning, unsup-SimCSE advanced state-of-the-art unsupervised sentence embeddings by taking dropout as the minimal data augmentation strategy. Considering the training objective, unsup-SimCSE expects to maximize the similarity of positive pairwise instances while minimize the similarity of negative pairwise instances. Indeed, even different raw sentences could be highly semantically similar. Thus, simply reducing the similarity of negative pairwise embeddings is impractical. Sentence embeddings learned by unsup-SimCSE may contain false knowledge of relationships of different sentences. To alleviate it, we introduce online clustering to unsup-SimCSE and thus propose ClusCSE. Instead of just comparing sentences, ClusCSE also enforces consistency between cluster assignments, which makes the embeddings aware of similar sentence groups. Our evaluations on semantic textual similarity tasks demonstrate that our proposed ClusCSE achieves superior performance compared to unsup-SimCSE with higher average Spearman' s correlation of 1.19% on BERT-base.

Keywords:
Computer science Cluster analysis Artificial intelligence Natural language processing Sentence

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
45
Refs
0.20
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

DialogueCSE: Dialogue-based Contrastive Learning of Sentence Embeddings

Che LiuRui WangJinghua LiuJian SunFei HuangLuo Si

Journal:   Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Year: 2021
JOURNAL ARTICLE

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

Yung-Sung ChuangRumen DangovskiHongyin LuoYang ZhangShiyu ChangMarin SoljačićShang-Wen LiScott YihYoon KimJames Glass

Journal:   Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Year: 2022 Pages: 4207-4218
© 2026 ScienceGate Book Chapters — All rights reserved.