ClusCSE: Clustering-Based Contrastive Learning of Sentence Embeddings

Kaihui Guo; Wenhua Xu; Tianyang Liu

doi:10.1109/itaic58329.2023.10409027

ScienceGate Book Chapters

JOURNAL ARTICLE

ClusCSE: Clustering-Based Contrastive Learning of Sentence Embeddings

Kaihui Guo Wenhua Xu Tianyang Liu

Year: 2023 Pages: 1444-1451

DOI: 10.1109/itaic58329.2023.10409027

Get Full-Text PDF Get Analytical Report

Abstract

We propose ClusCSE, an unsupervised sentence embedding framework. Contrastive learning has been widely researched for learning universal sentence embeddings in natural language processing. Contrastive methods typically apply well-designed transformations to raw sentences to construct positive pairs and combine different raw sentences to construct negative pairs. Following the usual paradigm of contrastive learning, unsup-SimCSE advanced state-of-the-art unsupervised sentence embeddings by taking dropout as the minimal data augmentation strategy. Considering the training objective, unsup-SimCSE expects to maximize the similarity of positive pairwise instances while minimize the similarity of negative pairwise instances. Indeed, even different raw sentences could be highly semantically similar. Thus, simply reducing the similarity of negative pairwise embeddings is impractical. Sentence embeddings learned by unsup-SimCSE may contain false knowledge of relationships of different sentences. To alleviate it, we introduce online clustering to unsup-SimCSE and thus propose ClusCSE. Instead of just comparing sentences, ClusCSE also enforces consistency between cluster assignments, which makes the embeddings aware of similar sentence groups. Our evaluations on semantic textual similarity tasks demonstrate that our proposed ClusCSE achieves superior performance compared to unsup-SimCSE with higher average Spearman' s correlation of 1.19% on BERT-base.

Keywords:

Computer science Cluster analysis Artificial intelligence Natural language processing Sentence

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.20

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

ClusCSE: Clustering-Based Contrastive Learning of Sentence Embeddings

Abstract

Metrics

Topics

Related Documents

DialogueCSE: Dialogue-based Contrastive Learning of Sentence Embeddings

Contrastive Learning Based Unsupervised Sentence Embeddings for Hinglish

WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

Composition-contrastive Learning for Sentence Embeddings