Recently, significant progress has been made in generating high-quality sentence representations through contrastive learning. SimCSE-like models improve the uniformity of the representation space by pulling in positive examples and pushing out negative examples. However, these models often suffer from semantic monotonicity, sampling bias, and training effect dependent on batch size. In order to solve these problems, this paper proposes a contrastive framework (CEUR) to enhance unsupervised sentence representation learning. CEUR adopts a linguistic knowledge-based sample augmentation method. Positive samples are generated by the method of synonym repetition, and negative samples are generated by the method of antonym replacement. To improve the consistency of representation space, CEUR uses an instance weighting method to reduce sampling bias. Going a step further, CEUR uses momentum contrast to increase the number of trainable negative samples. Extensive experimental results show that CEUR outperforms existing baseline models in comprehensive performance on seven semantic text similarity tasks.
Yanzhao ZhangRichong ZhangSamuel MensahXudong LiuYongyi Mao
Yan ZhangRuidan HeZuozhu LiuLidong BingHaizhou Li
Yeon-Seong JeongMun-Sung HanDong‐Kyu Chae
Kun ZhouBeichen ZhangXin ZhaoJi-Rong Wen