Extractive opinion summarization extracts sentences from users’ reviews to represent the prevalent opinions about a product or service. However, the extracted sentences can be redundant and may miss some important aspects, especially for centroid-based extractive summarization models (Radev et al., 2004). To alleviate these issues, we introduce TokenCluster– a method for unsupervised extractive opinion summarization that automatically identifies the aspects described in the review sentences and then extracts sentences based on their aspects. It identifies the underlying aspects of the review sentences using roots of noun phrases and adjectives appearing in them. Empirical evaluation shows that TokenCluster improves aspect coverage in summaries and achieves strong performance on multiple opinion summarization datasets, for both general and aspect-specific summarization. We also perform extensive ablation and human evaluation studies to validate the design choices of our method. The implementation of our work is available at https://github.com/leehaoyuan/TokenCluster
Somnath Basu Roy ChowdhuryChao ZhaoSnigdha Chaturvedi
Zhihao FanHuiyong LiShasha MoJianwei Niu
Mengli ZhangGang ZhouNingbo HuangPeng HeWanting YuWenfen Liu
Miao LiJey Han LauEduard HovyMirella Lapata