Longhui WangYong WangYudong Xie
Word embedding models treat words with equal status, which leads to the neglect of hierarchical semantic relationships between words (e.g., ‘green’ – ‘color’ and ‘cat’ – ‘mammal’). To build a hierarchical structure of words from raw text data, we propose an unsupervised model to learn word hierarchical representations (WHR), which are extended from word representations. Globally, WHRs can describe a word with several other words representing the basic attributes. The WHR model is an extended continuous bag-of-words (CBOW) neural language model with perceptual grouping and attention mechanisms. We further use WHRs to generate document representations, that are more expressive than some widely used document models, such as latent topic and deep learning models. Experimental results demonstrated that our model outperforms state-of-the-art baselines in terms of document retrieval, document classification, and sentiment analysis.
Enes Burak DündarEthem Alpaydın
Jakub NowakMarcin KorytkowskiSlava VoloshynovskiyRafał Scherer