Αϊβάτογλου, Γεώργιος Χαράλαμπου
Due to the exponential growth of data the recent years, many businesses and organizations seek innovative techniques in order to collect and understand data. Since they are most interested in the public opinion for their products and services they implement methodologies for sentiment classification. These techniques are part of Sentiment Analysis and Natural Language Processing with the goal to automatically understand the sentiment given a corpus. Although the proliferation of text documents caused problems in the last decade that were mostly related to computational resources, nowadays we are able to analyze data more efficiently due to the Deep Learning architectures. However, many languages are characterized as low-resource languages due to the limited data that are available online for analysis. For this reason in this thesis an aspect-based Sentiment Analysis methodology is presented, targeting to classify the aspects of a sentence into pre-defined sentiment categories. Specifically, the dataset of this study is written in the Greek language and was collected from social networks like Twitter. Moreover, the labels, both aspects and sentiments, were manually chosen from expert annotators. Furthermore, various language models and deep learning architectures were developed and tested. Finally, the results of the best architecture, which is a combination of neural machine translation and a language-model ensemble methodology clearly proved the necessity of neural machine translation for imbalanced data and the superiority of the Transformer-based ensemble architectures achieving ambitious results for the problem of aspect-based Sentiment Analysis in low-resource languages.
Guangmin LiHui WangYi DingKangan ZhouXiaowei Yan