Liping YangAlan M. MacEachrenPrasenjit Mitra
Deep learning can discover intricate patterns hidden in big data, and has much better scalability than traditional machine learning when the volume of data increases dramatically. Thus, deep learning has gained many successes in various domains and applications such as image classification, text classification, and machine translation. In this paper, we use deep learning to classify geographical features (e.g., mountains, rivers, landmarks, and cities) from text, using geolocated Wikipedia entries as the case study application. We employ one of the most commonly used deep learning architectures, convolutional neural networks (CNNs) and its integration with active learning (creating what we call active CNNs), to train the geographical feature classifiers on the Wikipedia text data set obtained from GeoNames (which provides the feature type for each geolocated entity). We evaluate the performance of CNNs and active CNNs with multiple metrics (i.e., accuracy, F1 score, and confusion matrix). Our experiment results demonstrated that CNNs and active CNNs can effectively classify geo-referenced text entities into predefined geographical features. In addition, our experiment results show that active CNNs outperform CNNs for hard to distinguish classes. In our experiment, we also compared results for hierarchical multi-class classification and flat multiclass classification, and the results show that hierarchical multiclass classification significantly outperforms flat multi-class classification for the data set we used.
Sara Muslih MishalMurtadha M. Hamad
Mark HughesIrene LiSpyros KotoulasToyotaro Suzumura
Simon BakerAnna-Leena KorhonenSampo Pyysalo
Ignazio GalloShah NawazAlessandro Calefati