JOURNAL ARTICLE

Geographical feature classification from text using (active) convolutional neural networks

Abstract

Deep learning can discover intricate patterns hidden in big data, and has much better scalability than traditional machine learning when the volume of data increases dramatically. Thus, deep learning has gained many successes in various domains and applications such as image classification, text classification, and machine translation. In this paper, we use deep learning to classify geographical features (e.g., mountains, rivers, landmarks, and cities) from text, using geolocated Wikipedia entries as the case study application. We employ one of the most commonly used deep learning architectures, convolutional neural networks (CNNs) and its integration with active learning (creating what we call active CNNs), to train the geographical feature classifiers on the Wikipedia text data set obtained from GeoNames (which provides the feature type for each geolocated entity). We evaluate the performance of CNNs and active CNNs with multiple metrics (i.e., accuracy, F1 score, and confusion matrix). Our experiment results demonstrated that CNNs and active CNNs can effectively classify geo-referenced text entities into predefined geographical features. In addition, our experiment results show that active CNNs outperform CNNs for hard to distinguish classes. In our experiment, we also compared results for hierarchical multi-class classification and flat multiclass classification, and the results show that hierarchical multiclass classification significantly outperforms flat multi-class classification for the data set we used.

Keywords:
Computer science Artificial intelligence Convolutional neural network Confusion matrix Feature (linguistics) Deep learning Machine learning Class (philosophy) Set (abstract data type) Pattern recognition (psychology)

Metrics

4
Cited By
0.59
FWCI (Field Weighted Citation Impact)
74
Refs
0.74
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Data-Driven Disease Surveillance
Health Sciences →  Medicine →  Epidemiology
Geographic Information Systems Studies
Social Sciences →  Social Sciences →  Geography, Planning and Development
© 2026 ScienceGate Book Chapters — All rights reserved.