DISSERTATION

Cluster-based semi-supervised ensemble learning

Abstract

Semi-supervised classification consists of acquiring knowledge from both labelled and unlabelled data to classify test instances. The cluster assumption represents one of the potential relationships between true classes and data distribution that semi-supervised algorithms assume in order to use unlabelled data. Ensemble algorithms have been widely and successfully employed in both supervised and semi-supervised contexts. In this Thesis, we focus on the cluster assumption to study ensemble learning based on a new cluster regularisation technique for multi-class semi-supervised classification. Firstly, we introduce a multi-class cluster-based classifier, the Cluster-based Regularisation (Cluster- Reg) algorithm. ClusterReg employs a new regularisation mechanism based on posterior probabilities generated by a clustering algorithm in order to avoid generating decision boundaries that traverses high-density regions. Such a method possesses robustness to overlapping classes and to scarce labelled instances on uncertain and low-density regions, when data follows the cluster assumption. Secondly, we propose a robust multi-class boosting technique, Cluster-based Boosting (CBoost), which implements the proposed cluster regularisation for ensemble learning and uses ClusterReg as base learner. CBoost is able to overcome possible incorrect pseudo-labels and produces better generalisation than existing classifiers. And, finally, since there are often datasets with a large number of unlabelled instances, we propose the Efficient Cluster-based Boosting (ECB) for large multi-class datasets. ECB extends CBoost and has lower time and memory complexities than state-of-the-art algorithms. Such a method employs a sampling procedure to reduce the training set of base learners, an efficient clustering algorithm, and an approximation technique for nearest neighbours to avoid the computation of pairwise distance matrix. Hence, ECB enables semi-supervised classification for large-scale datasets.

Keywords:
Boosting (machine learning) Cluster analysis Ensemble learning Classifier (UML) Artificial intelligence Computer science Machine learning Pattern recognition (psychology) Robustness (evolution) Cluster (spacecraft) Data mining

Metrics

1
Cited By
0.00
FWCI (Field Weighted Citation Impact)
49
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Semi-supervised cluster ensemble based on density peaks

Mustafa Raad KadhimHongjun WangYuan ZhouJian Song

Journal:   Data Science and Knowledge Engineering for Sensing Decision Support Year: 2018 Pages: 645-651
JOURNAL ARTICLE

Semi-supervised classification with cluster ensemble

Vladimir BerikovNikita KaraevAnkit Tewari

Journal:   2017 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON) Year: 2017 Vol: 20 Pages: 245-250
JOURNAL ARTICLE

Semi-supervised Ensemble Learning Based on Observational Learning

Yang Li-yingShanli Zhong -

Journal:   International Journal of Advancements in Computing Technology Year: 2012 Vol: 4 (9)Pages: 298-306
© 2026 ScienceGate Book Chapters — All rights reserved.