JOURNAL ARTICLE

Multi-objective Feature Selection Algorithm Based on Apache Spark and Particle Swarm Optimization

Abstract

The algorithm for feature selection using multi- objective particle swarm optimization has been effectively implemented in certain programs. Existing algorithms face difficulties in handling large data volumes and high- dimensional data due to their high computational complexity. To tackle this problem, data parallel frameworks have been suggested with the advancement of distributed parallel computing. However, increasing parallelism could lead to the problem of uneven data distribution which would affect the effectiveness of the cluster. We have introduced a feature selection approach that utilizes the Spark-MOPSO parallel algorithm and is implemented on the Spark platform in this paper. Firstly, the entire dataset was divided into multiple partitions, distributed computing and in-memory computing. They performed out by using Apache Spark. The local fitness values of particles were iterative independently which updated based on the data parallel computation of each partition. Among the interactive process, each partition was independent and did not affect each other. It reduced the necessity to transform mass date between nodes and network communication. At the same time, the local external archives produced by the particle swarm every m iterations pulled back to the main node and broadcasted to the partitions as the global external archive, guiding the search for the optimal particle in each partition and improving the issue of imbalanced data distribution impacting the outcomes. The experimental results indicated that the algorithm could obtain a good quality and diversity Pareto frontier through multi- objective evaluation indicators, and it could also enhance the effectiveness of problem-solving by increasing the speed of finding solutions and exhibit a better capabilities on performance and parallel computing.

Keywords:
Computer science Particle swarm optimization SPARK (programming language) Partition (number theory) Speedup Feature selection Big data Pareto principle Algorithm Estimation of distribution algorithm Parallel computing Mathematical optimization Data mining Artificial intelligence Mathematics

Metrics

1
Cited By
0.26
FWCI (Field Weighted Citation Impact)
12
Refs
0.53
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Metaheuristic Optimization Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Multi-Objective Optimization Algorithms
Physical Sciences →  Computer Science →  Computational Theory and Mathematics

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.