This work provides an alternative way to preprocessing procedure for consolidated data.Two methods are proposed.The first one is used for feature selection based on ensemble of machine learning algorithms.And the second one organizes missing data imputation based on combination of functional dependencies and associative rules.Ensemble methods for processing multimodal data based on a hierarchical classifier, a set of weak classifiers and a number of methods for selecting important characteristics with a much higher value of accuracy on unbalanced data sets compared to existing machine learning methods are developed.The methods are validated on medical dataset.The percentage of recovery data is on 1.2% comparing with associative rules.The proposed missing data imputation method creates additional data values operating a based domain and functional dependencies and includes these values to available training data.The correctness of the filled-in values is proved on the predictor built on the original dataset.The proposed PPD method conducts 12% better than RF and EM models for 30% missing data.
Karima BenhamzaNadjette BenhamidaMohamed Ilyes BOURAHDOUNBilel BOUDJAHEM