For high-dimensional data where the number of variables greatly exceeds the number of observations, selecting important variables while maintaining the required heredity conditions can be challenging. This dissertation is structured into three interconnected parts. In the first part, we propose a variable selection method by implementing a well-known optimization technique, the Genetic Algorithm. An R package was developed to simplify the implementation and usage of the proposed method. We then propose another variable selection method by extending the study from the Genetic Algorithm to a different but related optimization technique, Simulated Annealing. We consider three different hierarchical structures in both studies. We compare the performance and efficiency of the two proposed algorithms using multiple simulation studies. In the last part of the dissertation, a transfer learning-inspired algorithm with a specific focus on studying microbiome-metabolome interactions is proposed. We compare the proposed method with other existing methods in terms of mean squared error, type-I error, and power. An application of this method to real-world data reveals biologically significant interactions between gut microbes and various bile acids.
Zuharah JaafarNorazlina Ismail