Many bioinformatics studies aim to identify markers, or features that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, prior work proposes a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms that outperform 2MNC-Robust on synthetic data. Enrichment analysis on real cancer data indicates that they also output many of the genes and pathways linked to the cancers under study.
Ali Foroughi pourLori A. Dalton
Pour, Ali ForoughiDalton, Lori
Ali Foroughi pourLori A. Dalton
Francisco CarvalhoJoão T. MexiaRicardo Covas