Incorporating prior knowledge into regularized regression

Zeng, Chubing (author)

doi:10.25549/usctheses-c89-289901

ScienceGate Book Chapters

DISSERTATION

Incorporating prior knowledge into regularized regression

Zeng, Chubing (author)

Year: 2020 University: University of Southern California Digital Library

DOI: 10.25549/usctheses-c89-289901

Get Full-Text PDF Get Analytical Report

Abstract

The rapid advancement of high-throughput sequencing technologies has produced unprecedented amounts and types of omic data. Predicting clinical outcomes based on genomic features like gene expression, methylation, and genotypes is becoming increasingly important for individualized risk assessment and treatment. Associated with genomic features, there is also a rich set of meta-features such as functional annotation, pathway information, and knowledge from previous studies, that comprise valuable additional information. Traditionally, such meta-feature information is used in a post-hoc manner to enhance model explainability. For example, after model fit, analysis can be conducted to formally assess whether the selected gene features are enriched in particular metabolic pathways or gene ontology annotations. This kind of post-hoc analysis can provide biological insights and validation for a prediction model. In this dissertation, we propose novel methods that exploit genomic meta-features a-priori rather than post-hoc, to improve better identify important markers and improve prediction performance. We aim at addressing one central question: how can we predict an outcome of interest and identify relevant features while taking additional information on the features into account? ? Since genomic data sets are typically high-dimensional, penalized regression methods are commonly used to select relevant features and build predictive models. Standard penalized regression applies one penalty parameter to all features, ignoring the structural difference or heterogeneity of features. Based on this, we integrate meta-features into penalized regression by adapting the penalty parameters to be meta-feature-driven. The penalty parameters are modeled as a log-linear function of the meta-features and are estimated from the data using an approximate empirical Bayes approach. ? This dissertation is structured as follows. Chapter 1 introduces how penalized regression techniques can be used to solve high dimensional data problems. Chapter 2 describes an empirical Bayes approach to select the penalty parameter(s) in penalized regression. Chapter 3 discusses our method for incorporating meta-features into LASSO linear regression. Chapter 4 is devoted to the optimization algorithms for marginal likelihood maximization. Chapter 5 extends the model to Ridge and Elastic-Net linear and logistic regression. Finally, Chapter 6 presents the R package we developed to implement our method.

Keywords:

Exploit Regression Bayes' theorem Set (abstract data type) Regression analysis Predictive modelling Function (biology) Penalty method Data set

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Statistical Methods and Inference

Physical Sciences → Mathematics → Statistics and Probability

Gene expression and cancer classification

Life Sciences → Biochemistry, Genetics and Molecular Biology → Molecular Biology

Statistical Methods in Epidemiology

Physical Sciences → Mathematics → Statistics and Probability

Incorporating prior knowledge into regularized regression

Abstract

Metrics

Topics

Related Documents

Incorporating prior knowledge into regularized regression

Incorporating Prior Knowledge into Kernel Based Regression

Incorporating Prior Knowledge into Kernel Based Regression

Incorporating fuzzy prior knowledge into Relevance Vector Machine regression

Incorporating prior knowledge in support vector regression