DISSERTATION

Incorporating prior knowledge into regularized regression

Zeng, Chubing (author)

Year: 2020 University:   University of Southern California Digital Library

Abstract

The rapid advancement of high-throughput sequencing technologies has produced unprecedented amounts and types of omic data. Predicting clinical outcomes based on genomic features like gene expression, methylation, and genotypes is becoming increasingly important for individualized risk assessment and treatment. Associated with genomic features, there is also a rich set of meta-features such as functional annotation, pathway information, and knowledge from previous studies, that comprise valuable additional information. Traditionally, such meta-feature information is used in a post-hoc manner to enhance model explainability. For example, after model fit, analysis can be conducted to formally assess whether the selected gene features are enriched in particular metabolic pathways or gene ontology annotations. This kind of post-hoc analysis can provide biological insights and validation for a prediction model. In this dissertation, we propose novel methods that exploit genomic meta-features a-priori rather than post-hoc, to improve better identify important markers and improve prediction performance. We aim at addressing one central question: how can we predict an outcome of interest and identify relevant features while taking additional information on the features into account? ? Since genomic data sets are typically high-dimensional, penalized regression methods are commonly used to select relevant features and build predictive models. Standard penalized regression applies one penalty parameter to all features, ignoring the structural difference or heterogeneity of features. Based on this, we integrate meta-features into penalized regression by adapting the penalty parameters to be meta-feature-driven. The penalty parameters are modeled as a log-linear function of the meta-features and are estimated from the data using an approximate empirical Bayes approach. ? This dissertation is structured as follows. Chapter 1 introduces how penalized regression techniques can be used to solve high dimensional data problems. Chapter 2 describes an empirical Bayes approach to select the penalty parameter(s) in penalized regression. Chapter 3 discusses our method for incorporating meta-features into LASSO linear regression. Chapter 4 is devoted to the optimization algorithms for marginal likelihood maximization. Chapter 5 extends the model to Ridge and Elastic-Net linear and logistic regression. Finally, Chapter 6 presents the R package we developed to implement our method.

Keywords:
Exploit Regression Bayes' theorem Set (abstract data type) Regression analysis Predictive modelling Function (biology) Penalty method Data set

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Statistical Methods and Inference
Physical Sciences →  Mathematics →  Statistics and Probability
Gene expression and cancer classification
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Statistical Methods in Epidemiology
Physical Sciences →  Mathematics →  Statistics and Probability

Related Documents

JOURNAL ARTICLE

Incorporating prior knowledge into regularized regression

Chubing ZengDuncan C. ThomasJuan Pablo Lewinger

Journal:   Bioinformatics Year: 2020 Vol: 37 (4)Pages: 514-521
JOURNAL ARTICLE

Incorporating Prior Knowledge into Kernel Based Regression

Zhe SunZengke ZhangHuangang Wang

Journal:   ACTA AUTOMATICA SINICA Year: 2009 Vol: 34 (12)Pages: 1515-1521
JOURNAL ARTICLE

Incorporating Prior Knowledge into Kernel Based Regression

Zhe SunZengke ZhangHuangang Wang

Journal:   Acta Automatica Sinica Year: 2008 Vol: 34 (12)Pages: 1515-1521
JOURNAL ARTICLE

Incorporating prior knowledge in support vector regression

Fabien LauerGérard Bloch

Journal:   Machine Learning Year: 2007 Vol: 70 (1)Pages: 89-118
© 2026 ScienceGate Book Chapters — All rights reserved.