Scalable Bayesian sparse learning in high-dimensional model

Ke, Xiongwen

doi:10.26190/unsworks/24913

ScienceGate Book Chapters

DISSERTATION

Scalable Bayesian sparse learning in high-dimensional model

Ke, Xiongwen

Year: 2023 University: UNSWorks (University of New South Wales, Sydney, Australia) Publisher: Australian Defence Force Academy

DOI: 10.26190/unsworks/24913

Get Full-Text PDF Get Analytical Report

Abstract

Nowadays, high-dimensional models, where the number of parameters or features can even be larger than the number of observations are encountered on a fairly regular basis due to advancements in modern computation. For example, in gene expression datasets, we often encounter datasets with observations in the order of at most a few hundred and with predictors from thousands of genes. One of the goals is to identify the genes which are relevant to the expression. Another example is model compression, which aims to alleviate the costs of large model sizes. The former example is the variable or feature selection problem, while the latter is the model selection problem. In the Bayesian framework, we often specify shrinkage priors that induce sparsity in the model. The sparsity-inducing prior will have a high concentration around zero to identify the zero coefficient and heavy tails to capture the non-zero element. In this thesis, we first provide an overview of the most well-known sparsity-inducing priors. Then we propose to use $L_{\frac{1}{2}}$ prior with a partially collapsed Gibbs (PCG) sampler 2 to explore the high dimensional parameter space in linear regression models and variable selection is achieved through credible intervals. We also develop a coordinate-wise optimization for posterior mode search with theoretical guarantees. We then extend the PCG sampler to develop a scalable ordinal regression model with a real application in the study of student evaluation of surveys. Next, we move to modern deep learning. A constrained variational Adam (CVA) algorithm has been introduced to optimize the Bayesian neural network and its connection to stochastic gradient Hamiltonian Monte Carlo has been discussed. We then generalize our algorithm to constrained variational Adam with expectation maximization (CVA-EM), which incorporates the spike-and-slab prior to capturing the sparsity of the neural network. Both nonlinear high dimensional variable selection and network pruning can be achieved by this algorithm. We further show that the CVA-EM algorithm can extend to the graph neural networks to produce both sparse graphs and sparse weights. Finally, we discuss the sparse VAE with $L_{\frac{1}{2}}$ prior as potential future work.

Keywords:

Feature selection Prior probability Gibbs sampling Bayesian probability Scalability Model selection Artificial neural network Feature (linguistics) Linear regression Elastic net regularization

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Gaussian Processes and Bayesian Inference

Physical Sciences → Computer Science → Artificial Intelligence

Stochastic Gradient Optimization Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Markov Chains and Monte Carlo Methods

Physical Sciences → Mathematics → Statistics and Probability

Scalable Bayesian sparse learning in high-dimensional model

Abstract

Metrics

Topics

Related Documents

Scalable Bayesian High-dimensional Local Dependence Learning

Scalable Mean-Field Sparse Bayesian Learning

Parallel block sparse Bayesian learning for high dimensional sparse signals

High-Dimensional Sparse Bayesian Learning without Covariance Matrices

Scalable Graph-Based Semi-Supervised Learning through Sparse Bayesian Model