JOURNAL ARTICLE

High throughput nonparametric probability density estimation

Jenny FarmerDonald J. Jacobs

Year: 2018 Journal:   PLoS ONE Vol: 13 (5)Pages: e0196937-e0196937   Publisher: Public Library of Science

Abstract

In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.

Keywords:
Probability density function Statistics Probability distribution Akaike information criterion Mathematics Density estimation Principle of maximum entropy Random variable Order statistic Kernel density estimation Independent and identically distributed random variables Cumulative distribution function Sample size determination Statistical inference Estimator

Metrics

22
Cited By
1.77
FWCI (Field Weighted Citation Impact)
71
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Forecasting Techniques and Applications
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Advanced Statistical Methods and Models
Physical Sciences →  Mathematics →  Statistics and Probability
Financial Risk and Volatility Modeling
Social Sciences →  Economics, Econometrics and Finance →  Finance

Related Documents

JOURNAL ARTICLE

Nonparametric probability density estimation

Edward J. Wegman

Journal:   Journal of Statistical Computation and Simulation Year: 1972 Vol: 1 (3)Pages: 225-245
BOOK-CHAPTER

Nonparametric Estimation of Probability Density Functions

Statistics and Computing Year: 2006 Pages: 201-224
BOOK-CHAPTER

Nonparametric Estimation of Probability Density Functions

James E. Gentle

Statisctics and computing/Statistics and computing Year: 2009 Pages: 487-514
JOURNAL ARTICLE

Nonparametric Probability Density Estimation via Interpolation Filtering

Paolo CarboneDario PetriKurt Barbé

Journal:   IEEE Transactions on Instrumentation and Measurement Year: 2017 Vol: 66 (4)Pages: 681-690
© 2026 ScienceGate Book Chapters — All rights reserved.