Detecting outliers in multivariate data while controlling false alarm rate

André Achim

doi:10.20982/tqmp.08.2.p108

ScienceGate Book Chapters

JOURNAL ARTICLE

Detecting outliers in multivariate data while controlling false alarm rate

André Achim

Year: 2012 Journal: Tutorials in Quantitative Methods for Psychology Vol: 8 (2)Pages: 108-121 Publisher: University of Ottawa

DOI: 10.20982/tqmp.08.2.p108

Get Full-Text PDF Get Analytical Report

Abstract

Outlier identification often implies inspecting each z-transformed variable and adding a Mahalanobis D 2 . Multiple outliers may mask each other by increasing variance estimates. Caroni & Prescott (1992) proposed a extension of Rosner’s (1983) technique to circumvent masking, taking sample size into account to keep the false alarm risk below, say, α = .05. Simulations studies here compare the single approach to multiple-univariate plus multivariate tests, each at a Bonferroni corrected α level, in terms of power at detecting outliers. Results suggest the former is better only up to about 12 variables. Macros in an Excel spreadsheet implement these techniques. The impetus of the present work was to identify, in the context of a graduate course in statistics, sound statistical procedures to recommend for the examination of data for the detection of outliers, assuming normal distributions . The basic consideration is that the statistical criterion beyond which a piece of data would be considered an outlier must take into account both the number of cases (subjects) inspected as well as the number of variables examined if the variables are inspected one by one. This is required to adequately control the risk of falsely rejecting at least one case that actually belongs to the population. In particular, a fixed critical z-score, irrespective of number of variables or of sample size, can hardly be recommended. Beyond controlling for false alarm (FA) rate,

Keywords:

Outlier Univariate Statistics Bonferroni correction Sample size determination Multivariate statistics Anomaly detection Context (archaeology) Mahalanobis distance Constant false alarm rate Computer science Mathematics Data mining Artificial intelligence

Metrics

Cited By

0.32

FWCI (Field Weighted Citation Impact)

Refs

0.64

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Statistical Methods and Models

Physical Sciences → Mathematics → Statistics and Probability

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Statistical Process Monitoring

Social Sciences → Decision Sciences → Statistics, Probability and Uncertainty

Detecting outliers in multivariate data while controlling false alarm rate

Abstract

Metrics

Citation History

Topics

Related Documents

Detecting Outliers in Multivariate Laboratory Data

Removing Outliers from 3D Macrotexture Data by Controlling False Discovery Rate

DETECTING MULTIVARIATE OUTLIERS IN ARTEFACT COMPOSITIONAL DATA*

Incremental Methods for Detecting Outliers from Multivariate Data Stream

Detecting outliers in multivariate data and visualization-R scripts