Description Usage Arguments Details Value Note Author(s) References Examples
Replicates the experiment presented in Cerioli et al. (2009), Table 1, for a wider variety of estimators.
1 | table1sim.parallel(cl, p, nn, N, B = 10000, alpha = c(0.01, 0.025, 0.05), cutoff.method = "GM14", lgf = "")
|
cl |
A cluster object, e.g., returned from |
p |
The dimension of the data used in each simulated run. |
nn |
The number of observations used in each simulated run. |
N |
The number of simulations to run. |
B |
The batch/block size: the number of simulations to run
in each block. This is useful when running very large
simulation runs ( |
alpha |
The significance level to use for detecting outliers. Can be a vector; the outlier detection tests will be run at each level. |
cutoff.method |
String indicating with asymptotic distribution to use
for the MCD-based distances. Valid values are |
lgf |
Path to log file into which logging information should be written. |
This is a work function designed for use in replicating Table 1 of Cerioli et al. (2009), page XXX, but using the asymptotic method of Green and Martin (2014) instead of the Hardin-Rocke method. The experiment investigates how many false-positives certain Mahalanobis-based tests of outlyingness produce, compared to the nominal Type I error rate α.
Internally the simulation function does B
runs at a time. Blocks
of size B
are distributed across the cluster. Set B
smaller if
your machines have less memory or you have lots of cluster nodes.
An array of dimension 3:
The results of each of the N
simulation runs appear along the first dimension.
The various estimators and tests appear along the second dimension. Currently the results appear in the following order.
Column Name | Covariate Estimate | Test Statistic |
"OGK" | OGK estimate | chi-squared |
"ROGK" | Reweighted OGK estimate | chi-squared |
"SEST.BS" | S-estimate using bisquare | chi-squared |
"SEST.RK" | S-estimate using Rocke | chi-squared |
"MCD50.RAW" | MCD(0.5) | chi-squared |
"MCD50.HRRAW" | MCD(0.5) | Hardin-Rocke |
"MCD50.HRADJ" | MCD(0.5) | Hardin-Rocke (adj.) |
"RMCD50" | reweighted MCD(0.5) | chi-squared |
"MCD75.RAW" | MCD(0.75) | chi-squared |
"MCD75.HRRAW" | MCD(0.75) | Hardin-Rocke |
"MCD75.HRADJ" | MCD(0.75) | Hardin-Rocke (adj.) |
"RMCD75" | reweighted MCD(0.75) | chi-squared |
"MCD95.RAW" | MCD(0.95) | chi-squared |
"MCD95.HRRAW" | MCD(0.95) | Hardin-Rocke |
"MCD95.HRADJ" | MCD(0.95) | Hardin-Rocke (adj.) |
"RMCD95" | reweighted MCD(0.95) | chi-squared |
The adjusted versions of the Hardin-Rocke tests remove the finite sample correction when the sample size is 100 or greater. (WHY DID WE DO THIS)
The specified values of alpha
correspond to the third
dimension; the dimnames will be of the form “alpha” + alpha
.
This version is deprecated.
Written and maintained by Christopher G. Green <christopher.g.green@gmail.com>
Andrea Cerioli, Marco Riani, and Anthony C. Atkinson. Controlling the size of multivariate outlier tests with the mcd estimator of scatter. Statistical Computing, 19:341-353, 2009.
C. G. Green and R. Douglas Martin. An extension of a method of Hardin and Rocke, with an application to multivariate outlier detection via the IRMCD method of Cerioli. Working Paper, 2014. Available from http://students.washington.edu/cggreen/uwstat/papers/cerioli_extension.pdf
J. Hardin and D. M. Rocke. The distribution of robust distances. Journal of Computational and Graphical Statistics, 14:928-946, 2005.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | ## Not run:
# this runs an experiment
# assumes a cluster
# the vignette provides a better recipe for
# replicating Cerioli et al. (2009)
require( parallel )
require( CerioliOutlierDetection )
require( HardinRockeExtensionSimulations )
# we use a socket cluster on Windows,
# change to your preferred method of
# creating a cluster
thecluster <- makePSOCKcluster(4)
N.SIM <- 500
B.SIM <- 50
# initialize each node
tmp.rv <- clusterEvalQ( cl = thecluster, {
require(abind, quietly=TRUE)
require(rrcov, quietly=TRUE)
require(mvtnorm, quietly=TRUE)
require(CerioliOutlierDetection, quietly=TRUE)
require(HardinRockeExtensionSimulations, quietly=TRUE)
Sys.sleep(30)
invisible(NULL)
})
results <- table1sim.parallel(cl=thecluster, p = 4, nn = 300,
N=500, B=50, lgf=logfile)
stopCluster(thecluster)
# calculate some statistics
apply(results,c(2,3),mean),
apply(results,c(2,3),sd)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.