Description Usage Arguments Details Value Author(s) References See Also Examples

`hrSimParallel`

is used to replicate Tables 1–3 of Hardin
and Rocke (2005), page 942.

1 | ```
hrSimParallel(cl, p, n, N, B = 10000, alpha = 0.05, mcd.alpha = max.bdp.mcd.alpha(n, p), lgf = "")
``` |

`cl` |
A cluster object, e.g., returned from |

`p` |
The dimension of the data used in each simulated run. |

`n` |
The number of observations used in each simulated run. |

`N` |
The number of simulations to run. |

`B` |
The batch/block size: the number of simulations to run
in each block. This is useful when running very large
simulation runs ( |

`alpha` |
The significance level to use for detecting outliers. |

`mcd.alpha` |
The fraction of the data to use in computing the MCD. Defaults to the maximum breakdown point fraction. |

`lgf` |
Path to log file into which logging information should be written. |

This is a work function designed for use in replicating Tables 1–3 of Hardin
and Rocke (2005), pages 942-944. Use different values of `alpha`

to
replicate each of the tables.

Internally the simulation function does `B`

runs at a time. Set
`B`

smaller if your machine has less memory.

This function performs the same calculation as `hrSim`

,
but does so using internal parallelism—multiple blocks of size
`B`

are run in parallel.

The function returns a matrix with `N`

rows, one for each simulation,
and at present, 9 columns: each column reports the fraction of observations
in a simulation run that exceeded a given threshold (i.e., were flagged as
outliers).

The first three test Mahalanobis distances (MD) against a chi-squared quantile (prefix is “CHI2”);

the next three test MDs against the asympotic cutoff used in Hardin and Rocke (2005) (prefix is “HRASY”); and

the last three test against the cutoff predicted in Hardin and Rocke (2005) (prefix is “HRPRED”).

Within each group of three, the first entry (suffix “RAW”) uses (raw) MDs without the consistency correction or the small sample correction; the second entry (suffix “CON”) uses (raw) MDs without the small sample correction; and the third entry (suffix “SM”) uses the (raw) MDs with both correction factors. (It was not clear to the package author whether Hardin and Rocke (2005) used these correction factors in their calculations; so all variants were calculated and examined. Empirically, it seems the “CON” approach is the best match for their results.)

Look at the column means of the resulting matrix to see the average fraction of outliers detected (which is an estimate of the Type 1 error rate of the procedure, since the simulated data had no outliers).

The vignette “HardinRocke” provides a detailed example of how to replicate the data in Hardin and Rocke (2005).

Written and maintained by Christopher G. Green <[email protected]>

J. Hardin and D. M. Rocke. The distribution of robust distances. Journal of Computational and Graphical Statistics, 14:928-946, 2005.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ```
## Not run:
# example of how to replicate some of the
# calculations in Hardin and Rocke (2005)
# on Windows you must use socket clusters
# Linux/UNIX supports other types of clusters
#
# Change '4' to reflect the number of
# cores/processors you want to use
require( parallel )
thecluster <- makePSOCKcluster(4)
# initialize each node
tmp.rv <- clusterEvalQ( cl = thecluster, {
require( CerioliOutlierDetection )
require( HardinRockeExtensionSimulations )
invisible(NULL)
})
# compare to Hardin and Rocke, Table 1
results <- hrSimParallel(cl=thecluster, p = 5, n = 500,
N=5000, B=125, lgf="logfile.txt")
colMeans(results)
stopCluster(thecluster)
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.