emve: Extended Minimum Volume Ellipsoid (EMVE) in the presence of...

View source: R/emve.R

emveR Documentation

Extended Minimum Volume Ellipsoid (EMVE) in the presence of missing data

Description

Computes the Extended S-Estimate (ESE) version of the minimum volume ellipsoid (EMVE), which is used as an initial estimator in Generlized S-Estimator (GSE) for missing data by default.

Usage

emve(x, maxits=5, sampling=c("uniform","cluster"), n.resample, n.sub.size, seed) 

Arguments

x

a matrix or data frame. May contain missing values, but cannot contain columns with completely missing entries.

maxits

integer indicating the maximum number of iterations of Gaussian MLE calculation for each subsample. Default is 5.

sampling

which sampling scheme is to use: 'uniform' or 'cluster' (see Leung and Zamar, 2016). Default is 'uniform'.

n.resample

integer indicating the number of subsamples. Default is 15 for clustering-based subsampling and 500 for uniform subsampling.

n.sub.size

integer indicating the sizes of each subsample. Default is 2(p+1)/a for clustering-based subsampling and (p+1)/a for uniform subsampling, where a is proportion of non-missing cells.

seed

optional starting value for random generator. Default is seed = 1000.

Details

This function computes EMVE as described in Danilov et al. (2012). Two subsampling schemes can be used for computing EMVE: uniform subsampling and the clustering-based subsampling as described in Leung and Zamar (2016). For uniform subsampling, the number of subsamples must be large to ensure high breakdown point. For clustering-based subsampling, the number of subsamples can be smaller. The subsample size n_0 must be chosen to be larger than p to avoid singularity.

In the algorithm, there exists a concentration step in which Gaussian MLE is computed for 50\% of the data points using the classical EM-algorithm multiplied by a scalar factor. This step is repeated for each subsample. As the computation can be heavy as the number of subsample increases, we set by default the maximum number of iteration of classical EM-algorithm (i.e. maxits) as 5. Users are encouraged to refer to Danilov et al. (2012) for details about the algorithm and Rubin and Little (2002) for the classical EM-algorithm for missing data.

Value

An S4 object of class emve-class which is a subclass of the virtual class CovRobMissSc-class. The output S4 object contains the following slots:

mu Estimated location. Can be accessed via getLocation.
S Estimated scatter matrix. Can be accessed via getScatter.
sc Estimated EMVE scale. Can be accessed via getScale.
pmd Squared partial Mahalanobis distances. Can be accessed via getDist.
pmd.adj Adjusted squared partial Mahalanobis distances. Can be accessed via getDistAdj.
pu Dimension of the observed entries for each case. Can be accessed via getDim.
call Object of class "language". Not meant to be accessed.
x Input data matrix. Not meant to be accessed.
p Column dimension of input data matrix. Not meant to be accessed.
estimator Character string of the name of the estimator used. Not meant to be accessed.

Author(s)

Andy Leung andy.leung@stat.ubc.ca, Ruben H. Zamar, Mike Danilov, Victor J. Yohai

References

Danilov, M., Yohai, V.J., Zamar, R.H. (2012). Robust Esimation of Multivariate Location and Scatter in the Presence of Missing Data. Journal of the American Statistical Association 107, 1178–1186.

Leung, A. and Zamar, R.H. (2016). Multivariate Location and Scatter Matrix Estimation Under Cellwise and Casewise Contamination. Submitted.

Rubin, D.B. and Little, R.J.A. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley.

See Also

GSE, emve-class


GSE documentation built on Dec. 28, 2022, 1:31 a.m.

Related to emve in GSE...