Description Usage Arguments Details Author(s) References See Also Examples
View source: R/plotFRBmultiregDiag.R
Diagnostic plots for objects of class FRBmultireg
, FRBpca
and FRBhot
. It shows robust distances
and allows detection of multivariate outliers.
1 2 3 4 5 6 7 8 |
x |
an R object of class |
Xdist |
logical: if TRUE, the plot shows the robust distance versus the distance in the space of the explanatory variables; if FALSE, it plots the robust distance versus the index of the observation |
EIF |
logical: if TRUE, the plot shows the robust distance versus an influence measure for each point; if FALSE, it plots the robust distance versus the index of the observation |
... |
potentially more arguments to be passed |
The diagnostic plots are based on the robust distances of the observations. In a multivariate sample X_n={x_1,...,x_n}, the robust distance d_i of observation i is given by d_i^2=(x_i-μ)'Σ^(-1)(x_i-μ). where μ and Σ are robust estimates of location and covariance. Observations with large robust distance are considered as outlying.
The default diagnostic plot in the multivariate regresssion setting (i.e. for objects of type FRBmultireg
and Xdist=TRUE
),
shows the residual distances (i.e. the robust distances of the multivariate residuals) based on the estimates in x
,
versus the distances within the space of the explanatory variables. The latter are based on robust estimates of location and scatter for the
data matrix x$X
(without intercept). Computing these robust estimates may take an appreciable amount of time. The estimator used
corresponds to the one which was used in obtaining Xmultireg
(with the same breakdown point, for example, and the same control parameters).
On the vertical axis a cutoff line is drawn at the square root of the .975 quantile of the chi-squared distribution with degrees of
freedom equal to the number of response variables. On the horizontal axis the same quantile is drawn but now with degrees of freedom
equal to the number of covariates (not including intercept).
Those points to the right of the cutoff can be viewed as high-leverage points. These can be classified into so-called
'bad' or 'good' leverage points depending on whether they are above or below the cutoff. Points above the cutoff but to the
left of the vertical cutoff are sometimes called vertical outliers.
See also Van Aelst and Willems (2005) for example.
To avoid the additional computation time, one can choose Xdist=FALSE
, in which case the residual distances are simply plotted
versus the index of the observation.
The default plot in the context of PCA (i.e. for objects of type FRBpca
and EIF=FALSE
)
is a plot proposed by Pison and Van Aelst (2004). It shows the robust distance versus a measure of the overall empirical influence
of the observation on the (classical) principal components. The empirical influences are obtained by using the influence function of
the eigenvectors of the empirical or classical shape estimator at the normal model, and by
substituting therein the robust estimates for the population parameters.
The overall influence value is then defined by averaging the squared influence
over all coefficients in the eigenvectors.
The vertical line on the plot is an indicative cutoff value, obtained through simulation. This last part takes
a few moments of computation time.
Again, to avoid the additional computation time, one can choose EIF=FALSE
, in which case the robust distances are simply plotted
versus the index of the observation.
For the result of the robust Hotelling test (i.e. for objects of type FRBhot
), the method plots the robust
distance versus the index. In case of a two-sample test, the indices are within-sample and a vertical line separates
the two groups. In the two-sample case, each group has its own location estimate μ and a common
covariance estimate Σ.
Gert Willems and Ella Roelant
G. Pison and S. Van Aelst (2004). Diagnostic Plots for Robust Multivariate Methods. Journal of Computational and Graphical Statistics, 13, 310–329.
S. Van Aelst and G. Willems (2005). Multivariate Regression S-Estimators for Robust Estimation and Inference. Statistica Sinica, 15, 981–1001.
S. Van Aelst and G. Willems (2013). Fast and Robust Bootstrap for Multivariate Inference: The R Package FRB. Journal of Statistical Software, 53(3), 1–32. URL: http://www.jstatsoft.org/v53/i03/.
FRBmultiregS
, FRBmultiregMM
, FRBmultiregGS
, FRBpcaS
, FRBpcaMM
, FRBhotellingS
, FRBhotellingMM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | # for multivariate regression:
data(schooldata)
MMres <- MMest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata)
diagplot(MMres)
# a large 'bad leverage' outlier should be noticeable (observation 59)
# for PCA:
## Not run:
data(ForgedBankNotes)
MMres <- FRBpcaMM(ForgedBankNotes)
diagplot(MMres)
## End(Not run)
# a group of 15 fairly strong outliers can be seen which apparently would have
# a large general influence on a classical PCA analysis
# for Hotelling tests (two-sample)
## Not run:
data(hemophilia)
MMres <- FRBhotellingMM(cbind(AHFactivity,AHFantigen)~gr,data=hemophilia)
diagplot(MMres)
## End(Not run)
# the data seem practically outlier-free
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.