Description Usage Arguments Details Value Author(s) See Also Examples
Computes the root mean square distance between predicted and corresponding
observed values in an orthogonal multivariate space. This value is the mean
Mahalanobis distance between observed and imputed values in a space defined by
observations and variables were observed and predicted values are defined.
The statistic provides a way to compare imputation (or prediction) results.
While it is designed to work with imputation, the function can be used with objects
that inherit from lm
or with matrices and data frames that
follow the column naming convention described in the details.
1 
... 
objects created by any combination of

ancillaryData 
a data frame that defines variables, passed to

vars 
a list of variable names you want to include; if NULL all available
variables are included (note that if codeimpute.yai the
Yvariables are returned when 
wts 
A vector of weights used to compute the mean distances, see details below. 
rtnVectors 
The vectors of individual distances are returned (see Value) rather than the mean Mahalanobis distance. 
This function is designed to compute the root mean square distance between observed
and predicted observations over several variables at once. It is the Mahalanobis
distance between observed and predicted but the name emphasizes the similarities
to root mean square difference (or error, see rmsd
).
Here are some notable characteristics.
In the univariate case this function returns the same value as
rmsd
with scale=TRUE
. In that case
the root mean square difference is computed after scale
has been called on the variable.
Like rmsd
, grmsd
is zero if the imputed values are
exactly the same as the observed values over all variables.
Like rmsd
, grmsd
is ~1.0 when the mean of each
variable is imputed in place of a near neighbor (it would be exactly 1.0 if
the maximum likelihood estimate of the covariance were used rather than
the unbiased estimate – it approaches 1 as n gets large.)
This situation corresponds to regression where the slope is zero.
It indicates that the imputation error is, over all, the same as it
would be if the means of the variables were imputed rather than near
neighbors (it does not signal that the means were imputed).
Like rmsd
, values of grmsd > 1.0 indicate that, on average,
the errors in the imputation are greater than they would be if the mean
of the corresponding variables were imputed for each observation.
Note that individual rmsd
values can be computed even when
the variance of the variable is zero. In contrast, grmsd
can
only be computed in the situation where the observed data matrix is full rank.
Rank is determined using qr
and columns are removed from the
analysis to create this condition if necessary (with a warning).
Observed and predicted are matched using the column names. Column names
that have ".o
" are matched to those that do not. Columns that do not
have matching observed and imputed (predicted) values are ignored.
Several objects may be passed as "...". Function impute.yai
is
called for any objects that were created by yai
;
ancillaryData
and vars
are passed to impute.yai
when it is used.
When objects inherit from lm
, a suitable matrix is formed using
by calling the predict
and resid
functions.
Factors, if found, are removed (with a warning).
When argument wts
is defined there must be one value for each pair of
observed and predicted variables. If the values are named (preferred), then
the names are matched to the names of predicted variables (no .o
suffix).
The matched values effectively scale the axes in which distances are computed.
When this is done, the resulting distances are not Mahalanobis distances.
When rtnVectors=FALSE
, a sorted named vector of mean distances
is returned; the names are taken from the arguments.
When rtnVectors=TRUE
the function returns vectors of distances, sorted and
named as done wnen this argument is FALSE.
Nicholas L. Crookston [email protected]
yai
, impute.yai
, rmsd.yai
,
notablyDifferent
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49  require(yaImpute)
data(iris)
set.seed(12345)
# form some test data
refs=sample(rownames(iris),50)
x < iris[,1:2] # Sepal.Length Sepal.Width
y < iris[refs,3:4] # Petal.Length Petal.Width
# build yai objects using 2 methods
msn < yai(x=x,y=y)
mal < yai(x=x,y=y,method="mahalanobis")
# compute the average distances between observed and imputed (predicted)
grmsd(msn,mal,lmFit=lm(as.matrix(y) ~ ., data=x[refs,]))
# use the all variables and observations in iris
# Species is a factor and is automatically deleted with a warning
grmsd(msn,mal,ancillaryData=iris)
# here is an example using lm, and another using column
# means as predictions.
impMean < y
colnames(impMean) < paste0(colnames(impMean),".o")
impMean < cbind(impMean,y)
# set the predictions to the mean's of the variables
impMean[,"Petal.Length"] < mean(impMean[,"Petal.Length"])
impMean[,"Petal.Width"] < mean(impMean[,"Petal.Width"])
grmsd(msn, mal, lmFit=lm(as.matrix(y) ~ ., data=x[refs,]), impMean )
# compare to using function rmsd (values match):
msnimp < na.omit(impute(msn))
grmsd(msnimp[,c("Petal.Length","Petal.Length.o")])
rmsd(msnimp[,c("Petal.Length","Petal.Length.o")],scale=TRUE)
# these are multivariate cases and they don't match
# because the covariance of the two variables is > 0.
grmsd(msnimp)
colSums(rmsd(msnimp,scale=TRUE))/2
# get the vectors and make a boxplot, identify outliers
stats < boxplot(grmsd(msn,mal,ancillaryData=iris[,5],rtnVectors=TRUE),
ylab="Mahalanobis distance")
stats$out
# 118 132
#2.231373 1.990961

lmFit mal msn
0.7208731 1.0072846 1.2464372
mal msn
0.8645804 1.1095280
Warning messages:
1: In grmsd(msn, mal, ancillaryData = iris) :
factor(s) have been removed from msn: Species
2: In grmsd(msn, mal, ancillaryData = iris) :
factor(s) have been removed from mal: Species
lmFit impMean mal msn
0.7208731 0.9899495 1.0072846 1.2464372
msnimp[, c("Petal.Length", "Petal.Length.o")]
0.5196872
rmsdS
Petal.Length 0.5196872
msnimp
1.246437
rmsdS
0.7030801
118 132
2.231373 1.990961
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.