Description Usage Arguments Details Value Author(s) See Also Examples
Given a set of observations, yai
separates the observations into reference and target observations,
applies the specified method to project Xvariables into a Euclidean space (not
always, see argument method
), and
finds the knearest neighbors within the referenece observations and between the reference and target observations.
An alternative method using randomForest
classification and regression trees is provided for steps 2 and 3.
Target observations are those with values for Xvariables and
not for Yvariables, while reference observations are those
with no missing values for Xand Yvariables (see Details for the
exception).
1 2 3 4 
x 
1) a matrix or data frame containing the Xvariables for all
observations with row names are the identification for the observations, or 2) a
onesided formula defining the Xvariables as a linear formula. If
a formula is coded for 
y 
1) a matrix or data frame containing the Yvariables for the reference observations, or 2) a onesided formula defining the Yvariables as a linear formula. 
data 
when 
k 
the number of nearest neighbors; default is 1. 
noTrgs 
when TRUE, skip finding neighbors for target observations. 
noRefs 
when TRUE, skip finding neighbors for reference observations. 
nVec 
number of canonical vectors to use (methods 
pVal 
significant level for canonical vectors, used when 
method 
is the strategy used for computing distance and therefore for finding neighbors; the options are quoted key words (see details):

ann 
TRUE if 
mtry 
the number of Xvariables picked at random when method is 
ntree 
the number of classification and regression trees when method is 
rfMode 
when 
bootstrap 
if 
ppControl 
used to control how canoncial correlation analysis via projection pursuit is done, see Details. 
sampleVars 
the X and/or Yvariables will be sampled (without replacement) if this is not NULL and greater than zero. If specified as a single unnamed value, that value is used to control the sample size of both X and Y variables. If two unnamed values, then the first is taken for Xvariables and the second for Yvariables. If zero, no sampling is done. Otherwise, values are less than 1.0 they are taken as the proportion of the number of variables. Values greater or equal to 1 are number of variables to be included in the sample. Specification of a large number will cause the sequence of variables to be randomized. 
rfXsubsets 
a named list of character vectors where there is one vector for each
Yvariable, see details, only applies when 
See the paper at http://www.jstatsoft.org/v23/i10 (it includes examples).
The following information is in addition to the content in the papers.
You need not have any Yvariables to run yai for the following methods:
euclidean
, raw
, mahalanobis
, ica
, random
, and
randomForest
(in which case unsupervised classification is
performed). However, normally yai
classifies reference
observations as those with no missing values for X and Y variables and
target observations are those with values for X variables and
missing data for Yvariables. When Y is NULL (there are no Yvariables),
all the observations are considered references. See
newtargets
for an example of how to use yai in this
situation.
When bootstrap=TRUE
the reference observations are sampled with replacement. The
sample size is set to the number of reference observations. Normally, about a third of
the reference observations are left out of the sample; they are often called outofbag
samples. The outofbag observations are then treated as targets.
When method="msnPP"
projection pursuit from ccaPP is used. The method is
further controlled using argument ppControl
to specify a character vector that has
has two named components.
method
One of the following
"spearman", "kendall", "quadrant", "M", "pearson"
, default is "spearman"
search
If "data"
or "proj"
, then ccaProj
is used, otherwise the default ccaGrid
is used.
Here are some details on argument rfXsubsets
. When method="randomForest"
one call to randomForest
is generated for for each Yvariable. When
argument rfXsubsets
is left NULL
, all the Xvariables are used for each of
the Yvariables. However, sometimes better results can be achieved by using specific subsets
of Xvariables for each Yvariable. This is done by setting rfXsubsets
equal
to a named list of character vectors. The names correspond to the Yvariable names and the
character vectors hold the list of Xvariables for the corresponding Yvariable.
An object of class yai
, which is a list with
the following tags:
call 
the call. 
yRefs, xRefs 
matrices of the X and Yvariables for just the reference observations (unscaled). The scale factors are attached as attributes. 
obsDropped 
a list of the row names for observations dropped for various reasons (missing data). 
trgRows 
a list of the row names for target observations as a subset of all observations. 
xall 
the Xvariables for all observations. 
cancor 
returned from cancor function when method 
ccaVegan 
an object of class cca (from package vegan) when method gnn is used. 
ftest 
a list containing partial F statistics and a vector of Pr>F (pgf) corresponding to the canonical correlation coefficients when method msn or msn2 is used (NULL otherwise). 
yScale, xScale 
scale data used on yRefs and xRefs as needed. 
k 
the value of k. 
pVal 
as input; only used when method 
projector 
NULL when not used. For methods 
nVec 
number of canonical vectors used (methods 
method 
as input, the method used. 
ranForest 
a list of the forests if method 
ICA 
a list of information from 
ann 
the value of ann, TRUE when 
xlevels 
NULL if no factors are used as predictors; otherwise a list
of predictors that have factors and their levels (see 
neiDstTrgs 
a matrix of distances between a target (identified by its row name) and the k references. There are k columns. 
neiIdsTrgs 
a matrix of reference identifications that correspond to neiDstTrgs. 
neiDstRefs, neiIdsRefs 
counterparts for references. 
bootstrap 
a vector of reference rownames that constitute the bootstrap sample;
or the value 
Nicholas L. Crookston [email protected]
John Coulston [email protected]
Andrew O. Finley [email protected]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84  require (yaImpute)
data(iris)
# set the random number seed so that example results are consistent
# normally, leave out this command
set.seed(12345)
# form some test data, y's are defined only for reference
# observations.
refs=sample(rownames(iris),50)
x < iris[,1:2] # Sepal.Length Sepal.Width
y < iris[refs,3:4] # Petal.Length Petal.Width
# build yai objects using 2 methods
msn < yai(x=x,y=y)
mal < yai(x=x,y=y,method="mahalanobis")
# compare these results using the generalized mean distances. mal wins!
grmsd(mal,msn)
# use projection pursuit and specify ppControl (loads package ccaPP)
if (require(ccaPP))
{
msnPP < yai(x=x,y=y,method="msnPP",ppControl=c(method="kendall",search="proj"))
grmsd(mal,msnPP,msn)
}
#############
data(MoscowMtStJoe)
# convert polar slope and aspect measurements to cartesian
# (which is the same as Stage's (1976) transformation).
polar < MoscowMtStJoe[,40:41]
polar[,1] < polar[,1]*.01 # slope proportion
polar[,2] < polar[,2]*(pi/180) # aspect radians
cartesian < t(apply(polar,1,function (x)
{return (c(x[1]*cos(x[2]),x[1]*sin(x[2]))) }))
colnames(cartesian) < c("xSlAsp","ySlAsp")
x < cbind(MoscowMtStJoe[,37:39],cartesian,MoscowMtStJoe[,42:64])
y < MoscowMtStJoe[,1:35]
msn < yai(x=x, y=y, method="msn", k=1)
mal < yai(x=x, y=y, method="mahalanobis", k=1)
# the results can be plotted.
plot(mal,vars=yvars(mal)[1:16])
# compare these results using the generalized mean distances..
grmsd(mal,msn)
# try method="gower"
if (require(gower))
{
gow < yai(x=x, y=y, method="gower", k=1)
# compare these results using the generalized mean distances..
grmsd(mal,msn,gow)
}
# try method="randomForest"
if (require(randomForest))
{
# reduce the plant community data for randomForest.
yba < MoscowMtStJoe[,1:17]
ybaB < whatsMax(yba,nbig=7) # see help on whatsMax
rf < yai(x=x, y=ybaB, method="randomForest", k=1)
# build the imputations for the original y's
rforig < impute(rf,ancillaryData=y)
# compare the results using individual rmsd's
compare.yai(mal,msn,rforig)
plot(compare.yai(mal,msn,rforig))
# build another randomForest case forcing regression
# to be used for continuous variables. The answers differ
# but one is not clearly better than the other.
rf2 < yai(x=x, y=ybaB, method="randomForest", rfMode="regression")
rforig2 < impute(rf2,ancillaryData=y)
compare.yai(rforig2,rforig)
}

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.