rda | R Documentation |
Builds a classification rule using regularized group covariance matrices that are supposed to be more robust against multicollinearity in the data.
rda(x, ...)
## Default S3 method:
rda(x, grouping = NULL, prior = NULL, gamma = NA,
lambda = NA, regularization = c(gamma = gamma, lambda = lambda),
crossval = TRUE, fold = 10, train.fraction = 0.5,
estimate.error = TRUE, output = FALSE, startsimplex = NULL,
max.iter = 100, trafo = TRUE, simAnn = FALSE, schedule = 2,
T.start = 0.1, halflife = 50, zero.temp = 0.01, alpha = 2,
K = 100, ...)
## S3 method for class 'formula'
rda(formula, data, ...)
x |
Matrix or data frame containing the explanatory variables
(required, if |
formula |
Formula of the form ‘ |
data |
A data frame (or matrix) containing the explanatory variables. |
grouping |
(Optional) a vector specifying the class for
each observation; if not specified, the first column of
‘ |
prior |
(Optional) prior probabilities for the classes.
Default: proportional to training sample sizes.
“ |
gamma , lambda , regularization |
One or both of the rda-parameters may be fixed manually. Unspecified parameters are determined by minimizing the estimated error rate (see below). |
crossval |
Logical. If |
fold |
The number of Cross-Validation- or Bootstrap-samples to be drawn. |
train.fraction |
In case of Bootstrapping: the fraction of the data to be used for training in each Bootstrap-sample; the remainder is used to estimate the misclassification rate. |
estimate.error |
Logical. If |
output |
Logical flag to indicate whether text output during computation is desired. |
startsimplex |
(Optional) a starting simplex for the Nelder-Mead-minimization. |
max.iter |
Maximum number of iterations for Nelder-Mead. |
trafo |
Logical; indicates whether minimization is carrried out using transformed parameters. |
simAnn |
Logical; indicates whether Simulated Annealing shall be used. |
schedule |
Annealing schedule 1 or 2 (exponential or polynomial). |
T.start |
Starting temperature for Simulated Annealing. |
halflife |
Number of iterations until temperature is reduced to a half (schedule 1). |
zero.temp |
Temperature at which it is set to zero (schedule 1). |
alpha |
Power of temperature reduction (linear, quadratic, cubic,...) (schedule 2). |
K |
Number of iterations until temperature = 0 (schedule 2). |
... |
currently unused |
J.H. Friedman (see references below) suggested a method to fix
almost singular covariance matrices in discriminant analysis.
Basically, individual covariances as in QDA are used, but
depending on two parameters (\gamma
and
\lambda
), these can be shifted towards a
diagonal matrix and/or the pooled covariance
matrix. For (\gamma=0
, \lambda=0
) it equals QDA,
for (\gamma=0
, \lambda=1
) it equals LDA.
You may fix these parameters at certain values or leave it to
the function to try to find “optimal” values. If one
parameter is given, the other one is determined using the
R-function ‘optimize
’. If no parameter is
given, both are determined numerically by a
Nelder-Mead-(Simplex-)algorithm with the option of using
Simulated Annealing.
The goal function to be minimized is the (estimated)
misclassification rate; the misclassification rate is estimated
either by Cross-Validation or by repeatedly dividing the data
into training- and test-sets (Boostrapping).
Warning: If these sets are small, optimization is expected to produce almost random results. We recommend to adjust the parameters manually in such a case. In all other cases it is recommended to run the optimization several times in order to see whether stable results are gained.
Since the Nelder-Mead-algorithm is actually intended for continuous functions while the observed error rate by its nature is discrete, a greater number of Boostrap-samples might improve the optimization by increasing the smoothness of the response surface (and, of course, by reducing variance and bias). If a set of parameters leads to singular covariance matrices, a penalty term is added to the misclassification rate which will hopefully help to maneuver back out of singularity (so do not worry about error rates greater than one during optimization).
A list of class rda
containing the following
components:
call |
The (matched) function call. |
regularization |
vector containing the two regularization parameters (gamma, lambda) |
classes |
the names of the classes |
prior |
the prior probabilities for the classes |
error.rate |
apparent error rate (if computation was not suppressed), and, if any optimization took place, the final (cross-validated or bootstrapped) error rate estimate as well. |
means |
Group means. |
covariances |
Array of group covariances. |
covpooled |
Pooled covariance. |
converged |
(Logical) indicator of convergence (only for Nelder-Mead). |
iter |
Number of iterations actually performed (only for Nelder-Mead). |
The explicit defintion of \gamma
,
\lambda
and the resulting covariance estimates
is as follows:
The pooled covariance estimate \hat{\Sigma}
is
given as well as the individual covariance estimates
\hat{\Sigma}_k
for each group.
First, using \lambda
, a convex combination of
these two is computed:
\hat{\Sigma}_k (\lambda) := (1-\lambda) \hat{\Sigma}_k + \lambda \hat{\Sigma}.
Then, another convex combination is constructed using the above estimate and a (scaled) identity matrix:
\hat{\Sigma}_k (\lambda,\gamma) = (1-\gamma)\hat{\Sigma}_k(\lambda)+
\gamma\frac{1}{d}\mathrm{tr}[\hat{\Sigma}_k(\lambda)]\mathrm{I}.
The factor
\frac{1}{d}\mathrm{tr}[\hat{\Sigma}_k(\lambda)]
in front of the identity matrix I is the mean of the diagonal
elements of
\hat{\Sigma}_k(\lambda)
, so it is
the mean variance of all d
variables assuming the group
covariance \hat{\Sigma}_k(\lambda)
.
For the four extremes of (\gamma
,\lambda
)
the covariance structure reduces to special cases:
(\gamma=0
, \lambda=0
):
QDA - individual covariance for each group.
(\gamma=0
, \lambda=1
):
LDA - a common covariance matrix.
(\gamma=1
, \lambda=0
):
Conditional independent variables - similar to Naive Bayes,
but variable variances within group (main diagonal elements)
are equal.
(\gamma=1
, \lambda=1
):
Classification using euclidean distance - as in previous case,
but variances are the same for all groups. Objects are assigned
to group with nearest mean.
Christian Röver, roever@statistik.tu-dortmund.de
Friedman, J.H. (1989): Regularized Discriminant Analysis. In: Journal of the American Statistical Association 84, 165-175.
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T. (1992): Numerical Recipes in C. Cambridge: Cambridge University Press.
predict.rda
,
lda
, qda
data(iris)
x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2)
predict(x, iris)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.