LDA: Perform linear discriminant analysis

Description Usage Arguments Details References

Description

Perform linear discriminant analysis

Usage

1
2
3
4
5
LDA(formula, data = NULL, subset = NULL, weights = NULL,
  prior = "Observed", missing = "Exclude cases with missing data",
  output = "Means", outcome.color = "#5B9BD5",
  predictors.color = "#ED7D31", variance = "moment", seed = 12321,
  auxiliary.data = NULL, show.labels = FALSE, ...)

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators, and any transformations, interactions, or other non-additive operators apart from . will be ignored.

data

A data.frame from which variables specified in formula are preferentially to be taken.

subset

An optional vector specifying a subset of observations to be used in the fitting process, or, the name of a variable in data. It may not be an expression.

weights

An optional vector of sampling weights, or the name of a variable in data. It may not be an expression.

prior

The assumed probability of each value of y occurring in the population. By default this is set to "Observed" and the value is computed based on the observed data. If set to "Equal" the prior will be set to be equal for each group (this is the default in SPSS). Alternatively, a vector of probabilities can be provided.

missing

How missing data is to be treated in the regression. Options: "Error if missing data" "Exclude cases with missing data" "Imputation (replace missing values with estimates)"

output

One of "Means", "Prediction-Accuracy Table", or "Detail". "Scatterplot", "Moonplot" or "Discriminant Functions".

outcome.color

Color used to display centroids in "Scatterplot" output.

predictors.color

Color used to display variable correlations in "Scatterplot" output.

variance

The method used to estimate the variance; either "moment" for the method of moments or "mle" for maximum likelihood estimaion.

seed

The random number seed used in imputation.

auxiliary.data

A data.frame containing additional variables to be used in imputation (if required). While adding more variables will improve the quality of the imputation, it will dramatically slow down the time to estimate. Factors and character variables with a large number of categories should not be included, as they will both slow down the data and are unlikely to be useful.

show.labels

Shows the variable labels, as opposed to the labels, in the outputs, where a variable's label is an attribute (e.g., attr(foo, "label")).

...

Additional argments to be past to LDA.formula.

Details

Imputation (replace missing values with estimates): All selected outcome and predictor variables are included in the imputation, along with all auxiliary.data, excluding cases that are excluded via subset or have invalid weights, but including cases with missing values of the outcome variable. Then, cases with missing values in the outcome variable are excluded from the analysis (von Hippel 2007). See Imputation.

References

von Hippel, Paul T. 2007. "Regression With Missing Y's: An Improved Strategy for Analyzing Multiply Imputed Data." Sociological Methodology 37:83-117. White, H. (1980), A heteroskedastic-consistent covariance matrix estimator and a direct test of heteroskedasticity. Econometrica, 48, 817-838. Long, J. S. and Ervin, L. H. (2000). Using heteroscedasticity consistent standard errors in the linear regression model. The American Statistician, 54(3): 217-224.


19900321/flipMultivariates documentation built on May 29, 2019, 8:33 a.m.