LDA: Perform linear discriminant analysis

Description Usage Arguments Details References


Perform linear discriminant analysis


LDA(formula, data = NULL, subset = NULL, weights = NULL,
  prior = "Observed", missing = "Exclude cases with missing data",
  output = "Means", outcome.color = "#5B9BD5",
  predictors.color = "#ED7D31", variance = "moment", seed = 12321,
  auxiliary.data = NULL, show.labels = FALSE, ...)



A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators, and any transformations, interactions, or other non-additive operators apart from . will be ignored.


A data.frame from which variables specified in formula are preferentially to be taken.


An optional vector specifying a subset of observations to be used in the fitting process, or, the name of a variable in data. It may not be an expression.


An optional vector of sampling weights, or the name of a variable in data. It may not be an expression.


The assumed probability of each value of y occurring in the population. By default this is set to "Observed" and the value is computed based on the observed data. If set to "Equal" the prior will be set to be equal for each group (this is the default in SPSS). Alternatively, a vector of probabilities can be provided.


How missing data is to be treated in the regression. Options: "Error if missing data" "Exclude cases with missing data" "Imputation (replace missing values with estimates)"


One of "Means", "Prediction-Accuracy Table", or "Detail". "Scatterplot", "Moonplot" or "Discriminant Functions".


Color used to display centroids in "Scatterplot" output.


Color used to display variable correlations in "Scatterplot" output.


The method used to estimate the variance; either "moment" for the method of moments or "mle" for maximum likelihood estimaion.


The random number seed used in imputation.


A data.frame containing additional variables to be used in imputation (if required). While adding more variables will improve the quality of the imputation, it will dramatically slow down the time to estimate. Factors and character variables with a large number of categories should not be included, as they will both slow down the data and are unlikely to be useful.


Shows the variable labels, as opposed to the labels, in the outputs, where a variable's label is an attribute (e.g., attr(foo, "label")).


Additional argments to be past to LDA.formula.


Imputation (replace missing values with estimates): All selected outcome and predictor variables are included in the imputation, along with all auxiliary.data, excluding cases that are excluded via subset or have invalid weights, but including cases with missing values of the outcome variable. Then, cases with missing values in the outcome variable are excluded from the analysis (von Hippel 2007). See Imputation.


von Hippel, Paul T. 2007. "Regression With Missing Y's: An Improved Strategy for Analyzing Multiply Imputed Data." Sociological Methodology 37:83-117. White, H. (1980), A heteroskedastic-consistent covariance matrix estimator and a direct test of heteroskedasticity. Econometrica, 48, 817-838. Long, J. S. and Ervin, L. H. (2000). Using heteroscedasticity consistent standard errors in the linear regression model. The American Statistician, 54(3): 217-224.

19900321/flipMultivariates documentation built on May 29, 2019, 8:33 a.m.