R/ldaPlus.R
In multiUS: Functions for the Courses Multivariate Analysis and Computer Intensive Methods

Documented in ldaPlus

#' Linear discriminant analysis
#'
#' @description The function performs a linear discriminant analysis (by using the \code{MASS::lda} function).
#' Compared to the \code{MASS::lda} function, the \code{ldaPlus}
#' function enable to consider the prior probabilities to predict the values of a categorical variable, it
#' provides with predicted values and with (Jack-knife) classification table and also with statistical test of canonical correlations
#' between the variable that represents groups and numeric variables.
#' @param x A data frame with values of numeric variables.
#' @param grouping Categorical variable that defines groups.
#' @param pred Whether to return the predicted values based on the model. Default is \code{TRUE}.
#' @param CV Whether to do cross-validation in addition to "ordinary" analysis, default is \code{TRUE}.
#' @param usePriorBetweenGroups Whether to use prior probabilities also in estimating the model (compared to only in prediction); default is \code{TRUE}.
#' @param \dots Arguments passed to function \code{MASS::lda}.
#' @return The following objects are also a part of what is returned by the \code{MASS::lda} function.
#' \itemize{
#' \item \code{prior} - Prior probabilities of class membership taken to estimate the model (it can be estimated based on the sample data or it can be provided by a reseacher).
#' \item \code{counts} - Number of units in each category of categorical variable taken to estimate the model.
#' \item \code{means} - Group means.
#' \item \code{scaling} - Matrix that transforms observations to discriminant functions, normalized so that within groups covariance matrix is spherical.
#' \item \code{lev} - Levels (groups) of the categorical variable.
#' \item \code{svd} - Singular values, that give the ratio of the between-group and within-group standard deviations on linear discriminant variables. Their squares are the canonical F-statistics.
#' \item \code{N} - Number of observations used.
#' \item \code{call} - 	the (matched) function call.
#' }
#' The additional following objects are generated by the \code{multiUS::ldaPlus} function.
#' \itemize{
#' \item \code{standCoefWithin} - Standardized coefficients (within groups) of discriminant function.
#' \item \code{standCoefTotal} - Standardized coefficients of discriminant function.
#' \item \code{betweenGroupsWeights} - Proportions/priors used when estimating the model.
#' \item \code{sigTest} - Test of canonical correlations between the variable that represent groups (binary variable) and numeric variables (see function \code{testCC} for more details) (Ho: The current and all the later canonical correlations equal to zero.).
#' \item \code{eigModel} - Table with eigenvalues and canonical correlations (see function \code{testCC} for more details).
#' \item \code{centroids} - Means of discriminant variables by levels of categorical variable (not predicted, but actual).
#' \item \code{corr} - Pooled correlations within groups (correlations between values of numerical variables and values of linear discriminat function(s)).
#' \item \code{pred}
#' \itemize{
#' \item \code{class} - Predicted values of categorical variable
#' \item \code{posterior} - Posterior probabilities (the values of the Fisher's calcification linear discrimination function)
#' \item \code{x} - Estimated values of discriminat function(s) for each unit
#' }
#' \item \code{class} - Classification table:
#' \itemize{
#' \item \code{orgTab} - Frequency table.
#' \item \code{perTab} - Percentages.
#' \item \code{corPer} - Percentage of correctly predicted values (alternatively, percentage of correctly classified units).
#' }
#' \item \code{classCV} - Similar to \code{class} but based on cross validation (Jack-knife).
#' }
#' @examples
#' ldaPlus(x = mtcars[,c(1, 3, 4, 5, 6)], grouping = mtcars[,10])
#' @details The specified \code{prior} is not taken into account when computing eigenvalues and all statistics based on them (everything in components \code{eigModel} and \code{sigTest} of the returned value).
#' @author Aleš Žiberna
#' @references
#' R Data Analysis Examples: Canonical Correlation Analysis, UCLA: Statistical Consulting Group. From http://www.ats.ucla.edu/stat/r/dae/canonical.htm (accessed Decembar 27, 2013).
#' @export

ldaPlus <- function(x,grouping,pred=TRUE, CV=TRUE,usePriorBetweenGroups=TRUE,...){
  call<-match.call()
  #if(!require(MASS)) stop("MASS Package is required but not installed!")
  LDA<-lda(x=x,grouping=grouping, CV=FALSE,usePriorBetweenGroups=usePriorBetweenGroups,...)
  LDA$call<-call
  n<-dim(x)[1]
  p<-dim(x)[2]
  dummy<-stats::model.matrix(~ grouping)[,-1,drop=FALSE]
  colnames(dummy)<-sub(pattern="^grouping",replacement="",x=colnames(dummy))
  q<-dim(dummy)[2]

  if(!is.null(list(...)$prior) & usePriorBetweenGroups) warning("The specified prior is not taken into account when computing eigenvalues and all statistics based on them (everything in components 'eigModel' and 'sigTest' of the returned value).")
  cca<-stats::cancor(x=x,y=dummy)
  tmp<-testCC(cor=cca$cor, n=n, p=p, q=q)

  res<-c(LDA,tmp)
  class(res)<-class(LDA)

  if(pred){
    res$pred<-predict.ldaPlus(LDA)

    res$centroids<-stats::aggregate(res$pred$x,by = list(grouping),FUN=mean)
    rownames(res$centroids)<-res$centroids[,1]
    res$centroids<-res$centroids[,-1]
    tmp<-by(cbind(x,res$pred$x),INDICES=grouping,FUN=stats::cov,simplify=TRUE)
    tmp<-simplify2array(tmp)
    d1<-dim(tmp)[1]

    tmp<-sweep(tmp,MARGIN = 3,STATS = res$counts-1,FUN = "*")
    covw<-apply(tmp,MARGIN = 1:2,FUN = sum)/(res$N - length(res$counts))
    res$corr<-stats::cov2cor(covw)[-((p+1):d1),((p+1):d1),drop=FALSE]


    tab<-table(orig=grouping,pred=res$pred$class)
    res$class<-list(
      orgTab=stats::addmargins(tab,margin = 1:2),
      # % of correct predictions by groups
      perTab =  stats::addmargins(prop.table(tab,1)*100,margin=2),
      # % of correct predictions
      corPer=sum(diag(prop.table(tab)))*100
    )
  }

  if(CV){
    cv<-lda(x=x,grouping=grouping, CV=TRUE,usePriorBetweenGroups=usePriorBetweenGroups,...)
    tab<-table(orig=grouping,pred=cv$class)
    res$classCV<-list(
      orgTab=stats::addmargins(tab,margin = 1:2),
      # % of correct predictions by groups
      perTab =  stats::addmargins(prop.table(tab,1)*100,margin=2),
      # % of correct predictions
      corPer=sum(diag(prop.table(tab)))*100
    )
  }
  return(res)
}

Any scripts or data that you put into this service are public.

multiUS documentation built on Jan. 23, 2023, 5:40 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

multiUS
Functions for the Courses Multivariate Analysis and Computer Intensive Methods

R/ldaPlus.R
In multiUS: Functions for the Courses Multivariate Analysis and Computer Intensive Methods

Defines functions ldaPlus

Documented in ldaPlus

Try the multiUS package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

multiUS Functions for the Courses Multivariate Analysis and Computer Intensive Methods

R/ldaPlus.R In multiUS: Functions for the Courses Multivariate Analysis and Computer Intensive Methods

Defines functions ldaPlus

Documented in ldaPlus

Try the multiUS package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

multiUS
Functions for the Courses Multivariate Analysis and Computer Intensive Methods

R/ldaPlus.R
In multiUS: Functions for the Courses Multivariate Analysis and Computer Intensive Methods