# R/nScree.R In nFactors: Parallel Analysis and Other Non Graphical Solutions to the Cattell Scree Test

#' Non Graphical Cattel's Scree Test
#'
#' The \code{nScree} function returns an analysis of the number of component or
#' factors to retain in an exploratory principal component or factor analysis.
#' The function also returns information about the number of components/factors
#' to retain with the Kaiser rule and the parallel analysis.
#'
#' The \code{nScree} function returns an analysis of the number of
#' components/factors to retain in an exploratory principal component or factor
#' analysis. Different solutions are given. The classical ones are the Kaiser
#' rule, the parallel analysis, and the usual scree test
#' (\code{\link{plotuScree}}).  Non graphical solutions to the Cattell
#' subjective scree test are also proposed: an acceleration factor (\emph{af})
#' and the optimal coordinates index \emph{oc}. The acceleration factor
#' indicates where the elbow of the scree plot appears. It corresponds to the
#' acceleration of the curve, i.e. the second derivative.  The optimal
#' coordinates are the extrapolated coordinates of the previous eigenvalue that
#' allow the observed eigenvalue to go beyond this extrapolation. The
#' extrapolation is made by a linear regression using the last eigenvalue
#' coordinates and the \eqn{k+1} eigenvalue coordinates. There are \eqn{k-2}
#' regression lines like this.  The Kaiser rule or a parallel analysis
#' criterion (\code{\link{parallel}}) must also be simultaneously satisfied to
#' retain the components/factors, whether for the acceleration factor, or for
#' the optimal coordinates.
#'
#' If \eqn{\lambda_i} is the \eqn{i^{th}} eigenvalue, and \eqn{LS_i} is a
#' location statistics like the mean or a centile (generally the followings:
#' \eqn{1^{st}, \ 5^{th}, \ 95^{th}, \ or \ 99^{th}}).
#'
#' The Kaiser rule is computed as: \deqn{ n_{Kaiser} = \sum_{i} (\lambda_{i}
#' \ge \bar{\lambda}).} Note that \eqn{\bar{\lambda}} is equal to 1 when a
#' correlation matrix is used.
#'
#' The parallel analysis is computed as: \deqn{n_{parallel} = \sum_{i}
#' (\lambda_{i} \ge LS_i).}
#'
#' The acceleration factor (\eqn{AF}) corresponds to a numerical solution to
#' the elbow of the scree plot: \deqn{n_{AF} \equiv \ If \ \left[ (\lambda_{i}
#' \ge LS_i) \ and \ max(AF_i) \right].}
#'
#' The optimal coordinates (\eqn{OC}) corresponds to an extrapolation of the
#' preceeding eigenvalue by a regression line between the eigenvalue
#' coordinates and the last eigenvalue coordinates: \deqn{n_{OC} = \sum_i
#' \left[(\lambda_i \ge LS_i) \cap (\lambda_i \ge (\lambda_{i \ predicted})
#' \right].}
#'
#'
#' @param eig depreciated parameter (use x instead): eigenvalues to analyse
#' @param x numeric: a \code{vector} of eigenvalues, a \code{matrix} of
#' correlations or of covariances or a \code{data.frame} of data
#' @param aparallel numeric: results of a parallel analysis.  Defaults
#' eigenvalues fixed at \eqn{\lambda >= \bar{\lambda}} (Kaiser and related
#' rule) or \eqn{\lambda >= 0} (CFA analysis)
#' @param cor logical: if \code{TRUE} computes eigenvalues from a correlation
#' matrix, else from a covariance matrix
#' @param model character: \code{"components"} or \code{"factors"}
#' @param criteria numeric: by default fixed at \eqn{\bar{\lambda}}.  When the
#' \eqn{\lambda}s are computed from a principal component analysis on a
#' correlation matrix, it corresponds to the usual Kaiser \eqn{\lambda >= 1}
#' rule. On a covariance matrix or from a factor analysis, it is simply the
#' mean.  To apply \eqn{\lambda >= 0}, sometimes used with factor analysis, fix
#' the criteria to \eqn{0}.
#' @param ...  variabe: additionnal parameters to give to the \code{cor} or
#' \code{cov} functions
#'
#'
#' @return
#' \item{Components }{ Data frame for the number of components/factors
#' according to different rules } \item{Components$noc }{ Number of #' components/factors to retain according to optimal coordinates \emph{oc}} #' \item{Components$naf }{ Number of components/factors to retain according to
#' the acceleration factor \emph{af}} \item{Components$npar.analysis }{Number #' of components/factors to retain according to parallel analysis } #' \item{Components$nkaiser }{ Number of components/factors to retain according
#' to the Kaiser rule } \item{Analysis }{ Data frame of vectors linked to the
#' different rules } \item{Analysis$Eigenvalues }{ Eigenvalues } #' \item{Analysis$Prop }{ Proportion of variance accounted by eigenvalues }
#' \item{Analysis$Cumu }{ Cumulative proportion of variance accounted by #' eigenvalues } \item{Analysis$Par.Analysis }{ Centiles of the random
#' eigenvalues generated by the parallel analysis. } \item{Analysis$Pred.eig }{ #' Predicted eigenvalues by each optimal coordinate regression line } #' \item{Analysis$OC}{ Critical optimal coordinates \emph{oc}}
#' \item{Analysis$Acc.factor }{ Acceleration factor \emph{af}} #' \item{Analysis$AF}{ Critical acceleration factor \emph{af}} Otherwise,
#' returns a summary of the analysis.
#'
#' @author Gilles Raiche \cr Centre sur les Applications des Modeles de
#' Reponses aux Items (CAMRI) \cr Universite du Quebec a Montreal\cr
#' \email{raiche.gilles@@uqam.ca}
#'
#'
#' @references
#' Cattell, R. B. (1966). The scree test for the number of factors.
#' \emph{Multivariate Behavioral Research, 1}, 245-276.
#'
#' Dinno, A. (2009). \emph{Gently clarifying the application of Horn's parallel
#' analysis to principal component analysis versus factor analysis}.  Portland,
#' Oregon: Portland Sate University.
#'
#' Guttman, L. (1954). Some necessary conditions for common factor analysis.
#' \emph{Psychometrika, 19, 149-162}.
#'
#' Horn, J. L. (1965). A rationale for the number of factors in factor
#' analysis.  \emph{Psychometrika, 30}, 179-185.
#'
#' Kaiser, H. F. (1960). The application of electronic computer to factor
#' analysis.  \emph{Educational and Psychological Measurement, 20}, 141-151.
#'
#' Raiche, G., Walls, T. A., Magis, D., Riopel, M. and Blais, J.-G. (2013). Non-graphical solutions
#' for Cattell's scree test. Methodology, 9(1), 23-29.
#'
# #' @family nScree
#' @export
#' @importFrom stats lm coef
#' @keywords multivariate
#' @examples
#'
#' ## INITIALISATION
#'  data(dFactors)                      # Load the nFactors dataset
#'  attach(dFactors)
#'  vect         <- Raiche              # Uses the example from Raiche
#'  eigenvalues  <- vect$eigenvalues # Extracts the observed eigenvalues #' nsubjects <- vect$nsubjects      # Extracts the number of subjects
#'  variables    <- length(eigenvalues) # Computes the number of variables
#'  rep          <- 100                 # Number of replications for PA analysis
#'  cent         <- 0.95                # Centile value of PA analysis
#'
#' ## PARALLEL ANALYSIS (qevpea for the centile criterion, mevpea for the
#' ## mean criterion)
#'  aparallel    <- parallel(var     = variables,
#'                           subject = nsubjects,
#'                           rep     = rep,
#'                           cent    = cent
#'                           )$eigen$qevpea  # The 95 centile
#'
#' ## NUMBER OF FACTORS RETAINED ACCORDING TO DIFFERENT RULES
#'  results      <- nScree(x=eigenvalues, aparallel=aparallel)
#'  results
#'  summary(results)
#'
#' ## PLOT ACCORDING TO THE nScree CLASS
#'  plotnScree(results)
#'
"nScree" <-
function(eig=NULL, x=eig, aparallel = NULL, cor=TRUE, model="components", criteria=NULL, ...) {
# Initialisation
eig        <- eigenComputes(x, cor=cor, model=model, ...)
if (is.null(aparallel)) aparallel <- rep(1,length(eig))  # default to 1 in the diagonal
nk         <- length(eig)
k          <- 1:nk
proportion <- eig/sum(eig)
cumulative <- proportion
if (is.null(criteria)) criteria <- mean(eig)
for (i in 2:nk) cumulative[i] = cumulative[i-1] + proportion[i]
proportion[proportion < 0] <- 0# To constraint negative proportions to be zero
cond1      <- TRUE; cond2 <- TRUE; i <- 0; pred.eig <- af <- rep(NA,nk)
while ((cond1 == TRUE) && (cond2 == TRUE) && (i < nk)) {
i           <- i + 1
ind         <- k[c(i+1,nk)]
#### Optimal coordinate based on the next eigenvalue regression (scree)
vp.p        <- lm(eig[c(i+1,nk)] ~ ind)
vp.prec     <- pred.eig[i] <- sum(c(1,i)* coef(vp.p))
cond1       <- (eig[i] >=  vp.prec)
cond2       <- (eig[i] >= aparallel[i])
nc          <- i-1
}
# Second derivative at the i eigenvalue (acceleration factor, elbow)
# See Yakowitz and Szidarovszky (1986, p. 84)
tag       <- 1
for (j in 2:(nk-1)) {
if (eig[j-1] >= aparallel[j-1]) {
af[j]   <- (eig[j+1] -2* eig[j]) + eig[j-1]
}
}

if (model == "components") p.vec <- which(eig >= aparallel,TRUE) else p.vec <- which((eig-aparallel)>=0 & eig >= criteria)
###if (model == "components") p.vec <- which(eig >= aparallel,TRUE) else p.vec <- which((eig-aparallel)>=0 & eig > 0)
npar      <- sum(p.vec == (1:length(p.vec)))
nkaiser <- sum(eig >= rep(criteria,nk))
#### if (model == "components") nkaiser <- sum(eig >= rep(criteria,nk)) else nkaiser <- sum(eig >= rep(0,nk))
#### if (model == "components") nkaiser <- sum(eig >= rep(1,nk))        else nkaiser <- sum(eig >= rep(mean(eig),nk))
naf       <- which(af == max(af,na.rm=TRUE),TRUE) - 1

# Assure that all the optimal coordinates will be computed
for (i in (nc+1):(nk-2)) {
ind        <- k[c(i+1,nk)]
vp.p       <- lm(eig[c(i+1,nk)] ~ ind)
vp.prec    <- pred.eig[i] <- sum(c(1,i)* coef(vp.p))
}

# Assure that all the acceleration factors will be computed
for (j in 2:(nk-1)) af[j] <- (eig[j+1] - 2 * eig[j]) + eig[j-1]

# Return values by the function
coc        <- rep("",nk); coc[nc]  = "(< OC)"
caf        <- rep("",nk); caf[naf] = "(< AF)"
result     <- (list(Components = data.frame(noc          = nc,
naf          = naf,
nparallel    = npar,
nkaiser      = nkaiser),
Analysis   = data.frame(Eigenvalues  = eig,
Prop         = proportion,
Cumu         = cumulative,
Par.Analysis = aparallel,
Pred.eig     = pred.eig,
OC           = coc,
Acc.factor   = af,
AF           = caf),
Model      = model))

class(result) <- 'nScree'
return(result)
}


## Try the nFactors package in your browser

Any scripts or data that you put into this service are public.

nFactors documentation built on Oct. 10, 2022, 5:07 p.m.