Nothing
#' Principal Component Analysis
#'
#' @description
#' \code{pca} is used to build and explore a principal component analysis (PCA) model.
#'
#' @param x
#' calibration data (matrix or data frame).
#' @param ncomp
#' maximum number of components to calculate.
#' @param center
#' logical, do mean centering of data or not.
#' @param scale
#' logical, do standardization of data or not.
#' @param exclrows
#' rows to be excluded from calculations (numbers, names or vector with logical values)
#' @param exclcols
#' columns to be excluded from calculations (numbers, names or vector with logical values)
#' @param x.test
#' test data (matrix or data frame).
#' @param method
#' method to compute principal components ("svd", "nipals").
#' @param rand
#' vector with parameters for randomized PCA methods (if NULL, conventional PCA is used instead)
#' @param lim.type
#' which method to use for calculation of critical limits for residual distances (see details)
#' @param alpha
#' significance level for extreme limits for T2 and Q disances.
#' @param gamma
#' significance level for outlier limits for T2 and Q distances.
#' @param info
#' a short text with model description.
#'
#' @details
#'
#' Note, that from v. 0.10.0 cross-validation is no more supported in PCA.
#'
#' If number of components is not specified, a minimum of number of objects - 1 and number of
#' variables in calibration set is used. One can also specified an optimal number of component,
#' once model is calibrated (\code{ncomp.selected}). The optimal number of components is used to
#' build a residuals distance plot, as well as for SIMCA classification.
#'
#' If some of rows of calibration set should be excluded from calculations (e.g. because they are
#' outliers) you can provide row numbers, names, or logical values as parameter \code{exclrows}. In
#' this case they will be completely ignored we model is calibrated. However, score and residuls
#' distances will be computed for these rows as well and then hidden. You can show them
#' on corresponding plots by using parameter \code{show.excluded = TRUE}.
#'
#' It is also possible to exclude selected columns from calculations by provideing parameter
#' \code{exclcols} in form of column numbers, names or logical values. In this case loading matrix
#' will have zeros for these columns. This allows to compute PCA models for selected variables
#' without removing them physically from a dataset.
#'
#' Take into account that if you see other packages to make plots (e.g. ggplot2) you will
#' not be able to distinguish between hidden and normal objects.
#'
#' By default loadings are computed for the original dataset using either SVD or NIPALS algorithm.
#' However, for datasets with large number of rows (e.g. hyperspectral images), there is a
#' possibility to run algorithms based on random permutations [1, 2]. In this case you have
#' to define parameter \code{rand} as a vector with two values: \code{p} - oversampling parameter
#' and \code{k} - number of iterations. Usually \code{rand = c(15, 0)} or \code{rand = c(5, 1)}
#' are good options, which give quite almost precise solution but much faster.
#'
#' There are several ways to calculate critical limits for orthogonal (Q, q) and score (T2, h)
#' distances. In \code{mdatools} you can specify one of the following methods via parameter
#' \code{lim.type}: \code{"jm"} Jackson-Mudholkar approach [3], \code{"chisq"} - method based on
#' chi-square distribution [4], \code{"ddmoments"} and \code{"ddrobust"} - related to data
#' driven method proposed in [5]. The \code{"ddmoments"} is based on method of moments for
#' estimation of distribution parameters (also known as "classical" approach) while
#' \code{"ddrobust"} is based in robust estimation.
#'
#' If \code{lim.type="chisq"} or \code{lim.type="jm"} is used, only limits for Q-distances are
#' computed based on corresponding approach, limits for T2-distances are computed using
#' Hotelling's T-squared distribution. The methods utilizing the data driven approach calculate
#' limits for combination of the distances bases on chi-square distribution and parameters
#' estimated from the calibration data.
#'
#' The critical limits are calculated for a significance level defined by parameter \code{'alpha'}.
#' You can also specify another parameter, \code{'gamma'}, which is used to calculate acceptance
#' limit for outliers (shown as dashed line on residual distance plot).
#'
#' You can also recalculate the limits for existent model by using different values for alpha and
#' gamme, without recomputing the model itself. In this case use the following code (it is assumed
#' that you current PCA/SIMCA model is stored in variable \code{m}):
#' \code{m = setDistanceLimits(m, lim.type, alpha, gamma)}.
#'
#' In case of PCA the critical limits are just shown on residual plot as lines and can be used for
#' detection of extreme objects (solid line) and outliers (dashed line). When PCA model is used for
#' classification in SIMCA (see \code{\link{simca}}) the limits are also employed for
#' classification of objects.
#'
#' @return
#' Returns an object of \code{pca} class with following fields:
#' \item{ncomp }{number of components included to the model.}
#' \item{ncomp.selected }{selected (optimal) number of components.}
#' \item{loadings }{matrix with loading values (nvar x ncomp).}
#' \item{eigenvals }{vector with eigenvalues for all existent components.}
#' \item{expvar }{vector with explained variance for each component (in percent).}
#' \item{cumexpvar }{vector with cumulative explained variance for each component (in percent).}
#' \item{T2lim }{statistical limit for T2 distance.}
#' \item{Qlim }{statistical limit for Q residuals.}
#' \item{info }{information about the model, provided by user when build the model.}
#' \item{calres }{an object of class \code{\link{pcares}} with PCA results for a calibration data.}
#' \item{testres }{an object of class \code{\link{pcares}} with PCA results for a test data, if it
#' was provided.}
#'
#' More details and examples can be found in the Bookdown tutorial.
#'
#' @references
#' 1. N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: probabilistic
#' algorithms for constructing approximate matrix decompositions. SIAM Review, 53 (2010) pp.
#' 217-288.
#'
#' 2. S. Kucheryavskiy, Blessing of randomness against the curse of dimensionality,
#' Journal of Chemometrics, 32 (2018).
#'
#' 3. J.E. Jackson, A User's Guide to Principal Components, John Wiley & Sons, New York, NY (1991).
#'
#' 4. A.L. Pomerantsev, Acceptance areas for multivariate classification derived by projection
#' methods, Journal of Chemometrics, 22 (2008) pp. 601-609.
#'
#' 5. A.L. Pomerantsev, O.Ye. Rodionova, Concept and role of extreme objects in PCA/SIMCA,
#' Journal of Chemometrics, 28 (2014) pp. 429-438.
#'
#' @author
#' Sergey Kucheryavskiy (svkucheryavski@@gmail.com)
#'
#' @seealso
#' Methods for \code{pca} objects:
#' \tabular{ll}{
#' \code{plot.pca} \tab makes an overview of PCA model with four plots.\cr
#' \code{summary.pca} \tab shows some statistics for the model.\cr
#' \code{categorize.pca} \tab categorize data rows as "normal", "extreme" or "outliers".\cr
#' \code{\link{selectCompNum.pca}} \tab set number of optimal components in the model\cr
#' \code{\link{setDistanceLimits.pca}} \tab set critical limits for residuals\cr
#' \code{\link{predict.pca}} \tab applies PCA model to a new data.\cr
#' }
#'
#' Plotting methods for \code{pca} objects:
#' \tabular{ll}{
#' \code{\link{plotScores.pca}} \tab shows scores plot.\cr
#' \code{\link{plotLoadings.pca}} \tab shows loadings plot.\cr
#' \code{\link{plotVariance.pca}} \tab shows explained variance plot.\cr
#' \code{\link{plotCumVariance.pca}} \tab shows cumulative explained variance plot.\cr
#' \code{\link{plotResiduals.pca}} \tab shows plot for residual distances (Q vs. T2).\cr
#' \code{\link{plotBiplot.pca}} \tab shows bi-plot.\cr
#' \code{\link{plotExtreme.pca}} \tab shows extreme plot.\cr
#' \code{\link{plotT2DoF}} \tab plot with degrees of freedom for score distance.\cr
#' \code{\link{plotQDoF}} \tab plot with degrees of freedom for orthogonal distance.\cr
#' \code{\link{plotDistDoF}} \tab plot with degrees of freedom for both distances.\cr
#' }
#'
#' Most of the methods for plotting data are also available for PCA results (\code{\link{pcares}})
#' objects. Also check \code{\link{pca.mvreplace}}, which replaces missing values in a data matrix
#' with approximated using iterative PCA decomposition.
#'
#' @examples
#' library(mdatools)
#' ### Examples for PCA class
#'
#' ## 1. Make PCA model for People data with autoscaling
#'
#' data(people)
#' model = pca(people, scale = TRUE, info = "Simple PCA model")
#' model = selectCompNum(model, 4)
#' summary(model)
#' plot(model, show.labels = TRUE)
#'
#' ## 2. Show scores and loadings plots for the model
#'
#' par(mfrow = c(2, 2))
#' plotScores(model, comp = c(1, 3), show.labels = TRUE)
#' plotScores(model, comp = 2, type = "h", show.labels = TRUE)
#' plotLoadings(model, comp = c(1, 3), show.labels = TRUE)
#' plotLoadings(model, comp = c(1, 2), type = "h", show.labels = TRUE)
#' par(mfrow = c(1, 1))
#'
#' ## 3. Show residual distance and variance plots for the model
#' par(mfrow = c(2, 2))
#' plotVariance(model, type = "h")
#' plotCumVariance(model, show.labels = TRUE, legend.position = "bottomright")
#' plotResiduals(model, show.labels = TRUE)
#' plotResiduals(model, ncomp = 2, show.labels = TRUE)
#' par(mfrow = c(1, 1))
#'
#' @export
pca <- function(x, ncomp = min(nrow(x) - 1, ncol(x), 20), center = TRUE, scale = FALSE,
exclrows = NULL, exclcols = NULL, x.test = NULL, method = "svd", rand = NULL,
lim.type = "ddmoments", alpha = 0.05, gamma = 0.01, info = "") {
# check calibration data and process excluded rows and columns
x <- prepCalData(x, exclrows = exclrows, exclcols = exclcols, min.nrows = 2, min.ncols = 2)
# calibrate model and set distance limits
model <- pca.cal(x, ncomp, center = center, scale = scale, method = method, rand = rand)
model$info <- info
model$call <- match.call()
# apply the model to calibration set
model$res <- list()
model$res[["cal"]] <- predict.pca(model, x)
model$calres <- model$res[["cal"]]
# compute critical limit parameters
model$limParams <- list(
"Q" = ldecomp.getLimParams(model$res[["cal"]]$Q),
"T2" = ldecomp.getLimParams(model$res[["cal"]]$T2)
)
# apply model to test set if provided
if (!is.null(x.test)) {
model$res[["test"]] <- predict.pca(model, x.test)
model$testres <- model$res[["test"]]
}
# set distance limits
model <- setDistanceLimits(model, lim.type = lim.type, alpha = alpha, gamma = gamma)
class(model) <- c("pca")
return(model)
}
#' Select optimal number of components for PCA model
#'
#' @description
#' Allows user to select optimal number of components for a PCA model
#'
#' @param obj
#' PCA model (object of class \code{pca})
#' @param ncomp
#' number of components to select
#' @param ...
#' other parameters if any
#'
#' @return
#' the same model with selected number of components
#'
#' @export
selectCompNum.pca <- function(obj, ncomp, ...) {
if (ncomp < 1 || ncomp > obj$ncomp) {
stop("Wrong number of selected components.")
}
obj$ncomp.selected <- ncomp
obj$res[["cal"]]$ncomp.selected <- ncomp
obj$calres <- obj$res[["cal"]]
if (!is.null(obj$res$test)) {
obj$res[["test"]]$ncomp.selected <- ncomp
obj$testres <- obj$res[["test"]]
}
return(obj)
}
#' Compute and set statistical limits for Q and T2 residual distances.
#'
#' @description
#' Computes statisticsl limits for orthogonal and score distances (based on calibration set)
#' and assign the calculated values as model properties.
#'
#' @param obj
#' object with PCA model
#' @param lim.type
#' type of limits ("jm", "chisq", "ddmoments", "ddrobust")
#' @param alpha
#' significance level for detection of extreme objects
#' @param gamma
#' significance level for detection of outliers (for data driven approach)
#' @param ...
#' other arguments
#'
#' @details
#'
#' The limits can be accessed as fields of model objects: \code{$Qlim} and \code{$T2lim}. Each
#' is a matrix with four rows and \code{ncomp} columns. First row contains critical limits for
#' extremes, second row - for outliers, third row contains mean value for corresponding distance
#' (or its robust estimate in case of \code{lim.type = "ddrobust"}) and last row contains the
#' degrees of freedom.
#'
#' @return
#' Object models with the two fields updated.
#'
#' @export
setDistanceLimits.pca <- function(obj, lim.type = obj$lim.type, alpha = obj$alpha,
gamma = obj$gamma, ...) {
obj$T2lim <- ldecomp.getT2Limits(lim.type, alpha, gamma, obj$limParams)
obj$Qlim <- ldecomp.getQLimits(lim.type, alpha, gamma, obj$limParams,
obj$res[["cal"]]$residuals, obj$eigenvals)
obj$alpha <- alpha
obj$gamma <- gamma
obj$lim.type <- lim.type
attr(obj$res[["cal"]]$Q, "u0") <- obj$Qlim[3, ]
attr(obj$res[["cal"]]$Q, "Nu") <- obj$Qlim[4, ]
attr(obj$res[["cal"]]$T2, "u0") <- obj$T2lim[3, ]
attr(obj$res[["cal"]]$T2, "Nu") <- obj$T2lim[4, ]
obj$calres <- obj$res[["cal"]]
if (!is.null(obj$res$test)) {
attr(obj$res[["test"]]$Q, "u0") <- obj$Qlim[3, ]
attr(obj$res[["test"]]$Q, "Nu") <- obj$Qlim[4, ]
attr(obj$res[["test"]]$T2, "u0") <- obj$T2lim[3, ]
attr(obj$res[["test"]]$T2, "Nu") <- obj$T2lim[4, ]
obj$testres <- obj$res[["test"]]
}
return(obj)
}
#' Probabilities for residual distances
#'
#' @details
#' Computes p-value for every object being from the same populaion as calibration set
#' based on its orthogonal and score distances.
#'
#' @param obj
#' object with PCA model
#' @param ncomp
#' number of components to compute the probability for
#' @param q
#' vector with squared orthogonal distances for given number of components
#' @param h
#' vector with score distances for given number of components
#' @param ...
#' other parameters
#'
#' @export
getProbabilities.pca <- function(obj, ncomp, q, h, ...) {
# if chisq / hotelling
if (obj$lim.type == "chisq") {
nobj <- obj$T2lim[4, ncomp]
return(pmax(chisq.prob(q, obj$Qlim[3:4, ncomp]), hotelling.prob(h, ncomp, nobj)))
}
if (obj$lim.type == "jm") {
nobj <- obj$T2lim[4, ncomp]
return(pmax(jm.prob(q, attr(obj$Qlim, "eigenvals"), ncomp), hotelling.prob(h, ncomp, nobj)))
}
# if data driven
h0 <- obj$T2lim[3, ncomp]
Nh <- round(obj$T2lim[4, ncomp])
q0 <- obj$Qlim[3, ncomp]
Nq <- round(obj$Qlim[4, ncomp])
f <- Nh * h / h0 + Nq * q / q0
return(pchisq(f, Nh + Nq))
}
#' Returns matrix with original calibration data
#'
#' @param obj
#' object with PCA model
#'
#' @export
getCalibrationData.pca <- function(obj) {
x <- obj$res[["cal"]]$scores %*% t(obj$loadings) + obj$res[["cal"]]$residuals
if (!is.logical(obj$scale) && length(obj$scale) == ncol(x)) {
x <- sweep(x, 2, obj$scale, "*")
}
if (!is.logical(obj$center) && length(obj$center) == ncol(x)) {
x <- sweep(x, 2, obj$center, "+")
}
return(x)
}
#' Categorize PCA results based on orthogonal and score distances.
#'
#' @description
#' The method compares score and orthogonal distances of PCA results from \code{res} with
#' critical limits computed for the PCA model and categorizes the corresponding objects as
#' "regular", "extreme" or "outlier".
#'
#' @param obj
#' object with PCA model
#' @param res
#' object with PCA results
#' @param ncomp
#' number of components to use for the categorization
#' @param ...
#' other parameters
#'
#' @details
#' The method does not categorize hidden values if any.
#'
#' @return
#' vector (factor) with results of categorization.
#'
#' @export
categorize.pca <- function(obj, res = obj$res$cal, ncomp = obj$ncomp.selected, ...) {
create_categories <- function(nobj, extremes_ind, outliers_ind) {
categories <- rep(1, nobj)
categories[extremes_ind] <- 2
categories[outliers_ind] <- 3
return(factor(categories, levels = 1:3, labels = c("regular", "extreme", "outlier")))
}
# get distance values for selected number of components
h <- res$T2[, ncomp]
q <- res$Q[, ncomp]
# remove excluded values if any
rows_excluded <- attr(res$T2, "exclrows")
if (length(rows_excluded) > 0) {
h <- h[-rows_excluded]
q <- q[-rows_excluded]
}
nobj <- length(h)
# if chisq / hotelling
if (obj$lim.type %in% c("jm", "chisq")) {
outliers_ind <- (h >= obj$T2lim[2, ncomp] | q >= obj$Qlim[2, ncomp])
extremes_ind <- (h >= obj$T2lim[1, ncomp] | q >= obj$Qlim[1, ncomp])
return(create_categories(nobj, extremes_ind, outliers_ind))
}
# if data driven
h0 <- obj$T2lim[3, ncomp]
Nh <- obj$T2lim[4, ncomp]
q0 <- obj$Qlim[3, ncomp]
Nq <- obj$Qlim[4, ncomp]
f <- Nh * h / h0 + Nq * q / q0
outliers_ind <- f > (obj$T2lim[2, ncomp] * Nh / h0)
extremes_ind <- f > (obj$T2lim[1, ncomp] * Nh / h0)
return(create_categories(nobj, extremes_ind, outliers_ind))
}
#' PCA predictions
#'
#' @description
#' Applies PCA model to a new data set.
#'
#' @param object
#' a PCA model (object of class \code{pca}).
#' @param x
#' data values (matrix or data frame).
#' @param ...
#' other arguments.
#'
#' @return
#' PCA results (an object of class \code{pcares})
#'
#' @export
predict.pca <- function(object, x, ...) {
# check datasets and convert to matrix if needed
attrs <- attributes(x)
rownames <- rownames(x)
x <- prepCalData(x, min.nrows = 1, min.ncols = nrow(object$loadings) - length(attrs$exclcols))
if (ncol(x) != nrow(object$loadings)) {
stop("Number and type of data columns should be the same as in calibration dataset.")
}
# compute scores and residuals
x <- prep.autoscale(x, center = object$center, scale = object$scale)
scores <- x %*% object$loadings
residuals <- x - tcrossprod(scores, object$loadings)
# set names
rownames(scores) <- rownames(residuals) <- rownames
colnames(scores) <- colnames(object$loadings)
colnames(residuals) <- attrs$dimnames[[2]]
# set attributes
scores <- mda.setattr(scores, attrs, "row")
residuals <- mda.setattr(residuals, attrs)
attr(scores, "name") <- "Scores"
attr(scores, "xaxis.name") <- "Components"
attr(residuals, "name") <- "Residuals"
if (is.null(attrs$yaxis.name)) {
attr(scores, "yaxis.name") <- "Objects"
}
# create and return the results object
res <- pcares(scores, object$loadings, residuals, object$eigenvals, object$ncomp.selected)
attr(res$Q, "u0") <- object$Qlim[3, ]
attr(res$Q, "Nu") <- object$Qlim[4, ]
attr(res$T2, "u0") <- object$T2lim[3, ]
attr(res$T2, "Nu") <- object$T2lim[4, ]
return(res)
}
#' Print method for PCA model object
#'
#' @description
#' Prints information about the object structure
#'
#' @param x
#' a PCA model (object of class \code{pca})
#' @param ...
#' other arguments
#'
#' @export
print.pca <- function(x, ...) {
cat("\nPCA model (class pca)\n")
if (length(x$info) > 1) {
cat("\nInfo:\n")
cat(x$info)
}
cat("\n\nCall:\n")
print(x$call)
cat("\nMajor model fields:\n")
cat("$loadings - matrix with loadings\n")
cat("$eigenvals - eigenvalues for components\n")
cat("$ncomp - number of calculated components\n")
cat("$ncomp.selected - number of selected components\n")
cat("$center - values for centering data\n")
cat("$scale - values for scaling data\n")
cat("$alpha - significance level for critical limits\n")
cat("$gamma - significance level for outlier limits\n")
cat("$Qlim - critical values and parameters for orthogonal distances\n")
cat("$T2lim - critical values and parameters for score distances\n")
cat("$res - list with model results (calibration, test)\n")
}
#' Summary method for PCA model object
#'
#' @description
#' Shows some statistics (explained variance, eigenvalues) for the model.
#'
#' @param object
#' a PCA model (object of class \code{pca})
#' @param ...
#' other arguments
#'
#' @export
summary.pca <- function(object, ...) {
cat("\nSummary for PCA model (class pca)\n")
if (length(object$info) > 1) {
fprintf("\nInfo:\n%s\n", object$info)
}
if (!is.null(object$rand)) {
fprintf("\nParameters for randomized algorithm: q = %d, p = %d\n",
object$rand[1], object$rand[2])
}
if (length(object$exclrows) > 0) {
fprintf("Excluded rows: %d\n", length(object$exclrows))
}
if (length(object$exclcols) > 0) {
fprintf("Excluded coumns: %d\n", length(object$exclcols))
}
fprintf("Type of limits: %s\n", object$lim.type)
fprintf("Alpha: %s\n", object$alpha)
fprintf("Gamma: %s\n", object$gamma)
cat("\n")
data <- cbind(
round(object$eigenvals, 3),
round(object$calres$expvar, 2),
round(object$calres$cumexpvar, 2),
object$Qlim[4, ],
object$T2lim[4, ]
)
colnames(data) <- c("Eigenvals", "Expvar", "Cumexpvar", "Nq", "Nh")
rownames(data) <- colnames(object$loadings)
show(data)
}
################################
# Static methods #
################################
#' Replace missing values in data
#'
#' @description
#' \code{pca.mvreplace} is used to replace missing values in a data matrix with
#' approximated by iterative PCA decomposition.
#'
#' @param x
#' a matrix with data, containing missing values.
#' @param center
#' logical, do centering of data values or not.
#' @param scale
#' logical, do standardization of data values or not.
#' @param maxncomp
#' maximum number of components in PCA model.
#' @param expvarlim
#' minimum amount of variance, explained by chosen components (used for selection of optimal number
#' of components in PCA models).
#' @param covlim
#' convergence criterion.
#' @param maxiter
#' maximum number of iterations if convergence criterion is not met.
#'
#' @details
#' The function uses iterative PCA modeling of the data to approximate and impute missing values.
#' The result is most optimal for data sets with low or moderate level of noise and with number of
#' missing values less than 10\% for small dataset and up to 20\% for large data.
#'
#' @return
#' Returns the same matrix \code{x} where missing values are replaced with approximated.
#'
#' @references
#' Philip R.C. Nelson, Paul A. Taylor, John F. MacGregor. Missing data methods in PCA and PLS:
#' Score calculations with incomplete observations. Chemometrics and Intelligent Laboratory
#' Systems, 35 (1), 1996.
#'
#' @author
#' Sergey Kucheryavskiy (svkucheryavski@@gmail.com)
#'
#' @examples
#' library(mdatools)
#'
#' ## A very simple example of imputing missing values in a data with no noise
#'
#' # generate a matrix with values
#' s = 1:6
#' odata = cbind(s, 2*s, 4*s)
#'
#' # make a matrix with missing values
#' mdata = odata
#' mdata[5, 2] = mdata[2, 3] = NA
#'
#' # replace missing values with approximated
#' rdata = pca.mvreplace(mdata, scale = TRUE)
#'
#' # show all matrices together
#' show(cbind(odata, mdata, round(rdata, 2)))
#'
#' @export
pca.mvreplace <- function(x, center = TRUE, scale = FALSE, maxncomp = 10, expvarlim = 0.95,
covlim = 10^-6, maxiter = 100) {
# save original values and indices of missing ones
xo <- x
mvidx <- is.na(x)
# check if any column has more than 20% values
nmv_cols <- colSums(mvidx) / nrow(x)
if (any(nmv_cols > 0.2)) {
stop("Some of columns have more than 20% missing values.")
}
# make initial estimates with mean values for each column
col_means <- matrix(apply(xo, 2, mean, na.rm = TRUE), byrow = TRUE, nrow(x), ncol(x))
xo[mvidx] <- col_means[mvidx]
# autoscale
xo <- scale(xo, center = center, scale = scale)
if (scale) {
gsd <- attr(xo, "scaled:scale")
}
if (center) {
gmean <- attr(xo, "scaled:center")
}
n <- 1
scoresp <- 0
scores <- 1
cond <- 1
maxncomp <- min(maxncomp, nrow(x) - 1, ncol(x))
x <- xo
while (cond > covlim && n < maxiter) {
# recenter data on every iteration
x <- scale(x, center = TRUE, scale = FALSE)
lmean <- attr(x, "scaled:center")
# compute PCA decomposition annd find number of components
res <- pca.svd(x, maxncomp)
expvar <- cumsum(res$eigenvals / sum(res$eigenvals))
ncomp <- min(which(expvar >= expvarlim), maxncomp)
# correct number of components for border cases
if (ncomp == length(expvar)) ncomp <- ncomp - 1
if (ncomp == 0) ncomp <- 1
# get and trancate scores and loadings and reestimate the values
scoresp <- scores
loadings <- res$loadings[, seq_len(ncomp)]
scores <- x %*% loadings
x_new <- tcrossprod(scores, loadings)
# remove centering
x_new <- sweep(x_new, 2L, lmean, "+", check.margin = FALSE)
# replace missing values by the calculated
x <- xo
x[mvidx] <- x_new[mvidx]
if (n > 2) {
# calculate difference between scores for convergence
ncompcond <- min(ncol(scores), ncol(scoresp))
cond <- sum((scores[, seq_len(ncompcond)] - scoresp[, seq_len(ncompcond)])^2)
}
n <- n + 1
}
# rescale the data back and return
if (scale) {
x <- sweep(x, 2L, gsd, "*", check.margin = FALSE)
}
if (center) {
x <- sweep(x, 2L, gmean, "+", check.margin = FALSE)
}
attr(x, "scaled:center") <- NULL
attr(x, "scaled:scale") <- NULL
return(x)
}
#' Low-dimensional approximation of data matrix X
#'
#' @param X
#' data matrix
#' @param k
#' rank of X (number of components)
#' @param rand
#' a vector with two values - number of iterations (q) and oversmapling parameter (p)
#' @param dist
#' distribution for generating random numbers, 'unif' or 'norm'
#'
#' @import stats
pca.getB <- function(X, k = NULL, rand = NULL, dist = "unif") {
if (is.null(rand)) {
return(X)
}
ncols <- ncol(X)
q <- rand[1]
p <- rand[2]
k <- if (is.null(k)) ncols else 2 * k
l <- k + p
Y <- if (dist == "unif")
X %*% matrix(runif(ncols * l, -1, 1), ncols, l)
else
X %*% matrix(rnorm(ncols * l), ncols, l)
Q <- qr.Q(qr(Y))
if (q > 0) {
for (i in seq_len(q)) {
Y <- crossprod(X, Q)
Q <- qr.Q(qr(Y))
Y <- X %*% Q
Q <- qr.Q(qr(Y))
}
}
return(crossprod(Q, X))
}
#' Singular Values Decomposition based PCA algorithm
#'
#' @description
#' Computes principal component space using Singular Values Decomposition
#'
#' @param x
#' a matrix with data values (preprocessed)
#' @param ncomp
#' number of components to calculate
#'
#' @return
#' a list with scores, loadings and eigenvalues for the components
#'
pca.svd <- function(x, ncomp = min(ncol(x), nrow(x) - 1)) {
s <- svd(x, nu = ncomp, nv = ncomp)
lambda <- s$d[seq_len(ncomp)]
return(
list(
loadings = s$v,
scores = s$u %*% diag(lambda, ncomp, ncomp),
eigenvals = lambda^2 / (nrow(x) - 1),
ncomp = ncomp
)
)
}
#' NIPALS based PCA algorithm
#'
#' @description
#' Calculates principal component space using non-linear iterative partial least squares algorithm
#' (NIPALS)
#'
#' @param x
#' a matrix with data values (preprocessed)
#' @param ncomp
#' number of components to calculate
#' @param tol
#' tolerance (if difference in eigenvalues is smaller - convergence achieved)
#'
#' @return
#' a list with scores, loadings and eigenvalues for the components
#'
#' @references
#' Geladi, Paul; Kowalski, Bruce (1986), "Partial Least Squares
#' Regression:A Tutorial", Analytica Chimica Acta 185: 1-17
#'
pca.nipals <- function(x, ncomp = min(ncol(x), nrow(x) - 1), tol = 10^-10) {
nobj <- nrow(x)
nvar <- ncol(x)
if (ncomp < 1 || ncomp > min(nobj - 1, nvar)) {
stop("Wrong number of components")
}
scores <- matrix(0, nrow = nobj, ncol = ncomp)
loadings <- matrix(0, nrow = nvar, ncol = ncomp)
E <- x
for (i in seq_len(ncomp)) {
ind <- which.max(apply(E, 2, sd))
t <- E[, ind, drop = FALSE]
tau <- th <- 99999999
while (th > tol * tau) {
p <- crossprod(E, t) / as.vector(crossprod(t))
p <- p / as.vector(crossprod(p)) ^ 0.5
t <- (E %*% p) / as.vector(crossprod(p))
th <- abs(tau - as.vector(crossprod(t)))
tau <- as.vector(crossprod(t))
}
E <- E - tcrossprod(t, p)
scores[, i] <- t
loadings[, i] <- p
}
return(
list(
loadings = loadings,
scores = scores,
eigenvals = colSums(scores^2) / (nobj - 1)
)
)
}
#' Runs one of the selected PCA methods
#'
#' @param x
#' data matrix
#' @param ncomp
#' number of components
#' @param method
#' name of PCA methods ('svd', 'nipals')
#' @param rand
#' parameters for randomized algorithm (if not NULL)
#'
#' @export
pca.run <- function(x, ncomp, method, rand = NULL) {
# define which function to use depending on method name
f <- switch(
method,
"svd" = pca.svd,
"nipals" = pca.nipals,
stop("Wrong value for PCA method!")
)
# use proper function to compute scores, loadings and eigenvalues
res <- f(pca.getB(x, k = ncomp, rand = rand), ncomp)
# recompute scores and eigenvalues if randomized algorithm was used
if (!is.null(rand)) {
res$scores <- x %*% res$loadings
res$eigenvals <- colSums(res$scores^2) / (nrow(x) - 1)
}
# compute and add residuals
res$residuals <- x - tcrossprod(res$scores, res$loadings)
return(res)
}
#' PCA model calibration
#'
#' @description
#' Calibrates (builds) a PCA model for given data and parameters
#'
#' @param x
#' matrix with data values
#' @param ncomp
#' number of principal components to calculate
#' @param center
#' logical, do mean centering or not
#' @param scale
#' logical, do standardization or not
#' @param method
#' algorithm for compiting PC space (only 'svd' and 'nipals' are supported so far)
#' @param rand
#' vector with parameters for randomized PCA methods (if NULL, conventional PCA is used instead)
#'
#' @return
#' an object with calibrated PCA model
#'
pca.cal <- function(x, ncomp, center, scale, method, rand = NULL) {
# check if data has missing values
if (any(is.na(x))) {
stop("Data has missing values, try to fix this using pca.mvreplace.")
}
# prepare empty list for model object
model <- list()
# save data attributes
attrs <- mda.getattr(x)
# convert data to a matrix
x <- mda.df2mat(x)
nrows_full <- nrow(x)
ncols_full <- ncol(x)
# correct maximum number of components
ncols <- ncols_full - length(attrs$exclcols)
nrows <- nrows_full - length(attrs$exclrows)
# make sure that ncomp is correct
ncomp <- min(ncomp, ncols, nrows - 1)
# prepare data for model calibration and cross-validation
x_cal <- mda.purgeRows(x)
# autoscale and save the mean and std values for predictions
x_cal <- prep.autoscale(x_cal, center = center, scale = scale)
model$center <- attr(x_cal, "prep:center")
model$scale <- attr(x_cal, "prep:scale")
# remove excluded columns
x_cal <- mda.purgeCols(x_cal)
# compute loadings, scores and eigenvalues for data without excluded elements
res <- pca.run(x_cal, ncomp, method, rand)
# correct loadings for missing columns in x
# corresponding rows in loadings will be set to 0 and excluded
if (length(attrs$exclcols) > 0) {
loadings <- matrix(0, nrow = ncols_full, ncol = ncomp)
loadings[-attrs$exclcols, ] <- res$loadings
loadings <- mda.exclrows(loadings, attrs$exclcols)
} else {
loadings <- res$loadings
}
if (is.null(dim(loadings))) {
loadings <- matrix(loadings, ncol = ncomp)
}
if (is.null(attrs$xaxis.name)) {
attrs$xaxis.name <- "Variables"
}
# set names and attributes for the loadings
rownames(loadings) <- colnames(x)
colnames(loadings) <- paste("Comp", seq_len(ncol(loadings)))
attr(loadings, "name") <- "Loadings"
attr(loadings, "xaxis.name") <- "Components"
attr(loadings, "yaxis.name") <- attrs$xaxis.name
attr(loadings, "yaxis.values") <- attrs$xaxis.values
model$loadings <- loadings
# organize eigen values
model$eigenvals <- res$eigenvals
names(model$eigenvals) <- colnames(loadings)
# finalize model
model$method <- method
model$rand <- rand
# setup other fields and return the model
model$ncomp <- ncomp
model$ncomp.selected <- model$ncomp
# save excluded columns and rows
model$exclcols <- attrs$exclcols
model$exclrows <- attrs$exclrows
class(model) <- "pca"
return(model)
}
################################
# Plotting methods #
################################
#' Explained variance plot for PCA model
#'
#' @description
#' Shows a plot with explained variance or cumulative explained variance for components.
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param type
#' type of the plot ("b", "l", "h")
#' @param labels
#' what to use as labels (if \code{show.labels = TRUE})
#' @param variance
#' which variance to show
#' @param xticks
#' vector with ticks for x-axis
#' @param res
#' list with result objects to show the variance for
#' @param ylab
#' label for y-axis
#' @param ...
#' other plot parameters (see \code{mdaplotg} for details)
#'
#' @details
#' See examples in help for \code{\link{pca}} function.
#'
#' @export
plotVariance.pca <- function(obj, type = "b", labels = "values", variance = "expvar",
xticks = seq_len(obj$ncomp), res = obj$res, ylab = "Explained variance, %", ...) {
res <- getRes(res, "ldecomp")
plot_data <- lapply(res, plotVariance, variance = variance, show.plot = FALSE)
mdaplotg(plot_data, xticks = xticks, labels = labels, type = type, ylab = ylab, ...)
}
#' Cumulative explained variance plot for PCA model
#'
#' @description
#' Shows a plot with cumulative explained variance for components.
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param legend.position
#' position of the legend
#' @param ...
#' other plot parameters (see \code{mdaplotg} for details)
#'
#' @details
#' See examples in help for \code{\link{pca}} function.
#'
#' @export
plotCumVariance.pca <- function(obj, legend.position = "bottomright", ...) {
plotVariance(obj, variance = "cumexpvar", legend.position = legend.position, ...)
}
#' Scores plot for PCA model
#'
#' @description
#' Shows a scores plot for selected components.
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param comp
#' a value or vector with several values - number of components to show the plot for
#' @param type
#' type of the plot ("p", "l", "b", "h")
#' @param show.axes
#' logical, show or not a axes lines crossing origin (0,0)
#' @param show.legend
#' logical, show or not a legend on the plot
#' @param res
#' list with result objects to show the variance for
#' @param ...
#' other plot parameters (see \code{mdaplotg} for details)
#'
#' @details
#' If plot is created only for one result object (e.g. calibration set), then the behaviour and
#' all settings for the scores plot are identical to \code{\link{plotScores.ldecomp}}. In this case
#' you can show scores as a scatter, line or bar plot for any number of components.
#'
#' Otherwise (e.g. if model contains results for calibration and test set) the plot is a group
#' plot created using \code{\link{mdaplotg}} method and only scatter plot can be used.
#'
#' See examples in help for \code{\link{pca}} function.
#'
#' @export
plotScores.pca <- function(obj, comp = if (obj$ncomp > 1) c(1, 2) else 1, type = "p", show.axes = TRUE,
show.legend = TRUE, res = obj$res, ...) {
if (min(comp) < 1 || max(comp) > ncol(obj$loadings)) {
stop("Wrong values for 'comp' parameter.")
}
res <- getRes(res, "ldecomp")
if (length(res) == 1) {
return(plotScores(res[[1]], comp = comp, type = type, ...))
}
if (type != "p") {
stop("You have several result objects in model, only scatter plot is available for scores.")
}
# set up values for showing axes lines
show.lines <- FALSE
if (show.axes) {
show.lines <- if (length(comp) == 2 && type == "p") c(0, 0) else c(NA, 0)
}
plot_data <- lapply(res, plotScores, comp = comp, type = type, show.plot = FALSE)
mdaplotg(plot_data, type = type, show.lines = show.lines, show.legend = show.legend, ...)
}
#' Residuals distance plot for PCA model
#'
#' @description
#' Shows a plot with score (T2, h) vs orthogonal (Q, q) distances and corresponding critical
#' limits for given number of components.
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param ncomp
#' how many components to use (by default optimal value selected for the model will be used)
#' @param log
#' logical, apply log tranformation to the distances or not (see details)
#' @param norm
#' logical, normalize distance values or not (see details)
#' @param cgroup
#' color grouping of plot points (works only if one result object is available)
#' @param xlim
#' limits for x-axis
#' @param ylim
#' limits for y-axis
#' @param show.legend
#' logical, show or not a legend on the plot (needed if several result objects are available)
#' @param show.limits
#' logical, show or not lines/curves with critical limits for the distances
#' @param lim.col
#' vector with two values - line color for extreme and outlier limits
#' @param lim.lwd
#' vector with two values - line width for extreme and outlier limits
#' @param lim.lty
#' vector with two values - line type for extreme and outlier limits
#' @param res
#' list with result objects to show the plot for (by defaul, model results are used)
#' @param ...
#' other plot parameters (see \code{mdaplotg} for details)
#'
#' @details
#' The function is a bit more advanced version of \code{\link{plotResiduals.ldecomp}}. It allows to
#' show distance values for several result objects (e.g. calibration and test set or calibration
#' and new prediction set) as well as display the correspondng critical limits in form of lines
#' or curves.
#'
#' Depending on how many result objects your model has or how many you specified manually,
#' using the \code{res} parameter, the plot behaves in a bit different way.
#'
#' If only one result object is provided, then it allows to colorise the points using \code{cgroup}
#' parameter. If you specify \code{cgroup = "categories"} then it will show points as three groups:
#' normal, extreme and outliers. If two or more result objects are provided, then the function show
#' distances in groups, and adds corresponding legend.
#'
#' The function can show distance values normalised (h/h0 and q/q0) as well as with log
#' transformation (log(1 + h/h0), log(1 + q/q0)). The latter is useful if distribution of the
#' points is skewed and most of them are densely located around bottom left corner.
#'
#' See examples in help for \code{\link{pca}} function.
#'
#' @export
plotResiduals.pca <- function(obj, ncomp = obj$ncomp.selected, log = FALSE,
norm = TRUE, cgroup = NULL, xlim = NULL, ylim = NULL, show.limits = TRUE,
lim.col = c("darkgray", "darkgray"), lim.lwd = c(1, 1), lim.lty = c(2, 3),
res = obj$res, show.legend = TRUE, ...) {
# generate values for cgroup if categories should be used
if (length(cgroup) == 1 && cgroup == "categories") {
cgroup <- categorize(obj, res[[1]], ncomp = ncomp)
}
ldecomp.plotResiduals(res, obj$Qlim, obj$T2lim, ncomp = ncomp, log = log, norm = norm,
cgroup = cgroup, xlim = xlim, ylim = ylim, show.limits = show.limits, lim.col = lim.col,
lim.lwd = lim.lwd, show.legend = show.legend, ...)
}
#' Loadings plot for PCA model
#'
#' @description
#' Shows a loadings plot for selected components.
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param comp
#' a value or vector with several values - number of components to show the plot for
#' @param type
#' type of the plot ('b', 'l', 'h')
#' @param show.legend
#' logical, show or not a legend on the plot
#' @param show.axes
#' logical, show or not a axes lines crossing origin (0,0)
#' @param ...
#' other plot parameters (see \code{mdaplotg} for details)
#'
#' @details
#' See examples in help for \code{\link{pca}} function.
#'
#' @export
plotLoadings.pca <- function(obj, comp = if (obj$ncomp > 1) c(1, 2) else 1,
type = (if (length(comp == 2)) "p" else "l"),
show.legend = TRUE, show.axes = TRUE, ...) {
if (min(comp) < 1 || max(comp) > ncol(obj$loadings)) {
stop("Wrong values for 'comp' parameter.")
}
plot_data <- mda.subset(obj$loadings, select = comp)
colnames(plot_data) <- paste0("Comp ", comp, " (", round(obj$res[["cal"]]$expvar[comp], 2), "%)")
attr(plot_data, "name") <- "Loadings"
# set up values for showing axes lines
show.lines <- FALSE
if (show.axes) {
show.lines <- if (length(comp) == 2 && type == "p") c(0, 0) else c(NA, 0)
}
if (type == "p") {
return(mdaplot(plot_data, type = type, show.lines = show.lines, ...))
}
plot_data <- mda.t(plot_data)
attr(plot_data, "yaxis.name") <- "Loading"
mdaplotg(plot_data, show.legend = show.legend, type = type, show.lines = show.lines, ...)
}
#' PCA biplot
#'
#' @description
#' Shows a biplot for selected components.
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param comp
#' a value or vector with several values - number of components to show the plot for
#' @param pch
#' a vector with two values - markers for scores and loadings
#' @param col
#' a vector with two colors for scores and loadings
#' @param main
#' main title for the plot
#' @param lty
#' line type for loadings
#' @param lwd
#' line width for loadings
#' @param show.labels
#' logical, show or not labels for the plot objects
#' @param show.axes
#' logical, show or not a axes lines crossing origin (0,0)
#' @param show.excluded
#' logical, show or hide rows marked as excluded (attribute `exclrows`)
#' @param lab.col
#' a vector with two colors for scores and loadings labels
#' @param ...
#' other plot parameters (see \code{mdaplotg} for details)
#' @export
plotBiplot.pca <- function(obj, comp = c(1, 2), pch = c(16, NA), col = mdaplot.getColors(2),
main = "Biplot", lty = 1, lwd = 1, show.labels = FALSE, show.axes = TRUE,
show.excluded = FALSE, lab.col = adjustcolor(col, alpha.f = 0.5), ...) {
if (length(comp) != 2) {
stop("Biplot can be made only for two principal components!")
}
show.lines <- if (show.axes) c(0, 0) else FALSE
loadings <- mda.subset(obj$loadings, select = comp)
scores <- mda.subset(obj$calres$scores, select = comp)
attrs <- mda.getattr(scores)
loadsScaleFactor <- sqrt(max(rowSums(loadings^2)))
scoresScaleFactor <- max(abs(scores))
scores <- (scores / scoresScaleFactor) * loadsScaleFactor
scores <- mda.setattr(scores, attrs)
colnames(scores) <- paste0("Comp ", comp, " (", round(obj$res[["cal"]]$expvar[comp], 2), "%)")
mdaplotg(list(scores = scores, loadings = loadings), type = "p", pch = pch,
show.legend = FALSE, show.labels = show.labels, lab.col = lab.col,
main = main, colmap = col, show.lines = show.lines, show.excluded = show.excluded, ...)
if (show.excluded && length(attr(loadings, "exclrows")) > 0) {
loadings <- loadings[-attr(loadings, "exclrows"), , drop = FALSE]
}
segments(0, 0, loadings[, 1], loadings[, 2], col = col[2], lty = lty, lwd = lwd)
}
#' Degrees of freedom plot for score distance (Nh)
#'
#' @description
#' Shows a plot with degrees of freedom computed for score distances at given number
#' of components using data driven approach ("ddmoments" or "ddrobust").
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param type
#' type of the plot ("b", "l", "h")
#' @param labels
#' what to show as data points labels
#' @param xticks
#' vector with tick values for x-axis
#' @param ylab
#' label for y-axis
#' @param ...
#' other plot parameters (see \code{mdaplotg} for details)
#'
#' @details
#' Work only if parameter \code{lim.type} equal to "ddmoments" or "ddrobust".
#'
#' @export
plotT2DoF <- function(obj, type = "b", labels = "values", xticks = seq_len(obj$ncomp), ylab = "Nh", ...) {
if (!(obj$lim.type %in% c("ddrobust", "ddmoments", "chisq"))) {
stop("This plot can not be made for selected 'lim.type' method.")
}
plot_data <- mda.subset(obj$T2lim, subset = 4)
attr(plot_data, "name") <- "Degrees of freedom"
attr(plot_data, "xaxis.name") <- attr(obj$loadings, "xaxis.name")
mdaplot(plot_data, xticks = xticks, labels = labels, type = type, ylab = ylab, ...)
}
#' Degrees of freedom plot for orthogonal distance (Nh)
#'
#' @description
#' Shows a plot with degrees of freedom computed for score distances at given number
#' of components using data driven approach ("ddmoments" or "ddrobust").
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param type
#' type of the plot ("b", "l", "h")
#' @param labels
#' what to show as data points labels
#' @param xticks
#' vector with tick values for x-axis
#' @param ylab
#' label for y-axis
#' @param ...
#' other plot parameters (see \code{mdaplotg} for details)
#'
#' @details
#' Work only if parameter \code{lim.type} equal to "ddmoments" or "ddrobust".
#'
#' @export
plotQDoF <- function(obj, type = "b", labels = "values", xticks = seq_len(obj$ncomp), ylab = "Nq", ...) {
if (!(obj$lim.type %in% c("ddrobust", "ddmoments", "chisq"))) {
stop("This plot can not be made for selected 'lim.type' method.")
}
plot_data <- mda.subset(obj$Qlim, subset = 4)
attr(plot_data, "name") <- "Degrees of freedom"
attr(plot_data, "xaxis.name") <- attr(obj$loadings, "xaxis.name")
mdaplot(plot_data, type = type, labels = labels, xticks = xticks, ylab = ylab, ...)
}
#' Degrees of freedom plot for both distances
#'
#' @description
#' Shows a plot with degrees of freedom computed for score and orthogonal distances at given number
#' of components using data driven approach ("ddmoments" or "ddrobust").
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param type
#' type of the plot ("b", "l", "h")
#' @param labels
#' what to show as data points labels
#' @param xticks
#' vector with tick values for x-axis
#' @param ...
#' other plot parameters (see \code{mdaplotg} for details)
#'
#' @details
#' Work only if parameter \code{lim.type} equal to "ddmoments" or "ddrobust".
#'
#' @export
plotDistDoF <- function(obj, type = "b", labels = "values", xticks = seq_len(obj$ncomp), ...) {
if (!(obj$lim.type %in% c("ddrobust", "ddmoments", "chisq"))) {
stop("This plot can not be made for selected 'lim.type' method.")
}
plot_data <- rbind(
mda.subset(obj$T2lim, subset = 4),
mda.subset(obj$Qlim, subset = 4)
)
rownames(plot_data) <- c("Nh", "Nq")
attr(plot_data, "xaxis.name") <- attr(obj$loadings, "xaxis.name")
attr(plot_data, "name") <- "Degrees of freedom"
mdaplotyy(plot_data, type = type, labels = labels, xticks = xticks, ...)
}
#' Extreme plot
#'
#' @description
#' Shows a plot with number of expected vs. number of observed extreme objects for different
#' significance levels (alpha values)
#'
#' @param obj
#' a PCA model (object of class \code{pca})
#' @param res
#' object with PCA results to show the plot for (e.g. calibration, test, etc)
#' @param comp
#' vector, number of components to show the plot for
#' @param main
#' plot title
#' @param xlab
#' label for x-axis
#' @param ylab
#' label for y-axis
#' @param pch
#' vector with values for \code{pch} parameter for each number of components
#' @param col
#' vector with color values for series of points
#' @param bg
#' vector with background color values for series of points (if pch=21:25)
#' @param lwd
#' line width for point symbols
#' @param cex
#' scale factor for data points
#' @param ellipse.col
#' color for tolerance ellipse
#' @param legend.position
#' position of the legend
#' @param ...
#' other arguments
#'
#' @export
plotExtreme.pca <- function(obj, res = obj$res[["cal"]], comp = obj$ncomp.selected,
main = "Extreme plot", xlab = "Expected", ylab = "Observed", pch = rep(21, length(comp)),
bg = mdaplot.getColors(length(comp)), col = rep("white", length(comp)),
lwd = ifelse(pch %in% 21:25, 0.25, 1), cex = rep(1.2, length(comp)),
ellipse.col = "#cceeff", legend.position = "bottomright", ...) {
if (min(comp) < 1 || max(comp) > obj$ncomp) {
stop("Wrong value for parameter 'ncomp'.")
}
if (is.null(res$T2)) {
stop("Wrong value for 'res' parameter.")
}
# remove excluded values if any
T2 <- res$T2
Q <- res$Q
rows_excluded <- attr(res$T2, "exclrows")
if (length(rows_excluded) > 0) {
T2 <- T2[-rows_excluded, , drop = FALSE]
Q <- Q[-rows_excluded, , drop = FALSE]
}
nobj <- nrow(T2)
expected <- seq_len(nobj)
# show axes, grid and diagonal
par(mar = c(5, 4, 4, 2) + 0.1)
plot(0, type = "n", xlim = c(0, nobj), ylim = c(0, nobj), main = main, xlab = xlab, ylab = ylab)
grid()
lines(c(0, expected), c(0, expected), type = "l", col = ellipse.col)
# compute and show the tolerance ellipse
i <- 1:nobj
alpha <- i / nobj
D <- 2 * sqrt(i * (1 - alpha))
Nm <- i - D
Np <- i + D
segments(expected, Nm, expected, Np, col = ellipse.col)
lines(c(0, expected), c(0, Nm), col = ellipse.col)
lines(c(0, expected), c(0, Np), col = ellipse.col)
# check length of main plot parameters and correct if necessary
ncomp <- length(comp)
correct.param <- function(param) if (length(param) == 1) rep(param, ncomp) else param
pch <- correct.param(pch)
col <- correct.param(col)
cex <- correct.param(cex)
lwd <- correct.param(lwd)
bg <- correct.param(bg)
# show the plints
alpha_mat <- matrix(alpha, byrow = TRUE, ncol = nobj, nrow = nobj)
for (j in seq_along(comp)) {
p <- getProbabilities.pca(obj, comp[j], Q[, comp[j]], T2[, comp[j]])
p_mat <- matrix((1 - p), ncol = nobj, nrow = nobj)
observed <- colSums(p_mat < alpha_mat)
points(expected, observed, pch = pch[j], lwd = lwd[j], cex = cex[j], col = col[j],
bg = bg[j])
}
legend <- paste0(comp, " PC", ifelse(comp > 1, "s", ""))
mdaplotg.showLegend(legend, pt.bg = bg, lty = NA, col = col, pch = pch, lwd = lwd, cex = cex,
position = legend.position)
}
#' Model overview plot for PCA
#'
#' @description
#' Shows a set of plots (scores, loadings, residuals and explained variance) for PCA model.
#'
#' @param x
#' a PCA model (object of class \code{pca})
#' @param comp
#' vector with two values - number of components to show the scores and loadings plots for
#' @param ncomp
#' number of components to show the residuals plot for
#' @param show.labels
#' logical, show or not labels for the plot objects
#' @param show.legend
#' logical, show or not a legend on the plot
#' @param ...
#' other arguments
#'
#' @details
#' See examples in help for \code{\link{pca}} function.
#'
#' @export
plot.pca <- function(x, comp = c(1, 2), ncomp = x$ncomp.selected,
show.labels = FALSE, show.legend = TRUE, ...) {
par(mfrow = c(2, 2))
plotScores(x, comp = comp, show.labels = show.labels, show.legend = show.legend)
plotLoadings(x, comp = comp, show.labels = show.labels, show.legend = show.legend)
plotResiduals(x, ncomp = ncomp, show.labels = show.labels, show.legend = show.legend)
plotCumVariance(x, show.legend = show.legend)
par(mfrow = c(1, 1))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.