R/PLS_fit.R

Defines functions pls_major_axis summary.pls_fit pls

Documented in pls pls_major_axis

#' Partial least squares (PLS) analysis
#'
#' Performs a two-block PLS analysis, optionally allowing
#' for tests of significance using permutations
#'
#' This function performs a PLS analysis (sensu Rohlf & Corti 2000).
#' Given two blocks of variables (shape or other variables) scored on
#' the same observations (specimens), this analysis finds a series of
#' pairs of axis accounting for maximal covariance between the two
#' blocks.
#' If tests of significance with permutations are selected, three
#' different significance tests are performed:
#'  \describe{
#'   \item{Global significance}{Tested using Escoufier RV}
#'   \item{Axis-specific significance (singular value)}{Same test
#'   described in Rohlf & Corti 2000}
#'   \item{Axis-specific significance (correlation of PLS scores)}{For
#'   each pair of PLS (singular) axes, tests the correlation between
#'   scores of the first and second block}
#'  }
#' The object of class pls_fit returned by the function has print()
#' and summary() methods associated to it.
#' This means that using these generic functions on an object created
#' by this function (see examples), it is possible to obtain
#' information on the results.
#' In particular, print() returns a more basic set of results on the
#' global association, whereas summary() returns (only if permutation
#' tests are used) results for each pair of singular axes.
#'
#' Optionally, analysis using multiple cores to speedup permutation testin
#' can be performed using the future framework (requires package future.apply
#' and setting a plan with multiple workers)
#'
#'
#' @section Notice:
#' \itemize{
#'   \item The function does NOT perform GPA when applied to separate
#'   configurations of points.
#'   \item When using the Escoufier RV, notice that the value reported
#'   is the observed value without rarefaction. For a description of
#'   the problem, please see Fruciano et al 2013. To obtain rarefied
#'   estimates of Escoufier RV and their confidence interval, use the
#'   function RVrarefied.
#'   \item In the permutation test, rows of Y are permuted, so using
#'   the block with fewer variables as Y may speed up computations and
#'   substantially reduce memory usage.
#'   \item When using the print() and summary() on the pls_fit objects
#'   obtained with this function, some of the values are rounded for
#'   ease of interpretation. The non-rounded values can be obtained
#'   accessing individual elements of the object (see examples).
#' }
#'
#' @section Citation:
#' If you use this function to perform the PLS analysis and test for
#' significance, cite Rohlf & Corti 2000 (or earlier references
#' outside of geometric morphometrics).
#' If you report the test of significance based on the Escoufier RV
#' coefficient, also cite Escoufier 1975.
#' If you also use the major axis approach to obtain estimates of the
#' shape (or other variable) predicted by each pair of axes, please
#' cite Fruciano et al. 2020
#'
#' @param X,Y Matrices or data frames
#' containing each block of variables
#' (observations in rows, variables in columns).
#' @param perm number of permutations to use for hypothesis testing
#' @param global_RV_test logical - whether global significance of the
#'   association should be tested
#'
#' @seealso \code{\link{RVrarefied}}
#' @return The function outputs an object of class "pls_fit" and
#'   "list" with the following elements:
#'  \describe{
#'   \item{XScores}{Scores along each singular (PLS) axis for the
#'   first block of variables (X)}
#'   \item{YScores}{Scores along each singular (PLS) axis for the
#'   second block of variables (Y)}
#'   \item{U}{Left singular axes}
#'   \item{V}{Right singular axes}
#'   \item{D}{Singular values}
#'   \item{percentage_squared_covariance}{Percented squared covariance
#'   accounted by each pair of axes}
#'   \item{global_significance_RV}{(only if perm>0 and global_RV_test
#'   is TRUE) Observed value of Escoufier RV coefficient and p value
#'   obtained from the permutation test}
#'   \item{singular_axis_significance}{(only if perm>0) For each pair
#'   of singular (PLS) axis, the singular value, the correlation
#'   between scores, their significance level based on permutation,
#'   and the proportion of squared covariance accounted are reported}
#'   \item{OriginalData}{Data used in the analysis}
#'   \item{x_center}{Values used to center data in the X block}
#'   \item{y_center}{Values used to center data in the Y block}
#'}
#'
#' @references Escoufier Y. 1973. Le Traitement des Variables
#' Vectorielles. Biometrics 29:751-760.
#' @references Fruciano C, Colangelo P, Castiglia R, Franchini P.
#' 2020. Does divergence from normal patterns of integration increase
#' as chromosomal fusions increase in number? A test on a house mouse
#' hybrid zone. Current Zoology 66:527–538.
#' @references Rohlf FJ, Corti M. 2000. Use of Two-Block Partial
#' Least-Squares to Study Covariation in Shape. Systematic Biology
#' 49:740-753.
#' @examples
#'
#' ##############################
#' ### Example 1: random data ###
#' ##############################
#'
#' library(MASS)
#' set.seed(123)
#' A=as.data.frame(mvrnorm(100,mu=rep(0,20), Sigma=diag(20)))
#' B=as.data.frame(mvrnorm(100,mu=rep(0,10), Sigma=diag(10)))
#' # Create two blocks of, respectively, 20 and 10 variables
#' # for 100 observations.
#' # This simulates two different blocks of data (shape or otherwise)
#' # measured on the same individuals.
#' # Note that, as we are simulating them independently,
#' # we don't expect substantial covariation between blocks
#'
#' PLS_AB=pls(A, B, perm=99)
#' # Perform PLS analysis and use 99 permutations for testing
#' # (notice that in a real analysis, normally one uses more permutations)
#' print(PLS_AB)
#' # As expected, we do not find significant covariation between the
#' # two blocks
#'
#' summary(PLS_AB)
#' # The same happens when we look at the results for each of the axes
#'
#' # Notice that both for print() and summary() some values are
#' # rounded for ease of visualization
#' # However, the correct values can be always obtained from the
#' # object created by the function
#' # e.g.,
#' PLS_AB$singular_axis_significance
#'
#'
#' ######################################
#' ### Example 2: using the classical ###
#' ### iris data set as a toy example ###
#' ######################################
#'
#' data(iris)
#'# Import the iris dataset
#' set.seed(123)
#'
#'versicolor_data=iris[iris$Species=="versicolor",]
#'# Select only the specimens belonging to the species Iris versicolor
#'versicolor_sepal=versicolor_data[,grep(
#'  "Sepal", colnames(versicolor_data)
#')]
#'versicolor_petal=versicolor_data[,grep(
#'  "Petal", colnames(versicolor_data)
#')]
#'# Separate sepal and petal data
#'PLS_sepal_petal=pls(versicolor_sepal, versicolor_petal, perm=99)
#'# Perform PLS with permutation test
#'# (again, chosen few permutations)
#'
#'print(PLS_sepal_petal)
#'summary(PLS_sepal_petal)
#'# Global results and results for each axis
#'# (suggesting significant association)
#'
#'
#' @export
pls=function(X, Y, perm=999, global_RV_test=TRUE) {

  X = as.matrix(X)
  Y = as.matrix(Y)
  # Convert to matrix, ensuring 2D

  if (is.null(dim(X))) X = matrix(X, ncol=1)
  if (is.null(dim(Y))) Y = matrix(Y, ncol=1)
  # If X or Y is a vector, convert it to a single-column matrix

  if (nrow(X) != nrow(Y)) {
    stop("X and Y must have the same number of rows")
  }
  if (ncol(X) == 0 || ncol(Y) == 0) {
    stop("X and Y must have at least one column")
  }


  if (!(length(intersect(class(X), c("matrix", "data.frame")))==1 &&
        length(intersect(class(Y), c("matrix", "data.frame")))==1)) {
    stop(
      "X and Y should be matrices or data.frames ",
      "(observations in rows, variables in columns)"
    )
  }
  if (nrow(X)!=nrow(Y)) {
    stop(
      "X and Y should have the same number of rows ",
      "(different variables/shape scored on the same individuals)"
    )
  }
  # Checks of class and consistent number of rows

  if (perm>0) {
    Results=pls_perm(X, Y, perm=perm, global_RV_test=global_RV_test)
  } else {
    Results=pls_base(X, Y)
  }
return(Results)
}
# Wrapper for using pls_base or pls_perm, depending on the presence
# of permutations


#' @export
print.pls_fit = function (x, ...) {
  cat("Two-block PLS analysis")
  cat("\n")
  cat("======================")
  cat("\n")
  if ("global_significance_RV" %in% names(x)) {
   cat(
     "Global significance based on permuting the Escoufier RV ",
     "coefficient"
   )
    cat("\n")
    cat(paste0(
      "Observed RV=",
      round(x$global_significance_RV["observed_RV"], 2)
    ))
    cat("\n")
    cat(paste0("p value= ", x$global_significance_RV["RV_p_value"]))
    cat("\n")
    cat("Note that the observed RV is not rarefied")
    cat("\n")
    cat("\n")
    cat("For axis-specific results, please use summary()")
    cat("\n")
  } else {
    if ("percentage_squared_covariance" %in% names (x)) {
      Res=data.frame(Axis=seq_along(x$D),
                     percentage_squared_covariance=
                       x$percentage_squared_covariance,
                     singular_value=x$D,
                     correlation_between_scores=x$CorrXScoresYScores)

    } else {

    Res=data.frame(Axis=seq_along(x$D),
                   percentage_squared_covariance=
                     x$singular_axis_significance$percentage_squared_covariance,
                   singular_value=x$D,
                   correlation_between_scores=x$CorrXScoresYScores)
    }
    print(Res)
  }

  cat("======================")

}

# print method for S3 class pls_fit
#
#' @export
summary.pls_fit = function(object, ...) {
  if ("global_significance_RV" %in% names(object)) {
  cat("Two-block PLS analysis")
  cat("\n")
  cat("======================")
  cat("\n")

    cat(
      "Global significance based on permuting the Escoufier RV ",
      "coefficient"
    )
    cat("\n")
    cat(paste0(
      "Observed RV=",
      round(object$global_significance_RV["observed_RV"], 2)
    ))
    cat("\n")
    cat(paste0("p value= ", object$global_significance_RV["RV_p_value"]))
    cat("\n")
    cat("Note that the observed RV is not rarefied")
    cat("\n")
    cat("======================")
    cat("\n")
  }
  if (!("percentage_squared_covariance" %in% names (object))) {
    cat("Significance of each singular axis")
    cat("\n")
    SA_sign=object$singular_axis_significance
    SA_sign$correlation_PLS_scores=round(
      SA_sign$correlation_PLS_scores, 2
    )
    SA_sign$percentage_squared_covariance=round(
      SA_sign$percentage_squared_covariance, 2
    )
    if (all(SA_sign$singular_values>0.001)) {
      SA_sign$singular_values=round(SA_sign$singular_values, 3)
    }
    print(SA_sign)
    cat("======================")
  } else {
    print(object)
  }
}


# Function to perform the PLS fit
pls_base = function (x,y) {

  xor=x
  yor=y
  # original data

  x=scale(x, center = TRUE, scale = FALSE)
  y=scale(y, center = TRUE, scale = FALSE)
  # centered original data

  S12 = cov(x, y)
  # Obtain only the between-block covariance matrix

  SVD=corpcor::fast.svd(S12)
  # Use the fast.svd algorithm on the between-block covariance matrix

  XScores <- as.matrix(x) %*% SVD$u
  YScores <- as.matrix(y) %*% SVD$v
  # Compute scores for x and y


  CorrXScoresYScores=unlist(lapply(
    seq_len(length(SVD$d)),
    function(x) cor(XScores[,x], YScores[,x])
  ))

  PLS=list(XScores = XScores, YScores = YScores,
           U=SVD$u, V=SVD$v, D=SVD$d,
           percentage_squared_covariance=(SVD$d^2/sum(SVD$d^2))*100,
           CorrXScoresYScores=CorrXScoresYScores,
           OriginalData=list(x=cbind(xor), y=cbind(yor)),
           x_center=attributes(x)$"scaled:center",
           y_center=attributes(y)$"scaled:center")
  class(PLS)=c("pls_fit", "list")
return(PLS)
}


# Function to run permutation tests
pls_perm = function (x, y, perm=999, global_RV_test=TRUE) {
  ObsPLS=pls_base(x,y)
  # Observed PLS analysis

  nobs=nrow(x)
  nsingval=length(ObsPLS$D)
  nvarX=ncol(x)
  nvarY=ncol(y)
  # Number of observations and singular values

  xmat=as.matrix(x)
  ymat=as.matrix(y)

  perm_results=future.apply::future_lapply(seq_len(perm), function(k) {
    Yperm_k=ymat[sample(seq_len(nobs), nobs), , drop=FALSE]
    Temp=pls_base(xmat, Yperm_k)
    res=list(D=Temp$D, Corr=Temp$CorrXScoresYScores)
    if (global_RV_test==TRUE) {
      res$RV=EscoufierRV(xmat, Yperm_k)
    }
    res
  }, future.seed=TRUE)
  # For each permutation: generate a permuted Y, compute PLS, extract
  # singular values and score correlations; optionally compute RV.
  # Permuted Y matrices are never stored simultaneously — each is
  # created, used, and discarded within its iteration.

  PLSpermD    = do.call(cbind, lapply(perm_results, `[[`, "D"))
  PLSpermCorr = do.call(cbind, lapply(perm_results, `[[`, "Corr"))
  # Reassemble per-permutation results into matrices
  # (singular values / correlations in columns, axes in rows)

  D_p.values=rep(NA,nsingval)
  for (i in seq_len(nsingval)) {
    D_p.values[i]=(length(which(PLSpermD[i,]>=ObsPLS$D[i]))+1)/(perm+1)
  }
  # Compute p values as the proportion of permuted singular values larger
  # or equal to the one observed
  CorrXScoresYScores_p.values=rep(NA,nsingval)
  for (i in seq_len(nsingval)) {
    CorrXScoresYScores_p.values[i]=(
      length(which(
        PLSpermCorr[i,]>=ObsPLS$CorrXScoresYScores[i]
      ))+1
    )/(perm+1)
  }
  # Same for the correlation between X scores and Y scores

  if (global_RV_test==TRUE) {
  ObsRV=EscoufierRV(x, y)
  permRV=unlist(lapply(perm_results, `[[`, "RV"))
  RV_p_value=(length(which(permRV>=ObsRV))+1)/(perm+1)
  # Overall significance using Escoufier RV

  Results=list(XScores = ObsPLS$XScores,
               YScores = ObsPLS$YScores,
               U=ObsPLS$U, V=ObsPLS$V, D=ObsPLS$D,
               CorrXScoresYScores=ObsPLS$CorrXScoresYScores,
               global_significance_RV=c(
                 observed_RV=ObsRV, RV_p_value=RV_p_value
               ),
               singular_axis_significance=data.frame(
                 correlation_PLS_scores=ObsPLS$CorrXScoresYScores,
                 pvalue_correlation_PLS_scores=
                   CorrXScoresYScores_p.values,
                 singular_values=ObsPLS$D,
                 pvalue_singular_values=D_p.values,
                 percentage_squared_covariance=
                   (ObsPLS$D^2/sum(ObsPLS$D^2))*100
               ),
               OriginalData=list(
                 x=ObsPLS$OriginalData$x, y=ObsPLS$OriginalData$y
               ),
               x_center=ObsPLS$x_center,
               y_center=ObsPLS$y_center
  )

  } else {
    ObsRV=NA
    RV_p_value=NA

    Results=list(XScores = ObsPLS$XScores,
                 YScores = ObsPLS$YScores,
                 U=ObsPLS$U, V=ObsPLS$V, D=ObsPLS$D,
                 CorrXScoresYScores=ObsPLS$CorrXScoresYScores,
                 singular_axis_significance=data.frame(
                   correlation_PLS_scores=ObsPLS$CorrXScoresYScores,
                   pvalue_correlation_PLS_scores=
                     CorrXScoresYScores_p.values,
                   singular_values=ObsPLS$D,
                   pvalue_singular_values=D_p.values,
                   percentage_squared_covariance=
                     (ObsPLS$D^2/sum(ObsPLS$D^2))*100
                 ),
                 OriginalData=list(
                   x=ObsPLS$OriginalData$x, y=ObsPLS$OriginalData$y
                 ),
                 x_center=ObsPLS$x_center,
                 y_center=ObsPLS$y_center
    )

  }


  class(Results)=c("pls_fit", "list")
  return(Results)
}


#' Major axis predictions for partial least squares (PLS) analysis
#'
#' Project data on the major axis of PLS scores and
#' obtain associated predictions
#'
#'
#' This function acts on a pls_fit object obtained from the function
#' pls. More in detail, the function:
#'  \itemize{
#'   \item Projects the original data onto the major axis for each
#'   pair of PLS axes (obtaining for each observation of the original
#'   data a score along this axis).
#'   \item For each observation (specimen) of the original data,
#'   obtains the shape predicted by its score along the major axis.
#'   \item (Optionally) if new data is provided, these data are first
#'   projected in the original PLS space and then the two operations
#'   above are performed on the new data.
#'  }
#' A more in-depth explanation with a figure which allows for a more
#' intuitive understanding is provided in Fruciano et al 2020
#' The idea is to obtain individual-level estimates of the shape
#' predicted by a PLS model.
#' This can be useful, for instance, to quantify to which extent the
#' shape of a given individual from one group resembles the shape that
#' individual would have according to the model computed in another
#' group.
#' This can be done by obtaining predictions with this function and
#' then computing the distance between the actual shape observed for
#' each individual and its prediction obtained from this function.
#' This is, indeed, how this approach has been used in Fruciano et al
#' 2020.
#'
#' The function returns a list with two or three main elements which
#' are themselves lists.
#' The most useful elements for the final user are highlighted in
#' boldface.
#'
#' \emph{original_major_axis_projection} is a list containing as many
#' elements as specified in axes_to_use (default 1).
#' Each of this elements contains the details of the computation of
#' the major axis (as a PCA of PLS scores for a pair of axes), and in
#' particular:
#' \describe{
#'   \item{major_axis_rotation}{Eigenvector}
#'   \item{mean_pls_scores}{Mean scores for that axis pair used in
#'   the computation}
#'   \item{pls_scale}{Scaling factor used}
#'   \item{\strong{original_data_PLS_projection}}{Scores of the
#'   original data on the major axis}
#' }
#'
#' \emph{original_major_axis_predictions_reversed} contains the
#' predictions of the PLS model for the original data,
#' back-transformed to the original space (i.e., if the original data
#' was shape, this will be shape).
#' If axes_to_use > 1, these predictions will be based on the major
#' axis computed for all pairs of axes considered.
#' This element has two sub-elements:
#' \describe{
#'   \item{\strong{Block1}}{Prediction for block 1}
#'   \item{\strong{Block2}}{Prediction for block 2}
#' }
#'
#' \emph{new_data_results} is only returned when new data is provided
#' and contains the results of the analyses obtained using a previous
#' PLS model on new data and, in particular:
#' \describe{
#'   \item{\strong{new_data_Xscores}}{PLS scores of the new data
#'   using the old model for the first block}
#'   \item{\strong{new_data_Yscores}}{PLS scores of the new data
#'   using the old model for the second block}
#'   \item{\strong{new_data_major_axis_proj}}{Scores of the new data
#'   on the major axis computed using the PLS model provided in
#'   pls_object. If axes_to_use > 1, each column corresponds to a
#'   separate major axis}
#'   \item{\strong{new_data_Block1_proj_prediction_revert}}{
#'   Predictions for Block 1 of the new data obtained by first
#'   computing the major axis projections (element
#'   new_data_major_axis_proj) and then back-transforming these
#'   projections to the original space (e.g., shape)}
#'   \item{\strong{new_data_Block2_proj_prediction_revert}}{
#'   Predictions for Block 2 of the new data obtained by first
#'   computing the major axis projections (element
#'   new_data_major_axis_proj) and then back-transforming these
#'   projections to the original space (e.g., shape)}
#' }
#'
#'
#' @section Citation:
#' If you use this function, please cite Fruciano et al. 2020.
#'
#' @param pls_object object of class "pls_fit" obtained from the
#' function pls
#' @param new_data_x,new_data_y (optional) matrices or data frames
#' containing new data
#' @param axes_to_use number of pairs of PLS axes to use in the
#' computation (by default, this is performed only on the first axis)
#' @param scale_PLS logical indicating whether PLS scores for
#' different blocks should be scaled prior to computing the major axis
#'
#' @seealso \code{\link{pls}}
#'
#' @return The function outputs a list with the following elements
#' (please, see the Details section for explanations on their
#' sub-elements):
#'  \describe{
#'   \item{original_major_axis_projection}{For each PLS axis pair,
#'   results of the computation of major axis and projection of the
#'   original data on each axis}
#'   \item{original_major_axis_predictions_reversed}{Data obtained
#'   back-transforming the scores on the major axis into the original
#'   space (e.g., shape)}
#'   \item{new_data_results}{(only if new data has been provided) PLS
#'   scores for the new data, scores of the new data on the major
#'   axis, preditions for the new data back-transformed into the
#'   original space (e.g., shape)}
#' }
#'
#'
#'
#' @references Fruciano C, Colangelo P, Castiglia R, Franchini P.
#' 2020. Does divergence from normal patterns of integration increase
#' as chromosomal fusions increase in number? A test on a house mouse
#' hybrid zone. Current Zoology 66:527–538.
#'
#' @section Notice:
#' \itemize{
#'   \item If new data is provided, this is first centered to the
#'   same average as in the original analysis, then it is translated
#'   back to the original scale.
#' }
#'
#' @examples
#'
#'
#'
#' ######################################
#' ### Example using the classical    ###
#' ### iris data set as a toy example ###
#' ######################################
#'
#' data(iris)
#' # Import the iris dataset
#' versicolor_data=iris[iris$Species=="versicolor",]
#' # Select only the specimens belonging to the species Iris versicolor
#' versicolor_sepal=versicolor_data[,grep("Sepal",
#'                                        colnames(versicolor_data))]
#' versicolor_petal=versicolor_data[,grep("Petal",
#'                                        colnames(versicolor_data))]
#' # Separate sepal and petal data for I. versicolor
#'
#'
#' PLS_sepal_petal_versicolor=pls(versicolor_sepal,
#'                                versicolor_petal,
#'                                perm=99)
#' summary(PLS_sepal_petal_versicolor)
#' # Compute the PLS for I. versicolor
#'
#'
#' plot(PLS_sepal_petal_versicolor$XScores[,1],
#'      PLS_sepal_petal_versicolor$YScores[,1],
#'      asp = 1,
#'      xlab = "PLS 1 Block 1 scores",
#'      ylab = "PLS 1 Block 2 scores")
#' # Plot the scores for the original data on the first pair of PLS
#' # axes (one axis per block)
#' # This is the data based on which we will compute the major axis
#' # direction
#' # Imagine fitting a line through those point, that is the major axis
#'
#' Pred_major_axis_versicolor=pls_major_axis(PLS_sepal_petal_versicolor,
#'                                           axes_to_use=1)
#' # Compute for I. versicolor the projections to the major axis
#' # using only the first pair of PLS axes (and scaling the scores
#' # prior to the computation)
#'
#' hist(Pred_major_axis_versicolor$original_major_axis_projection[[1]]$original_data_PLS_projection,
#'      main="Original data - projections on the major axis - based on the first pair of PLS axes",
#'      xlab="Major axis score")
#' # Plot distribution of PLS scores for each individual in the
#' # original data (I. versicolor)
#' # projected on the major axis for the first pair of PLS axis
#'
#' Pred_major_axis_versicolor$original_major_axis_predictions_reversed$Block1
#' Pred_major_axis_versicolor$original_major_axis_predictions_reversed$Block2
#' # Shape for each individual of the original data (I. versicolor)
#' # predicted by its position along the major axis
#'
#' # Now we will use the data from new species (I. setosa and I
#' # virginica) and obtain predictions from the PLS model obtained for
#' # I. versicolor
#'
#' # The easiest is to use the data for all three species
#' # as if they were both new data
#' # (using versicolor as new data is not going to affect the model)
#'
#'
#' all_sepal=iris[,grep("Sepal", colnames(iris))]
#' all_petal=iris[,grep("Petal", colnames(iris))]
#' # Separate sepals and petals (they are the two blocks)
#'
#' Pred_major_axis_versicolor_newdata=pls_major_axis(
#'   pls_object=PLS_sepal_petal_versicolor,
#'   new_data_x = all_sepal,
#'   new_data_y = all_petal,
#'   axes_to_use=1)
#' # Perform the major axis computation using new data
#' # Notice that:
#' # - we are using the old PLS model (computed on versicolor only)
#' # - we are adding the new data in the same order as in the original
#' #   model (i.e., new_data_x is sepal data, new_data_y is petal data)
#'
#'
#' plot(Pred_major_axis_versicolor_newdata$new_data_results$new_data_Xscores[,1],
#'      Pred_major_axis_versicolor_newdata$new_data_results$new_data_Yscores[,1],
#'      col=iris$Species, asp=1,
#'      xlab = "Old PLS, Axis 1, Block 1",
#'      ylab = "Old PLS, Axis 1, Block 2")
#' # Plot the new data (both versicolor and setosa)
#' # in the space of the first pair of PLS axes computed only on
#' # versicolor
#' # The three species follow a quite similar trajectories
#' # but they have different average value on the major axis
#'
#' # To visualize this better, we can plot the scores along the major
#' # axis for the three species
#' boxplot(Pred_major_axis_versicolor_newdata$new_data_results$new_data_major_axis_proj[,1]~
#'         iris$Species,
#'         xlab="Species",
#'         ylab="Score on the major axis")
#'
#' # We can also visualize the deviations from the major axis
#' # For instance by putting the predictions of the two blocks together
#' # Computing differences and then performing a PCA
#' predictions_all_data=cbind(
#'   Pred_major_axis_versicolor_newdata$new_data_results$new_data_Block1_proj_prediction_revert,
#'   Pred_major_axis_versicolor_newdata$new_data_results$new_data_Block2_proj_prediction_revert)
#' # Get the predictions for the two blocks (sepals and petals)
#' # and put them back together
#'
#' Euc_dist_from_predictions=unlist(lapply(seq(nrow(iris)),
#'                                          function(i)
#'   dist(rbind(iris[i,1:4],predictions_all_data[i,]))))
#' # for each flower, compute the Euclidean distance between
#' # the original values and what is predicted by the model
#'
#' boxplot(Euc_dist_from_predictions~iris$Species,
#'         xlab="Species",
#'         ylab="Euclidean distance from prediction")
#' # I. setosa is the one which deviates the most from the prediction
#'
#'
#'
#' @export
pls_major_axis=function(pls_object, new_data_x=NULL, new_data_y=NULL,
                        axes_to_use=1, scale_PLS=TRUE) {
  if (!("pls_fit" %in% class(pls_object)))
    stop("pls_object should be an object of class 'pls_fit'")
  if (axes_to_use<1 | axes_to_use>ncol(pls_object$U))
    stop(paste("The axes_to_use should be comprised in the number of",
               "axes identified by the PLS analysis"))
  # Initial checks

  PLS_scores_major_axis_find=lapply(seq(axes_to_use), function(i)
  {
    PLS_scores_temp=cbind(pls_object$XScores[,i],
                          pls_object$YScores[,i])
    if (scale_PLS==TRUE) {
      PCA_PLS_scores_temp=prcomp(PLS_scores_temp, scale. = TRUE)
    } else {
      PCA_PLS_scores_temp=prcomp(PLS_scores_temp, scale. = FALSE)
    }

    list(major_axis_rotation=cbind(PCA_PLS_scores_temp$rotation[,1]),
         mean_pls_scores=PCA_PLS_scores_temp$center,
         pls_scale=PCA_PLS_scores_temp$scale,
         original_data_PLS_projection=cbind(PCA_PLS_scores_temp$x[,1]))
  })
  names(PLS_scores_major_axis_find)=paste0("PLS_axis_pair_",
                                           seq(axes_to_use))

  # For each PLS axis in axes_to_use, perform a PCA of
  # PLS scores of one block and the other
  # to identify the major axis, as well as the projection
  # of the original PLS scores in each of these axes


  ### Now a double-reversion of the original data
  ### from projections on the major axis back to the original (shape) variables



  if (scale_PLS==TRUE) {
    revertPCAPLS=lapply(PLS_scores_major_axis_find, function(X) {
      Scores=cbind(X$original_data_PLS_projection)
      Eigenvectors=cbind(X$major_axis_rotation)
    ZEt = Scores %*% t(Eigenvectors)
    for (i in seq_len(ncol(ZEt))) {
      ZEt[,i]=ZEt[,i]*X$pls_scale[i]
    }
    rows = nrow(Scores)
  ZEt + rep_row(X$mean_pls_scores, rows)
    })
  } else {

  revertPCAPLS=lapply(PLS_scores_major_axis_find, function(X) {
    reversePCA(Scores = cbind(X$original_data_PLS_projection),
               Eigenvectors = cbind(X$major_axis_rotation),
               Mean = X$mean_pls_scores
    )
  })
  }
  # Get back scores in the PLS space for the major axis projections


  PLS_scores_major_axis_X=do.call("cbind",
                                  lapply(revertPCAPLS,
                                         function(X) X[,1]))
  PLS_scores_major_axis_Y=do.call("cbind",
                                  lapply(revertPCAPLS,
                                         function(X) X[,2]))
  # Separate X and Y (the two blocks) and put them together as column
  # vectors

  Block1_revert=reversePCA(
    cbind(PLS_scores_major_axis_X),
    Eigenvectors = cbind(pls_object$U[,seq(axes_to_use)]),
    Mean = pls_object$x_center)
  Block2_revert=reversePCA(
    cbind(PLS_scores_major_axis_Y),
    Eigenvectors = cbind(pls_object$V[,seq(axes_to_use)]),
    Mean = pls_object$y_center)
  # Revert back to the original space (e.g., shape)

  Results=list(
    original_major_axis_projection=PLS_scores_major_axis_find,
    original_major_axis_predictions_reversed=list(Block1=Block1_revert,
                                                   Block2=Block2_revert)
  )

  # Now the analysis in case there is new data to project on the original
  # PLS

  if(!is.null(new_data_x) && !is.null(new_data_y)) {

    if ((ncol(new_data_x)!=ncol(pls_object$OriginalData$x)) |
        (ncol(new_data_y)!=ncol(pls_object$OriginalData$y)) ) {
      stop(paste("The number of variables of the new data should be",
                 "the same as the number of variables in the",
                 "original data"))
    }
    # Checks on consistency on the number of variables for new data

    new_data_x_center=as.matrix(sweep(cbind(new_data_x), 2,
                                      pls_object$x_center, "-"))
    new_data_y_center=as.matrix(sweep(cbind(new_data_y), 2,
                                      pls_object$y_center, "-"))
    # Translate the new data to the same origin as the original data


    new_data_Xscores=cbind(new_data_x_center %*% pls_object$U)
    new_data_Yscores=cbind(new_data_y_center %*% pls_object$V)
    # Compute PLS scores for the new data in the old PLS

    new_data_major_axis_proj=lapply(seq(axes_to_use), function(i)
    {
      PLS_scores_temp=cbind(new_data_Xscores[,i],
                            new_data_Yscores[,i])
      if (scale_PLS==TRUE){
        PLS_scores_temp=sweep(PLS_scores_temp, 2,
                              PLS_scores_major_axis_find[[i]]$mean_pls_scores,
                              "-")
        PLS_scores_temp=sweep(PLS_scores_temp, 2,
                              PLS_scores_major_axis_find[[i]]$pls_scale,
                              "/")
      }
      new_data_mjp=PLS_scores_temp %*%
        PLS_scores_major_axis_find[[i]]$major_axis_rotation
    })
    names(new_data_major_axis_proj)=paste0("PLS_axis_pair_",
                                           seq(axes_to_use))
    # Compute projection of the PLS scores for the new data on the major
    # axis of the old PLS

    new_data_revertPCAPLS=lapply(seq(axes_to_use), function(i) {
      if (scale_PLS==TRUE) {
        # For scaled PCA, manually reverse: multiply scores by rotation,
        # then scale, then add mean
        ZEt = new_data_major_axis_proj[[i]] %*%
          t(PLS_scores_major_axis_find[[i]]$major_axis_rotation)
        ZEt = sweep(ZEt, 2,
                    PLS_scores_major_axis_find[[i]]$pls_scale, "*")
        ZEt = sweep(ZEt, 2,
                    PLS_scores_major_axis_find[[i]]$mean_pls_scores, "+")
        return(ZEt)
      } else {
        reversePCA(
          Scores = new_data_major_axis_proj[[i]],
          Eigenvectors = cbind(
            PLS_scores_major_axis_find[[i]]$major_axis_rotation),
          Mean = PLS_scores_major_axis_find[[i]]$mean_pls_scores
        )
      }
    })
    # Get back scores in the PLS space for the major axis projections of
    # new data

    new_data_revertPCAPLS_major_axis_X=do.call("cbind",
                                               lapply(new_data_revertPCAPLS,
                                                      function(X) X[,1]))
    new_data_revertPCAPLS_major_axis_Y=do.call("cbind",
                                               lapply(new_data_revertPCAPLS,
                                                      function(X) X[,2]))
    # Separate X and Y (the two blocks) and put them together as column
    # vectors

    new_data_Block1_revert=reversePCA(
      cbind(new_data_revertPCAPLS_major_axis_X),
      Eigenvectors = cbind(pls_object$U[,seq(axes_to_use)]),
      Mean = pls_object$x_center)
    new_data_Block2_revert=reversePCA(
      cbind(new_data_revertPCAPLS_major_axis_Y),
      Eigenvectors = cbind(pls_object$V[,seq(axes_to_use)]),
      Mean = pls_object$y_center)
    # Revert back to the original space (e.g., shape)

    Results$new_data_results=list(
      new_data_Xscores=new_data_Xscores,
      new_data_Yscores=new_data_Yscores,
      new_data_major_axis_proj=do.call("cbind",
                                       new_data_major_axis_proj),
      new_data_Block1_proj_prediction_revert=new_data_Block1_revert,
      new_data_Block2_proj_prediction_revert=new_data_Block2_revert
    )

  }


  return(Results)
}

Try the GeometricMorphometricsMix package in your browser

Any scripts or data that you put into this service are public.

GeometricMorphometricsMix documentation built on April 18, 2026, 1:06 a.m.