R/ipls.R
In mdatools: Multivariate Data Analysis for Chemometrics

Documented in ipls ipls.backward ipls.forward plot.ipls plotRMSE.ipls plotSelection.ipls print.ipls summary.ipls

## class and methods for iPLS ##

#' Variable selection with interval PLS
#'
#' @description
#' Applies iPLS algorithm to find variable intervals most important for
#' prediction.
#'
#' @param x
#' a matrix with predictor values.
#' @param y
#' a vector with response values.
#' @param glob.ncomp
#' maximum number of components for a global PLS model.
#' @param center
#' logical, center or not the data values.
#' @param scale
#' logical, standardize or not the data values.
#' @param cv
#' cross-validation settings (see details).
#' @param exclcols
#' columns of x to be excluded from calculations (numbers, names or vector with logical values).
#' @param exclrows
#' rows to be excluded from calculations (numbers, names or vector with logical values).
#' @param int.ncomp
#' maximum number of components for interval PLS models.
#' @param int.num
#' number of intervals.
#' @param int.width
#' width of intervals.
#' @param int.limits
#' a two column matrix with manual intervals specification.
#' @param int.niter
#' maximum number of iterations (if NULL it will be the smallest of two values: number of intervals and 30).
#' @param ncomp.selcrit
#' criterion for selecting optimal number of components ('min' for minimum of RMSECV).
#' @param method
#' iPLS method (\code{'forward'} or \code{'backward'}).
#' @param x.test
#' matrix with predictors for test set (by default is NULL, if specified, is used instead of cv).
#' @param y.test
#' matrix with responses for test set.
#' @param silent
#' logical, show or not information about selection process.
#' @param full
#' logical, if TRUE the procedure will continue even if no improvements is observed.
#' @param cv.scope
#' scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std
#' or 'local' — recompute new for each local calibration set.
#'
#' @return
#' object of 'ipls' class with several fields, including:
#'    \item{var.selected}{a vector with indices of selected variables}
#'    \item{int.selected}{a vector with indices of selected intervals }
#'    \item{int.num}{total number of intervals}
#'    \item{int.width}{width of the intervals}
#'    \item{int.limits}{a matrix with limits for each interval}
#'    \item{int.stat}{a data frame with statistics for the selection algorithm}
#'    \item{glob.stat}{a data frame with statistics for the first step (individual intervals)}
#'    \item{gm}{global PLS model with all variables included}
#'    \item{om}{optimized PLS model with selected variables}
#'
#' @details
#' The algorithm splits the predictors into several intervals and tries to find a combination
#' of the intervals, which gives best prediction performance. There are two selection methods:
#' "forward" when the intervals are successively included, and "backward" when the intervals
#' are successively excluded from a model. On the first step the algorithm finds the best
#' (forward) or the worst (backward) individual interval. Then it tests the others to find the
#' one which gives the best model in a combination with the already selected/excluded one. The
#' procedure continues until no improvements is observed or the maximum number of iteration
#' is reached.
#'
#' There are several ways to specify the intervals. First of all either number of intervals
#' (\code{int.num}) or width of the intervals (\code{int.width}) can be provided. Alternatively
#' one can specify the limits (first and last variable number) of the intervals manually
#' with \code{int.limits}.
#'
#' Cross-validation settings, \code{cv}, can be a number or a list. If \code{cv} is a number, it
#' will be used as a number of segments for random cross-validation (if \code{cv = 1}, full
#' cross-validation will be preformed). If it is a list, the following syntax can be used:
#' \code{cv = list('rand', nseg, nrep)} for random repeated cross-validation with \code{nseg}
#' segments and \code{nrep} repetitions or \code{cv = list('ven', nseg)} for systematic splits
#' to \code{nseg} segments ('venetian blinds').
#'
#' @references
#' [1] Lars Noergaard at al.  Interval partial least-squares regression (iPLS): a
#' comparative chemometric study with an example from near-infrared spectroscopy.
#' Appl.Spec. 2000; 54: 413-419
#'
#' @examples
#' library(mdatools)
#'
#' ## forward selection for simdata
#'
#' data(simdata)
#' Xc = simdata$spectra.c
#' yc = simdata$conc.c[, 3, drop = FALSE]
#'
#' # run iPLS and show results
#' im = ipls(Xc, yc, int.ncomp = 5, int.num = 10, cv = 4, method = "forward")
#' summary(im)
#' plot(im)
#'
#' # show "developing" of RMSECV during the algorithm execution
#' plotRMSE(im)
#'
#' # plot predictions before and after selection
#' par(mfrow = c(1, 2))
#' plotPredictions(im$gm)
#' plotPredictions(im$om)
#'
#' # show selected intervals on spectral plot
#' ind = im$var.selected
#' mspectrum = apply(Xc, 2, mean)
#' plot(simdata$wavelength, mspectrum, type = 'l', col = 'lightblue')
#' points(simdata$wavelength[ind], mspectrum[ind], pch = 16, col = 'blue')
#'
#' @export
ipls <- function(x, y, glob.ncomp = 10, center = TRUE, scale = FALSE, cv = list("ven", 10),
   exclcols = NULL, exclrows = NULL,  int.ncomp = glob.ncomp, int.num = NULL, int.width = NULL,
   int.limits = NULL, int.niter = NULL, ncomp.selcrit = "min", method = "forward",
   x.test = NULL, y.test = NULL, silent = FALSE, full = FALSE, cv.scope = "local") {

   # process names and values for xaxis
   x <- mda.df2mat(x)
   xaxis.name <- attr(x, "xaxis.name")
   xaxis.values <- attr(x, "xaxis.values")

   if (is.null(xaxis.name)) xaxis.name <- "Variables"
   if (is.null(xaxis.values)) xaxis.values <- seq_len(ncol(x))
   if (is.null(dim(y))) dim(y) <- c(length(y), 1)

   # if test set is provided check it and set cv to NULL
   if (!is.null(x.test) && !is.null(y.test)) {
      if (is.null(dim(y.test))) dim(y.test) <- c(length(y.test), 1)
      if (nrow(x.test) != nrow(y.test)) stop("Number of rows in 'x.test' and 'y.test' should be the same")
      if (ncol(x.test) != ncol(x)) stop("Number of columns in 'x.test' and 'x' should be the same")
      if (ncol(y.test) != ncol(y)) stop("Number of columns in 'y.test' and 'y' should be the same")
      cv <- NULL
   }

   # remove excluded columns and rows
   if (length(exclcols) > 0) {
      x <- mda.exclcols(x, exclcols)
      exclcols <- attr(x, "exclcols")
      x <- x[, -exclcols, drop = FALSE]
      xaxis.values <- xaxis.values[-exclcols]
      attr(x, "exclcols") <- NULL

      if (!is.null(x.test)) {
         x.test <- x.test[, -exclcols, drop = FALSE]
         attr(x.test, "exclcols") <- NULL
      }
   }

   if (length(exclrows) > 0) {
      x <- mda.exclrows(x, exclrows)
      exclrows <- attr(x, "exclrows")
      x <- x[-exclrows, , drop = FALSE]
      y <- y[-exclrows, , drop = FALSE]
      attr(x, "exclrows") <- NULL
      attr(y, "exclrows") <- NULL
   }

   attr(x, "xaxis.values") <- xaxis.values
   attr(x, "xaxis.name") <- xaxis.name

   if (length(dim(y)) != 2) {
      stop("Response variable (y) should be a matrix or a sequence.")
   }

   if (ncol(y) > 1) {
      warning("iPLS can work with one y-variable at time, selecting first column.")
      y <- y[, 1, drop = FALSE]
   }

   # add name to the single column
   if (is.null(colnames(y))) colnames(y) <- "y1"

   # get number of predictors
   npred <- ncol(x)

   # get matrix with limits
   int.limits <- ipls.getintlimits(int.limits, int.width, int.num, npred)

   # check limits
   if (ncol(int.limits) != 2 || nrow(int.limits) < 2) {
      stop("Interval limits shall be provided as a matrix with two columns.")
   }

   # get difference (size of intervals)
   df <- apply(int.limits, 1, diff)

   # check that difference and interval values are also fine
   if (min(int.limits) < 1 || max(int.limits) > npred || any(df < 0)) {
      stop("Wrong values for interval limits.")
   }

   # recompute interval numbers and width (average)
   int.num <- nrow(int.limits)
   int.width <- max(df) + 1

   # add names
   rownames(int.limits) <- seq_len(int.num)
   colnames(int.limits) <- c("Left", "Right")

   # define number of iterations
   if (is.null(int.niter)) {
      int.niter <- min(30, int.num)
   }

   if (int.niter > int.num) {
      int.niter <- int.num
   }

   # build an object
   obj <- list(
      cv = cv,
      glob.ncomp = glob.ncomp,
      int.ncomp = int.ncomp,
      xmean = apply(x, 2, "mean"),
      int.width = int.width,
      int.num = int.num,
      int.niter = int.niter,
      int.limits = int.limits,
      center = center,
      scale = scale,
      method = method,
      ncomp.selcrit = ncomp.selcrit,
      silent = silent,
      xaxis.name = xaxis.name,
      xaxis.values = xaxis.values,
      x.test = x.test,
      y.test = y.test
   )

   # make a global model
   obj$gm <- pls(x, y, ncomp = obj$glob.ncomp, center = obj$center, scale = obj$scale, cv = obj$cv,
      ncomp.selcrit = obj$ncomp.selcrit, x.test = obj$x.test, y.test = obj$y.test, cv.scope = cv.scope)


   # get statistics for global model
   gmres <- if (is.null(obj$cv)) obj$gm$res$test else obj$gm$res$cv

   glob.stat <- data.frame(
      "n" = 0,
      "start" = 1,
      "end" = ncol(x),
      "nComp" = obj$gm$ncomp.selected,
      "RMSE" = gmres$rmse[1, obj$gm$ncomp.selected],
      "R2" = gmres$r2[1, obj$gm$ncomp.selected]
   )

   # initialize statistics for local models
   int.stat <- data.frame(
      "n" = 0,
      "start" = 1,
      "end" = ncol(x),
      "selected" = FALSE,
      "nComp" = obj$gm$ncomp.selected,
      "RMSE" = gmres$rmse[1, obj$gm$ncomp.selected],
      "R2" = gmres$r2[1, obj$gm$ncomp.selected]
   )

   # do iPLS
   obj <- switch(method,
      "forward" = ipls.forward(x, y, obj, int.stat, glob.stat, full, cv.scope),
      "backward" = ipls.backward(x, y, obj, int.stat, glob.stat, full, cv.scope),
      stop("Wrong value for parameter 'method'.")
   )

   # correct columns of int.stat
   if (!is.null(obj$int.stat)) {
      rownames(obj$int.stat) <- seq_len(nrow(obj$int.stat))
      obj$int.stat$R2 <- round(obj$int.stat$R2, 3)
      obj$int.stat$selected <- obj$int.stat$n %in% obj$int.selected
   }


   # prepare vector with selected variables
   int.limits <- obj$int.limits[obj$int.selected, , drop = FALSE]
   obj$var.selected <- apply(int.limits, 1, function(x) seq(x[1], x[2]))
   if (is.list(obj$var.selected)) obj$var.selected <- do.call(c, obj$var.selected)
   names(obj$var.selected) <- dim(obj$var.selected) <- NULL

   obj$om <- pls(x[, obj$var.selected, drop = FALSE], y, ncomp = obj$int.ncomp, center = obj$center,
      scale = obj$scale, cv = obj$cv, ncomp.selcrit = obj$ncomp.selcrit,
      x.test = obj$x.test[, obj$var.selected, drop = FALSE], y.test = obj$y.test, cv.scope = cv.scope)

   obj$call <- match.call()
   class(obj) <- "ipls"

   return(obj)
}

ipls.getintlimits <- function(int.limits, int.width, int.num, npred) {

   if (is.null(c(int.limits, int.width, int.num))) {
      stop("Specify either interval width or number of intervals.")
   }

   if (!is.null(int.limits)) {
      return(int.limits)
   }

   int.width <- if (is.null(int.num)) int.width else round(npred / int.num)
   int.num <- round(npred / int.width)

   # generate interval limits similar to the way from [1]
   if (int.num == npred) {
      return(cbind(seq_len(npred), seq_len(npred)))
   }

   varrem <- npred %% int.num
   nvarrem <- trunc(npred / int.num)

   int1 <- if (varrem == 0) seq(1, npred, by = nvarrem) else
         c(
            seq(1, (varrem - 1) * (nvarrem + 1) + 1, by = nvarrem + 1),
            seq((varrem - 1) * (nvarrem + 1) + 2 + nvarrem, npred, by = nvarrem)
         )
   int2 <- c(int1[2:int.num] - 1, npred)

   return(cbind(int1, int2))
}


#' Runs the forward iPLS algorithm
#'
#' @param x
#' a matrix with predictor values.
#' @param y
#' a vector with response values.
#' @param obj
#' object with initial settings for iPLS algorithm.
#' @param int.stat
#' data frame with initial interval statistics.
#' @param glob.stat
#' data frame with initial global statistics.
#' @param full
#' logical, if TRUE the procedure will continue even if no improvements is observed.
#' @param cv.scope
#' scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std
#' or 'local' — recompute new for each local calibration set.
#'
ipls.forward <- function(x, y, obj, int.stat, glob.stat, full, cv.scope) {

   # define vectors with status, selected and non-selected intervals
   int.nonselected <- seq_len(obj$int.num)
   int.selected <- NULL


   if (!obj$silent) {
      fprintf("\nModel with all intervals: RMSE = %f, nLV = %d\n",
         int.stat$RMSE[1], int.stat$nComp[1])
   }

   # do loop for max number of intervals
   selind <- NULL
   rmse <- Inf
   for (i in seq_len(obj$int.niter)) {
      if (!obj$silent) fprintf("Iteration %3d/%3d... ", i, obj$int.niter)

      sel <- NULL
      rmse_loc <- Inf
      for (l in int.nonselected) {

         # combine already selected intervals with the current
         ind <- obj$int.limits[l, 1]:obj$int.limits[l, 2]
         xc <- x[, c(selind, ind), drop = FALSE]
         xt <- if (!is.null(obj$x.test)) obj$x.test[, c(selind, ind), drop = FALSE] else NULL

         # build a model
         m <- pls(xc, y, ncomp = obj$int.ncomp, center = obj$center, scale = obj$scale, cv = obj$cv,
            ncomp.selcrit = obj$ncomp.selcrit, x.test = xt, y.test = obj$y.test, cv.scope = cv.scope)
         lres <- if (is.null(obj$cv)) m$res$test else m$res$cv

         # if first round, build a data frame with statistics for each interval
         if (i == 1) {
            glob.stat <- rbind(glob.stat, data.frame(
               "n" = l,
               "start" = obj$int.limits[l, 1],
               "end" = obj$int.limits[l, 2],
               "nComp" = m$ncomp.selected,
               "RMSE" = lres$rmse[1, m$ncomp.selected],
               "R2" = lres$r2[1, m$ncomp.selected]
            ))
         }

         # else check if rmse has been improved
         if (rmse_loc > lres$rmse[1, m$ncomp.selected]) {
            ncomp <- m$ncomp.selected
            rmse_loc <- lres$rmse[1, m$ncomp.selected]
            r2 <- lres$r2[1, m$ncomp.selected]
            sel <- l
         }
      }

      # if no global improvements and parameter "full" is FALSE - stop
      if (rmse_loc > rmse && !full) {
         if (!obj$silent) cat("no improvements, stop.\n\n")
         break
      }

      # if do not stop set new RMSE value and continue
      rmse  <- rmse_loc

      selind <- c(selind, obj$int.limits[sel, 1]:obj$int.limits[sel, 2])
      int.nonselected <- int.nonselected[int.nonselected != sel]
      int.selected <- c(int.selected, sel)
      int.stat <- rbind(int.stat, data.frame(
         "n" = sel,
         "start" = obj$int.limits[sel, 1],
         "end" = obj$int.limits[sel, 2],
         "selected" = TRUE,
         "nComp" = ncomp,
         "RMSE" = rmse,
         "R2" = r2
      ))

      if (!obj$silent) {
         fprintf("selected interval %3d (RMSE = %f, nLV = %d)\n", sel, rmse, ncomp)
      }

   }

   # return the selection results
   obj$glob.stat <- glob.stat
   obj$int.stat <- int.stat

   # take only intervals which result in global RMSE minimum
   iter <- which.min(int.stat$RMSE[2:nrow(int.stat)])
   obj$int.selected <- int.stat$n[2:(iter + 1)]

   return(obj)
}

#' Runs the backward iPLS algorithm
#'
#' @param x
#' a matrix with predictor values.
#' @param y
#' a vector with response values.
#' @param obj
#' object with initial settings for iPLS algorithm.
#' @param int.stat
#' data frame with initial interval statistics.
#' @param glob.stat
#' data frame with initial global statistics.
#' @param full
#' logical, if TRUE the procedure will continue even if no improvements is observed.
#' @param cv.scope
#' scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std
#' or 'local' — recompute new for each local calibration set.
#'
ipls.backward <- function(x, y, obj, int.stat, glob.stat, full, cv.scope) {

   # define vectors with status, selected and non-selected intervals
   int.selected <- seq_len(obj$int.num)
   int.nonselected <- NULL

   if (!obj$silent) {
      fprintf("\nModel with all intervals: RMSE = %f, nLV = %d\n",
         int.stat$RMSE[1], int.stat$nComp[1])
   }

   # do loop for max number of intervals
   unselind <- NULL
   rmse <- Inf
   for (i in seq_len(obj$int.niter)) {

      # if only one interval is left - stop
      if (length(int.selected) == 1) break

      if (!obj$silent) {
         fprintf("Iteration %3d/%3d... ", i, obj$int.niter)
      }

      # do loop to select an interval
      unsel <- NULL
      rmse_loc <- Inf
      for (l in int.selected) {
         ind <- obj$int.limits[l, 1]:obj$int.limits[l, 2]

         # combine already selected intervals with the current
         xc <- x[, -c(unselind, ind), drop = FALSE]
         xt <- if (!is.null(obj$x.test)) obj$x.test[, -c(unselind, ind), drop = FALSE] else NULL

         # build a model
         m <- pls(xc, y, ncomp = obj$int.ncomp, center = obj$center, scale = obj$scale, cv = obj$cv,
            ncomp.selcrit = obj$ncomp.selcrit, x.test = xt, y.test = obj$y.test, cv.scope = cv.scope)

         lres <- if (is.null(obj$cv)) m$res$test else m$res$cv

         # if first round, build a data frame with statistics for each interval
         if (i == 1) {
            glob.stat <- rbind(glob.stat, data.frame(
               "n" = l,
               "start" = obj$int.limits[l, 1],
               "end" = obj$int.limits[l, 2],
               "nComp" = m$ncomp.selected,
               "RMSE" = lres$rmse[1, m$ncomp.selected],
               "R2" = lres$r2[1, m$ncomp.selected]
            ))
         }

         # else check if rmse has been improved
         if (rmse_loc > lres$rmse[1, m$ncomp.selected]) {
            ncomp <- m$ncomp.selected
            rmse_loc <- lres$rmse[1, m$ncomp.selected]
            r2 <- lres$r2[1, m$ncomp.selected]
            unsel <- l
         }
      }

      # if no global improvements and parameter "full" is FALSE - stop
      if (rmse_loc > rmse && !full) {
         if (!obj$silent) cat("no improvements, stop.\n\n")
         break
      }

      # if unsel is not set - stop
      if (is.null(unsel)) break

      # if do not stop set new RMSE value and continue
      rmse  <- rmse_loc

      unselind <- c(unselind, obj$int.limits[unsel, 1]:obj$int.limits[unsel, 2])
      int.selected <- int.selected[int.selected != unsel]
      int.nonselected <- c(int.nonselected, unsel)

      int.stat <- rbind(int.stat, data.frame(
         "n" = unsel,
         "start" = obj$int.limits[unsel, 1],
         "end" = obj$int.limits[unsel, 2],
         "selected" = FALSE,
         "nComp" = ncomp,
         "RMSE" = rmse,
         "R2" = r2
      ))

      if (!obj$silent) {
         fprintf("excluded interval %3d (RMSE = %f, nLV = %d)\n", unsel, rmse, m$ncomp.selected)
      }
   }

   # return the selection results
   obj$glob.stat <- glob.stat
   obj$int.stat <- int.stat

   # take only intervals which result in global RMSE minimum
   iter <- which.min(int.stat$RMSE[2:nrow(int.stat)])
   obj$int.selected <- seq_len(obj$int.num)[-int.stat$n[2:(iter + 1)]]

   return(obj)
}

#' iPLS performance plot
#'
#' @description
#' Shows PLS performance for each selected or excluded intervals at the
#' first iteration.
#'
#' @param obj
#' iPLS results (object of class ipls).
#' @param glob.ncomp
#' number of components for global PLS model with all intervals.
#' @param main
#' main title for the plot.
#' @param xlab
#' label for x-axis.
#' @param ylab
#' label for y-axis.
#' @param xlim
#' limits for x-axis.
#' @param ylim
#' limits for y-axis.
#' @param ...
#' other arguments.
#'
#' @details
#' The plot shows intervals as bars, which height corresponds to RMSECV obtained when particular
#' interval was selected (forward) or excluded (backward) from a model at the first iteration.
#' The intervals found optimal after backward/forward iPLS selection are shown with green color
#' while the other intervals are gray.
#'
#' See examples in help for \code{\link{ipls}} function.
#'
#'  @seealso
#'  \code{\link{summary.ipls}}, \code{\link{plotRMSE.ipls}}
#'
#' @export
plotSelection.ipls <- function(obj, glob.ncomp = obj$gm$ncomp.selected, main = "iPLS results",
   xlab = obj$xaxis.name, ylab = if (is.null(obj$cv)) "RMSEP" else "RMSECV", xlim = NULL, ylim = NULL, ...) {

   if (glob.ncomp < 1 || glob.ncomp > obj$gm$ncomp) {
      stop("Wrong value for number of components.")
   }

   int <- obj$int.limits
   xlabels <- if (is.null(obj$xaxis.values)) seq_len(max(int)) else obj$xaxis.values
   if (!is.numeric(xlabels)) {
      stop("Parameter 'xlabels' should be a vector with numbers.")
   }

   if (length(xlabels) != (max(int) - min(int) + 1)) {
      stop("Wrong values for 'xlabels' parameter.")
   }

   # redefine intervals using xlabels
   int[, 1] <- xlabels[int[, 1]]
   int[, 2] <- xlabels[int[, 2]]

   # compute values for barplot
   mids <- apply(int, 1, mean)
   rmse <- obj$glob.stat$RMSE[2:nrow(obj$glob.stat)]
   ncomp <- obj$glob.stat$nComp[2:nrow(obj$glob.stat)]

   # adjust axis limits
   if (is.null(xlim)) xlim <- c(min(int), max(int))
   if (is.null(ylim)) ylim <- c(0, max(rmse) * 1.1)

   # rescale mean X values to fit the plot
   xmean <- (obj$xmean - min(obj$xmean)) / (max(obj$xmean) - min(obj$xmean)) * (ylim[2] - ylim[1])

   # make plot
   plot(0, 0, type = "n", main = main, xlab = xlab, ylab = ylab, xlim = xlim, ylim = ylim, ...)

   # show intervals as bars (gray - non selected, green - selected)
   nint <- obj$int.num
   selected <- rep(FALSE, nint)
   selected[obj$int.selected] <- TRUE

   rect(int[, 1], 0, c(int[2:nint, 1], int[nint, 2]), rmse,
      border = ifelse(selected, rgb(0.70, 0.80, 0.70), rgb(0.80, 0.80, 0.80)),
      col = ifelse(selected, rgb(0.5, 1.0, 0.6), rgb(0.9, 0.9, 0.9))
   )


   # show mean signal
   lines(xlabels, xmean, col = rgb(1.0, 0.7, 0.7), lwd = 2)

   # number of components for each interval
   text(mids,  matrix(0.05 * ylim[2], ncol = length(mids)), ncomp,
      col = rgb(0.4, 0.4, 0.4), cex = 0.85)

   dx <- diff(xlim) / 50
   abline(h = obj$gm$cvres$rmse[1, glob.ncomp], lty = 2, col = rgb(0.5, 0.5, 0.5))
   text(xlim[2] + dx, obj$gm$cvres$rmse[1, glob.ncomp], glob.ncomp, cex = 0.85,
        col = rgb(0.3, 0.3, 0.3), font = 2, pos = 3)
}

#' RMSE development plot
#'
#' @description
#' Shows how RMSE develops for each iteration of iPLS selection
#' algorithm.
#'
#' @param obj
#' iPLS results (object of class ipls).
#' @param glob.ncomp
#' number of components for global PLS model with all intervals.
#' @param main
#' main title for the plot.
#' @param xlab
#' label for x-axis.
#' @param ylab
#' label for y-axis.
#' @param xlim
#' limits for x-axis.
#' @param ylim
#' limits for y-axis.
#' @param ...
#' other arguments.
#'
#' @details
#' The plot shows RMSE values obtained at each iteration of the iPLS algorithm as bars. The first
#' bar correspond to the global model with all variables included, second - to the model obtained
#' at the first iteration and so on. Number at the bottom of each bar corresponds to the interval
#' included or excluded at the particular iteration.
#'
#' @seealso
#' \code{\link{summary.ipls}}, \code{\link{plotSelection.ipls}}
#'
#' @export
plotRMSE.ipls <- function(obj, glob.ncomp = obj$gm$ncomp.selected, main = "RMSE development",
   xlab = "Iterations", ylab = if (is.null(obj$cv)) "RMSEP" else "RMSECV", xlim = NULL, ylim = NULL, ...) {

   if (glob.ncomp < 1 || glob.ncomp > obj$gm$ncomp) {
      stop("Wrong value for number of components.")
   }

   rmse <- obj$int.stat$RMSE
   n <- obj$int.stat$n
   mids <- 0:(length(n) - 1)

   if (is.null(xlim)) xlim <- c(min(mids) - 0.5, max(mids) + 0.5)
   if (is.null(ylim)) ylim <- c(0, max(rmse) * 1.1)

   # make plot
   plot(0, 0, type = "n", main = main, xlab = xlab, ylab = ylab, xlim = xlim, ylim = ylim, ...)
   abline(h = rmse[1], lty = 2, col = "gray")
   lines(mids, rmse, type = "b", col = mdaplot.getColors(1), pch = 16)
}

#' Overview plot for iPLS results
#'
#' @description
#' Shows a plot for iPLS results.
#'
#' @param x
#' a  (object of class \code{pca}).
#' @param ...
#' other arguments.
#'
#' @details
#' See details for \code{\link{plotSelection.ipls}}.
#'
#' @export
plot.ipls <- function(x, ...) {
   plotSelection(x, ...)
}

#' Print method for iPLS
#'
#' @description
#' Prints information about the iPLS object structure
#'
#' @param x
#' a iPLS (object of class \code{ipls})
#' @param ...
#' other arguments
#'
#' @export
print.ipls <- function(x, ...) {
   cat("\niPLS results (class ipls)\n")
   cat("\nCall:\n")
   print(x$call)
   cat("\nMajor fields:\n")
   cat("$var.selected - vector with selected variables\n")
   cat("$int.selected - vector with selected intervals\n")
   cat("$int.num - number of intervals\n")
   cat("$int.width - width of the intervals\n")
   cat("$int.limits - limits for the intervals\n")
   cat("$int.stat - table with statistics for the interval selection results\n")
   cat("$glob.stat - table with statistics for the first iteration of the algorithm\n")
   cat("\nTry summary(obj) and plot(obj) to see details.\n")
}

#' Summary for iPLS results
#'
#' @description
#' Shows statistics and algorithm parameters for iPLS results.
#'
#' @param object
#' a iPLS (object of class \code{ipls}).
#' @param glob.ncomp
#' number of components for global PLS model with all intervals.
#' @param ...
#' other arguments.
#'
#' @details
#' The method shows information on the algorithm parameters as well as a table with selected or
#' excluded interval. The table has the following columns: 'step' showing on which iteration
#' an interval was selected or excluded, 'start and 'end' show variable indices for the interval,
#' 'nComp' is a number of components used in a model, 'RMSE' is RMSECV for the model and 'R2' is
#' coefficient of determination for the same model.
#'
#' @export
summary.ipls <- function(object, glob.ncomp = object$gm$ncomp.selected, ...) {

   if (glob.ncomp < 1 || glob.ncomp > object$gm$ncomp) {
      stop("Wrong value for number of components.")
   }

   glob.rmse <- object$gm$cvres$rmse[1, glob.ncomp]
   opt.ncomp <- object$om$ncomp.selected
   opt.rmse <- object$om$cvres$rmse[1, opt.ncomp]
   rmse.suffix <- if (is.null(object$cv)) "P" else "CV"

   cat("\niPLS variable selection results\n")
   fprintf("  Method: %s\n", object$method)
   fprintf("  Validation: %s\n", if (is.null(object$cv)) "test set" else crossval.str(object$cv))
   fprintf("  Number of intervals: %d\n", object$int.num)
   fprintf("  Number of selected intervals: %d\n", length(object$int.selected))
   fprintf("  RMSE%s for global model: %f (%d LVs)\n", rmse.suffix, glob.rmse, glob.ncomp)
   fprintf("  RMSE%s for optimized model: %f (%d LVs)\n", rmse.suffix, opt.rmse, opt.ncomp)
   cat("\nSummary for selection procedure:\n")
   show(object$int.stat)
   cat("\n")
}
Any scripts or data that you put into this service are public.
mdatools documentation built on Sept. 11, 2024, 7:59 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
mdatools
Multivariate Data Analysis for Chemometrics

R/ipls.R
In mdatools: Multivariate Data Analysis for Chemometrics

Defines functions summary.ipls print.ipls plot.ipls plotRMSE.ipls plotSelection.ipls ipls.backward ipls.forward ipls.getintlimits ipls

Documented in ipls ipls.backward ipls.forward plot.ipls plotRMSE.ipls plotSelection.ipls print.ipls summary.ipls

Try the mdatools package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

mdatools Multivariate Data Analysis for Chemometrics

R/ipls.R In mdatools: Multivariate Data Analysis for Chemometrics

Defines functions summary.ipls print.ipls plot.ipls plotRMSE.ipls plotSelection.ipls ipls.backward ipls.forward ipls.getintlimits ipls

Documented in ipls ipls.backward ipls.forward plot.ipls plotRMSE.ipls plotSelection.ipls print.ipls summary.ipls

Try the mdatools package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

mdatools
Multivariate Data Analysis for Chemometrics

R/ipls.R
In mdatools: Multivariate Data Analysis for Chemometrics