plsmod-package: parsnip methods for partial least squares (PLS)

plsmod-packageR Documentation

parsnip methods for partial least squares (PLS)

Description

plsmod offers a function to fit ordinary, sparse, and discriminant analysis PLS models.

Details

The model function works with the tidymodels infrastructure so that the model can be resampled, tuned, tided, etc.

Examples

For regression, let’s use the Tecator data in the modeldata package:

library(tidymodels)
library(plsmod)
tidymodels_prefer()
theme_set(theme_bw())

data(meats, package = "modeldata")

Note that using tidymodels_prefer() will resulting getting parsnip::pls() instead of mixOmics::pls() when simply running pls().

Although plsmod can fit multivariate models, we’ll concentration on a univariate model that predicts the percentage of protein in the samples.

meats <- meats %>% select(-water, -fat)

We define a sparse PLS model by setting the predictor_prop argument to a value less than one. This allows the model fitting process to set certain loadings to zero via regularization.

sparse_pls_spec <- 
  pls(num_comp = 10, predictor_prop = 1/3) %>% 
  set_engine("mixOmics") %>% 
  set_mode("regression")

The model is fit either with a formula or by passing the predictors and outcomes separately:

form_fit <- 
  sparse_pls_spec %>% 
  fit(protein ~ ., data = meats)
form_fit
## parsnip model object
## 
## 
## Call:
##  mixOmics::spls(X = x, Y = y, ncomp = ncomp, keepX = keepX) 
## 
##  sPLS with a 'regression' mode with 10 sPLS components. 
##  You entered data X of dimensions: 215 100 
##  You entered data Y of dimensions: 215 1 
## 
##  Selection of [34] [34] [34] [34] [34] [34] [34] [34] [34] [34] variables on each of the sPLS components on the X data set. 
##  Selection of [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] variables on each of the sPLS components on the Y data set. 
## 
##  Main numerical outputs: 
##  -------------------- 
##  loading vectors: see object$loadings 
##  variates: see object$variates 
##  variable names: see object$names 
## 
##  Functions to visualise samples: 
##  -------------------- 
##  plotIndiv, plotArrow 
## 
##  Functions to visualise variables: 
##  -------------------- 
##  plotVar, plotLoadings, network, cim
# or 

sparse_pls_spec %>% 
  fit_xy(x = meats %>% select(-protein), y = meats$protein)
## parsnip model object
## 
## 
## Call:
##  mixOmics::spls(X = x, Y = y, ncomp = ncomp, keepX = keepX) 
## 
##  sPLS with a 'regression' mode with 10 sPLS components. 
##  You entered data X of dimensions: 215 100 
##  You entered data Y of dimensions: 215 1 
## 
##  Selection of [34] [34] [34] [34] [34] [34] [34] [34] [34] [34] variables on each of the sPLS components on the X data set. 
##  Selection of [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] variables on each of the sPLS components on the Y data set. 
## 
##  Main numerical outputs: 
##  -------------------- 
##  loading vectors: see object$loadings 
##  variates: see object$variates 
##  variable names: see object$names 
## 
##  Functions to visualise samples: 
##  -------------------- 
##  plotIndiv, plotArrow 
## 
##  Functions to visualise variables: 
##  -------------------- 
##  plotVar, plotLoadings, network, cim

The pls() function can also be used with categorical outcomes.

Author(s)

Maintainer: Max Kuhn max@rstudio.com (ORCID)

Other contributors:

  • RStudio [copyright holder]

See Also

Useful links:


topepo/projections documentation built on Sept. 17, 2022, 12:03 p.m.