The plsmod package serves as an extension to the parsnip package that enables tiydmodels users to fit several types of partial least squares models (PLS). If you are unfamiliar with tidymodels, please see the beginning help pages as tidymodels.org.

With tidymodels, there are possible engines that can be used to fit a particular model. For PLS, the possible models are:

library(plsmod)
show_engines("pls")

An example

For a demonstration of regression modeling, well use the Tecator data in the modeldata package:

library(dplyr)
data(meats, package = "modeldata")

Note that using tidymodels_prefer() will resulting getting parsnip::pls() instead of mixOmics::pls() when simply running pls().

Although plsmod can fit multivariate models, we'll concentration on a univariate model that predicts the percentage of protein in the samples.

meats <- meats %>% select(-water, -fat)

We define a sparse PLS model by setting the predictor_prop argument to a value less than one. This allows the model fitting process to set certain loadings to zero via regularization.

sparse_pls_spec <- 
  pls(num_comp = 10, predictor_prop = 1 / 3) %>% 
  set_engine("mixOmics") %>% 
  set_mode("regression")

The model is fit either with a formula or by passing the predictors and outcomes separately:

form_fit <- 
  sparse_pls_spec %>% 
  fit(protein ~ ., data = meats)
form_fit

# or 

sparse_pls_spec %>% 
  fit_xy(x = meats %>% select(-protein), y = meats$protein)

The pls() function can also be used with categorical outcomes, provided that the outcome column is an R factor vector.

The number of components and the amount of sparsity can be optimized using the tune package. See Chapter 17 of Tidy Models with R for more information and examples on how to tune model hyperparameters using tidymodels.

For prediction, the basic predict() method can be used:

predict(form_fit, head(meats))


topepo/projections documentation built on April 13, 2024, 12:16 a.m.