step_spca: Sparse Principal Components Analysis Variable Reduction
In brian-j-smith/MachineShop: Machine Learning Models and Tools

step_spca

R Documentation

Sparse Principal Components Analysis Variable Reduction

Description

Creates a specification of a recipe step that will derive sparse principal components from one or more numeric variables.

Usage

step_spca(
  recipe,
  ...,
  num_comp = 5,
  sparsity = 0,
  num_var = integer(),
  shrinkage = 1e-06,
  center = TRUE,
  scale = TRUE,
  max_iter = 200,
  tol = 0.001,
  replace = TRUE,
  prefix = "SPCA",
  role = "predictor",
  skip = FALSE,
  id = recipes::rand_id("spca")
)

## S3 method for class 'step_spca'
tunable(x, ...)

Arguments

`recipe`	recipe object to which the step will be added.
`...`	one or more selector functions to choose which variables will be used to compute the components. See `selections` for more details. These are not currently used by the `tidy` method.
`num_comp`	number of components to derive. The value of `num_comp` will be constrained to a minimum of 1 and maximum of the number of original variables when `prep` is run.
`sparsity`, `num_var`	sparsity (L1 norm) penalty for each component or number of variables with non-zero component loadings. Larger sparsity values produce more zero loadings. Argument `sparsity` is ignored if `num_var` is given. The argument value may be a single number applied to all components or a vector of component-specific numbers.
`shrinkage`	numeric shrinkage (quadratic) penalty for the components to improve conditioning; larger values produce more shrinkage of component loadings toward zero.
`center`, `scale`	logicals indicating whether to mean center and standard deviation scale the original variables prior to deriving components, or functions or names of functions for the centering and scaling.
`max_iter`	maximum number of algorithm iterations allowed.
`tol`	numeric tolerance for the convergence criterion.
`replace`	logical indicating whether to replace the original variables.
`prefix`	character string prefix added to a sequence of zero-padded integers to generate names for the resulting new variables.
`role`	analysis role that added step variables should be assigned. By default, they are designated as model predictors.
`skip`	logical indicating whether to skip the step when the recipe is baked. While all operations are baked when `prep` is run, some operations may not be applicable to new data (e.g. processing outcome variables). Care should be taken when using `skip = TRUE` as it may affect the computations for subsequent operations.
`id`	unique character string to identify the step.
`x`	`step_spca` object.

Details

Sparse principal components analysis (SPCA) is a variant of PCA in which the original variables may have zero loadings in the linear combinations that form the components.

Value

Function step_spca creates a new step whose class is of the same name and inherits from step_lincomp, adds it to the sequence of existing steps (if any) in the recipe, and returns the updated recipe. For the tidy method, a tibble with columns terms (selectors or variables selected), weight of each variable loading in the components, and name of the new variable names; and with attribute pev containing the proportions of explained variation.

References

Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286.

Examples


## Requires prior installation of suggested package elasticnet to run

library(recipes)

rec <- recipe(rating ~ ., data = attitude)
spca_rec <- rec %>%
  step_spca(all_predictors(), num_comp = 5, sparsity = 1)
spca_prep <- prep(spca_rec, training = attitude)
spca_data <- bake(spca_prep, attitude)

pairs(spca_data, lower.panel = NULL)

tidy(spca_rec, number = 1)
tidy(spca_prep, number = 1)

brian-j-smith/MachineShop documentation built on June 12, 2025, 3:52 a.m.