spca: Perform supervised principal component analysis (SPCA) on the...

View source: R/dimension_reduction.R

spcaR Documentation

Perform supervised principal component analysis (SPCA) on the data. The optimal threshold is estimated using cross validation. This produces a threshold which yields a subset of the columns. See original paper (https://tibshirani.su.domains/ftp/spca.pdf) for details.

Description

Perform supervised principal component analysis (SPCA) on the data. The optimal threshold is estimated using cross validation. This produces a threshold which yields a subset of the columns. See original paper (https://tibshirani.su.domains/ftp/spca.pdf) for details.

Usage

spca(
  df,
  X_cols,
  y_col = "CASE_CNTL",
  min.threshold = 0.1,
  max.threshold = 5,
  num.thresholds = 100,
  test.size = 0.3,
  verbose = TRUE
)

Arguments

df

(data.frame) NEBCS data

X_cols

(array<character>) column names for the explanatory variables

y_col

(character) (default='CASE_CNTL') column name for the response variable

min.threshold

(number) (default=0.1) minimum threshold for the dimension reduction process

max.threshold

(number) (default=5) maximum threshold for the dimension reduction process

num.thresholds

(integer) (default=100) number of thresholds to use

test.size

(number) (default=0.3) percentage of the data to use for testing during cross-validation

verbose

(bool) (default=TRUE) whether to print progress to the console

Value

(list) Named list with the following components: - cols (number) optimal columns (lowest mse) - model (number) optimal model (lowest mse) - mse (number) optimal mean squared error - threshold (number) optimal threshold

Examples

res = spca(df, X_cols=c("ARSENIC", ...))
res$model
res$cols

paulsavala/nebcs documentation built on March 20, 2022, 9:24 a.m.