spca: Supervised (and unsupervised) principal components
In jpiironen/dimreduce: Supervised Dimension Reduction

Description Usage Arguments Value Details References Examples

View source: R/spca.R

Computes dimension reduction based on the supervised principal components algorithm. In essense, algorithm performs a screening step based on univariate scores for the features, and then computes standard PCA on using only the retained features.

spca(
  x,
  y = NULL,
  nctot = NULL,
  ncsup = NULL,
  window = 500,
  exclude = NULL,
  verbose = TRUE,
  normalize = FALSE,
  preprocess = TRUE,
  alpha = NULL,
  perms = 1000,
  screenthresh = NULL,
  nfeat = NULL,
  sup.only = FALSE,
  ...
)

`x`	The original feature matrix, columns denoting the features and rows the instances.
`y`	A vector with the observed target values we try to predict using `x`. Can be factor for classification problems. If missing, then this function computes standard unsupervised principal components.
`nctot`	Total number of latent features to extract.
`ncsup`	Maximum number of latent features to extract that use supervision. If `nctot > ncsup`, then the remaining `nctot-ncsup` features are computed in unsupervised manner (or ignored if `sup.only=TRUE`).
`window`	Maximum number of features that will survive the screening and from which the supervised components are computed. Affects also how the `screenthresh`-argument is interpreted.
`exclude`	Columns (variables) in `x` to ignore when extrating the new features.
`verbose`	Whether to print some messages along the way.
`normalize`	Whether to scale the extracted features so that they all have standard deviation of one.
`preprocess`	Whether to center and scale the features before extracting the new features.
`alpha`	Significance level for the p-values of the univariate scores used to determine which features survive the screening and are used to compute the supervised components.
`perms`	Number of permutations to estimate the p-values for univariate scores.
`screenthresh`	Value between 0 and 1 (or `NULL`). If not `NULL`, then no permutation tests are run, and the supervised components are computed among those features that have their univariate score equal or larger than this. Value 1 means that only the feature with the highest score survives the screening, whereas value 0 means that the top `min(window, ncol(x))` survive the screening. Overwrites also `nfeat`-argument.
`nfeat`	Number of features to retain in the screening step. If this option is used, then the algorithm does not perform the permutation tests for the p-values, but instead computes the supervised components from those features that have their univariate score among the `nfeat` highest scores (in this case `perms` and `alpha` are ignored).
`sup.only`	If `TRUE`, then no unsupervised components are ever computed even if the number of supervised components that could be extracted was less than `nctot`.
`...`	Currently ignored.

spca-object that is similar to the object returned by prcomp. The object will have the following elements:

w: The projection (or rotation) matrix W, that transforms the original data X into the new features Z = X W .
z: The extracted latent features corresponding to the training inputs X.
v: Matrix V that is used to compute W when combining supervised and unsupervised components (see the Piironen and Vehtari (2018) for more information).
sdev: Standard deviations of the new features.
centers: Mean values for the original variables.
scales: Scales of the original variables.
exclude: Excluded variables.

In the original paper, the authors proposed estimating the screening threshold using cross-validation for the model obtained when the extracted features are used for regression or classification. This implementation performs the screening based on the estimated p-values for the univariate scores (these are estimated using a permutation test) and the screening step retains only those features with p-value less than the specified level alpha.

Bair, E., Hastie, T., Paul, D., and Tibshirani, R. (2006). Prediction by supervised principal components. Journal of the American Statistical Association, 101(473):119-137.

Piironen, J. and Vehtari, A. (2018). Iterative supervised principal components. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) PMLR 84: 106-114.

###

# load data
data("ovarian", package = "dimreduce")
x <- ovarian$x
y <- ovarian$y

# dimension reduction
dr <- spca(x, y, nctot = 2)
z <- predict(dr, x) # the latent features