spca: Supervised (and unsupervised) principal components

Description Usage Arguments Value Details References Examples

View source: R/spca.R

Description

Computes dimension reduction based on the supervised principal components algorithm. In essense, algorithm performs a screening step based on univariate scores for the features, and then computes standard PCA on using only the retained features.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
spca(
  x,
  y = NULL,
  nctot = NULL,
  ncsup = NULL,
  window = 500,
  exclude = NULL,
  verbose = TRUE,
  normalize = FALSE,
  preprocess = TRUE,
  alpha = NULL,
  perms = 1000,
  screenthresh = NULL,
  nfeat = NULL,
  sup.only = FALSE,
  ...
)

Arguments

x

The original feature matrix, columns denoting the features and rows the instances.

y

A vector with the observed target values we try to predict using x. Can be factor for classification problems. If missing, then this function computes standard unsupervised principal components.

nctot

Total number of latent features to extract.

ncsup

Maximum number of latent features to extract that use supervision. If nctot > ncsup, then the remaining nctot-ncsup features are computed in unsupervised manner (or ignored if sup.only=TRUE).

window

Maximum number of features that will survive the screening and from which the supervised components are computed. Affects also how the screenthresh-argument is interpreted.

exclude

Columns (variables) in x to ignore when extrating the new features.

verbose

Whether to print some messages along the way.

normalize

Whether to scale the extracted features so that they all have standard deviation of one.

preprocess

Whether to center and scale the features before extracting the new features.

alpha

Significance level for the p-values of the univariate scores used to determine which features survive the screening and are used to compute the supervised components.

perms

Number of permutations to estimate the p-values for univariate scores.

screenthresh

Value between 0 and 1 (or NULL). If not NULL, then no permutation tests are run, and the supervised components are computed among those features that have their univariate score equal or larger than this. Value 1 means that only the feature with the highest score survives the screening, whereas value 0 means that the top min(window, ncol(x)) survive the screening. Overwrites also nfeat-argument.

nfeat

Number of features to retain in the screening step. If this option is used, then the algorithm does not perform the permutation tests for the p-values, but instead computes the supervised components from those features that have their univariate score among the nfeat highest scores (in this case perms and alpha are ignored).

sup.only

If TRUE, then no unsupervised components are ever computed even if the number of supervised components that could be extracted was less than nctot.

...

Currently ignored.

Value

spca-object that is similar to the object returned by prcomp. The object will have the following elements:

w

The projection (or rotation) matrix W, that transforms the original data X into the new features Z = X W .

z

The extracted latent features corresponding to the training inputs X.

v

Matrix V that is used to compute W when combining supervised and unsupervised components (see the Piironen and Vehtari (2018) for more information).

sdev

Standard deviations of the new features.

centers

Mean values for the original variables.

scales

Scales of the original variables.

exclude

Excluded variables.

Details

In the original paper, the authors proposed estimating the screening threshold using cross-validation for the model obtained when the extracted features are used for regression or classification. This implementation performs the screening based on the estimated p-values for the univariate scores (these are estimated using a permutation test) and the screening step retains only those features with p-value less than the specified level alpha.

References

Bair, E., Hastie, T., Paul, D., and Tibshirani, R. (2006). Prediction by supervised principal components. Journal of the American Statistical Association, 101(473):119-137.

Piironen, J. and Vehtari, A. (2018). Iterative supervised principal components. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) PMLR 84: 106-114.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
###

# load data
data("ovarian", package = "dimreduce")
x <- ovarian$x
y <- ovarian$y

# dimension reduction
dr <- spca(x, y, nctot = 2)
z <- predict(dr, x) # the latent features

jpiironen/dimreduce documentation built on March 18, 2021, 11:52 p.m.