ispca: Iterative supervised principal components

Description Usage Arguments Value References Examples

View source: R/ispca.R

Description

Computes dimension reduction based on the iterative supervised principal components algorithm.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
ispca(
  x,
  y,
  nctot = NULL,
  ncsup = NULL,
  exclude = NULL,
  nthresh = NULL,
  thresh = NULL,
  window = 500,
  verbose = TRUE,
  min_score = 1e-04,
  normalize = FALSE,
  center = TRUE,
  scale = TRUE,
  permtest = TRUE,
  permtest_type = "max-marginal",
  alpha = 0.1,
  perms = 500,
  method = "svd",
  ...
)

Arguments

x

The original feature matrix, columns denoting the features and rows the instances.

y

A vector with the observed target values we try to predict using x. Can be factor for classification problems.

nctot

Total number of latent features to extract.

ncsup

Maximum number of latent features to extract that use supervision.

exclude

Columns (variables) in x to ignore when extrating the new features.

nthresh

Number of evaluations when finding the optimal screening threshold at each supervised iteration. Increasing this number can make the supervised iterations more accurate but also increases the computation time.

thresh

Instead of specifying nthresh, one can specify the candidate screening thresholds explicitly. These are numbers between 0 and 1 and are relative to the highest univariate score. By default seq(0, 1-eps, len=nthresh) where eps = 1e-6.

window

Maximum number of features to consider when computing each supervised component. Lowering this number makes the computation faster, but can make the algorithm less accurate if there are more potentially relevant features than this number.

verbose

Whether to print some messages along the way.

min_score

Terminate the computation at the latest when the maximum univariate score drops below this.

normalize

Whether to scale the extracted features so that they all have standard deviation of one.

center

Whether to center the original features before the computation.

scale

Whether to scale the original features to have unit variance before the computation.

permtest

Whether to use permutation test to decide the number of supervised components.

permtest_type

Either 'max-marginal' or 'marginal'.

alpha

Significance level used in the permutation test to decide whether to continue supervised iteration.

perms

Number of permutations to estimate the p-values for univariate scores.

method

Method to compute the principal components. Either 'svd' or 'power'. 'power' can sometimes be slightly faster but in some cases can have very slow convergence.

...

Currently ignored.

Value

ispca-object that is similar in spirit to the object returned by prcomp. The object will have the following elements:

w

The projection (or rotation) matrix W, that transforms the original data X into the new features Z = X W .

z

The extracted latent features corresponding to the training inputs X.

v

Matrix V that is used to compute W. The columns of V indicate which variables become active at each iteration (see the paper below for more information).

sdev

Standard deviations of the new features.

ncsup

How many supervised components were extracted (the rest are computed in an unsupervised manner).

centers

Mean values for the original variables.

scales

Scales of the original variables.

exclude

Excluded variables.

References

Piironen, J. and Vehtari, A. (2018). Iterative supervised principal components. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) PMLR 84: 106-114.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
###

# load data
data("ovarian", package = "dimreduce")
x <- ovarian$x
y <- ovarian$y

# dimension reduction
dr <- ispca(x, y, nctot = 2)
z <- predict(dr, x) # the latent features

jpiironen/dimreduce documentation built on March 18, 2021, 11:52 p.m.