Feature selection for supervised principal components

Description

Forms reduced models to approximate the supervised principal component predictor.

Usage

1
2
superpc.predict.red(fit, data, data.test, threshold, n.components = 3, n.shrinkage= 20, shrinkages=NULL,compute.lrtest = TRUE, sign.wt="both",  prediction.type =
                 c("continuous", "discrete"), n.class = 2 )

Arguments

fit

Object returned by superpc.train

data

Training data object, of form described in superpc.train dcoumentation

data.test

Test data object; same form as train

threshold

Feature score threshold; usually estimated from superpc.cv

n.components

Number of principal components to examine; should equal 1,2, etc up to the number of components used in training

n.shrinkage

Number of shrinkage values to consider. Default 20.

shrinkages

Shrinkage values to consider. Default NULL.

compute.lrtest

Should the likelihood ratio test be computed? Default TRUE

sign.wt

Signs of feature weights allowed: "both", "pos", or "neg"

prediction.type

Type of prediction: "continuous" (Default) or "discrete". In the latter, superprc score is divided into n.class groups

n.class

Number of groups for discrete predictor. Default 2.

Details

Soft-thresholding by each of the "shrinkages" values is applied to the PC loadings. This reduce the number of features used in the model. The reduced predictor is then used in place of the supervised PC predictor.

Value

shrinkages

Shrinkage values used

lrtest.reduced

Likelihood ratio tests for reduced models

num.features

Number of features used in each reduced model

feature.list

List of features used in each reduced model

coef

Least squares coefficients for each reduced model

import

Importance scores for features

wt

Weight for each feature, in constructing the reduced predictor

v.test

Outcome predictor from reduced models. Array of n.shrinkage by (number of test observations)

v.test.1df

Outcome combined predictor from reduced models. Array of n.shrinkage by (number of test observations)

n.components

Number of principal components used

type

Type of outcome

call

calling sequence

Author(s)

Eric Bair and Robert Tibshirani

References

~put references to the literature/web site here ~

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
set.seed(332)
#generate some data

x<-matrix(rnorm(1000*40),ncol=40)
y<-10+svd(x[1:60,])$v[,1]+ .1*rnorm(40)
ytest<-10+svd(x[1:60,])$v[,1]+ .1*rnorm(40)
censoring.status<- sample(c(rep(1,30),rep(0,10)))
censoring.status.test<- sample(c(rep(1,30),rep(0,10)))

featurenames <- paste("feature",as.character(1:1000),sep="")
data<-list(x=x,y=y, censoring.status=censoring.status, featurenames=featurenames)
data.test<-list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames= featurenames)



a<- superpc.train(data, type="survival")

fit<- superpc.predict(a, data, data.test, threshold=1.0, n.components=1, prediction.type="continuous")

fit.red<- superpc.predict.red(a,data, data.test, threshold=.6)
superpc.plotred.lrtest(fit.red)