fit.rrfe: Reweighted Recursive Feature Elimination (RRFE)

Description Usage Arguments Value Note Author(s) References Examples

View source: R/RRFE.R

Description

Implementation of the Reweighted Recursive Feature Elimination (RRFE) algorithm. mapping must be a data.frame with at least two columns. The column names have to be c('probesetID','graphID'). Where 'probesetID' is the probeset ID present in the expression matrix (i.e. colnames(x)) and 'graphID' is any ID that represents the nodes in the graph (i.e. colnames(Gsub) or rownames(Gsub)). The purpose of the this mapping is that a gene or protein in the network might be represented by more than one probe set on the chip. Therefore, the algorithm must know which genes/protein in the network belongs to which probeset on the chip. However, the method is able to use all feature when one sets the parameter useAllFeatures to TRUE. When doing so, RRFE assigns the minimal wheight returned by GeneRank to those genes which are not present in Gsub.

Usage

1
2
3
4
  fit.rrfe(x, y, DEBUG = FALSE,
    scale = c("center", "scale"), Cs = 10^c(-3:3),
    stepsize = 0.1, useAllFeatures = F, mapping, Gsub,
    d = 0.5)

Arguments

x

a p x n matrix of expression measurements with p samples and n genes.

y

a factor of length p comprising the class labels.

DEBUG

should debugging information be plotted.

scale

a character vector defining if the data should be centered and/or scaled. Possible values are center and/or scale. Defaults to c('center', 'scale').

Cs

soft-margin tuning parameter of the SVM. Defaults to 10^c(-3:3).

stepsize

amount of features that are discarded in each step of the feature elimination. Defaults to 10%.

useAllFeatures

should all features be used for classification (see also Details).

mapping

a mapping that defines how probe sets are summarized to genes.

Gsub

an adjacency matrix that represents the underlying biological network.

d

the damping factor which controls the influence of the network data and the fold change on the ranking of the genes. Defaults to 0.5

Value

a RRFE fit object.

features

the selected features

error.bound

the span bound of the model

fit

the fitted SVM model

Note

The optimal number of features is found by using the span estimate. See Chapelle, O., Vapnik, V., Bousquet, O., and Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46(1), 131-159.

Author(s)

Marc Johannes JohannesMarc@gmail.com

References

Johannes M, et al. (2010). Integration Of Pathway Knowledge Into A Reweighted Recursive Feature Elimination Approach For Risk Stratification Of Cancer Patients. Bioinformatics

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Not run: 
library(Biobase)
data(sample.ExpressionSet)
x <- t(exprs(sample.ExpressionSet))
y <- factor(pData(sample.ExpressionSet)$sex)
# create the mapping
library('hgu95av2.db')
mapped.probes <- mappedkeys(hgu95av2REFSEQ)
refseq <- as.list(hgu95av2REFSEQ[mapped.probes])
times <- sapply(refseq, length)
mapping <- data.frame(probesetID=rep(names(refseq), times=times), graphID=unlist(refseq), 
row.names=NULL, stringsAsFactors=FALSE)
mapping <- unique(mapping)
library(pathClass)
data(adjacency.matrix)
res.rrfe <- crossval(x, y, DEBUG=TRUE, theta.fit=fit.rrfe, folds=3, repeats=1, parallel=TRUE,
 Cs=10^(-3:3), mapping=mapping, Gsub=adjacency.matrix, d=1/2)

## End(Not run)

pathClass documentation built on May 29, 2017, 11:44 p.m.