mipp: MiPP-based Classification

Description Usage Arguments Value Author(s) References Examples

View source: R/MiPP.R

Description

Finds optimal sets of genes for classification

Usage

1
2
3
4
5
mipp(x, y, x.test = NULL, y.test = NULL, probe.ID = NULL, 
    rule = "lda", method.cut = "t.test", percent.cut = 0.01, 
    model.sMiPP.margin = 0.01, min.sMiPP = 0.85, n.drops = 2, 
    n.fold = 5, p.test = 1/3, n.split = 20, 
    n.split.eval = 100) 

Arguments

x

data matrix

y

class vector

x.test

test data matrix if available

y.test

test class vector if available

probe.ID

probe set IDs; if NULL, row numbers are assigned.

rule

classification rule: "lda","qda","logistic","svmlin","svmrbf"; the default is "lda".

method.cut

method for pre-selection; t-test is available.

percent.cut

proportion of pre-selected genes; the default is 0.01.

model.sMiPP.margin

smallest set of genes s.t. sMiPP <= (max sMiPP-model.sMiPP.margin); the default is 0.01.

min.sMiPP

Adding genes stops if max sMiPP is at least min.sMiPP; the default is 0.85.

n.drops

Adding genes stops if sMiPP decreases (n.drops) times, in addition to min.sMiPP criterion.; the default is 2.

n.fold

number of folds; default is 5.

p.test

partition percent of train and test samples when test samples are not available; the default is 1/3 for test set.

n.split

number of splits; the default is 20.

n.split.eval

numbr of splits for evalutation; the default is 100.

Value

model

candiadate genes (for each split if no indep set is available

model.eval

Optimal sets of genes for each split when no indep set is available

Author(s)

Soukup M, Cho H, and Lee JK

References

Soukup M, Cho H, and Lee JK (2005). Robust classification modeling on microarray data using misclassification penalized posterior, Bioinformatics, 21 (Suppl): i423-i430.

Soukup M and Lee JK (2004). Developing optimal prediction models for cancer classification using gene expression data, Journal of Bioinformatics and Computational Biology, 1(4) 681-694

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
##########
#Example 1: When an independent test set is available

data(leukemia)

#Normalize combined data
leukemia <- cbind(leuk1, leuk2)
leukemia <- mipp.preproc(leukemia, data.type="MAS4")

#Train set
x.train <- leukemia[,1:38]
y.train <- factor(c(rep("ALL",27),rep("AML",11)))

#Test set
x.test <- leukemia[,39:72]
y.test <- factor(c(rep("ALL",20),rep("AML",14)))


#Compute MiPP
out <- mipp(x=x.train, y=y.train, x.test=x.test, y.test=y.test, probe.ID = 1:nrow(x.train), n.fold=5, percent.cut=0.05, rule="lda")

#Print candidate models
out$model



##########
#Example 2: When an independent test set is not available

data(colon)

#Normalize data
x <- mipp.preproc(colon)
y <- factor(c("T", "N", "T", "N", "T", "N", "T", "N", "T", "N", 
       "T", "N", "T", "N", "T", "N", "T", "N", "T", "N",
       "T", "N", "T", "N", "T", "T", "T", "T", "T", "T", 
       "T", "T", "T", "T", "T", "T", "T", "T", "N", "T", 
       "T", "N", "N", "T", "T", "T", "T", "N", "T", "N", 
       "N", "T", "T", "N", "N", "T", "T", "T", "T", "N", 
       "T", "N"))


#Deleting comtaminated chips
x <- x[,-c(51,55,45,49,56)]
y <- y[ -c(51,55,45,49,56)]

#Compute MiPP
out <- mipp(x=x, y=y, probe.ID = 1:nrow(x), n.fold=5, p.test=1/3, n.split=5, n.split.eval=100, 
percent.cut= 0.1, rule="lda")

#Print candidate models for each split
out$model

#Print optimal models and independent evaluation for each split
out$model.eval

MiPP documentation built on Nov. 8, 2020, 8:31 p.m.