iter.subset: Performance evaluation by subsetting data sets in 100...

Description Usage Arguments Details Value Author(s) Examples

Description

A data set can be split to different subsets to determine if the performance derived from its subsets is improved by the increase of sample size. Each subset can then be split 100 times into the independent training and testing sets. The sample size of the training set is set by the user (20,50,...,up to 2/3 of the complete set) and the remaining samples are used for the testing set. A gene signature will be derived from the training set and assessed on the testing set.

The performance obtained from the larger subsets and ultimately, the complete set is more likely higher than the performance generated from the smaller subsets. If it is not the case, the performance improvement might have been retained by factors such as heterogeneity with respect to patient's cohort or tumor characteristics.

Usage

1
iter.subset(data, surv, censor, method = "none", gn.nb = 50, train.nb = 100)

Arguments

data

Matrix of gene expression data.

surv

Vector of survival times.

censor

Vector of censoring status. 1 = event occurred, 0 = censored.

method

A character string specifying the feature selection method: "none" for top-ranking or one of the adjusting methods specified by the p.adjust function.

gn.nb

An integer specifying the number of genes to select.

train.nb

An integer specifying the sample size of the training set.

Details

In top-ranking, genes are selected based on univariate Cox P-value ranking using the coxph function in the R survival package. In this feature selection method, the genes are ranked based on their likelihood ratio P-value and the top-gn.nb ranked genes with the smallest P-values are retained as the gene signature.

The p.adjust function in the stats package is used and all adjusted p-values not greater than 0.05 are retained if method != "none".

Value

Mean of AUC +/- standard deviation of AUC, geometric mean of HR (CI).

Author(s)

Haleh Yasrebi

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
data(gse4335)
data(gse4335pheno)
#The following script might be lengthy
#iter.subset(gse4335, gse4335pheno[,6],gse4335pheno[,5])
## The function is currently defined as
function (data, surv, censor,method = "none", gn.nb = 50, train.nb = 100){

        require (survival)
        require (survivalROC)

        data =data[!is.na(surv),]
        censor= censor[!is.na(surv)]
        surv= surv[!is.na(surv)]
	
        res = NULL
        iteration.nb = 100
	
        cat ("Iteration\tAUC\tHR(CI)\t\tP-val\n")

        for (i in 1:iteration.nb){
                new.lst = eval.subset(data, surv, censor,i, method, gn.nb, train.nb)
                res = rbind (res, new.lst)
        }

        cat ("Avg AUC+/-SD\tHR(CI)\n")

        cat (sprintf("%.2f",mean(res[,1], na.rm = T)),  "+/-", 
        sprintf("%.2f",sd (res[,1],na.rm = T)), "\t", 
        sprintf("%.2f",gm(res[,2])), "(", 
        sprintf("%.2f",ci.gm(res[,2])[1]), "-",
        sprintf("%.2f",ci.gm(res[,2])[2]), ")\n", 
        sep = "")       
}

survJamda documentation built on May 1, 2019, 8:50 p.m.