seedsFinder: Evaluate some statistics on all genes in order to select...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/seedsFinder.R

Description

This function works on each column (gene expression level) of the geData and returns the test-value and p-value of the log-rank test, the bayesian information criterion value under the hypothesis that tha data are drawn from a single gaussian (bic1) and a mixture of two gaussians (bic2); at the end the clustering of the samples is added.

Usage

1
seedsFinder(cutoff = 1.95, evaluateBICs = TRUE, cpuCluster = NULL)

Arguments

cutoff

argument passed to the BICs() function.

evaluateBICs

flag to force the computation of the bayesian information criteria.

cpuCluster

If a parallel search is necessary, this variable has to be set to the output of NCPUS() function.

Details

For each gene expression levels data an unbiased classification is performed resulting into two clusters coded by the values 0 and 1. The samples classified by 0 are those for which the mean is lower than that of the samples classified with 1. The classification method is the partitioning around medoids algorithm linked to the a leave-one-out re-classification strategy (see the pamUmbiased() function for further details). From the clusters two survival curves are estimated with the stData data and then tested for the null hypothesis of no difference among them (see the survdiff() function for further details) providing the tValue. The correponding pValue is given by 1-pchisq(tValue, df = 1). Two more indexes are computed, the bayesian information criteria under the hypotheses 1) the gene levels are from a univarite gaussians (bic1) and 2) the gene levels are from a mixture of two gaussians (bic2) (see the BICs() function for further details). The mixing coefficient is estimated from the classification as the fraction of samples classified as 1. The parameters of the gassians are robustly estimated.

Value

The result of the function is a matrix having so many rows as ncol(geData) and 4+nrow(geData) rows.

column no.1: tValue

test-value of the log-rank test statistic under the null hypothesis that the two survival curves are equal (see details)

column no.2: pValue

p-value corresponding to the test-value in column no.1; actuallly is 1-pchisq(tValue, df = 1)

column no.3: bic1

value of the bayesian information criterion computed under the hypothesis that the data are drawn from a single gaussian

column no.4: bic1

value of the bayesian information criterion computed under the hypothesis that the data are drawn from mixture of two gaussians

columns from no.5 to no.4+nrow(geData)

result of the unbiased classification (see details)

Author(s)

Stefano M. Pagnotta and Michele Ceccarelli

See Also

pam, survdiff, BICs.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(geNSCLC)
geData <- geNSCLC

data(stNSCLC)
stData <- stNSCLC

# here few genes and samples are considered to speed up the timing of the example.
# please, try 
# genesToUse <- which(apply(!is.na(geData), 2, sum)/nrow(geData) > 0.75)
# geData <- geData[, genesToUse]
# and comment stData <- stData[1:50, ]
genesToUse <- which(apply(!is.na(geData), 2, sum) == nrow(geData))
geData <- geData[, genesToUse]
geData <- geData[1:50, ]
stData <- stData[1:50, ]
dim(geData)

aMakeCluster <- makeCluster(2)
aSeedsFinder <- seedsFinder(cpuCluster = aMakeCluster)
head(aSeedsFinder)

geneSignatureFinder documentation built on May 2, 2019, 2:32 p.m.