seedsFinder: Evaluate some statistics on all genes in order to select...
In geneSignatureFinder: A Gene-signatures finder tools

Description Usage Arguments Details Value Author(s) See Also Examples

This function works on each column (gene expression level) of the geData and returns the test-value and p-value of the log-rank test, the bayesian information criterion value under the hypothesis that tha data are drawn from a single gaussian (bic1) and a mixture of two gaussians (bic2); at the end the clustering of the samples is added.

1	seedsFinder(cutoff = 1.95, evaluateBICs = TRUE, cpuCluster = NULL)

`cutoff`	argument passed to the BICs() function.
`evaluateBICs`	flag to force the computation of the bayesian information criteria.
`cpuCluster`	If a parallel search is necessary, this variable has to be set to the output of NCPUS() function.

For each gene expression levels data an unbiased classification is performed resulting into two clusters coded by the values 0 and 1. The samples classified by 0 are those for which the mean is lower than that of the samples classified with 1. The classification method is the partitioning around medoids algorithm linked to the a leave-one-out re-classification strategy (see the pamUmbiased() function for further details). From the clusters two survival curves are estimated with the stData data and then tested for the null hypothesis of no difference among them (see the survdiff() function for further details) providing the tValue. The correponding pValue is given by 1-pchisq(tValue, df = 1). Two more indexes are computed, the bayesian information criteria under the hypotheses 1) the gene levels are from a univarite gaussians (bic1) and 2) the gene levels are from a mixture of two gaussians (bic2) (see the BICs() function for further details). The mixing coefficient is estimated from the classification as the fraction of samples classified as 1. The parameters of the gassians are robustly estimated.

The result of the function is a matrix having so many rows as ncol(geData) and 4+nrow(geData) rows.

`column no.1: tValue`	test-value of the log-rank test statistic under the null hypothesis that the two survival curves are equal (see details)
`column no.2: pValue`	p-value corresponding to the test-value in column no.1; actuallly is 1-pchisq(tValue, df = 1)
`column no.3: bic1`	value of the bayesian information criterion computed under the hypothesis that the data are drawn from a single gaussian
`column no.4: bic1`	value of the bayesian information criterion computed under the hypothesis that the data are drawn from mixture of two gaussians
`columns from no.5 to no.4+nrow(geData)`	result of the unbiased classification (see details)

Stefano M. Pagnotta and Michele Ceccarelli

pam, survdiff, BICs.

data(geNSCLC)
geData <- geNSCLC

data(stNSCLC)
stData <- stNSCLC

# here few genes and samples are considered to speed up the timing of the example.
# please, try 
# genesToUse <- which(apply(!is.na(geData), 2, sum)/nrow(geData) > 0.75)
# geData <- geData[, genesToUse]
# and comment stData <- stData[1:50, ]
genesToUse <- which(apply(!is.na(geData), 2, sum) == nrow(geData))
geData <- geData[, genesToUse]
geData <- geData[1:50, ]
stData <- stData[1:50, ]
dim(geData)

aMakeCluster <- makeCluster(2)
aSeedsFinder <- seedsFinder(cpuCluster = aMakeCluster)
head(aSeedsFinder)