signatureFinder: Main function to find the signature.
In geneSignatureFinder: A Gene-signatures finder tools

Description Usage Arguments Details Value Author(s) See Also Examples

This function implements the algorithm to find the signature using a searching strategy supervised by survival time data.

1 2	signatureFinder(seedGene, logFilePrefix = "", coeffMissingAllowed = 0.75, subsetToUse = 1:ncol(geData), cpuCluster = NULL, stopCpuCluster = TRUE)

`seedGene`	is the integer index pointing to the column (gene) of geData from which the searching strategy has to start. Optionally a list of genes (indexes pointing to the columns of geData) can be provided.
`logFilePrefix`	Is a string containing a prefix of the log file generated by the algorithm. No longer necessary in this upgrade of the package.
`coeffMissingAllowed`	This parameter controls the number of missing values tolerated by the pam classification procedure (see details).
`subsetToUse`	If necessary the costruction of the signature can be restricted to a subset of genes. In this case a list of the columns of geData has to be provided.
`cpuCluster`	If a parallel search is necessary, this variable has to be set to the output of NCPUS() function.
`stopCpuCluster`	flag to control if the channel to the cpu-cluster has to be closed

In the global enviroment two variables have to be set up: geData and stData. geData is a matrix whose columns are the gene expressions and the rows are the samples (see geNSCLC for example). It is recommended that the columns names are instantiated. stData is a variable of the "Surv" class from the package "survival" (see stNSCLG for example).

Starting from the seed gene (a list of seeds is allowed), the next gene added is the one that maximizes the distance of the two survival curves. The list of genes grows until no more gene is able to improve the distance between the survival curves.

A gene (candidateGene) can be added to the running signature if it satisfies two controls: given the classification computed on the gene expressions of geneCandidate + runningSignature, 1) no cluster can have a dimension lower than floor(0.1 * nrow(geData)), and 2) the survival curves cannot cross. When more than 1 candidate gene is proposed, if the number of candidates is greater than 0.01*ncol(geData) the searching stops; otherwise a subset of the candidates is selected using backward strategy.

The parameter coeffMissingAllowed controls an empirical rule having in charge to prevent the crash of the pam() function. The number of joint missing values allowed in a sample described by p gene expression levels is given by floor(p^coeffMissingAllowed).

The function returns a list with the following slots

`signatureName`	is a string for identifying the signature. By default is set to (colnames(geData)[seedGene])[1].
`startingSignature`	is a list of string set to colnames(geData)[seedGene]
`coeffMissingAllowed`	same as input
`startingClassification`	(factor) classification of the samples computed by using the gene expression levels of the startingSignature
`startingTValue`	test-value of the log-rank test computed on the startingSignature
`startingPValue`	p-value corresponding to the startingTValue
`signatureIDs`	indexes pointing to the column of geData providing the sequence of gene expression levels that maximizes the distance between the two survival curves
`signature`	labels corresponding to signatureIDs: colnames(geData)[signatureIDs]
`tValue`	test-value of the log-rank test computed on the signature
`pValue`	p-value corresponding to the tValue
`classification`	(factor) classification of the samples computed by using the gene expression levels of the signature

Stefano M. Pagnotta and Michele Ceccarelli

geNSCLC, stNSCLC.

# find the signature starting from the gene SELP for the Non Small Cell Lung Cancer 
#############
# set the working data 
data(geNSCLC)
geData <- geNSCLC
data(stNSCLC)
stData <- stNSCLC
##############
# set the dimension of the cpu's cluster 
aMakeCluster <- makeCluster(2)
################
# set the starting gene to SELP
geneSeed <- which(colnames(geData) == "SELP")
##################
# run ...
ans <- signatureFinder(geneSeed, cpuCluster = aMakeCluster)
ans