Variable selection and cluster functions

Share:

Description

Different functions for a variable selection and clustering methods. These functions are mainly used for the function MCRestimate

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
identity(sample.gene.matrix,classfactor,...)
       varSel.highest.t.stat(sample.gene.matrix,classfactor,theParameter=NULL,var.numbers=500,...)

       varSel.highest.var(sample.gene.matrix,classfactor,theParameter=NULL,var.numbers=2000,...)

       varSel.AUC(sample.gene.matrix, classfactor, theParameter=NULL,var.numbers=200,...)
       cluster.kmeans.mean(sample.gene.matrix,classfactor,theParameter=NULL,number.clusters=500,...)

       varSel.removeManyNA(sample.gene.matrix,classfactor, theParameter=NULL, NAthreshold=0.25,...)
       varSel.impute.NA(sample.gene.matrix ,classfactor,theParameter=NULL,...)

Arguments

sample.gene.matrix

a matrix in which the rows corresponds to genes and the colums corresponds to samples

classfactor

a factor containing the values that should be predicted

theParameter

Parameter that depends on the function. For 'cluster.kmeans.mean' either NULL or an output of the function kmeans. If it is NULL then kmeans will be used to form clusters of the genes. Otherwise the already existing clusters will be used. In both ways there will be a calculation of the metagene intensities afterwards. For the other functions either NULL or a logical vector which indicates for every gene if it should be left out from further analysis or not

number.clusters

parameter which specifies the number of clusters

var.numbers

some methods needs an argument which specifies how many variables should be taken

NAthreshold

integer- if the percentage of the NA is higher than this threshold the variable will be deleted

...

Further parameters

Details

metagene.kmeans.mean performs a kmeans clustering with a number of clusters specified by 'number clusters' and takes the mean of each cluster. varSel.highest.var selects a number (specified by 'var.numbers') of variables with the highest variance. varSel.AUC chooses the most discriminating variables due to the AUC criterium (the library ROC is required).

Value

Every function returns a list consisting of two arguments:

matrix

the result matrix of the variable reduction or the clustering

parameter

The parameter which are used to reproduce the algorithm, i.e. a vector which indicates for every gene if it will be left out from further analysis or not if a gene reduction is performed or the output of the function kmeans for the clustering algorithm.

Author(s)

Markus Ruschhaupt mailto:m.ruschhaupt@dkfz.de

See Also

MCRestimate

Examples

1
2
m <- matrix(c(rnorm(10,2,0.5),rnorm(10,4,0.5),rnorm(10,7,0.5),rnorm(10,2,0.5),rnorm(10,4,0.5),rnorm(10,2,0.5)),ncol=2)
cluster.kmeans.mean(m ,number.clusters=3)