new expression density diagnostics interface

Share:

Description

this will replace edd.unsupervised; has more sensible parameters

Usage

1
2
3
edd(eset, distList=eddDistList, tx=c(sort,flatQQNormY)[[1]],
	refDist=c("multiSim", "theoretical")[1], 
	method=c("knn", "nnet", "test")[1], nRowPerCand=100, ...)

Arguments

eset

eset – instance of Biobase ExpressionSet class

distList

distList – list comprised of eddDist objects

tx

tx – transformation of data and reference prior to classification

refDist

refDist – type of reference distribution system to use

method

method – type of classifier to use. knn is k-nearest neighbors, nnet is neural net, test is max p-value from ks.test

nRowPerCand

nRowPerCand – number of realizations for a multiSim reference system

...

... – parameters to classifiers

Details

Classifies genes according to distributional shape, by comparing observed expression distributions to a collection of references, which may be simulated or evaluated theoretically.

The distList argument is important. It enumerates the catalog of distributions for classification of gene expression vectors by distributional shape. See the HOWTO-edd vignette for information on how this list is constructed and how it can be extended.

The tx argument specifies how the data are processed for comparison to the reference catalog. This is a function on a vector returning a vector, but the input and the output need not have the same length. The default value of tx is sort, which entails that the order statistics are treated as multivariate data for classification.

The refDist argument selects the type of reference catalog. Options are 'multiSim', for which the reference consists of nRowPerCand realizations of each catalog entry, and 'theoretical', for which the reference consists of one vector of quantiles for each catalog entry.

The method argument selects the type of classifier. It would be desirable to allow this to be a function, but there is insufficient structure on classifier argument and return value structure to permit this at present; see the e1071 package for some work on handling various classifiers programmatically (e.g., tune).

Value

a character vector or factor depending on the classifier

Author(s)

Vince Carey <stvjc@channing.harvard.edu>

See Also

ExpressionSet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
require(Biobase)
data(sample.ExpressionSet)
# should filter to genes with reasonable variation
table( edd(sample.ExpressionSet, meth="nnet", size=10, decay=.2) )
library(golubEsets)
data(Golub_Merge)
madvec <- apply(exprs(Golub_Merge),1,mad)
minvec <- apply(exprs(Golub_Merge),1,min)
keep <- (madvec > median(madvec)) & (minvec > 300)
gmfilt <- Golub_Merge[keep==TRUE,]
ALL <- gmfilt$ALL.AML=="ALL"
gall <- gmfilt[,ALL==TRUE]
gaml <- gmfilt[,ALL==FALSE]
alldists <- edd(gall, meth="nnet", size=10, decay=.2)
amldists <- edd(gaml, meth="nnet", size=10, decay=.2)
table(alldists,amldists)
amldists2 <- edd(gaml, meth="nnet", refDist="theoretical", size=10, decay=.2)
table(amldists,amldists2)