Expression Density Diagnostics

Description

Classify cohort distributions of gene expression values.

Usage

1
2
3
eddObsolete(eset, 
   ref=c("multiCand", "uniCand", "test", "nnet")[1], 
   k=10, l=6, nnsize=6, nniter=200)

Arguments

eset

instance of Biobase class ExpressionSet.

ref

one of 'multiCand', 'uniCand', 'test' or 'nnet'. see details.

k

k setting for knn – number of nearest neighbors to poll.

l

l setting for knn – minimum number of concordant assents.

nnsize

size parameter for nnet.

nniter

iter setting for nnet.

Details

Four options are available for classifying expression densities. Data on each gene are shifted and scaled to have median zero and mad 1. They are then compared to shapes of reference distributions (standard Gaussian, chisq(1), lognorm(0,1), t(3), .75N0,1+.25N4,1, .25N0,1+.75N4,1, Beta(2,8), Beta(8,2), U(0,1)) after each of these has been transformed to have median 0 and mad 1. Classification proceeds by one of four methods, selected by setting of the 'ref' argument. Suppose there are S samples in the ExpressionSet.

multiCand – 100 samples of size S are drawn from each reference distribution and then scaled to med 0, mad 1. The knn(k,l) procedure is used to classify the genes based on proximity to representatives in this set.

uniCand – one representative of size S is created from each reference distribution, using the theoretical quantiles. knn(1,0) is used to classify genes based on proximity to these representatives.

test – classification of each gene is based on maximum p-value of Kolmogorov-Smirnov tests vs each reference distribution. If the p-value never exceeds .1, 'doubt' is declared.

nnet – 100 samples of size S are drawn from each reference distribution and then scaled to med 0, mad 1. A neural net is fit to this dataset and the associated labels. The net is then applied to the scaled gene expression data and the predictions are used for classification.

Value

the vector of classifications, with NAs for nonclassifiable genes

Author(s)

VJ Carey

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
require(Biobase)
data(sample.ExpressionSet)
print(summary(eddObsolete(sample.ExpressionSet,k=10,l=2)))

# 6 x 20 x 50 test problem
set.seed(1234)
test <- matrix(NA,nr=120,nc=50)
test[1:20,] <- rnorm(1000)
test[21:40,] <- rt(1000,3)
test[41:60,] <- rexp(1000,4)
test[61:80,] <- rmixnorm(1000,.750,0,1,4,1)
test[81:100,] <- runif(1000)
test[101:120,] <- rlnorm(1000)
labs <- c(rep("n01",20),rep("t3",20),
rep("exp",20),rep("mix1",20),rep("u01",20),rep("ln01",20))

phenoData <- new("AnnotatedDataFrame")
pData(phenoData) <- data.frame(1:50)
varLabels(phenoData) <- list("Col1")
TT <- new("ExpressionSet", exprs=test, phenoData = phenoData)

multrun <- eddObsolete(TT, k=10, l=2)
print(table(given=labs, multiCand=multrun))
netrun <- eddObsolete(TT, ref="nnet")
print(table(given=labs, netout=netrun))
newrun <- edd(TT, meth="nnet", size=10, decay=.2)
print(table(given=labs, newout=newrun))
newrun <- edd(TT, meth="test")
print(table(given=labs, newout=newrun))