Obtain p-values and q-values using results from QL.fit

Share:

Description

Obtain significance results for quasi-likelihood model fits to RNA-seq expression count data using the methods detailed in Lund, Nettleton, McCarthy, and Smyth (2012).

Usage

1
2
3
	
	QL.results(fit,Dispersion="Deviance",spline.df=NULL,Plot=TRUE)
	     

Arguments

fit

The list returned by the function QL.fit

Dispersion

Must be one of "Deviance" or "Pearson", specifying which type of estimator should be used for estimating quasi-likelihood dispersion parameter.

spline.df

Optional. User may specify the degrees of freedom to use when fitting a cubic spline to log-scale(estimated dispersion) versus the log(average count). Default uses cross-validation in sreg function to pick appropriate degrees of freedom.

Plot

logical. If TRUE, the estimated dispersion versus the average count are plotted on a log-scale with the corresponding cubic spline fit overlaid.

Value

list containing:

"P.values"

list of matrices providing p-values for the QL, QLShrink and QLSpline methods, respectively. The i^th column of each element of pvals corresponds to the hypothesis test assigned in the i^th row of test.mat.

"Q.values"

list of matrices providing q-values for the QL, QLShrink and QLSpline methods, respectively. The i^th column of each element of qvals corresponds to the hypothesis test assigned in the i^th row of test.mat. Q-values are computed using the methods of Nettleton et al. (2006) JABES 11, 337-356.

"F.stat"

list of matrices providing F-statistics for the QL, QLShrink and QLSpline methods, respectively. The i^th column of each element of F.stat corresponds to the hypothesis test assigned in the i^th row of test.mat.

"m0"

matrix providing estimated number of true null hypotheses for each test(arranged by row) under each of the three methods(arranged by column). m0 values are computed using the methods of Nettleton et al. (2006) JABES 11, 337-356.

"d0"

vector containing estimated additional denominator degrees of freedom gained from shrinking dispersion estimates in the QLShrink and QLSpline procedures, respectively.

Author(s)

Steve Lund lundsp@gmail.com

References

Lund, Nettleton, McCarthy and Smyth (2012) "Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates" SAGMB, 11(5).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
  
set.seed(234092)

n.genes<-100
n.de<-round(.5*n.genes)
trt<-rep(1:2,each=4)
n.samp<-length(trt)
mu<-rgamma(n.genes,1.5,.01)

## specify gene specific negative binomial dispersions
size<-(log(mu+exp(1))-1)/mu  ### Var(Y)=E(Y)log(E(Y)+exp(1))

## add noise to gene specific negative binomial dispersions
size<-size*4.5/rchisq(n.genes,4.5)

sim.mn<-matrix(mu,n.genes,2)

### Simulate fold changes
B<-exp((2*rbinom(n.de,1,.5)-1)*(.25+rbeta(n.de,1,2)))

sim.mn[1:n.de,1]<-sim.mn[1:n.de,1]*B^(.5)+5
sim.mn[1:n.de,2]<-sim.mn[1:n.de,2]*B^(-.5)

### Simulate library size factors
sim.offset<-2^(rnorm(n.samp,0,.15))

### Compute final means
sim.mn2<-t(t(sim.mn[,trt])*sim.offset)

### Simulate data
simdat<-matrix(rnbinom(n.samp*n.genes,mu=sim.mn2,size=1/size),n.genes,n.samp)

### Simulate estimated dispersions to save time
################################################################
## THIS STEP SHOULD NOT BE PERFORMED WHEN ANALYZING REAL DATA ##
################################################################
est.nb.disp<-size*rchisq(n.genes,n.samp-2)/(n.samp-2)
est.nb.disp<-est.nb.disp

### Keep genes with at least 10 total counts
est.nb.disp<-est.nb.disp[rowSums(simdat)>9]
simdat<-simdat[rowSums(simdat)>9,]


### Create list of designs describing model under null and alternative hypotheses
design.list<-vector("list",2)
design.list[[1]]<-model.matrix(~as.factor(trt))  #This also could have just been ``trt''.
design.list[[2]]<-rep(1,length(trt))

log.offset<-log(apply(simdat,2,quantile,.75))

### Analyze using QL, QLShrink and QLSpline methods applied to quasi-Poisson model
fit<-QL.fit(simdat, design.list,log.offset=log.offset, Model="Poisson")

results<-QL.results(fit)

### How many significant genes at FDR=.05 from QLSpline method?
apply(results$Q.values[[3]]<.05,2,sum)

### Indexes for Top 10 most significant genes from QLSpline method
head(order(results$P.values[[3]]), 10)

### Analyze using QL, QLShrink and QLSpline methods 
### applied to quasi-negative binomial model
fit2<-QL.fit(simdat, design.list,log.offset=log.offset, nb.disp=est.nb.disp, Model="NegBin")

##########################################################
## Note: 'nb.disp' typically will not be specified when ##
## calling QL.fit while analyzing real data. Providing  ##
## numeric values for 'nb.disp' prevents neg binomial   ##
## dispersions from being estimated from the data.      ##
##########################################################

results2<-QL.results(fit2)

### How many significant genes at FDR=.05 for QLSpline method?
apply(results2$Q.values[[3]]<.05,2,sum)

### Indexes for Top 10 most significant genes from QLShrink method
head(order(results2$P.values[[2]]), 10)