BPglm: Generalized linear model for four-parameter beta-Possion...

Description Usage Arguments Value Examples

Description

Generalized linear model for four-parameter beta-Possion model

Usage

1
2
3
BPglm(data, controlIds, design, coef = 2, keepFit = FALSE, minExp = 1e-04,
  tbreak.num = 10, break.thres = 10, estIntPar = FALSE, useExt = FALSE,
  extreme.quant = NULL, useParallel = FALSE)

Arguments

data

Input data matrix for differential expression analysis

controlIds

Indicies of control group

design

Design matrix for glm fitting

coef

An integer to point out the column index corresponding to the coefficient for the GLM model testing. This should be specified by users. Default value is 2

keepFit

An option to keep fit results when the input dataset is small (keepFit=TRUE). Otherwise, keepFit=FALSE by default

minExp

A threshold for minimum expressed values. If a expression is less than minExp, it is set to be zero

tbreak.num

Number of breaks for binning

break.thres

The threshold to decide whether use breaks from logs cale or not. We decide to use breaks from log scale if 75 percent of input data less than this threshold

estIntPar

An option to allow estimating initial parameters for the model from only expressed values

useExt

A parameter setting of getTbreak function that allows to extend the last bin to infinity or not

extreme.quant

A quantile probability to remove extrem values (outliers) higher than the quantile. If extreme.quant=NULL, no elimination of outliers is done

useParallel

An option to allow using parallel computing

Value

A list of p-values, statistics, indicies, a list of fitting models (if keepFit=TRUE), etc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
set.seed(2015)
###Generate a random data matrix from a beta-poisson model
#Set the number of genes
N=100
#Generate randomly the parameters of BP models
alp=sample(100,N,replace=TRUE)*0.1;
bet=sample(100,N,replace=TRUE)*0.1;
lam1=sample(100,N,replace=TRUE)*10;
lam2=sample(100,N,replace=TRUE)*0.01
#Generate a control group
n1=100
control.mat=NULL
for (i in 1:N) control.mat=rbind(control.mat,rBP(n1,alp=alp[i],
bet=bet[i],lam1=lam1[i],lam2=lam2[i]))
#To create biological variation, we randomly set 10% as differentially expressed genes
#by simply replacing the parameter lam1 in treated group by a fold-change fc
DE.ids= sample(N,N*0.1)
fc=2.0
lam1[DE.ids]=lam1[DE.ids] * fc
#Generate a treated group
n2=100
treated.mat=NULL
for (i in 1:N)treated.mat=rbind(treated.mat,rBP(n2,alp=alp[i],
bet=bet[i],lam1=lam1[i],lam2=lam2[i]))
#Create a data set by merging the control group and the treated group
bp.mat=cbind(control.mat,treated.mat)
rownames(bp.mat)=c(1:nrow(bp.mat));
colnames(bp.mat)=c(1:ncol(bp.mat))
group=c(rep(1,ncol(control.mat)),rep(2,ncol(treated.mat)))

#First, choose IDs of all cells of the control group for estimating parameters of BP models
controlIds=which(group==1)
#Create a design matrix including the group labels.
#All batch effects can be also added here if they are available
design=model.matrix(~group)
#Select the column in the design matrix corresponding to
#the coefficient (the group label) for the GLM model testing
coef=2
#Run BPglm for differential expression analysis
res=BPglm(data=bp.mat, controlIds=controlIds, design=design, coef=coef, estIntPar=FALSE)
#In this function, user can also set estIntPar=TRUE to have better estimated beta-Poisson
#models for the generalized linear model. However, a longer computational time is required.

#Plot the p-value distribution
hist(res$PVAL, breaks=20)
#Summarize the resutls
ss=summary(res)
#Compare the discovered DE genes and the true DE genes predefined beforeward
fdr=p.adjust(res$PVAL, method="BH")
bpglm.DE.ids=which(fdr<=0.05)
#Print the indices of the true DE genes:
cat(sort(DE.ids))
#Print the indices of the DE genes discovered by BPglm:
cat(sort(bpglm.DE.ids))

nghiavtr/BPSC documentation built on May 23, 2019, 4:42 p.m.