BPglm: Generalized linear model for four-parameter beta-Possion...
In nghiavtr/BPSC: Beta-Poisson Model for Single-Cell RNA-seq Data Analyses

Description Usage Arguments Value Examples

Generalized linear model for four-parameter beta-Possion model

1
2
3

BPglm(data, controlIds, design, coef = 2, keepFit = FALSE, minExp = 1e-04,
  tbreak.num = 10, break.thres = 10, estIntPar = FALSE, useExt = FALSE,
  extreme.quant = NULL, useParallel = FALSE)

`data`	Input data matrix for differential expression analysis
`controlIds`	Indicies of control group
`design`	Design matrix for glm fitting
`coef`	An integer to point out the column index corresponding to the coefficient for the GLM model testing. This should be specified by users. Default value is 2
`keepFit`	An option to keep fit results when the input dataset is small (keepFit=TRUE). Otherwise, keepFit=FALSE by default
`minExp`	A threshold for minimum expressed values. If a expression is less than minExp, it is set to be zero
`tbreak.num`	Number of breaks for binning
`break.thres`	The threshold to decide whether use breaks from logs cale or not. We decide to use breaks from log scale if 75 percent of input data less than this threshold
`estIntPar`	An option to allow estimating initial parameters for the model from only expressed values
`useExt`	A parameter setting of getTbreak function that allows to extend the last bin to infinity or not
`extreme.quant`	A quantile probability to remove extrem values (outliers) higher than the quantile. If extreme.quant=NULL, no elimination of outliers is done
`useParallel`	An option to allow using parallel computing

A list of p-values, statistics, indicies, a list of fitting models (if keepFit=TRUE), etc

set.seed(2015)
###Generate a random data matrix from a beta-poisson model
#Set the number of genes
N=100
#Generate randomly the parameters of BP models
alp=sample(100,N,replace=TRUE)*0.1;
bet=sample(100,N,replace=TRUE)*0.1;
lam1=sample(100,N,replace=TRUE)*10;
lam2=sample(100,N,replace=TRUE)*0.01
#Generate a control group
n1=100
control.mat=NULL
for (i in 1:N) control.mat=rbind(control.mat,rBP(n1,alp=alp[i],
bet=bet[i],lam1=lam1[i],lam2=lam2[i]))
#To create biological variation, we randomly set 10% as differentially expressed genes
#by simply replacing the parameter lam1 in treated group by a fold-change fc
DE.ids= sample(N,N*0.1)
fc=2.0
lam1[DE.ids]=lam1[DE.ids] * fc
#Generate a treated group
n2=100
treated.mat=NULL
for (i in 1:N)treated.mat=rbind(treated.mat,rBP(n2,alp=alp[i],
bet=bet[i],lam1=lam1[i],lam2=lam2[i]))
#Create a data set by merging the control group and the treated group
bp.mat=cbind(control.mat,treated.mat)
rownames(bp.mat)=c(1:nrow(bp.mat));
colnames(bp.mat)=c(1:ncol(bp.mat))
group=c(rep(1,ncol(control.mat)),rep(2,ncol(treated.mat)))

#First, choose IDs of all cells of the control group for estimating parameters of BP models
controlIds=which(group==1)
#Create a design matrix including the group labels.
#All batch effects can be also added here if they are available
design=model.matrix(~group)
#Select the column in the design matrix corresponding to
#the coefficient (the group label) for the GLM model testing
coef=2
#Run BPglm for differential expression analysis
res=BPglm(data=bp.mat, controlIds=controlIds, design=design, coef=coef, estIntPar=FALSE)
#In this function, user can also set estIntPar=TRUE to have better estimated beta-Poisson
#models for the generalized linear model. However, a longer computational time is required.

#Plot the p-value distribution
hist(res$PVAL, breaks=20)
#Summarize the resutls
ss=summary(res)
#Compare the discovered DE genes and the true DE genes predefined beforeward
fdr=p.adjust(res$PVAL, method="BH")
bpglm.DE.ids=which(fdr<=0.05)
#Print the indices of the true DE genes:
cat(sort(DE.ids))
#Print the indices of the DE genes discovered by BPglm:
cat(sort(bpglm.DE.ids))