Description Usage Arguments Value Author(s) References Examples
It utilizes two steps to estimate the essentiality for each gene in the genome.
First, it construcsts a confidence distribution (CD) function containing evidence of conditional essentiality for each insertion by comparing read counts
of that insertion between conditions. Second, it combines insertion-level CD functions
to infer the essentiality for the corresponding gene. (Zhao et al., 2017).
If no replicate in both conditions in any pool, the insertion-level p-values are calculated from the DESeq (using method="blind"
and sharingMode="fit-only"
),
and then combined using the stouffer method.
1 2 |
countData |
a read count matrix. It has n rows (each row corresponding to a unique insertion site) and m columns (each column corresponding to a sample). |
geneID |
a character string of gene names for the insertion sites in countData. |
location |
a numeric vector specifying insertion locations. |
pool |
a numeric vector specifying the pool id that each sample in countData belongs to. |
condition |
a character string specifying the condition that each sample in countData belongs to. |
weights |
a character string specifying weights for the insertion sites. These weights are used to weight the insertion-level evidence for the combination.
It must be |
bayes |
a logical indicating whether moderated estimates are used to construct the CD function. |
p.nb |
a logical requesting a p-value based on a negative binomial model. If |
norm |
a logical indicating whether normalization is performed. If |
cut |
an insertion is excluded from the analysis if the total read counts over all samples is less than cut. |
There are two output files, one is named as resTable, and the other is named as est.insertion. The resTable file is a data frame, which contains the following columns:
ID |
gene names |
NumInser |
number of insertions for each gene in input condition. |
Unique.NumInser |
number of unique insertions for each gene in input condition. If there is only one pool, Unique.NumInser should be equal to NumInser. |
Mean.x |
averaged counts for samples in condition x. The average is calculated after counts are normalized in DESeq. |
FC |
the fold change of condition B over A derived from the combined CD function (if replicates=FALSE, the FC is the ratio of the total counts within the gene). |
logFC |
log2 fold change of group B over group A. |
pvalue |
p-values indicating significances of the differential tests. |
The est.insertion file is a data frame when there are replicates in all pools (for advanced users), and it contains estimates from the linear model for each insertion:
ID |
gene names |
Mean1,Mean2 |
means of the two conditions. |
df |
degrees of freedom for the slope estimate. |
bhat |
slope estimate (logFC). |
se |
standard error of the slope estimate. |
Lili Zhao zhaolili@umich.edu
Wang, H. and Song, M. (2001). Ckmeans.1d.dp: optimal k-means clustering in one dimension by dyanamic programing. The R Journal, 3(2), 29-33.
Zhao, L., Wu, W., Anderson, M. T., Li, Y. Mobley, H. L. T., and Bachman, M. A.(2017). Inseq: Identification of Conditionally Essential Genes in Transposon Sequencing Studies. Submitted to BMC Bioinformatics.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | data(serratia)
# Test the first 100 insertions
serr=serratia[1:100,]
# obtain the count matrix
countData=serr[,-c(1,2,3)]
condition=c(rep("Input",4),rep("Output",8))
pool=c(1,1,2,2,1,1,1,1,2,2,2,2)
geneID=as.character(serr$GeneID)
location=serr$Loc
foo<-TnseqDiff(countData, geneID, location, pool, condition)
res=foo$resTable
# adjust pvalues using Benjamini & Hochberg
res$padj=p.adjust(res$pvalue, method = "BH")
## when no replicate in both conditions in each pool
countData=serr[,c(4,6,8,12)]
condition=c("Input","Input","Output","Output")
pool=c(1,2,1,2)
geneID=as.character(serr$GeneID)
location=serr$Loc
foo<-TnseqDiff(countData, geneID, location, pool, condition)
res=foo$resTable
# when there is only one pool
countData=serr[,c(4,5,8,9,10,11)]
condition=c(rep("Input",2),rep("Output",4))
pool=c(1,1,1,1,1,1)
geneID=as.character(serr$GeneID)
location=serr$Loc
foo<-TnseqDiff(countData, geneID, location, pool, condition)
res=foo$resTable
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.