knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) TRAVIS <- !identical(tolower(Sys.getenv("TRAVIS")), "true") knitr::opts_chunk$set(purl = TRAVIS)
knitr::include_graphics("logo_tagMeppr.pdf")
Hi there! Welcome!
TagMap is a very useful method for transposon mapping [@Stern037762], enabling researchers to map the insertion sites with ease and generate long sequencing reads. However, there is little to none automatisation and downstream analysis software available for these reads. TagMeppr is an easy to use, memory efficient fastq-to-figure package written in R.
The basic usage of tagMapper revolves around three clear steps:
library(tagMeppr)
Remember: you only have to make a tagMepprIndex once per genome and protocol! This means that you can use it for however many samples for that protocol and organism (e.g. PiggyBac in human cells) you may have.
library("BSgenome.Hsapiens.UCSC.hg19") reference_hg19_PB = makeIndex(indexPath = '../premadeReferences/', bsgenome = BSgenome.Hsapiens.UCSC.hg19, ITR = 'PiggyBac', targetInsertionSite = 'TTAA', verbose = T)
Loading a tagMepprIndex is easy with loadIndex().
library("BSgenome.Hsapiens.UCSC.hg19",quietly = T)
reference_hg19_PB = loadIndex('BSgenome.Hsapiens.UCSC.hg19_PiggyBac_tagMeppRindex.fa.gz')
You can use print() on this tagMepprIndex, which gives you a lot of information:
print(reference_hg19_PB)
The core object of tagMeppr is the tagMepprSample
. Everything, from the paths of
the fastq's to the final results, are stored in this object. To kick off the
analysis, such an object should be made for every sample by using newTagMeppr()
.
In this function, the paths of the four respective fastq's, the name of the
sample and the transposon are asked.
C91 = newTagMeppr(F1 = 'clone91_FWD_R1.fq.gz', F2 = 'clone91_FWD_R2.fq.gz', R1 = 'clone91_REV_R1.fq.gz', R2 = 'clone91_REV_R2.fq.gz', name = "clone9 (minimised)", protocol = 'PiggyBac')
When you use the print()
method, a summary will be given about the sample and
its progression trough the tagMeppr-pipeline.
print(C91)
After this, it is a good idea to check whether your primer-design is one that
tagMeppr expects: reverse- and forward-primers on different ITRs. Also, it
checks if the reverse is found near the start of the transposon. This is to be
able to accurately denote the orientation of the transposon. Normally, we
expect the reverse-primer to be found at the start of the ITR-sequence. When
running checkPrimer()
, it will automatically update the tagMepprSample-object
with the rev5_fwd3
tag. If it appears to be to other way around, the sample-
object will have a rev5_fwd3=T
flag.
It is highly recommended to do this!
fwdPrimer = "CGTCAATTTTACGCAGACTATC" revPrimer = "GTACGTCACAATATGATTATCTTTCTAG" checkPrimer(fwdPrimer = fwdPrimer, revPrimer = revPrimer, exp = C91, ITR = 'PiggyBac')
print(C91)
To align the tagMeppr-sample to the index, use the align()
-function. This will
call bwa mem
and run it with optimal parameters set. If bwa
or samtools
are not installed, the function will can an error. Afterwards, the chimeric
reads will be filtered and placed in a GRanges
object for easy usage later
on. Running align()
will automatically update the tagMeppr-sample
to
include the parsed reads.
align(exp = C91, ref = reference_hg19_PB, cores = 30)
The object will be updated, which keeps your evironment tidy. The summary will now also show the number of informative reads (i.e. a chimeric read to both an ITR and the reference) and the temporary folder of the .BAM-files.
print(C91)
By default, align()
will remove suspected PCR-duplicates. This is not possible
with normal tools like samtools
due to the chimeric nature of the reads. If
one doesn't want this deduplication to happen, set dedup=F
in align()
.
C91_noDedup = C91 align(exp = C91_noDedup, ref = reference_hg19_PB, dedup = F, cores = 30)
The output of the print()
-command will now show that deduplication didn't
happen.
print(C91_noDedup)
The findInsertions()
function will first find all reads that overlap a TIS,
which in the case of PiggyBac will be "TTAA". Next it will calculate whether
there is a bias towards one side of the TIS using a binominal test. The bias,
denoted as $D$, $-1$ when all reads are upstream and $+1$ when all reads are
downstream of the TIS.This is done independently for the forward and reverse
reads:
$$ \begin{aligned} p_{fwd/rev} = \binom{reads_{D<0}}{reads} \end{aligned} $$
Next, we filter out TISs which have the bias on the same side of the TIS:
$$ \begin{aligned} sgn(D_{fwd}) \neq sgn(D_{rev}) \end{aligned} $$
To calculate a "TIS-specific" p-value, we use Edgington's sum-p method, which is very conservative in our usage. This ensures that, when $p_{combined} < \alpha$, both the fwd and the rev reads are indeed biased.
$$ \begin{aligned} p_{combined} = \dfrac{(\sum_{i=1}^{2} p_i)^{2}}{2!} \end{aligned} $$
Afterwards, a holm-correction is done to limit the Family-Wise Error Rate (FWER).
findInsertions(exp = C91, ref = reference_hg19_PB, padding = 2)
findInsertions(exp = C91_noDedup, ref = reference_hg19_PB, padding = 2)
print(C91)
foundInsertions = results( C91 ) head(foundInsertions) barplot(table(foundInsertions$strand), horiz = F, col = tagMepprCol(), main = 'found orientations')
plotSite(C91,site = 3)
Returning to the sample witout deduplication, one can now see what would happen:
plotSite(C91_noDedup,site = 3)
You can also set a limit on the maximum number of reads shown:
plotSite(C91_noDedup,site = 3, maxReads = 100)
plotInsertions(exp = C91)
show orientation:
plotInsertions(exp = C91, showOrientation = T)
C91_a = C91 C91_b = C91 C91_a$name = 'clone9 subset 1' C91_b$name = 'clone9 subset 2' C91_a$results = sample(C91_a$results, size = floor(length(C91_a$results)/2)) C91_b$results = sample(C91_b$results, size = floor(length(C91_b$results)/2)) tagMepprSampleList = list(C91_a, C91_b)
plotInsertions(tagMepprSampleList)
plotInsertions(tagMepprSampleList, showOrientation = T)
plotInsertions(tagMepprSampleList, sideBySide = T)
cnt1 = seq(1:20) cnt2 = seq(1:20) CNTS = data.frame(P_fwd = rep(cnt1, 20), P_rev = rep(cnt2, each = 20)) CNTS$p = apply(CNTS, 1, function(x){ p = metap::sump(c(binom.test(x[1],x[1])$p.value, binom.test(x[2],x[2])$p.value))$p }) CNTS.bk = CNTS CNTS$p = -log10(CNTS$p) CNTS = CNTS[CNTS$P_rev < CNTS$P_fwd, ] sigs = CNTS.bk[CNTS.bk$p < 0.05, 1:2] sigs$SIG = "*" sigs$p = 0 library(ggplot2) ggplot(CNTS, aes(x= P_rev, y = P_fwd, fill = p)) + geom_raster() + scale_fill_distiller(palette = 'PuRd', direction = 1)+ coord_fixed(expand = F) + theme_classic()+ labs(fill = '-log10(P)', title = 'expected significances', subtitle = 'assuming |D| == 1', x = 'fwd counts', y = 'rev counts')+ geom_text(data = sigs[sigs$P_fwd <= sigs$P_rev,], mapping = aes( label = SIG))+ annotate('segment', x = 4.5, xend = 20.5, y = 2.5, yend = 2.5)+ annotate('segment', y = 4.5, yend = 20.5, x = 2.5, xend = 2.5)+ annotate('segment', x = 2.5, xend = 3.5, y = 4.5, yend = 4.5)+ annotate('segment', y = 2.5, yend = 3.5, x = 4.5, xend = 4.5)+ annotate('segment', x = 3.5, xend = 4.5, y = 3.5, yend = 3.5) + annotate('segment', y = 3.5, yend = 4.5, x = 3.5, xend = 3.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.