library(knitcitations) # create bibtex entries write.bibtex(c(knitr=citation('knitr'), knitcitations=citation('knitcitations'), plyr=citation('plyr'), ggplot2=citation('ggplot2'), ape=citation('ape'), directlabels=citation('directlabels'), plyr=citation('plyr'), foreach=citation('foreach')), file='citations.bib') bib = read.bibtex('citations.bib')
James Hester and David Serre
Genomic Medicine Institute, Cleveland Clinic Lerner Research Institute, Cleveland, OH 44195, USA
Designing PCR primers is a critical step in most genetic studies. Primers need to be sensitive and specific, as well as yield informative DNA sequences. For example, in metagenomic studies, primers need to amplify only DNA from the targeted community and amplify DNA sequences that enable differentiating all members of this community. Several informatics tools exist for designing primers based on a template DNA sequence and for identifying potential target sequences in a database. However, there are no tools to systematically analyze and visualize the specificity of PCR primers and the informativeness of the amplified sequence. PrimerTree is an R package that enables identifying potential target sequences for a set of primers and generates taxonomically annotated phylogenetic trees with the predicted amplification products.
PrimerTree is an R package released under the GPL-2 license and is available through CRAN. Source code and developmental versions are available at http://www.github.com/jimhester/primerTree.
Supplementary data are available at Bioinformatics online.
Designing primers for PCR is the initial step in many biomedical, forensic and metagenomic studies. There are a number of important factors to consider in primer design. First one must consider the chemical properties of the primers, including their length, melting temperature, GC content, secondary structure and the likelihood of primer dimer formation. The primer pair must also amplify specifically DNA from the target of interest and must not produce offtarget products. In addition, for some studies, the DNA sequence amplified must provide enough information to identify the source of the DNA. In metagenomic studies, the primers need to amplify DNA from the taxon of interest (e.g., bacteria, fungus, birds), not amplify unrelated taxon and provide enough information after sequencing to characterize which members of the communities are present in a given sample. Some clinical studies need to amplify only a specific pathogen species and enable identifying which strains are present.
While there are efficient online tools for designing primers r citep('10.1093/nar/gks596')
and testing their specificity r citep(c(primer_BLAST='10.1186/1471-2105-13-134'))
, there is no simple
tool to systematically assess the informativeness of the primer products.
PrimerTree is an R package that, given a set of primer pairs, provides visual
assessment of their specificity and informativeness by constructing
phylogenetic trees of the predicted PCR products along with their taxonomic
annotation. PrimerTree can run on a wide variety of hardware and only requires
two commands, making it accessible to a large audience.
PrimerTree successively performs primer search, retrieval of DNA sequences predicted to be amplified, taxonomic identification of these sequences, multiple sequence alignment, reconstruction of a phylogenetic tree, and visualization of the tree with taxonomic annotation. Note that multiple primer pairs can be queried simultaneously using PrimerTree.
PrimerTree utilizes the primer search implemented in Primer-BLAST
r citep(c(primer_BLAST='10.1186/1471-2105-13-134'))
by directly querying the NCBI Primer-BLAST search page. This allows using
all the options available on the NCBI site. By default PrimerTree searches the
NCBI (nt) nucleotide database but alternative NCBI databases can be chosen
through the function options. Note that when the primers are degenerated,
PrimerTree automatically tests all possible combinations of primer sequences
in Primer-BLAST and merges the results. The primer alignment results are then
processed using the NCBI Eutilities r citep('http://www.ncbi.nlm.nih.gov/books/NBK25500/')
to i) retrieve DNA sequences located between the primers (i.e., “amplified”)
and ii) obtain taxonomic information related to each DNA sequence using the
NCBI’s taxonomy database r citep('http://www.ncbi.nlm.nih.gov/books/NBK21100/')
.
PrimerTree next aligns all amplified sequences using Clustal Omega
r citep('10.1038/msb.2011.75')
and reconstructs a Neighbor-Joining tree using the ape package
r citep('10.1093/bioinformatics/btg412')
. Finally, PrimerTree displays the resulting phylogenetic
tree using the ggplot2 package, labeling each taxon in a different color and
adding the names of the main taxa using the directlabels package
r citep(c(bib[['ggplot2']], bib[['directlabels']]))
.
Figure 1 shows a subset of PrimerTree results for universal non-vascular plant (Bryophyte) primers
r citep('10.1111/j.1365-294X.2012.05537.x')
targeting the chloroplast trnL gene (Figure 1A)
and mammal mitochondrial 16S ribosomal rna gene (Figure1B)
r citep('10.1093/oxfordjournals.molbev.a025566')
. The tree display enables
rapid evaluation of the specificity of the primer pairs (e.g., offtarget
amplification of amphibians and ray-finned fishes on Fig. 1B). In addition, the
information encoded in the amplified DNA sequence can also be easily assessed
by the length of the branches leading to different sequences (scaled in number
of nucleotide differences). By default, PrimerTree displays the annotated
phylogenetic tree for all taxonomic levels (e.g., kindom, phylum, class)
enabling the user to determine the level of specificity of each primers (see
e.g., Supplemental Figure 1).
For nondegenerated primers and using a single thread, PrimerTree usually runs
in less than 240 seconds. However, the average runtime varies greatly depending
on the primer specificity (i.e. how many DNA sequences are “amplified”), the
search parameters chosen, and the current load on the NCBI servers and the
internet connection. Highly degenerated primers result in large numbers of
possible primer pairs, which can increase the runtime considerably. To limit
maximum runtime in this situation, PrimerTree randomly samples only a portion
of the total primer permutations. This provides a representative sample for
most cases. Changing the number of sampled permutations or turning off sampling
completely is possible. PrimerTree uses the plyr package extensively and has
full support for any of the parallel backends compatible with the foreach
package r citep(c(bib[['plyr']], bib[['foreach']]))
. In particular, parallel retrieval of
the primer sequences from NCBI speeds up the total runtime considerably. Note
that parallel queries to Primer-BLAST are queued by NCBI’s servers and only
processed once there is free compute time.
Designing primers can be challenging, especially for metagenomic study, and dramatically impact the study results. We have developed the PrimerTree package to help assessing large numbers of designed primers and choosing the best primer pair for a given experiment. PrimerTree is an R package available on a wide variety of platforms and is very simple to use and install. PrimerTree is released under an opensource license and can be downloaded from http://github.com/jimhester/primerTree, which also provides additional documentation.
XXX
None declared
bibliography(style='markdown')
library(doMC) registerDoMC(8) mammals_16S = search_primer_pair(name='Mammals 16S', 'CGGTTGGGGTGACCTCGGA', 'GCTGTTATCCCTAGGGTAACT', num_aligns=1000, .parallel=T) bryophytes_trnL = search_primer_pair(name='Bryophyte trnL', 'GATTCAGGGAAACTTAGGTTG', 'CCATTGAGTCTCTGCACC', num_aligns=1000, .parallel=T) save(file='data/mammals_16S.RData', mammals_16S) save(file='data/bryophytes_trnL.RData', bryophytes_trnL)
change_mapping = function(plot){ plot$layers[[1]]$geom_params$colour = 'lightgrey' plot$layers[[4]]$mapping = aes(x=x, y=y, color=class, shape=class) #add shape to points mapping plot$layers[[5]]$mapping = aes(x=x, y=y, label=class) #remove color from direct labels mapping plot } p1 = change_mapping(plot(mammals_16S, ranks='class', main='Mammal 16S', rotate=45, size=4) + scale_colour_grey(start=0, end=.8)) #scale_color_brewer(palette="Set1")) p2 = change_mapping(plot(bryophytes_trnL, ranks='class', main='Bryophyte trnL', size=4) + scale_colour_grey(start=0, end=.8)) #scale_color_brewer(palette="Set1")) library(gridExtra) p3 = arrangeGrob(p2,p1, ncol=2, clip=F) grid.draw(p3)
setEPS() grid.draw(p3)
setEPS() rm(scale_colour_discrete) plot(mammals_16S) plot(bryophytes_trnL)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.