library(knitcitations)
# create bibtex entries
write.bibtex(c(knitr=citation('knitr'),
               knitcitations=citation('knitcitations'), plyr=citation('plyr'),
               ggplot2=citation('ggplot2'), ape=citation('ape'),
               directlabels=citation('directlabels'),
               plyr=citation('plyr'), foreach=citation('foreach')),
             file='citations.bib')
bib = read.bibtex('citations.bib')

PrimerTree: Visually Assessing the Specificity and Informativeness of Primer Pairs

James Hester and David Serre

Genomic Medicine Institute, Cleveland Clinic Lerner Research Institute, Cleveland, OH 44195, USA

Abstract

Summary

Designing PCR primers is a critical step in most genetic studies. Primers need to be sensitive and specific, as well as yield informative DNA sequences. For example, in metagenomic studies, primers need to amplify only DNA from the targeted community and amplify DNA sequences that enable differentiating all members of this community. Several informatics tools exist for designing primers based on a template DNA sequence and for identifying potential target sequences in a database. However, there are no tools to systematically analyze and visualize the specificity of PCR primers and the informativeness of the amplified sequence. PrimerTree is an R package that enables identifying potential target sequences for a set of primers and generates taxonomically annotated phylogenetic trees with the predicted amplification products.

Availability

PrimerTree is an R package released under the GPL-2 license and is available through CRAN. Source code and developmental versions are available at http://www.github.com/jimhester/primerTree.

Contact

hesterj@ccf.org

Supplemental Information

Supplementary data are available at Bioinformatics online.

Introduction

Designing primers for PCR is the initial step in many biomedical, forensic and metagenomic studies. There are a number of important factors to consider in primer design. First one must consider the chemical properties of the primers, including their length, melting temperature, GC content, secondary structure and the likelihood of primer dimer formation. The primer pair must also amplify specifically DNA from the target of interest and must not produce offtarget products. In addition, for some studies, the DNA sequence amplified must provide enough information to identify the source of the DNA. In metagenomic studies, the primers need to amplify DNA from the taxon of interest (e.g., bacteria, fungus, birds), not amplify unrelated taxon and provide enough information after sequencing to characterize which members of the communities are present in a given sample. Some clinical studies need to amplify only a specific pathogen species and enable identifying which strains are present.

While there are efficient online tools for designing primers r citep('10.1093/nar/gks596') and testing their specificity r citep(c(primer_BLAST='10.1186/1471-2105-13-134')), there is no simple tool to systematically assess the informativeness of the primer products. PrimerTree is an R package that, given a set of primer pairs, provides visual assessment of their specificity and informativeness by constructing phylogenetic trees of the predicted PCR products along with their taxonomic annotation. PrimerTree can run on a wide variety of hardware and only requires two commands, making it accessible to a large audience.

Methods

PrimerTree successively performs primer search, retrieval of DNA sequences predicted to be amplified, taxonomic identification of these sequences, multiple sequence alignment, reconstruction of a phylogenetic tree, and visualization of the tree with taxonomic annotation. Note that multiple primer pairs can be queried simultaneously using PrimerTree.

PrimerTree utilizes the primer search implemented in Primer-BLAST r citep(c(primer_BLAST='10.1186/1471-2105-13-134')) by directly querying the NCBI Primer-BLAST search page. This allows using all the options available on the NCBI site. By default PrimerTree searches the NCBI (nt) nucleotide database but alternative NCBI databases can be chosen through the function options. Note that when the primers are degenerated, PrimerTree automatically tests all possible combinations of primer sequences in Primer-BLAST and merges the results. The primer alignment results are then processed using the NCBI Eutilities r citep('http://www.ncbi.nlm.nih.gov/books/NBK25500/') to i) retrieve DNA sequences located between the primers (i.e., “amplified”) and ii) obtain taxonomic information related to each DNA sequence using the NCBI’s taxonomy database r citep('http://www.ncbi.nlm.nih.gov/books/NBK21100/').

PrimerTree next aligns all amplified sequences using Clustal Omega r citep('10.1038/msb.2011.75') and reconstructs a Neighbor-Joining tree using the ape package r citep('10.1093/bioinformatics/btg412'). Finally, PrimerTree displays the resulting phylogenetic tree using the ggplot2 package, labeling each taxon in a different color and adding the names of the main taxa using the directlabels package r citep(c(bib[['ggplot2']], bib[['directlabels']])).

Results

Figure 1 shows a subset of PrimerTree results for universal non-vascular plant (Bryophyte) primers r citep('10.1111/j.1365-294X.2012.05537.x') targeting the chloroplast trnL gene (Figure 1A) and mammal mitochondrial 16S ribosomal rna gene (Figure1B) r citep('10.1093/oxfordjournals.molbev.a025566'). The tree display enables rapid evaluation of the specificity of the primer pairs (e.g., offtarget amplification of amphibians and ray-finned fishes on Fig. 1B). In addition, the information encoded in the amplified DNA sequence can also be easily assessed by the length of the branches leading to different sequences (scaled in number of nucleotide differences). By default, PrimerTree displays the annotated phylogenetic tree for all taxonomic levels (e.g., kindom, phylum, class) enabling the user to determine the level of specificity of each primers (see e.g., Supplemental Figure 1).

For nondegenerated primers and using a single thread, PrimerTree usually runs in less than 240 seconds. However, the average runtime varies greatly depending on the primer specificity (i.e. how many DNA sequences are “amplified”), the search parameters chosen, and the current load on the NCBI servers and the internet connection. Highly degenerated primers result in large numbers of possible primer pairs, which can increase the runtime considerably. To limit maximum runtime in this situation, PrimerTree randomly samples only a portion of the total primer permutations. This provides a representative sample for most cases. Changing the number of sampled permutations or turning off sampling completely is possible. PrimerTree uses the plyr package extensively and has full support for any of the parallel backends compatible with the foreach package r citep(c(bib[['plyr']], bib[['foreach']])). In particular, parallel retrieval of the primer sequences from NCBI speeds up the total runtime considerably. Note that parallel queries to Primer-BLAST are queued by NCBI’s servers and only processed once there is free compute time.

Conclusions

Designing primers can be challenging, especially for metagenomic study, and dramatically impact the study results. We have developed the PrimerTree package to help assessing large numbers of designed primers and choosing the best primer pair for a given experiment. PrimerTree is an R package available on a wide variety of platforms and is very simple to use and install. PrimerTree is released under an opensource license and can be downloaded from http://github.com/jimhester/primerTree, which also provides additional documentation.

Funding

XXX

Conflict of Interest

None declared

References

bibliography(style='markdown')
library(doMC)
registerDoMC(8)
mammals_16S = search_primer_pair(name='Mammals 16S', 'CGGTTGGGGTGACCTCGGA', 'GCTGTTATCCCTAGGGTAACT', num_aligns=1000, .parallel=T)
bryophytes_trnL = search_primer_pair(name='Bryophyte trnL', 'GATTCAGGGAAACTTAGGTTG', 'CCATTGAGTCTCTGCACC', num_aligns=1000, .parallel=T)
save(file='data/mammals_16S.RData', mammals_16S)
save(file='data/bryophytes_trnL.RData', bryophytes_trnL)
change_mapping = function(plot){
  plot$layers[[1]]$geom_params$colour = 'lightgrey'
  plot$layers[[4]]$mapping = aes(x=x, y=y, color=class, shape=class) #add shape to points mapping
  plot$layers[[5]]$mapping = aes(x=x, y=y, label=class) #remove color from direct labels mapping
  plot
}
p1 = change_mapping(plot(mammals_16S, ranks='class', main='Mammal 16S', rotate=45, size=4) + scale_colour_grey(start=0, end=.8)) #scale_color_brewer(palette="Set1"))
p2 = change_mapping(plot(bryophytes_trnL, ranks='class', main='Bryophyte trnL', size=4) + scale_colour_grey(start=0, end=.8)) #scale_color_brewer(palette="Set1"))
library(gridExtra)
p3 = arrangeGrob(p2,p1, ncol=2, clip=F)
grid.draw(p3)
setEPS()
grid.draw(p3)
setEPS()
rm(scale_colour_discrete)
plot(mammals_16S)
plot(bryophytes_trnL)


jimhester/primerTree documentation built on Oct. 15, 2023, 11:06 p.m.