An R package for generating barcoded Massively Parallel Reporter Assay sequences
If you make use of this software, please cite the following publication:
Andrew R Ghazi, Edward S Chen, David M Henke, Namrata Madan, Leonard C Edelstein, Chad A Shaw; Design tools for MPRA experiments, Bioinformatics, Volume 34, Issue 15, 1 August 2018, Pages 2682–2683, https://doi.org/10.1093/bioinformatics/bty150
MPRA Design Tools depends on the Biostrings and BSgenome.Hsapiens.UCSC.hg38 packages from Bioconductor. First install these in R with the following commands:
source("https://bioconductor.org/biocLite.R")
biocLite("Biostrings")
biocLite("BSgenome.Hsapiens.UCSC.hg38")
The package also makes use of some tidyverse packages which can be installed with the following commands:
install.packages(c('dplyr', 'magrittr', 'purrr', 'readr', 'stringr', 'tibble', 'tidyr', 'purrrlyr'))
If you don't have the devtools package installed, install it like so:
install.packages("devtools")
After that you can install and load MPRA Design Tools with these commands:
devtools::install_github('andrewGhazi/mpradesigntools')
library(mpradesigntools)
This is the companion package to the MPRA Design Tools Shiny application available here: https://andrewghazi.shinyapps.io/designmpra/
The Shiny app allows users to interact with MPRA parameters (such as number of barcodes per allele) and see the effect of changing parameters on the assays power. Researchers can use this to decide what parameters best meet their experimental goals.
Currently the main function of MPRA Design Tools package is to design a set of barcoded sequences for MPRA experiments (without overloading our Shiny server!). This is done with the processVCF
function. It takes roughly 5 seconds + 10ms per barcoded sequence on a relatively modern CPU, so you can estimate the expected job time in seconds as
5 + .01 * Number of barcodes per allele * Number of SNPs in VCF * 2 (for ref/alt alleles)
Only the CHROM, POS, REF, and ALT columns are used. The INFO column is used only for detecting reverse strand constructs.
Current input constraints are:
VCFs generated by batch querying rsID's on dbSNP should meet most of the formatting requirements. However the MPRAREV tag will need to be added by the user (where appropriate) because the VCF's do not always specify which strand the relevant gene is on.
9/17/18 - Feature under development
Alternative barcode sets may be used by specifying the barcode_set
argument to processVCF
one of the following values. The first number indicates the length of the barcodes in basepairs, the second indicates the number of errors correctable while still being able to identify the original barcode. Note that these barcodes CAN include miR seed sequences. If you want to avoid miR interference, identify the main miRs by abundance in your cell type of interest, then include their seed sequences in the filterPatterns
argument. These barcodes are provided by the freebarcodes package, detailed at the publication below and available from the subsequently listed github repository.
The original barcode set provided with mpradesigntools is available as the twelvemers
barcode set.
|barcode_set | n_barcodes| |:------------|----------:| |barcodes10-1 | 1902| |barcodes10-2 | 30| |barcodes11-1 | 6160| |barcodes11-2 | 74| |barcodes12-1 | 17213| |barcodes12-2 | 178| |barcodes13-1 | 56735| |barcodes13-2 | 467| |barcodes14-1 | 157196| |barcodes14-2 | 1155| |barcodes15-1 | 518508| |barcodes15-2 | 3182| |barcodes16-1 | 1636417| |barcodes16-2 | 8776| |barcodes17-2 | 23024| |barcodes3-1 | 1| |barcodes4-1 | 2| |barcodes5-1 | 9| |barcodes5-2 | 1| |barcodes6-1 | 26| |barcodes6-2 | 1| |barcodes7-1 | 66| |barcodes7-2 | 3| |barcodes8-1 | 212| |barcodes8-2 | 6| |barcodes9-1 | 553| |barcodes9-2 | 11| |twelvemers | 1140292|
Indel-correcting DNA barcodes for high-throughput sequencing, John A. Hawkins, Stephen K. Jones, Ilya J. Finkelstein, William H. Press, Proceedings of the National Academy of Sciences Jul 2018, 115 (27) E6217-E6226; DOI: 10.1073/pnas.1802640115
https://github.com/finkelsteinlab/freebarcodes
processVCF(vcf = '/path/to/the.vcf',
nper = 14,
upstreamContextRange = 55,
downstreamContextRange = 55,
outPath = '/path/to/the/output.tsv',
fwprimer = 'ACTGGCCGCTTCACTG',
revprimer = 'AGATCGGAAGAGCGTCG',
alter_aberrant = TRUE,
extra_elements = FALSE,
max_construct_size = 170,
barcode_set = 'barcodes14-1',
ensure_all_4_nuc = TRUE)
Once you've performed your MPRA and have your sequencing results, check out malacoda for QC and statistical analysis of your results!
If you are interested in a subset of these features or have other feature requests, please let us know to inform our implementation prioritization. You can do so by opening an issue on this repository or contacting the first and corresponding authors of the publication, listed above.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.