README.md

mpradesigntools

An R package for generating barcoded Massively Parallel Reporter Assay sequences

Publication

If you make use of this software, please cite the following publication:

Andrew R Ghazi, Edward S Chen, David M Henke, Namrata Madan, Leonard C Edelstein, Chad A Shaw; Design tools for MPRA experiments, Bioinformatics, Volume 34, Issue 15, 1 August 2018, Pages 2682–2683, https://doi.org/10.1093/bioinformatics/bty150

Installation

Dependencies

MPRA Design Tools depends on the Biostrings and BSgenome.Hsapiens.UCSC.hg38 packages from Bioconductor. First install these in R with the following commands:

source("https://bioconductor.org/biocLite.R")
biocLite("Biostrings")
biocLite("BSgenome.Hsapiens.UCSC.hg38")

The package also makes use of some tidyverse packages which can be installed with the following commands:

install.packages(c('dplyr', 'magrittr', 'purrr', 'readr', 'stringr', 'tibble', 'tidyr', 'purrrlyr'))

Package Installation

If you don't have the devtools package installed, install it like so:

install.packages("devtools")

After that you can install and load MPRA Design Tools with these commands:

devtools::install_github('andrewGhazi/mpradesigntools')
library(mpradesigntools)

Use

This is the companion package to the MPRA Design Tools Shiny application available here: https://andrewghazi.shinyapps.io/designmpra/

The Shiny app allows users to interact with MPRA parameters (such as number of barcodes per allele) and see the effect of changing parameters on the assays power. Researchers can use this to decide what parameters best meet their experimental goals.

Currently the main function of MPRA Design Tools package is to design a set of barcoded sequences for MPRA experiments (without overloading our Shiny server!). This is done with the processVCF function. It takes roughly 5 seconds + 10ms per barcoded sequence on a relatively modern CPU, so you can estimate the expected job time in seconds as

5 + .01 * Number of barcodes per allele * Number of SNPs in VCF * 2 (for ref/alt alleles)

VCF Input constraints

Only the CHROM, POS, REF, and ALT columns are used. The INFO column is used only for detecting reverse strand constructs.

Current input constraints are:

VCFs generated by batch querying rsID's on dbSNP should meet most of the formatting requirements. However the MPRAREV tag will need to be added by the user (where appropriate) because the VCF's do not always specify which strand the relevant gene is on.

Indel-correcting barcodes

9/17/18 - Feature under development

Alternative barcode sets may be used by specifying the barcode_set argument to processVCF one of the following values. The first number indicates the length of the barcodes in basepairs, the second indicates the number of errors correctable while still being able to identify the original barcode. Note that these barcodes CAN include miR seed sequences. If you want to avoid miR interference, identify the main miRs by abundance in your cell type of interest, then include their seed sequences in the filterPatterns argument. These barcodes are provided by the freebarcodes package, detailed at the publication below and available from the subsequently listed github repository.

The original barcode set provided with mpradesigntools is available as the twelvemers barcode set.

|barcode_set | n_barcodes| |:------------|----------:| |barcodes10-1 | 1902| |barcodes10-2 | 30| |barcodes11-1 | 6160| |barcodes11-2 | 74| |barcodes12-1 | 17213| |barcodes12-2 | 178| |barcodes13-1 | 56735| |barcodes13-2 | 467| |barcodes14-1 | 157196| |barcodes14-2 | 1155| |barcodes15-1 | 518508| |barcodes15-2 | 3182| |barcodes16-1 | 1636417| |barcodes16-2 | 8776| |barcodes17-2 | 23024| |barcodes3-1 | 1| |barcodes4-1 | 2| |barcodes5-1 | 9| |barcodes5-2 | 1| |barcodes6-1 | 26| |barcodes6-2 | 1| |barcodes7-1 | 66| |barcodes7-2 | 3| |barcodes8-1 | 212| |barcodes8-2 | 6| |barcodes9-1 | 553| |barcodes9-2 | 11| |twelvemers | 1140292|

Indel-correcting DNA barcodes for high-throughput sequencing, John A. Hawkins, Stephen K. Jones, Ilya J. Finkelstein, William H. Press, Proceedings of the National Academy of Sciences Jul 2018, 115 (27) E6217-E6226; DOI: 10.1073/pnas.1802640115

https://github.com/finkelsteinlab/freebarcodes

Example

processVCF(vcf = '/path/to/the.vcf',
           nper = 14,
           upstreamContextRange = 55,
           downstreamContextRange = 55,
           outPath = '/path/to/the/output.tsv',
           fwprimer = 'ACTGGCCGCTTCACTG',
           revprimer = 'AGATCGGAAGAGCGTCG',
           alter_aberrant = TRUE,
           extra_elements = FALSE,
           max_construct_size = 170,
           barcode_set = 'barcodes14-1',
           ensure_all_4_nuc = TRUE)

Downstream analysis

Once you've performed your MPRA and have your sequencing results, check out malacoda for QC and statistical analysis of your results!

Planned Features

If you are interested in a subset of these features or have other feature requests, please let us know to inform our implementation prioritization. You can do so by opening an issue on this repository or contacting the first and corresponding authors of the publication, listed above.



andrewGhazi/mpradesigntools documentation built on Dec. 21, 2020, 3:18 p.m.