Haplotype-Assisted Read Parsing
This script performs allele-specific alignment of a set of reads to two reference genomes. It outputs sam files for each genome, which can be used in many downstream analyses.
The script in this package takes a set of single/paired-end fastq files, aligns them two reference genomes using bowtie2, and parses each read based on which genome they best align to. If a read aligns to both genomes, but not equally-well, the alignment to both genomes must be unique (have a high alignment score for both genomes) in order to avoid mismappings due to SNPs.
This was originally written to perform allele-specific alignment of repli-seq data in castaneus-musculus hybrid mouse cells. It has also been used for parsing Hi-C, RNA-seq, and ChIP-seq data. This R package contains the implementation of the read-parsing algorithm used in the manuscript:
Allele-specific control of replication timing and genome organization during development. Juan Carlos Rivera-Mulia, Andrew Dimond, Daniel Vera, Claudia Trevilla-Garcia, Takayo Sasaki, Jared Zimmerman, Catherine Dupont, Joost Gribnau, Peter Fraser and David M. Gilbert
This software was written by Daniel Vera (vera@genomics.fsu.edu)
This software has only been tested on centos7 and ubuntu trusty, but is expected to work on most modern linux-based systems with the following software installed and in your $PATH: - bowtie2 - samtools >1.3 - gawk - GNU coreutils - R >3
And the following R packages should be installed: - devtools >1.13 (R package)
# in R:
devtools::install_github("dvera/harp")
# make bowtie2 indices for each haplotype, assuming you have a fasta file for each haplotype, where each differs only by SNPs:
mkdir bowtie2index && cd bowtie2index
bowtie2-build /path/to/genome1.fa genome1
bowtie2-build /path/to/genome2.fa genome2
# navigate to a directory with your fastq files
cd /path/to/fastqFiles
# open R
R
in R:
library(harp)
# define a vector of fastq files to parse
f <- files("*.fastq")
ref1 <- "/path/to/bowtie2index/genome1"
ref2 <- "/path/to/bowtie2index/genome2"
harp( f, index1prefix=ref1, index2prefix=ref2 )
The script will generate a series of files for each input fastq file: - _unmapped.sam (did not map well to either genome) - _parsed_genome1.sam (parsed to genome1) - _parsed_genome2.sam (parsed to genome2) - _ambiguous.sam (mapped well to at least one genome, but could not be confidently parsed)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.