In this vignette Users learn how to create IUPAC fasta format
(IUPAC code) files from reference mapped bam
files which can be used with distIUPAC to calculate IUPAC distances
.
The following external tools needs to be installed to be able to obtain IUPAC fasta format files:
Genome mapper
of your choice (e.g. bwa mem or NextGenMap)Picard tools
for bam
file sorting and de-duplication (picard tools)angsd
for IUPAC fasta format
retrieval (angsd)A manual how the pre-requisites needs to be installed is given at the end of this vignette.
fastq
files to a reference
BWA
for mapping:The user needs to perform five steps:
fastq
files, preferentially QC
pre-processed (see trimmomatic for one of many trimming tools available)fasta
filebuild index
for the referencemap reads
sort reads
#FORWARD FASTQ FILE: 1.fq #REVERSE FASTQ FILE: 2.fq #REFERENCE: ref.fasta #USED THREADS: 12 #SAMPLE NAME: IND1 #SAM OUTPUT FILE: ind1.sam #build index bwa index ref.fasta #map reads bwa mem -t 12 -M -R @RG\tID:IND1\tLB:lib1\tPL:ILLUMINA\tPU:unit1\tSM:IND1 -o ind1.sam ref.fasta 1.fq 2.fq #sort sam file java -jar picard.jar SortSam I=ind1.sam O=ind1.sorted.bam SO=coordinate
NextGenMap
for mappingThe usere needs to perform four steps:
fastq
files, preferentially QC
pre-processed (see trimmomatic for one of many trimming tools available)fasta
filemap reads
sort reads
#FORWARD FASTQ FILE: 1.fq #REVERSE FASTQ FILE: 2.fq #REFERENCE: ref.fasta #USED THREADS: 12 #SAMPLE NAME: IND1 #SAM OUTPUT FILE: ind1.bam #map reads ngm -1 1.fq -2 2.fq -r ref.fasta -o ind1.bam --no-unal --sensitive -t 12 --no-progress --rg-id IND1 --rg-sm IND1 --rg-lb lib1 --rg-pl illumina --rg-pu unit1 -b #sort bam file java -jar picard.jar SortSam I=ind1.bam O=ind1.sorted.bam SO=coordinate
sorted bam
filesIn case that multiple fastq
libraries exists or have been mapped without prior merging of the fastq
files, the User can merge reference mapped bam
files as follows:
java -jar picard.jar MergeSamFiles I=ind1_1.sorted.bam I=ind1_2.sorted.bam O=ind1.sorted.bam
sorted bam
filejava -jar picard.jar RemoveDuplicates REMOVE_DUPLICATES=true I=ind1.sorted.bam O=ind1.sorted.nodup.bam M=ind1.sorted.bam.duplicate.metrics
IUPAC fasta
file for the de-duplicated referenced mapped bam
fileChromosomes should be processed separately to be able to merge different samples into chromosome alignments
, which can be processed with distIUPAC
as follows:
#chromosomes should be processed separately to be able to easily merge different samples into one alignment to be processed with 'distIUPAC' #chromosome 'chr1' will be processed here samtools index ind1.sorted.nodup.bam angsd -doFasta 4 -doCounts 1 -minQ 20 -minMapQ 30 -uniqueOnly -setMinDepth 5 -setMaxDepth 100 -iupacRatio 0.2 -i ind1.sorted.nodup.bam -out IND1.minQ20.minMapQ30.uniqueOnly.setMinDepth5.setMaxDepth100.chr1 -r chr1
Assuming all IUPAC fasta
files are located in the same folder and same chromosome files have the same ending, individuals can be merged as follows:
#chromosome 'chr1' from different individuals will be processed here #1. uncompress fasta files for file in *.chr1.fa.gz;do gunzip $file;done #2. rename fasta sequences according to file names for file in *.chr1.fa;do sed -i 's/>chr1/>'"$file"'/g' $file;done #3. merge fasta files for file in *.chr1.fa;do cat $file >> chr1.fa;done
IUPAC distances
for 'chr1' with distIUPAC
The following commands needs to be executed in R
:
library(distIUPAC) dna<-readDNAStringSet("chr1.fa") chr1.dist<-distIUPAC(as.character(subseq(dna,1,10000)))
Short description of how to compile the external tools for a unix based system are given. See the next section for MAC OS X installation.
#download latest release from NextGenMap wget https://codeload.github.com/Cibiv/NextGenMap/tar.gz/NextGenMap-0.5.5.tar.gz tar -xvf v0.5.5.tar.gz rm v0.5.5.tar.gz cd NextGenMap-0.5.5/ mkdir -p build/ cd build/ cmake .. make
#download latest release 'picard.jar' from broadinstitute wget https://github.com/broadinstitute/picard/releases/download/2.17.4/picard.jar
#download latest release ANGSD from https://github.com/ANGSD/angsd git clone https://github.com/samtools/htslib.git; git clone https://github.com/angsd/angsd.git; cd htslib;make; cd ../angsd; make HTSSRC=../htslib
For a MAC OS X system there are additional pre-requisites that needs to be installed to be able to compile all necessary software.
'Xcode' needs to be installed from App-Store
'Homebrew' needs to be installed https://brew.sh/index_de.html
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)
'Git' needs to be installed https://git-scm.com/download/mac
'Autoconf' needs to be installed http://www.gnu.org/software/autoconf/autoconf.html
brew install autoconf
'Cmake' needs to be installed https://cmake.org/download/
sudo "/Applications/CMake.app/Contents/bin/cmake-gui" --install
#download latest release from NextGenMap curl -L https://github.com/Cibiv/NextGenMap/archive/v0.5.5.tar.gz > v0.5.5.tar.gz tar -xvf v0.5.5.tar.gz rm v0.5.5.tar.gz cd NextGenMap-0.5.5/ mkdir -p build/ cd build/ cmake .. make
#download latest release 'picard.jar' from broadinstitute curl -L https://github.com/broadinstitute/picard/releases/download/2.17.4/picard.jar > picard.jar
#download latest release ANGSD from https://github.com/ANGSD/angsd git clone https://github.com/samtools/htslib.git; git clone https://github.com/angsd/angsd.git; cd htslib;/usr/local/Cellar/autoconf/2.69/bin/autoconf;/usr/local/Cellar/autoconf/2.69/bin/autoheader;./configure --disable-lzma;make; cd ../angsd; make HTSSRC=../htslib
Korneliussen TS, Albrechtsen A and Nielsen R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics (2014) 15:356 https://doi.org/10.1186/s12859-014-0356-4
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (2009) 25:1754 https://doi.org/10.1093/bioinformatics/btp324
Picard tools http://broadinstitute.github.io/picard
Sedlazeck FJ, Rescheneder P, von Haeseler A. NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics (2013) 21:2790 https://doi.org/10.1093/bioinformatics/btt468
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.