<!DOCTYPE html>
<html>
<head>
<meta charset='utf-8'>
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<link href='https://fonts.googleapis.com/css?family=Architects+Daughter' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" href="stylesheets/stylesheet.css" media="screen" />
<link rel="stylesheet" type="text/css" href="stylesheets/pygment_trac.css" media="screen" />
<link rel="stylesheet" type="text/css" href="stylesheets/print.css" media="print" />
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<title>mapDamage by the Orlando Group</title>
</head>
<body>
<header>
<div class="inner">
<h1>mapDamage</h1>
<h2>mapDamage: tracking and quantifying damage patterns in ancient DNA sequences</h2>
<a href="https://github.com/ginolhac/mapDamage" class="button"><small>View project on</small>GitHub</a>
</div>
</header>
<div id="content-wrapper">
<div class="inner clearfix">
<section id="main-content">
<h2><a name="content">Important</a></h2>
<p>
Users with versions dating prior to June 12 2013 please update.
A nasty bug that caused the statistical part of mapDamage to use
half of the data for estimation of the damage parameters, sorry for the inconvenience.
</p>
<h2><a name="content">Contents</a></h2>
<li><a href="#a1">Introduction</a></li>
<li><a href="#a2">News in version 2.0</a></li>
<li><a href="#a3">Inputs</a></li>
<li><a href="#a4">Output files</a></li>
<li><a href="#a5">Examples and datasets</a></li>
<li><a href="#a6">Usage</a></li>
<li><a href="#a6.5">Web interface (insideDNA.me)</a></li>
<li><a href="#a7">Description of tables</a></li>
<li><a href="#a8">Graphical outputs</a></li>
<li><a href="#a9">Citation</a></li>
<li><a href="#a10">FAQ</a></li>
<li><a href="#a11">Contact</a></li>
<hr>
<h2><a name="a1">Introduction</a></h2>
<p><a href="http://geogenetics.ku.dk/publications/mapdamage/">mapDamage2</a>
is a computational framework written in Python and R, which tracks and quantifies
DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms. </p>
<p>mapDamage is developed at the <a href="http://geogenetics.ku.dk/">Centre for GeoGenetics</a>
by the <a href="http://geogenetics.ku.dk/research/research_groups/palaeomix_group/"> Orlando Group</a>.</p>
<h3>Requirements</h3>
<ul>
<li>Python (version >= 2.6)</li>
<li>
<a href="http://git-scm.com/">Git</a> </li>
<li>
<a>R</a> (version >= 2.15.1) must be present in your $PATH. Otherwise, only tables will be produced.</li>
<li>
<a href="http://code.google.com/p/pysam/">pysam</a> (version >= 0.6) python package, an interface for reading/writing SAM/BAM files</li>
<li>
<a href="http://www.cython.org">Cython</a> package, only necessary if pysam needs to be built</li>
<li>
<a href="http://www.gnu.org/software/gsl/">gsl</a> GNU scientific library (GSL) </li>
<li>R libraries:
<ul style="list-style: square">
<li><a href="https://cran.r-project.org/web/packages/inline/index.html">inline</a></li>
<li><a href="https://cran.r-project.org/web/packages/gam/index.html">gam</a></li>
<li><a href="https://cran.r-project.org/web/packages/Rcpp/index.html">Rcpp</a></li>
<li><a href="https://cran.r-project.org/web/packages/RcppGSL/index.html">RcppGSL</a></li>
<li><a href="https://cran.r-project.org/web/packages/ggplot2/index.html">ggplot2</a> (>=0.9.2)</li>
</ul>
</li>
</ul><p>Of note, the R libraries and the GSL library are only required for the statistical estimation of DNA damage
parameters. A stripped-down version of <i>seqtk</i> is included in this package for fast computation of reference base composition, it is written by Heng Li
and can be found at <a href="https://github.com/lh3/seqtk">here at github.</a></p>
<h3>Installation</h3>
<p>mapDamage2 was successfully tested on GNU/Linux and MacOSX environments. For MacOSX, you can follow the <a href="mac_install.html">following guidelines</a> written by mapDamage users.</p>
<ol>
<li>
<u>Install git:</u><br>
Follow the instructions available at <a href="http://git-scm.com/downloads">git</a>
(with installers/packages available for all major operating systems).</li><br>
</li>
<li><u>First time installation for mapDamage:</u><br>
Either clone the repository from GitHub<br>
<code>git clone https://github.com/ginolhac/mapDamage.git</code> or <br>
download a tarball/zip archives for a specific <a href="https://github.com/ginolhac/mapDamage/releases">release.</a><br>
go into the resulting folder<br>
<code>cd mapDamage</code><br>
<p>Continue the procedure by running the following command if you have administrator rights<br>
<code>sudo python setup.py install</code><br>
otherwise install it locally<br><code>python setup.py install --user</code><br>
then, <code>$HOME/.local/bin</code> must be in your PATH.</p>
</li>
<li><u>Install pysam:</u><br>
Download the tarball archive for <a href="http://code.google.com/p/pysam/downloads/list">pysam</a>.
Untar the downloaded tar file and follow instructions in the pysam-XXX/INSTALL text file. <a href="http://www.cython.org">Cython</a> is necessary if pysam needs to be built.<br>It is recommend to check if the pysam installation seems okay before proceeding with the following command<br><code>python -c "import pysam"</code><br>
If nothing appears, then proceed.</p></li>
<li><u>Install R:</u><br>
Follow the instructions available at <a href="http://www.r-project.org/">R project</a>
(with installers/packages available for all major operating systems).</li><br>
<li><u>Install the R packages:</u><br>
By running in a R console and selecting a CRAN mirror:<br><code>install.packages("inline")</code><br><code>install.packages("gam")</code><br><code>install.packages("Rcpp")</code><br><code>install.packages("ggplot2")</code></li><br>
<li><u>Install the GSL library <a href="http://www.gnu.org/software/gsl/">gsl</a>:</u><br>
For Debian-based distros use the following command.<br><code>sudo apt-get install libgsl0-dev</code><br>
There should be a equivalent package for Mac using fink or macports.</li><br>
<li><u>Install RcppGSL:</u><br>
In a R session do the following<br><code>install.packages("RcppGSL")</code><br>
If the installation of RcppGSL ended with an unsuccessfully then, download the tarball at <a href="http://cran.r-project.org/web/packages/RcppGSL/index.html">RcppGSL</a> and install it with:<br><code>install.packages("RcppGSL_0.2.2.tar.gz", repo = NULL)</code><br>
Another alternative for downloading this package is to use the <a href="http://dirk.eddelbuettel.com/code/rcpp/">author's mirror archive</a></li>
</ul>
<p>Note that if steps 4-6 fail for some reason, then it is possible to utilize mapDamage without
the statistical function.</p>
</ol>
<h3>Update mapDamage</h3>
When mapDamage is already git-installed, you can get the latest version using git pull.
<li><u>Pull from the git repository:</u><br>
go into the git folder<br><code>cd /path/to/mapDamage</code><br>
Check updates at GitHub and download them if any<br><code>git pull</code><br>
If there is no updates, the command returns: <code>Already up-to-date.</code><br>
If the last line looks like: <code>4 files changed, 49 insertions(+), 35 deletions(-)</code><br>
you can proceed with the next step.
</li>
<li><u>Install downloaded newer version:</u><br>
<p>Install the updated version if needed, by running the following command if you have administrator rights<br>
<code>sudo python setup.py install</code><br>
otherwise install it locally<br><code>python setup.py install --user</code><br>
</li>
<hr><h2><a name="a2">News in version 2.0</a></h2>
<p align="right"><a href="#content">TOP</a></p>
<ul>
<li>The main new feature is the <b>approximate bayesian estimation of damage parameters</b>.<br>
The idea behind the summary approach in mapdamage 2.0 is depicted in this figure:<br><br>
<a href="images/jumpfig.png"><img src="images/jumpfig_small.png"></a>
</li><br>
<li>The following four key damage parameters are estimated in Bayesian manner:
<ol>
<li><b>Lambda</b>, the probability of termating an overhang.</li>
<li><b>DeltaD</b>, the cytosine deamination probability in <i>double</i> strand context.</li>
<li><b>DeltaS</b>, the cytosine deamination probability in <i>single</i> strand context.</li>
<li><b>Theta</b>, the mean difference rate between the reference and the sequenced sample not due to DNA damage.</li>
</ol>
</li>
<li>The <code>--rescale</code> parameter can be optionally used to rescale quality scores of likely damaged positions in the reads. A new BAM file is constructed by downscaling quality values for misincorporations likely due to ancient DNA damage according to their initial qualities, position in reads and damage patterns.</li>
<li>The program was rewritten in python to make it simpler and the dependency on bedtools was replaced by the use of pysam. The memory footprint is now negligible and runtime reduced by <i>ca.</i> 25%.</li>
<li>It is now possible to filter out nucleotides with qualities inferior to a threshold defined by users,
allowing users to reduce the impact of sequencing errors when counting misincorporations.</li>
</ul>
<h2><a name="a3">Inputs</a></h2>
<p align="right"><a href="#content">TOP</a></p>
<p>Two files are needed:</p>
<ul>
<li>A valid SAM/BAM file with a correct header, as argument to the <code>-i</code> option.</li>
<li>A FASTA file that contains reference sequences used for mapping reads, as argument to the <code>-r</code> option. </li>
</ul><p>References described in the SAM/BAM header and the FASTA file must be coherent, <em>i.e</em>,
the references must have identical names and lengths. Extra sequences present in the FASTA header
raise a warning but the program will proceed since all necessary references are available.</p>
<p>As an alternative, one can run only the plotting, statistic estimations or rescaling
on an already processed dataset. Use a combination of <code>-d</code> option followed by
a valid folder and the <code>--plot-only</code>, <code>--stats-only</code> or <code>--rescale-only</code> options.</p>
<h3><a name="PE">Paired ends</a></h3>
<p>
We assume the pairs are facing inwards when counting the position specific misincorporations and rescaling quality scores.
An additional assumption in the rescaling process is the pairs are non-overlapping.
Make sure the pairing information in the BAM/SAM file is correct as we rely on the paired end information provided by the file.
We advise not using non-overlapping paired ends coming from highly degraded samples (small template lengths) as they are
likely contamination.
</p>
<h2><a name="a4">Output files</a></h2>
<p align="right"><a href="#content">TOP</a></p>
<p>When all options are activated, 16 files are produced in the result folder.</p>
<ul>
<li><code>Runtime_log.txt</code>, log file with a summary of command lines used and timestamps.</li>
</ul><p>For the plotting:</p>
<ul>
<li><a href="images/E522_Fragmisincorporation_plot.png"><code>Fragmisincorporation_plot.pdf</code></a>, a pdf file that displays both fragmentation and misincorporation patterns.</li>
<li><a href="images/E522_Length_plot.png"><code>Length_plot.pdf</code></a>, a pdf file that displays length distribution of singleton reads per strand and cumulative
frequencies of C->T at 5'-end and G->A at 3'-end are also displayed per strand. <br><b>Important remark</b>: paired-ended reads are not considered for this analysis since the plot displays the distribution of read lengths and not insert sizes of paired-ended reads. For this last purpose, one can use the dedicated Picard tool <a href="http://picard.sourceforge.net/command-line-overview.shtml#CollectInsertSizeMetrics">CollectInsertSizeMetrics</a>.</li>
<li><code>misincorporation.txt</code>, contains a table with occurrences for each type of mutations and relative positions from the reads ends.</li>
<li><code>5pCtoT_freq.txt</code>, contains frequencies of Cytosine to Thymine mutations per position from the 5'-ends.</li>
<li><code>3pGtoA_freq.txt</code>, contains frequencies of Guanine to Adenine mutations per position from the 3'-ends.</li>
<li><code>dnacomp.txt</code>, contains a table of the reference genome base composition per position, inside reads and adjacent regions.</li>
<li><code>lgdistribution.txt</code>, contains a table with read length distributions per strand.</li>
</ul><p>For the statistical estimation:</p>
<ul>
<li><a href="images/Stats_out_MCMC_hist.png"><code>Stats_out_MCMC_hist.pdf</code></a>, MCMC histogram for the damage parameters and log likelihood.</li>
<li><code>Stats_out_MCMC_iter.csv</code>, values for the damage parameters and log likelihood in each MCMC iteration.</li>
<li><a href="images/Stats_out_MCMC_trace.png"><code>Stats_out_MCMC_trace.pdf</code></a>, a MCMC trace plot for the damage parameters and log likelihood. </li>
<li><code>Stats_out_MCMC_iter_summ_stat.csv</code>, summary statistics for the damage parameters estimated posterior distributions.</li>
<li><a href="images/Stats_out_MCMC_post_pred.png"><code>Stats_out_post_pred.pdf</code></a>, empirical misincorporation frequency and posterior predictive intervals from the fitted model.</li>
<li><code>Stats_out_MCMC_correct_prob.csv</code>, position specific probability of a C->T and G->A misincorporation is due to damage.</li>
<li><code>dnacomp_genome.txt</code>, contains the global reference genome base composition (computed by <a href="https://github.com/lh3/seqtk">seqtk</a>).</li>
<li>Rescaled BAM file, where likely post-mortem damaged bases have downscaled quality scores. </li>
</ul>
<h2><a name="a5">Examples and datasets</a></h2>
<p align="right"><a href="#content">TOP</a></p>
<p>A simple command line that would process the entire BAM file with plotting and statistic estimation is:</p>
<pre><code>mapDamage -i mymap.bam -r myreference.fasta
</code></pre>
<p>To run the plotting part with a new scale to fit lower levels of DNA damages, 0.1 as y-limit instead of 0.3 by default:</p>
<pre><code> mapDamage -d results_mydata -y 0.1 --plot-only
</code></pre>
<p>Running the rescaling of the quality scores taking into account the damage estimates:</p>
<pre><code> mapDamage -i mymap.bam -r myreference --rescale
</code></pre>
<p>To run the statistical estimation using only the 5'-ends and with verbose output:</p>
<pre><code> mapDamage -d results_mydata --forward --stats-only -v
</code></pre>
<p>For further examples and usage introduction see the supplementary material chapter in the
mapDamage2.0 manuscript.</p>
<h2><a name="a6">Usage</a></h2>
<p align="right"><a href="#content">TOP</a></p>
<pre><code>Usage: mapDamage [options] -i BAMfile -r reference.fasta
Use option -h or --help for help
Options:
--version show program's version number and exit
-h, --help show this help message and exit
Input files:
-i FILENAME, --input=FILENAME
SAM/BAM file, must contain a valid header, use '-' for reading a BAM from stdin
-r REF, --reference=REF
Reference file in FASTA format
General options:
-n DOWNSAMPLE, --downsample=DOWNSAMPLE
Downsample to a randomly selected fraction of the reads (if 0 < DOWNSAMPLE < 1),
or a fixed number of randomly selected reads (if DOWNSAMPLE >= 1). By default,
no downsampling is performed.
--downsample-seed Seed value to use for downsampling. See documentation for py module 'random' for default behavior.
-l LENGTH, --length=LENGTH
read length, in nucleotides to consider [70]
-a AROUND, --around=AROUND
nucleotides to retrieve before/after reads [10]
-Q MINQUAL, --min-basequal=MINQUAL
minimun base quality Phred score considered, Phred-33 assumed [0]
-d FOLDER, --folder=FOLDER
folder name to store results [results_FILENAME]
-f, --fasta Write alignments in a FASTA file
--plot-only Run only plotting from a valid result folder
-q, --quiet Disable any output to stdout
-v, --verbose Display progression information during parsing
Options for graphics:
-y YMAX, --ymax=YMAX
graphical y-axis limit for nucleotide misincorporation frequencies [0.3]
-m READPLOT, --readplot=READPLOT
read length, in nucleotides, considered for plotting nucleotide
misincorporations [25]
-b REFPLOT, --refplot=REFPLOT
the number of reference nucleotides to consider for ploting base
composition in the region located upstream and downstream of
every read [10]
-t TITLE, --title=TITLE
title used for both graph and filename [plot]
Options for the statistical estimation:
--rand=RAND Number of random starting points for the likelihood optimization [30]
--burn=BURN Number of burnin iterations [10000]
--adjust=ADJUST Number of adjust proposal variance parameters iterations [10]
--iter=ITER Number of final MCMC iterations [50000]
--forward=FORWARD Using only the 5' end of the seqs [False]
--reverse=REVERSE Using only the 3' end of the seqs [False]
--fix-disp=FIX_DISP Fix dispersion in the overhangs [True]
--diff-hangs The overhangs are different for 5' and 3' [False]
--fix-nicks=FIX_NICKS
Fix the nick frequency vector nu else estimate it with GAM [False]
--jukes-cantor Use Jukes Cantor instead of HKY85
--single_stranded=SINGLE_STRANDED
Single stranded protocol [False]
--seq-length=SEQ_LENGTH
How long sequence to use from each side [12]
--stats-only Run only statistical estimation from a valid result folder
--rescale Rescale the quality scores in the BAM file using the output from
the statistical estimation
--rescale-only Run only rescaling from a valid result folder
--no-stats Disabled statistical estimation, active by default
</code></pre>
<h2><a name="a6.5"> Web interface</a></h2>
Anna Kostikova from <a href="https://insidedna.me">insideDNA</a> implemented a web interface for mapDamage.</br>
Users can adjust the cloud ressources in terms of RAM/CPU to perform their analysis.
She wrote a <a href = "https://insidedna.me/tutorials/view/Analysis-ancient-DNA-samples-using-mapDamage">tutorial</a> explaining how to use this
<a href="https://insidedna.me/app#/tools/100648/">web interface (sign up required).</a>
<h2><a name="a7">Description of tables</a></h2>
<p align="right"><a href="#content">TOP</a></p>
<h3>misincorporation table</h3>
<p>This file looks like:</p>
<pre><code># table produced by mapDamage version 2.0
# using mapped file hits_sort_mts.bam and NC_012920.fasta as reference file
# Chr: reference from sam/bam header, End: from which termini of DNA sequences, Std: strand of reads
Chr End Std Pos A C G T Total G>A C>T A>G T>C A>C A>T C>G C>A T>G T>A G>C G>T A>- T>- C>- G>- ->A ->T ->C ->G S
NC_012920.1 3p + 1 424 579 201 424 1628 48 16 3 5 1 0 2 1 0 2 1 0 0 0 0 0 4 4 1 1 0
NC_012920.1 3p + 2 466 535 213 404 1618 30 12 2 1 0 1 0 0 3 3 1 0 0 0 0 0 2 6 5 7 0
NC_012920.1 3p + 3 514 463 223 417 1617 35 15 3 1 1 0 0 0 1 0 1 1 0 0 0 2 9 4 4 4 0
NC_012920.1 3p + 4 514 483 221 400 1618 30 11 3 4 0 1 1 0 0 1 0 1 1 0 0 2 5 5 8 2 0
</code></pre>
<p>The first lines that start by a hash contain information about the options used while processing the data.
Then, the table contains occurrences of the 12 mutation type + 4 deletions + 4 insertions +
soft-clipping for each combination of reference (<code>Chr</code> column), strand (<code>Std</code> column) , end (<code>End</code> column) and position (<code>Pos</code> column).
In Z->X, Z comes from the reference, X is the nucleotide from reads. The columns A, C, G and T are the respective reference base count
and Total is the sum of the four reference nucleotides. Frequencies depicted in <a href="#Fragmisincorporation_plot">Fragmisincorporation_plot.pdf</a> file are computed per
position as follows</p>
<p>#occurrences of mutations / #occurrences of the reference nucleotide </p>
<p>in order to compensate for reference base composition bias along the positions.</p>
<p>In this example, for G>A substitutions at the first base from the 3'-ends and for the positive strand:
<code>48 / 201 = 0.238806</code></p>
<p>The <a href="#Fragmisincorporation_plot">Fragmisincorporation</a> plot sums up misincorporations for all references and strand orientations, finally
displays the frequencies per sequencing position. </p>
<p><b>5p</b> means reads analyzed from their 5'-ends and are depicted at the bottom left.</p>
<p><b>3p</b> means reads analyzed from their 3'-ends and are depicted at the bottom right.</p>
<p><b>Remark concerning soft-clipping</b>:</p>
<p>Soft-clipped bases are bases located at read extremities that are NOT aligned to the reference.
Those bases are not taken into account and alignments are computed on aligned bases.
However, the soft-clipped bases are recorded and displayed as an orange line as a diagnostic for users.
Even if, the positions for those bases are different from positions of all misincorporations, they should warn
users that reads should be trimmed if soft-clipped base frequencies is high.</p>
<h3>dnacomp table</h3>
<p>Example table below:</p>
<pre><code># table produced by mapDamage version 2.0
# using mapped file hits_sort_mts.bam and NC_012920.fasta as reference file
# Chr: reference from sam/bam header, End: from which termini of DNA sequences, Std: strand of reads
Chr End Std Pos A C G T Total
NC_012920.1 3p + -70 113 77 43 181 414
NC_012920.1 3p + -69 178 140 97 233 648
NC_012920.1 3p + -68 230 145 98 219 692
NC_012920.1 3p + -67 217 157 76 272 722
NC_012920.1 3p + -66 224 188 88 239 739
NC_012920.1 3p + -65 244 165 104 262 775
NC_012920.1 3p + -64 223 185 102 288 798
</code></pre>
<p>The first lines that start by a hash contain information about the options used while processing the data.
Then, the table contains occurrences for each nucleotides, per reference (<code>Chr</code> column),
per strand (<code>Std</code> column) , per end (<code>End</code> column) and per position (<code>Pos</code> column).</p>
<p>At a given position, base composition is recorded either for the reference or for reads depending
on the position and direction. The following table details which sequences and coordinates are
used by default:</p>
<table>
<tr>
<th align="center">Positions</th>
<th align="center">5p</th>
<th align="center">3p</th>
</tr>
<tr>
<td align="center">negative values</td>
<td align="center">reference, -10 to -1</td>
<td align="center">reads, -25 to -1</td>
</tr>
<tr>
<td align="center">positive values</td>
<td align="center">reads, +1 to +25</td>
<td align="center">reference, +1 to +10</td>
</tr>
</table>
<br>
<h3>Stats_out_MCMC_iter_summ_stat table</h3>
<p>This table contains the MCMC output of the five model parameters (depending on options) and log likelihood. The mean, standard deviation and quantiles per 2.5% interval
of the MCMC posterior simulations are reported. The
acceptance ratio (ratio of acceptances in the MCMC iterations) for each model parameter is calculated and depicted in the file and should be around 0.2. The companion file <code>Stats_out_MCMC_iter.csv</code> contains all the values obtained for each MCMC iteration and is available in graphical format as a <a href="#Stats_out_MCMC_hist.png">histogram</a> or a <a href="#Stats_out_MCMC_trace.png">traceplot</a>.
Example table below (the file is actually comma separated, here depicted with tabulations for clarity purposes):</p>
<pre><code>
Theta DeltaD DeltaS Lambda Rho LogLik
Mean 0.0129715432214025 0.0276128283764782 0.506870702430984 0.236778993065734 0.0429440323801133 -290.436006736216
Std. 0.000727998242592221 0.0016991302193519 0.0186911963483404 0.0112842920128649 0.0061180619558739 1.59627579978557
Acceptance ratio 0.1659 0.28692 0.17708 0.17222 0.21354 0.6781
0% 0.010343780887111 0.0200009808935262 0.443618434348717 0.196608727957609 0.024062021508867 -303.394224826161
2.5% 0.0115779148051722 0.0242216196897948 0.471593745448078 0.215033292540976 0.0316283681609979 -294.415675536748
5% 0.0117929060844704 0.0247964028136441 0.476538261177338 0.218032838581406 0.0334894928201135 -293.50800575224
7.5% 0.0119301611221827 0.0251693761425478 0.480224681410536 0.220351157627141 0.0345977332183658 -292.980765571483
10% 0.0120425513929948 0.0254336747705529 0.48309367716994 0.222245004085287 0.0354838414161916 -292.586815783461
12.5% 0.0121442465603967 0.0256581391709088 0.485339023472775 0.223852846490496 0.0361775294966489 -292.277395952081
15%...
.....
</code></pre>
<br>
<h3>Stats_out_MCMC_correct_prob table</h3>
<p>This table contains the position-specific probability of observing a C->T or a G->A due to a <i>post-mortem</i> damage.<br>
Example table below (the file is actually comma separated, here depicted with tabulations for clarity purposes):</p>
<pre><code>
Position C.T G.A
1 1 0.980784246710695 0.65527195218869
2 2 0.97568346676245 0.71352845563181
3 3 0.969521194936301 0.743638120957285
4 4 0.962170583850572 0.766805824762406
5 5 0.953542075730188 0.787010580221663
6 6 0.943631337701685 0.803957830256353
7 7 0.932671721032529 0.81529717262592
8 8 0.921048902653476 0.82189320180069
9 9 0.909252012734666 0.825047159166101
10...
.....
</code></pre>
<br>
<h2><a name="a8">Graphical outputs</a></h2>
<p align="right"><a href="#content">TOP</a></p>
<h3>Visualizing damage patterns</h3>
<li>The mis-incorporation plot already produced by mapdamge1.0 was slighly modified:
<a name="Fragmisincorporation_plot" href="images/E522_Fragmisincorporation_plot.png"><img src="images/E522_Fragmisincorporation_plot_small.png"></a><br>
<p style="font-size:10px">Sample E522. The four upper mini-plots show the base frequency outside and in the read (the open grey box corresponds to the read). The bottom plots are the positions' specific substitutions from the 5" (left) and the 3" end (right). See <a href=#a9><i>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>
</li>
<li>An additional plot for the length distributions of singletons is now produced:
<a href="images/E522_Length_plot.png"><img src="images/E522_Length_plot_small.png"></a>
<p style="font-size:10px">Sample E522. The upper two plots are histograms of the read lengths. The lower two plots are the empirical cumulative frequency of C->T and G->A misincorporations, normalized by the first 70 positions. <a href=#a9>See <i>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>
</li>
<h3>Quantification of damage patterns</h3>
<li>Fit of the model to the observed mis-incorporation patterns:
<a href="images/Stats_out_MCMC_post_pred.png"><img src="images/Stats_out_MCMC_post_pred_small.png"></a><br>
<p style="font-size:10px">Sample E522.The empirical misincorporation frequencies and simulated posterior predictive intervals from the fitted model are depicted. <a href=#a9>See <i>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>
</li>
<li>Posterior distributions of estimated parameters:
<a name="Stats_out_MCMC_hist.png" href="images/Stats_out_MCMC_hist.png"><img src="images/Stats_out_MCMC_hist_small.png"></a><br>
<p style="font-size:10px">Sample E522. Simulated posterior distributions of the five model parameters and log likelihood. <a href=#a9>See <i>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>
</li>
<li>Trace plot of estimated parameters:
<a name="Stats_out_MCMC_trace.png" href="images/Stats_out_MCMC_trace.png"><img src="images/Stats_out_MCMC_trace_small.png"></a><br>
<p style="font-size:10px">Sample E522. MCMC trace plots of the five model parameters and log likelihood. See <i><a href=#a9>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>
</li>
<h2><a name="a9">Citation</a></h2>
<p>If you use this program, please cite the following publication:<br>
Jónsson H, Ginolhac A, Schubert M, Johnson P, Orlando L.
mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters.
<em>Bioinformatics</em> 2013. <a href="http://bioinformatics.oxfordjournals.org/content/early/2013/04/23/bioinformatics.btt193.abstract">23rd April 2013. doi: 10.1093/bioinformatics/btt193</a></p>
<p>The original mapDamage1 is described in the following article:<br>
Ginolhac A, Rasmussen M, Gilbert MT, Willerslev E, Orlando L.
mapDamage: testing for damage patterns in ancient DNA sequences.
<a href="http://bioinformatics.oxfordjournals.org/content/27/15/2153"><em>Bioinformatics</em> 2011 <b>27</b> (15):2153-2155</a></p>
<p> Schuenemann VJ, Bos K, DeWitte S, Schmedes S, Jamieson J, Mittnik A, Forrest
S, Coombes BK, Wood JW, Earn DJ, White W, Krause J, Poinar HN. Targeted
enrichment of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis
from victims of the Black Death. <i>Proc Natl Acad Sci USA</i>. 2011 108:E746-52. doi: 10.1073/pnas.1105107108.</p>
<h2><a name="a10">FAQ</a></h2>
<p>Q: I got this error: could not find function "ggtitle"</p>
<p>A: update your ggplot2 package to at least version 0.9.2 by running:<br>
<code>update.packages("ggplot2")</code><br>and selecting a CRAN mirror</p>
<p>Q: Why do I do not get the Lengh_plot.pdf file? (or only a blank page)</p><br>
<p>A: This is likely you are using paired-end reads. The length distribution is performed only for single-end reads using their lengths.
In theory, paired-end reads could also be used but we think this does not represent the real insert size and could be misleading.
Best results are obtained by using merged paired-end reads (mapped as single-end), so when the full inserts are sequenced.</p>
<p>Q: How to clone the development version from github?</p>
<p>A: Use the following commands</p>
<p><code>git clone --recursive https://github.com/ginolhac/mapDamage.git</code><br>
<p>Q: Where is the original version of mapDamage?</p>
<p>A: The original web page with examples, datasets and result files is here:</p>
<p><a href="http://geogenetics.ku.dk/publications/mapdamage/">http://geogenetics.ku.dk/publications/mapdamage/</a></p>
<h2><a name="a11">Contact</a></h2>
<p align="right"><a href="#content">TOP</a></p>
<p>Please report bugs and suggest possible improvements to Aurélien Ginolhac, Mikkel Schubert or Hákon Jónsson by email:
ginolhac at gmail.com, MSchubert at snm.ku.dk or jonsson.hakon at gmail.com.</p>
</section>
<aside id="sidebar">
<p class="repo-owner"><a href="https://github.com/ginolhac/mapDamage"></a> is maintained by <a href="https://github.com/ginolhac">ginolhac</a>, <a href="https://github.com/MikkelSchubert">MikkelSchubert</a> and <a href="https://github.com/hakon-jon">hakon-jon</a>.</p>
<p>This page was generated by <a href="pages.github.com">GitHub Pages</a> using the Architect theme by <a href="http://twitter.com/jasonlong">Jason Long</a>.</p>
</aside>
</div>
</div>
</body>
</html>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.