docs/index2.R

<!DOCTYPE html>

  <html>

  <head>

  <meta charset='utf-8'>

    <meta http-equiv="X-UA-Compatible" content="chrome=1">

      <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">

        <link href='https://fonts.googleapis.com/css?family=Architects+Daughter' rel='stylesheet' type='text/css'>

          <link rel="stylesheet" type="text/css" href="stylesheets/stylesheet.css" media="screen" />

            <link rel="stylesheet" type="text/css" href="stylesheets/pygment_trac.css" media="screen" />

              <link rel="stylesheet" type="text/css" href="stylesheets/print.css" media="print" />



                <!--[if lt IE 9]>

                <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>

                  <![endif]-->



                  <title>mapDamage by the Orlando Group</title>

                  </head>



                  <body>

                  <header>

                  <div class="inner">

                    <h1>mapDamage</h1>

                    <h2>mapDamage: tracking and quantifying damage patterns in ancient DNA sequences</h2>

                    <a href="https://github.com/ginolhac/mapDamage" class="button"><small>View project on</small>GitHub</a>

                      </div>

                      </header>



                      <div id="content-wrapper">

                        <div class="inner clearfix">

                          <section id="main-content">



                            <h2><a name="content">Important</a></h2>

                              <p>

                              Users with versions dating prior to June 12 2013 please update.

                            A nasty bug that caused the statistical part of mapDamage to use

                            half of the data for estimation of the damage parameters, sorry for the inconvenience.

                            </p>

                              <h2><a name="content">Contents</a></h2>

                              <li><a href="#a1">Introduction</a></li>

                              <li><a href="#a2">News in version 2.0</a></li>

                              <li><a href="#a3">Inputs</a></li>

                              <li><a href="#a4">Output files</a></li>

                              <li><a href="#a5">Examples and datasets</a></li>

                              <li><a href="#a6">Usage</a></li>

                              <li><a href="#a6.5">Web interface (insideDNA.me)</a></li>

                              <li><a href="#a7">Description of tables</a></li>

                              <li><a href="#a8">Graphical outputs</a></li>

                              <li><a href="#a9">Citation</a></li>

                              <li><a href="#a10">FAQ</a></li>

                              <li><a href="#a11">Contact</a></li>



                              <hr>

                              <h2><a name="a1">Introduction</a></h2>



                              <p><a href="http://geogenetics.ku.dk/publications/mapdamage/">mapDamage2</a>

                              is a computational framework written in Python and R, which tracks and quantifies

                            DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms. </p>

                              <p>mapDamage is developed at the <a href="http://geogenetics.ku.dk/">Centre for GeoGenetics</a>

                              by the <a href="http://geogenetics.ku.dk/research/research_groups/palaeomix_group/"> Orlando Group</a>.</p>



                              <h3>Requirements</h3>



                              <ul>

                              <li>Python (version &gt;= 2.6)</li>

                              <li>

                              <a href="http://git-scm.com/">Git</a> </li>

                              <li>

                              <a>R</a> (version &gt;= 2.15.1) must be present in your $PATH. Otherwise, only tables will be produced.</li>

                              <li>

                              <a href="http://code.google.com/p/pysam/">pysam</a> (version &gt;= 0.6) python package, an interface for reading/writing SAM/BAM files</li>

                              <li>

                              <a href="http://www.cython.org">Cython</a> package, only necessary if pysam needs to be built</li>

                              <li>

                              <a href="http://www.gnu.org/software/gsl/">gsl</a> GNU scientific library (GSL) </li>

                              <li>R libraries:



                              <ul style="list-style: square">

                              <li><a href="https://cran.r-project.org/web/packages/inline/index.html">inline</a></li>

                              <li><a href="https://cran.r-project.org/web/packages/gam/index.html">gam</a></li>

                              <li><a href="https://cran.r-project.org/web/packages/Rcpp/index.html">Rcpp</a></li>

                              <li><a href="https://cran.r-project.org/web/packages/RcppGSL/index.html">RcppGSL</a></li>

                              <li><a href="https://cran.r-project.org/web/packages/ggplot2/index.html">ggplot2</a> (&gt;=0.9.2)</li>

                              </ul>

                              </li>

                              </ul><p>Of note, the R libraries and the GSL library are only required for the statistical estimation of DNA damage

                            parameters. A stripped-down version of <i>seqtk</i> is included in this package for fast computation of reference base composition, it is written by Heng Li

                            and can be found at <a href="https://github.com/lh3/seqtk">here at github.</a></p>



                              <h3>Installation</h3>



                              <p>mapDamage2 was successfully tested on GNU/Linux and MacOSX environments. For MacOSX, you can follow the <a href="mac_install.html">following guidelines</a> written by mapDamage users.</p>



                              <ol>

                              <li>

                              <u>Install git:</u><br>

                              Follow the instructions available at <a href="http://git-scm.com/downloads">git</a>

                              (with installers/packages available for all major operating systems).</li><br>

                              </li>

                              <li><u>First time installation for mapDamage:</u><br>

                              Either clone the repository from GitHub<br>

                              <code>git clone https://github.com/ginolhac/mapDamage.git</code> or <br>
                              download a tarball/zip archives for a specific <a href="https://github.com/ginolhac/mapDamage/releases">release.</a><br>

                              go into the resulting folder<br>

                              <code>cd mapDamage</code><br>

                              <p>Continue the procedure by running the following command if you have administrator rights<br>

                              <code>sudo python setup.py install</code><br>

                              otherwise install it locally<br><code>python setup.py install --user</code><br>

                              then, <code>$HOME/.local/bin</code> must be in your PATH.</p>

                              </li>

                              <li><u>Install pysam:</u><br>

                              Download the tarball archive for <a href="http://code.google.com/p/pysam/downloads/list">pysam</a>.

                            Untar the downloaded tar file and follow instructions in the pysam-XXX/INSTALL text file. <a href="http://www.cython.org">Cython</a> is necessary if pysam needs to be built.<br>It is recommend to check if the pysam installation seems okay before proceeding with the following command<br><code>python -c "import pysam"</code><br>

                              If nothing appears, then proceed.</p></li>

                              <li><u>Install R:</u><br>

                              Follow the instructions available at <a href="http://www.r-project.org/">R project</a>

                              (with installers/packages available for all major operating systems).</li><br>

                              <li><u>Install the R packages:</u><br>

                              By running in a R console and selecting a CRAN mirror:<br><code>install.packages("inline")</code><br><code>install.packages("gam")</code><br><code>install.packages("Rcpp")</code><br><code>install.packages("ggplot2")</code></li><br>

                              <li><u>Install the  GSL library <a href="http://www.gnu.org/software/gsl/">gsl</a>:</u><br>

                              For Debian-based distros use the following command.<br><code>sudo apt-get install libgsl0-dev</code><br>

                              There should be a equivalent package for Mac using fink or macports.</li><br>

                              <li><u>Install RcppGSL:</u><br>

                              In a R session do the following<br><code>install.packages("RcppGSL")</code><br>

                              If the installation of RcppGSL ended with an unsuccessfully then, download the tarball at <a href="http://cran.r-project.org/web/packages/RcppGSL/index.html">RcppGSL</a> and install it with:<br><code>install.packages("RcppGSL_0.2.2.tar.gz", repo = NULL)</code><br>
                              Another alternative for downloading this package is to use the <a href="http://dirk.eddelbuettel.com/code/rcpp/">author's mirror archive</a></li>

                            </ul>

                            <p>Note that if steps 4-6 fail for some reason, then it is possible to utilize mapDamage without

                            the statistical function.</p>

                            </ol>



                            <h3>Update mapDamage</h3>

                            When mapDamage is already git-installed, you can get the latest version using git pull.

                            <li><u>Pull from the git repository:</u><br>

                            go into the git folder<br><code>cd /path/to/mapDamage</code><br>

                            Check updates at GitHub and download them if any<br><code>git pull</code><br>

                            If there is no updates, the command returns: <code>Already up-to-date.</code><br>

                            If the last line looks like: <code>4 files changed, 49 insertions(+), 35 deletions(-)</code><br>

                            you can proceed with the next step.

                            </li>

                            <li><u>Install downloaded newer version:</u><br>

                            <p>Install the updated version if needed, by running the following command if you have administrator rights<br>

                            <code>sudo python setup.py install</code><br>

                            otherwise install it locally<br><code>python setup.py install --user</code><br>

                            </li>







                            <hr><h2><a name="a2">News in version 2.0</a></h2>

                            <p align="right"><a href="#content">TOP</a></p>

                            <ul>

                            <li>The main new feature is the <b>approximate bayesian estimation of damage parameters</b>.<br>

                            The idea behind the summary approach in mapdamage 2.0 is depicted in this figure:<br><br>

                            <a href="images/jumpfig.png"><img src="images/jumpfig_small.png"></a>



                            </li><br>

                            <li>The following four key damage parameters are estimated in Bayesian manner:

                            <ol>

                            <li><b>Lambda</b>, the probability of termating an overhang.</li>

                            <li><b>DeltaD</b>, the cytosine deamination probability in <i>double</i> strand context.</li>

                            <li><b>DeltaS</b>, the cytosine deamination probability in <i>single</i> strand context.</li>

                            <li><b>Theta</b>, the mean difference rate between the reference and the sequenced sample not due to DNA damage.</li>

                            </ol>

                            </li>

                            <li>The <code>--rescale</code> parameter can be optionally used to rescale quality scores of likely damaged positions in the reads. A new BAM file is constructed by downscaling quality values for misincorporations likely due to ancient DNA damage according to their initial qualities, position in reads and damage patterns.</li>

                            <li>The program was rewritten in python to make it simpler and the dependency on bedtools was replaced by the use of pysam. The memory footprint is now negligible and runtime reduced by <i>ca.</i> 25%.</li>

                            <li>It is now possible to filter out nucleotides with qualities inferior to a threshold defined by users,

                            allowing users to reduce the impact of sequencing errors when counting misincorporations.</li>

                            </ul>





                            <h2><a name="a3">Inputs</a></h2>

                            <p align="right"><a href="#content">TOP</a></p>

                            <p>Two files are needed:</p>



                            <ul>

                            <li>A valid SAM/BAM file with a correct header, as argument to the <code>-i</code> option.</li>

                            <li>A FASTA file that contains reference sequences used for mapping reads, as argument to the <code>-r</code> option. </li>

                            </ul><p>References described in the SAM/BAM header and the FASTA file must be coherent, <em>i.e</em>,

                            the references must have identical names and lengths. Extra sequences present in the FASTA header

                            raise a warning but the program will proceed since all necessary references are available.</p>



                            <p>As an alternative, one can run only the plotting, statistic estimations or rescaling

                            on an already processed dataset. Use a combination of <code>-d</code> option followed by

                            a valid folder and the <code>--plot-only</code>, <code>--stats-only</code> or  <code>--rescale-only</code> options.</p>



                            <h3><a name="PE">Paired ends</a></h3>

                            <p>

                            We assume the pairs are facing inwards when counting the position specific misincorporations and rescaling quality scores.

                            An additional assumption in the rescaling process is the pairs are non-overlapping.

                            Make sure the pairing information in the BAM/SAM file is correct as we rely on the paired end information provided by the file.

                            We advise not using non-overlapping paired ends coming from highly degraded samples (small template lengths) as they are

                            likely contamination.

                            </p>

                            <h2><a name="a4">Output files</a></h2>

                            <p align="right"><a href="#content">TOP</a></p>

                            <p>When all options are activated, 16 files are produced in the result folder.</p>



                            <ul>

                            <li><code>Runtime_log.txt</code>, log file with a summary of command lines used and timestamps.</li>

                            </ul><p>For the plotting:</p>



                            <ul>

                            <li><a href="images/E522_Fragmisincorporation_plot.png"><code>Fragmisincorporation_plot.pdf</code></a>, a pdf file that displays both fragmentation and misincorporation patterns.</li>

                            <li><a href="images/E522_Length_plot.png"><code>Length_plot.pdf</code></a>, a pdf file that displays length distribution of singleton reads per strand and cumulative

                            frequencies of C-&gt;T at 5'-end and G-&gt;A at 3'-end are also displayed per strand. <br><b>Important remark</b>: paired-ended reads are not considered for this analysis since the plot displays the distribution of read lengths and not insert sizes of paired-ended reads. For this last purpose, one can use the dedicated Picard tool <a href="http://picard.sourceforge.net/command-line-overview.shtml#CollectInsertSizeMetrics">CollectInsertSizeMetrics</a>.</li>

                            <li><code>misincorporation.txt</code>, contains a table with occurrences for each type of mutations and relative positions from the reads ends.</li>

                            <li><code>5pCtoT_freq.txt</code>, contains frequencies of Cytosine to Thymine mutations per position from the 5'-ends.</li>

                              <li><code>3pGtoA_freq.txt</code>, contains frequencies of Guanine to Adenine mutations per position from the 3'-ends.</li>

                            <li><code>dnacomp.txt</code>, contains a table of the reference genome base composition per position, inside reads and adjacent regions.</li>

                            <li><code>lgdistribution.txt</code>, contains a table with read length distributions per strand.</li>

                            </ul><p>For the statistical estimation:</p>



                            <ul>

                            <li><a href="images/Stats_out_MCMC_hist.png"><code>Stats_out_MCMC_hist.pdf</code></a>, MCMC histogram for the damage parameters and log likelihood.</li>

                            <li><code>Stats_out_MCMC_iter.csv</code>, values for the damage parameters and log likelihood in each MCMC iteration.</li>

                            <li><a href="images/Stats_out_MCMC_trace.png"><code>Stats_out_MCMC_trace.pdf</code></a>, a MCMC trace plot for the damage parameters and log likelihood. </li>

                            <li><code>Stats_out_MCMC_iter_summ_stat.csv</code>, summary statistics for the damage parameters estimated posterior distributions.</li>

                            <li><a href="images/Stats_out_MCMC_post_pred.png"><code>Stats_out_post_pred.pdf</code></a>, empirical misincorporation frequency and posterior predictive intervals from the fitted model.</li>

                            <li><code>Stats_out_MCMC_correct_prob.csv</code>, position specific probability of a C-&gt;T and G-&gt;A misincorporation is due to damage.</li>

                            <li><code>dnacomp_genome.txt</code>, contains the global reference genome base composition (computed by <a href="https://github.com/lh3/seqtk">seqtk</a>).</li>

                            <li>Rescaled BAM file, where likely post-mortem damaged bases have downscaled quality scores. </li>

                            </ul>

                            <h2><a name="a5">Examples and datasets</a></h2>

                            <p align="right"><a href="#content">TOP</a></p>

                            <p>A simple command line that would process the entire BAM file with plotting and statistic estimation is:</p>



                            <pre><code>mapDamage -i mymap.bam -r myreference.fasta

                            </code></pre>



                            <p>To run the plotting part with a new scale to fit lower levels of DNA damages, 0.1 as y-limit instead of 0.3 by default:</p>



                            <pre><code> mapDamage -d results_mydata -y 0.1 --plot-only

                            </code></pre>



                            <p>Running the rescaling of the quality scores taking into account the damage estimates:</p>

                            <pre><code> mapDamage -i mymap.bam -r myreference --rescale

                            </code></pre>



                            <p>To run the statistical estimation using only the 5'-ends and with verbose output:</p>



                              <pre><code> mapDamage -d results_mydata --forward --stats-only -v

                            </code></pre>

                              <p>For further examples and usage introduction see the supplementary material chapter in the

                            mapDamage2.0 manuscript.</p>







                              <h2><a name="a6">Usage</a></h2>

                              <p align="right"><a href="#content">TOP</a></p>

                              <pre><code>Usage: mapDamage [options] -i BAMfile -r reference.fasta



                            Use option -h or --help for help



                            Options:



                              --version             show program's version number and exit

                            -h, --help            show this help message and exit



                            Input files:



                            -i FILENAME, --input=FILENAME

                            SAM/BAM file, must contain a valid header, use '-' for reading a BAM from stdin

                            -r REF, --reference=REF

                            Reference file in FASTA format

                            General options:

                            -n DOWNSAMPLE, --downsample=DOWNSAMPLE

                            Downsample to a randomly selected fraction of the reads (if 0 &lt; DOWNSAMPLE &lt; 1),

                            or a fixed number of randomly selected reads (if DOWNSAMPLE &gt;= 1). By default,

                            no downsampling is performed.

                            --downsample-seed   Seed value to use for downsampling. See documentation for py module 'random' for default behavior.

                            -l LENGTH, --length=LENGTH

                            read length, in nucleotides to consider [70]

                            -a AROUND, --around=AROUND

                            nucleotides to retrieve before/after reads [10]

                            -Q MINQUAL, --min-basequal=MINQUAL

                            minimun base quality Phred score considered, Phred-33 assumed [0]

                            -d FOLDER, --folder=FOLDER

                            folder name to store results [results_FILENAME]

                            -f, --fasta         Write alignments in a FASTA file

                            --plot-only         Run only plotting from a valid result folder

                            -q, --quiet         Disable any output to stdout

                            -v, --verbose       Display progression information during parsing



                            Options for graphics:



                            -y YMAX, --ymax=YMAX

                            graphical y-axis limit for nucleotide misincorporation frequencies [0.3]

                            -m READPLOT, --readplot=READPLOT

                            read length, in nucleotides, considered for plotting nucleotide

                            misincorporations [25]

                            -b REFPLOT, --refplot=REFPLOT

                            the number of reference nucleotides to consider for ploting base

                            composition in the region located upstream and downstream of

                            every read [10]

                            -t TITLE, --title=TITLE

                            title used for both graph and filename [plot]



                            Options for the statistical estimation:



                            --rand=RAND         Number of random starting points for the likelihood optimization [30]

                            --burn=BURN         Number of burnin iterations [10000]

                            --adjust=ADJUST     Number of adjust proposal variance parameters iterations [10]

                            --iter=ITER         Number of final MCMC iterations [50000]

                            --forward=FORWARD   Using only the 5' end of the seqs [False]

                            --reverse=REVERSE   Using only the 3' end of the seqs [False]

                            --fix-disp=FIX_DISP Fix dispersion in the overhangs [True]

                            --diff-hangs        The overhangs are different for 5' and 3'  [False]

                            --fix-nicks=FIX_NICKS

                            Fix the nick frequency vector nu else estimate it with GAM [False]

                            --jukes-cantor      Use Jukes Cantor instead of HKY85

                            --single_stranded=SINGLE_STRANDED

                            Single stranded protocol [False]

                            --seq-length=SEQ_LENGTH

                            How long sequence to use from each side [12]

                            --stats-only        Run only statistical estimation from a valid result folder

                            --rescale           Rescale the quality scores in the BAM file using the output from

                            the statistical estimation

                            --rescale-only      Run only rescaling from a valid result folder

                            --no-stats          Disabled statistical estimation, active by default

                            </code></pre>


                            <h2><a name="a6.5"> Web interface</a></h2>
                            Anna Kostikova from <a href="https://insidedna.me">insideDNA</a> implemented a web interface for mapDamage.</br>
                            Users can adjust the cloud ressources in terms of RAM/CPU to perform their analysis.
                            She wrote a <a href = "https://insidedna.me/tutorials/view/Analysis-ancient-DNA-samples-using-mapDamage">tutorial</a> explaining how to use this
                            <a href="https://insidedna.me/app#/tools/100648/">web interface (sign up required).</a>

                            <h2><a name="a7">Description of tables</a></h2>

                            <p align="right"><a href="#content">TOP</a></p>

                            <h3>misincorporation table</h3>



                            <p>This file looks like:</p>



                            <pre><code># table produced by mapDamage version 2.0

                            # using mapped file hits_sort_mts.bam and NC_012920.fasta as reference file

                            # Chr: reference from sam/bam header, End: from which termini of DNA sequences, Std: strand of reads

                            Chr End Std Pos A   C   G   T   Total   G&gt;A C&gt;T A&gt;G T&gt;C A&gt;C A&gt;T C&gt;G C&gt;A T&gt;G T&gt;A G&gt;C G&gt;T A&gt;- T&gt;- C&gt;- G&gt;- -&gt;A -&gt;T -&gt;C -&gt;G S

                            NC_012920.1 3p  +   1   424 579 201 424 1628    48  16  3   5   1   0   2   1   0   2   1   0   0   0   0   0   4   4   1   1   0

                            NC_012920.1 3p  +   2   466 535 213 404 1618    30  12  2   1   0   1   0   0   3   3   1   0   0   0   0   0   2   6   5   7   0

                            NC_012920.1 3p  +   3   514 463 223 417 1617    35  15  3   1   1   0   0   0   1   0   1   1   0   0   0   2   9   4   4   4   0

                            NC_012920.1 3p  +   4   514 483 221 400 1618    30  11  3   4   0   1   1   0   0   1   0   1   1   0   0   2   5   5   8   2   0

                            </code></pre>



                            <p>The first lines that start by a hash contain information about the options used while processing the data.

                            Then, the table contains occurrences of the 12 mutation type + 4 deletions + 4 insertions +

                            soft-clipping for each combination of reference (<code>Chr</code> column),  strand (<code>Std</code> column) ,  end (<code>End</code> column) and  position (<code>Pos</code> column).

                            In Z-&gt;X, Z comes from the reference, X is the nucleotide from reads. The columns A, C, G and T are the respective reference base count

                            and Total is the sum of the four reference nucleotides. Frequencies depicted in <a href="#Fragmisincorporation_plot">Fragmisincorporation_plot.pdf</a> file are computed per

                            position as follows</p>



                            <p>#occurrences of mutations / #occurrences of the reference nucleotide </p>



                            <p>in order to compensate for reference base composition bias along the positions.</p>



                            <p>In this example, for G&gt;A substitutions at the first base from the 3'-ends and for the positive strand:

                              <code>48 / 201 = 0.238806</code></p>



                              <p>The <a href="#Fragmisincorporation_plot">Fragmisincorporation</a> plot sums up misincorporations  for all references and strand orientations, finally

                            displays the frequencies per sequencing position.  </p>



                              <p><b>5p</b> means reads analyzed from their 5'-ends and are depicted at the bottom left.</p>



                            <p><b>3p</b> means reads analyzed from their 3'-ends and are depicted at the bottom right.</p>



                              <p><b>Remark concerning soft-clipping</b>:</p>



                              <p>Soft-clipped bases are bases located at read extremities that are NOT aligned to the reference.

                            Those bases are not taken into account and alignments are computed on aligned bases.

                            However, the soft-clipped bases are recorded and displayed as an orange line as a diagnostic for users.

                            Even if, the positions for those bases are different from positions of all misincorporations, they should warn

                            users that reads should be trimmed if soft-clipped base frequencies is high.</p>



                              <h3>dnacomp table</h3>



                              <p>Example table below:</p>



                              <pre><code># table produced by mapDamage version 2.0

                              # using mapped file hits_sort_mts.bam and NC_012920.fasta as reference file

                              # Chr: reference from sam/bam header, End: from which termini of DNA sequences, Std: strand of reads

                              Chr End Std Pos A   C   G   T   Total

                            NC_012920.1 3p  +   -70 113 77  43  181 414

                            NC_012920.1 3p  +   -69 178 140 97  233 648

                            NC_012920.1 3p  +   -68 230 145 98  219 692

                            NC_012920.1 3p  +   -67 217 157 76  272 722

                            NC_012920.1 3p  +   -66 224 188 88  239 739

                            NC_012920.1 3p  +   -65 244 165 104 262 775

                            NC_012920.1 3p  +   -64 223 185 102 288 798

                            </code></pre>



                              <p>The first lines that start by a hash contain information about the options used while processing the data.

                            Then, the table contains occurrences for each nucleotides, per reference (<code>Chr</code> column),

                            per strand (<code>Std</code> column) , per end (<code>End</code> column) and per position (<code>Pos</code> column).</p>



                              <p>At a given position, base composition is recorded either for the reference or for reads depending

                            on the position and direction. The following table details which sequences and coordinates are

                            used by default:</p>



                              <table>

                              <tr>

                              <th align="center">Positions</th>

                              <th align="center">5p</th>

                              <th align="center">3p</th>

                              </tr>

                              <tr>

                              <td align="center">negative values</td>

                              <td align="center">reference, -10 to -1</td>

                              <td align="center">reads, -25 to -1</td>

                              </tr>

                              <tr>

                              <td align="center">positive values</td>

                              <td align="center">reads, +1 to +25</td>

                              <td align="center">reference, +1 to +10</td>

                              </tr>

                              </table>

                              <br>



                              <h3>Stats_out_MCMC_iter_summ_stat table</h3>



                              <p>This table contains the MCMC output of the five model parameters (depending on options) and log likelihood. The mean, standard deviation and quantiles per 2.5% interval

                            of the MCMC posterior simulations are reported. The

                            acceptance ratio (ratio of acceptances in the MCMC iterations) for each model parameter is calculated and depicted in the file and should be around 0.2. The companion file <code>Stats_out_MCMC_iter.csv</code> contains all the values obtained for each MCMC iteration and is available in graphical format as a <a href="#Stats_out_MCMC_hist.png">histogram</a> or a <a href="#Stats_out_MCMC_trace.png">traceplot</a>.

                            Example table below (the file is actually comma separated, here depicted with tabulations for clarity purposes):</p>



                              <pre><code>

                              Theta			DeltaD			DeltaS			Lambda			Rho			LogLik

                            Mean	0.0129715432214025	0.0276128283764782	0.506870702430984	0.236778993065734	0.0429440323801133	-290.436006736216

                            Std.	0.000727998242592221	0.0016991302193519	0.0186911963483404	0.0112842920128649	0.0061180619558739	1.59627579978557

                            Acceptance ratio	0.1659	0.28692			0.17708			0.17222			0.21354			0.6781

                            0%	0.010343780887111	0.0200009808935262	0.443618434348717	0.196608727957609	0.024062021508867	-303.394224826161

                            2.5%	0.0115779148051722	0.0242216196897948	0.471593745448078	0.215033292540976	0.0316283681609979	-294.415675536748

                            5%	0.0117929060844704	0.0247964028136441	0.476538261177338	0.218032838581406	0.0334894928201135	-293.50800575224

                            7.5%	0.0119301611221827	0.0251693761425478	0.480224681410536	0.220351157627141	0.0345977332183658	-292.980765571483

                            10%	0.0120425513929948	0.0254336747705529	0.48309367716994	0.222245004085287	0.0354838414161916	-292.586815783461

                            12.5%	0.0121442465603967	0.0256581391709088	0.485339023472775	0.223852846490496	0.0361775294966489	-292.277395952081

                            15%...

                            .....

                            </code></pre>

                              <br>

                              <h3>Stats_out_MCMC_correct_prob table</h3>



                              <p>This table contains the position-specific probability of observing a C-&gt;T or a G-&gt;A due to a <i>post-mortem</i> damage.<br>

                              Example table below (the file is actually comma separated, here depicted with tabulations for clarity purposes):</p>



                              <pre><code>

                              Position	C.T			G.A

                            1	1	0.980784246710695	0.65527195218869

                            2	2	0.97568346676245	0.71352845563181

                            3	3	0.969521194936301	0.743638120957285

                            4	4	0.962170583850572	0.766805824762406

                            5	5	0.953542075730188	0.787010580221663

                            6	6	0.943631337701685	0.803957830256353

                            7	7	0.932671721032529	0.81529717262592

                            8	8	0.921048902653476	0.82189320180069

                            9	9	0.909252012734666	0.825047159166101

                            10...

                            .....

                            </code></pre>

                              <br>



                              <h2><a name="a8">Graphical outputs</a></h2>

                              <p align="right"><a href="#content">TOP</a></p>

                              <h3>Visualizing damage patterns</h3>

                              <li>The mis-incorporation plot already produced by mapdamge1.0 was slighly modified:

                              <a name="Fragmisincorporation_plot" href="images/E522_Fragmisincorporation_plot.png"><img src="images/E522_Fragmisincorporation_plot_small.png"></a><br>

                              <p style="font-size:10px">Sample E522. The four upper mini-plots show the base frequency outside and in the read (the open grey box corresponds to the read). The bottom plots are the positions' specific substitutions from the 5" (left) and the 3" end (right). See <a href=#a9><i>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>

                            </li>

                            <li>An additional plot for the length distributions of singletons is now produced:

                            <a href="images/E522_Length_plot.png"><img src="images/E522_Length_plot_small.png"></a>

                            <p style="font-size:10px">Sample E522. The upper two plots are histograms of the read lengths. The lower two plots are the empirical cumulative frequency of C-&gt;T and G-&gt;A misincorporations, normalized by the first 70 positions. <a href=#a9>See <i>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>

                            </li>

                            <h3>Quantification of damage patterns</h3>

                            <li>Fit of the model to the observed mis-incorporation patterns:

                            <a href="images/Stats_out_MCMC_post_pred.png"><img src="images/Stats_out_MCMC_post_pred_small.png"></a><br>

                            <p style="font-size:10px">Sample E522.The empirical misincorporation frequencies and simulated posterior predictive intervals from the fitted model are depicted. <a href=#a9>See <i>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>

                            </li>

                            <li>Posterior distributions of estimated parameters:

                            <a name="Stats_out_MCMC_hist.png" href="images/Stats_out_MCMC_hist.png"><img src="images/Stats_out_MCMC_hist_small.png"></a><br>

                            <p style="font-size:10px">Sample E522. Simulated posterior distributions of the five model parameters and log likelihood. <a href=#a9>See <i>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>

                            </li>

                            <li>Trace plot of estimated parameters:

                            <a name="Stats_out_MCMC_trace.png" href="images/Stats_out_MCMC_trace.png"><img src="images/Stats_out_MCMC_trace_small.png"></a><br>

                            <p style="font-size:10px">Sample E522. MCMC trace plots of the five model parameters and log likelihood. See <i><a href=#a9>Schuenemann et al. (2011)</i> for more detailed information regarding the original data.</a></p>

                            </li>









                            <h2><a name="a9">Citation</a></h2>



                            <p>If you use this program, please cite the following publication:<br>

                            Jónsson H, Ginolhac A, Schubert M, Johnson P, Orlando L.

                            mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters.

                            <em>Bioinformatics</em> 2013. <a href="http://bioinformatics.oxfordjournals.org/content/early/2013/04/23/bioinformatics.btt193.abstract">23rd April 2013. doi: 10.1093/bioinformatics/btt193</a></p>



                            <p>The original mapDamage1 is described in the following article:<br>

                            Ginolhac A, Rasmussen M, Gilbert MT, Willerslev E, Orlando L.

                            mapDamage: testing for damage patterns in ancient DNA sequences.

                            <a href="http://bioinformatics.oxfordjournals.org/content/27/15/2153"><em>Bioinformatics</em> 2011 <b>27</b> (15):2153-2155</a></p>



                            <p> Schuenemann VJ, Bos K, DeWitte S, Schmedes S, Jamieson J, Mittnik A, Forrest

                            S, Coombes BK, Wood JW, Earn DJ, White W, Krause J, Poinar HN. Targeted

                            enrichment of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis

                            from victims of the Black Death. <i>Proc Natl Acad Sci USA</i>. 2011 108:E746-52. doi: 10.1073/pnas.1105107108.</p>



                            <h2><a name="a10">FAQ</a></h2>



                            <p>Q: I got this error: could not find function "ggtitle"</p>



                            <p>A: update your ggplot2 package to at least version 0.9.2 by running:<br>

                            <code>update.packages("ggplot2")</code><br>and selecting a CRAN mirror</p>



                            <p>Q: Why do I do not get the Lengh_plot.pdf file? (or only a blank page)</p><br>

                            <p>A: This is likely you are using paired-end reads. The length distribution is performed only for single-end reads using their lengths.

                            In theory, paired-end reads could also be used but we think this does not represent the real insert size and could be misleading.

                            Best results are obtained by using merged paired-end reads (mapped as single-end), so when the full inserts are sequenced.</p>



                            <p>Q: How to clone the development version from github?</p>



                            <p>A: Use the following commands</p>

                            <p><code>git clone --recursive https://github.com/ginolhac/mapDamage.git</code><br>



                            <p>Q: Where is the original version of mapDamage?</p>



                            <p>A: The original web page with examples, datasets and result files is here:</p>



                            <p><a href="http://geogenetics.ku.dk/publications/mapdamage/">http://geogenetics.ku.dk/publications/mapdamage/</a></p>





                            <h2><a name="a11">Contact</a></h2>

                            <p align="right"><a href="#content">TOP</a></p>



                            <p>Please report bugs and suggest possible improvements to Aurélien Ginolhac, Mikkel Schubert or Hákon Jónsson by email:

                            ginolhac at gmail.com, MSchubert at snm.ku.dk or jonsson.hakon at gmail.com.</p>

                            </section>



                            <aside id="sidebar">



                            <p class="repo-owner"><a href="https://github.com/ginolhac/mapDamage"></a> is maintained by <a href="https://github.com/ginolhac">ginolhac</a>, <a href="https://github.com/MikkelSchubert">MikkelSchubert</a> and <a href="https://github.com/hakon-jon">hakon-jon</a>.</p>



                            <p>This page was generated by <a href="pages.github.com">GitHub Pages</a> using the Architect theme by <a href="http://twitter.com/jasonlong">Jason Long</a>.</p>

                            </aside>

                            </div>

                            </div>





                            </body>

                            </html>
kkdey/aRchaic.site documentation built on May 20, 2019, 10:31 a.m.