run_tsv2bam | R Documentation |
Runs STACKS tsv2bam module and additionnally, this function will also generate a summary of stacks tsv2bam and will merge in parallel BAM sample files into a unique BAM catalog file using SAMtools or Sambamba. tsv2bam converts the data (single-end or paired-end) from being organized by sample into being organized by locus. This allows downstream improvements (e.g. Bayesian SNP calling).
run_tsv2bam(
P = "06_ustacks_2_gstacks",
M = "02_project_info/population.map.tsv2bam.tsv",
R = NULL,
parallel.core = parallel::detectCores() - 1,
cmd.path = "/usr/local/bin/samtools",
h = FALSE
)
P |
(path, character) Path to the directory containing STACKS files.
Default: |
M |
(character, path) Path to a population map file.
Note that the |
R |
(path, character) Directory where to find the paired-end reads files (in fastq/fasta/bam (gz) format). |
parallel.core |
(integer) Enable parallel execution with the number of threads.
Default: |
cmd.path |
(character, path) Provide the FULL path to SAMtools
program. See details on how to install SAMtools.
Default: |
h |
Display this help messsage.
Default: |
Install SAMtools link to detailed instructions on how to install SAMtools
tsv2bam
returns a set of .matches.bam
files.
The function run_tsv2bam
returns a list with the number of individuals, the batch ID number,
a summary data frame and a plot containing:
INDIVIDUALS: the sample id
ALL_LOCUS: the total number of locus for the individual (shown in subplot A)
LOCUS: the number of locus with a one-to-one relationship (shown in subplot B) with the catalog
MATCH_PERCENT: the percentage of locus with a one-to-one relationship with the catalog (shown in subplot C)
Addtionally, the function returns a catalog.bam file, generated by merging all the individual BAM files in parallel.
Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.
Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.
Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.
Li H A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93.
A. Tarasov, A. J. Vilella, E. Cuppen, I. J. Nijman, and P. Prins. Sambamba: fast processing of NGS alignment formats. Bioinformatics, 2015.
## Not run:
# The simplest form of the function:
bam.sum <- stackr::run_tsv2bam() # that's it !
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.