run_tsv2bam: Run STACKS tsv2bam and merges BAM files

Description Usage Arguments Details Value References See Also Examples

View source: R/run_tsv2bam.R

Description

Runs STACKS tsv2bam module and additionnally, this function will also generate a summary of stacks tsv2bam and will merge in parallel BAM sample files into a unique BAM catalog file using SAMtools or Sambamba. tsv2bam converts the data (single-end or paired-end) from being organized by sample into being organized by locus. This allows downstream improvements (e.g. Bayesian SNP calling).

Usage

1
2
3
4
5
6
7
8
run_tsv2bam(
  P = "06_ustacks_2_gstacks",
  M = "02_project_info/population.map.tsv2bam.tsv",
  R = NULL,
  parallel.core = parallel::detectCores() - 1,
  cmd.path = "/usr/local/bin/samtools",
  h = FALSE
)

Arguments

P

(path, character) Path to the directory containing STACKS files. Default: P = "06_ustacks_2_gstacks". Inside the folder, you should have:

  • the catalog files: starting with batch_ and ending with .alleles.tsv.gz, .snps.tsv.gz, .tags.tsv.gz;

  • 3 files for each samples: The sample name is the prefix for the files ending with: .alleles.tsv.gz, .snps.tsv.gz, .tags.tsv.gz. Those files are created in the ustacks, sstacks and cxstacks modules.

M

(character, path) Path to a population map file. Note that the -s option is not used inside stackr. Default: M = "02_project_info/population.map.tsv2bam.tsv".

R

(path, character) Directory where to find the paired-end reads files (in fastq/fasta/bam (gz) format).

parallel.core

(integer) Enable parallel execution with the number of threads. Default: parallel.core = parallel::detectCores() - 1

cmd.path

(character, path) Provide the FULL path to SAMtools program. See details on how to install SAMtools. Default: cmd.path = "/usr/local/bin/samtools".

h

Display this help messsage. Default: h = FALSE

Details

Install SAMtools link to detailed instructions on how to install SAMtools

Value

tsv2bam returns a set of .matches.bam files.

The function run_tsv2bam returns a list with the number of individuals, the batch ID number, a summary data frame and a plot containing:

  1. INDIVIDUALS: the sample id

  2. ALL_LOCUS: the total number of locus for the individual (shown in subplot A)

  3. LOCUS: the number of locus with a one-to-one relationship (shown in subplot B) with the catalog

  4. MATCH_PERCENT: the percentage of locus with a one-to-one relationship with the catalog (shown in subplot C)

    Addtionally, the function returns a catalog.bam file, generated by merging all the individual BAM files in parallel.

References

Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.

Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.

Li H A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93.

A. Tarasov, A. J. Vilella, E. Cuppen, I. J. Nijman, and P. Prins. Sambamba: fast processing of NGS alignment formats. Bioinformatics, 2015.

See Also

STACKS

stacks Version 2.0Beta6

SAMtools

Sambamba

Examples

1
2
3
4
5
## Not run: 
# The simplest form of the function:
bam.sum <- stackr::run_tsv2bam() # that's it !

## End(Not run)

thierrygosselin/stackr documentation built on Nov. 11, 2020, 11 a.m.