run_gstacks | R Documentation |
Run STACKS gstacks module inside R!
run_gstacks(
denovo = TRUE,
P = "06_ustacks_2_gstacks",
M = "02_project_info/population.map.tsv2bam.tsv",
I = NULL,
B = NULL,
O = NULL,
parallel.core = parallel::detectCores() - 1,
model = "marukilow",
var.alpha = 0.01,
gt.alpha = 0.05,
rm.pcr.duplicates = FALSE,
rm.unpaired.reads = FALSE,
ignore.pe.reads = TRUE,
unpaired = TRUE,
kmer.length = 31,
max.debruijn.reads = 1000,
min.kmer.cov = 2,
write.alignments = FALSE,
min.mapq = 10,
max.clipped = 0.2,
max.insert.len = 1000,
details = FALSE,
phasing.cooccurrences.thr.range = c(1, 2),
phasing.dont.prune.hets = FALSE,
h = FALSE
)
denovo |
(logical) de novo vs reference-based mode.
Default: |
P |
(path, character) De novo mode.
Path to the directory containing STACKS files.
Default:
. |
M |
(path, character) Path to a population map giving the list of samples. |
I |
(character, path) Reference-based mode.
Input directory containing BAM files.
Default: |
B |
(character, path) Reference-based mode. Path to input BAM files.
The input BAM file(s) must be sorted by coordinate.
With |
O |
(character, path) Path to output directory. |
parallel.core |
(integer) Enable parallel execution with the number of threads.
Default: |
model |
(character) The model to use to call variants and genotypes;
one of |
var.alpha |
(double) Alpha threshold for discovering SNPs.
Default: |
gt.alpha |
(double) Alpha threshold for calling genotypes.
Default: |
rm.pcr.duplicates |
(logical) remove all but one set of read pairs of
the same sample that have the same insert length
(implies |
rm.unpaired.reads |
(logical) Discard unpaired reads.
Default: |
ignore.pe.reads |
(logical) With default the function will
ignore paired-end reads even if present in the input.
Default: |
unpaired |
(logical) Ignore read pairing (only for paired-end GBS;
treat READ2's as if they were READ1's).
Default: |
kmer.length |
(integer) De novo mode.
kmer length for the de Bruijn graph. For expert.
Default: |
max.debruijn.reads |
(integer) Maximum number of reads to use in the
de Bruijn graph. For expert.
Default: |
min.kmer.cov |
(integer) De novo mode.
Minimum coverage to consider a kmer. For expert.
Default: |
write.alignments |
(logical) Save read alignments (heavy BAM files).
Default: |
min.mapq |
(double) Reference-based mode.
Minimum PHRED-scaled mapping quality to consider a read.
Default: |
max.clipped |
(double) Reference-based mode.
Maximum soft-clipping level, in fraction of read length.
Default: |
max.insert.len |
(integer) Reference-based mode.
Maximum allowed sequencing insert length
Default: |
details |
(logical) With default the function will write a more detailed
output.
Default: |
phasing.cooccurrences.thr.range |
(integer) range of edge coverage thresholds to
iterate over when building the graph of allele cooccurrences for
SNP phasing.
Default: |
phasing.dont.prune.hets |
(logical) Don't try to ignore dubious heterozygote
genotypes during phasing. By default, during phasing, dubious het are ignored.
Default: |
h |
Display this help messsage.
Default: |
tsv2bam
returns a set of .matches.bam
files.
The function run_gstacks
returns a list with the number of individuals, the batch ID number,
a summary data frame and a plot containing:
INDIVIDUALS: the sample id
ALL_LOCUS: the total number of locus for the individual (shown in subplot A)
LOCUS: the number of locus with a one-to-one relationship (shown in subplot B) with the catalog
MATCH_PERCENT: the percentage of locus with a one-to-one relationship with the catalog (shown in subplot C)
Addtionally, the function returns a batch_X.catalog.bam file that was generated by merging all the individual BAM files in parallel.
Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.
Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.
Maruki T, Lynch M (2017) Genotype Calling from Population-Genomic Sequencing Data. G3, 7, 1393-1404.
## Not run:
# The simplest form of the function with De novo data:
bam.sum <- stackr::run_gstacks() # that's it !
# This will use, by default, the same population map used in run_tsv2bam
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.