stackr: Run stacks pipeline for RADseq analysis inside R

run_populations

R Documentation

Run STACKS Version >= 2.0 populations module

Description

Usage

run_populations(
  P = "06_ustacks_2_gstacks",
  V = NULL,
  O = "07_populations",
  M,
  parallel.core = parallel::detectCores() - 1,
  batch.size = 10000,
  p = 1,
  r = 0.3,
  R = 0.3,
  H = TRUE,
  min.maf = NULL,
  min.mac = NULL,
  max.obs.het = NULL,
  write.single.snp = FALSE,
  write.random.snp = FALSE,
  B = NULL,
  W = NULL,
  e = NULL,
  merge.sites = FALSE,
  merge.prune.lim = NULL,
  hwe = FALSE,
  fstats = FALSE,
  fst.correction = NULL,
  p.value.cutoff = 0.05,
  k = FALSE,
  smooth.fstats = FALSE,
  smooth.popstats = FALSE,
  sigma = 150000,
  bootstrap = FALSE,
  N = 100,
  bootstrap.pifis = FALSE,
  bootstrap.fst = FALSE,
  bootstrap.div = FALSE,
  bootstrap.phist = FALSE,
  bootstrap.wl = NULL,
  ordered.export = FALSE,
  fasta.samples = TRUE,
  fasta.samples_raw = FALSE,
  fasta.loci = TRUE,
  vcf = TRUE,
  genepop = FALSE,
  structure = FALSE,
  radpainter = FALSE,
  phase = FALSE,
  fastphase = FALSE,
  plink = FALSE,
  hzar = FALSE,
  phylip = FALSE,
  phylip.var = FALSE,
  phylip.var.all = FALSE,
  treemix = FALSE,
  no.hap.exports = FALSE,
  h = FALSE,
  verbose = FALSE,
  v = FALSE,
  log.fst.comp = FALSE
)

Arguments

`P`	(path, character) Path to the directory containing all the STACKS files. Default: `P = "06_ustacks_2_gstacks"`.
`V`	(character) Path to an input VCF file. When this module is used to filter an existing vcf file. Default: `V = NULL`.
`O`	(character) Path to a directory where to write the output files. With default: `O = "07_populations"`, the function creates a folder inside `07_populations`, with date and time appended to `populations_`.
`M`	path to a population map file. The format is a tab-separated file, with first column containing sample name and second column population id. No heather (column name). e.g. `M = "02_project_info/population.map.catalog.tsv"`
`parallel.core`	(integer) enable parallel execution with num_threads threads. Default: `parallel.core = parallel::detectCores() - 1`
`batch.size`	(integer) The number of loci (de novo mode) or chromosome (reference mode), to process in a batch. Increase to speed analysis, uses more memory, decrease to save memory). Default in de novo mode (loci/batch): `batch.size = 10000`. Default in reference mode (chromosome/batch): `batch.size = 1`.
`p`	(integer) Minimum number of populations a locus must be present in to process a locus. Default: `p = 1`.
`r`	(double) Minimum percentage of individuals in a population required to process a locus for that population. Default: `r = 0.3`.
`R`	(double) Minimum percentage of individuals across populations required to process a locus. Default: `r = 0.3`.
`H`	(logical) Apply the above filters haplotype wise (unshared SNPs will be pruned to reduce haplotype-wise missing data). Default: `H = TRUE`.
`min.maf`	(double) Specify a minimum minor allele frequency required to process a nucleotide site at a locus (0 < min.maf < 0.5). Default: `min.maf = NULL`. Recommendation: use my other R package RADIATOR to filter data based on MAF.
`min.mac`	(integer) Specify a minimum minor allele count required to process a nucleotide site at a locus. Default: `min.mac = NULL`.
`max.obs.het`	(double) Specify a maximum observed heterozygosity required to process a nucleotide site at a locus. Default: `max.obs.het = NULL`. Recommendation: use my other R package RADIATOR to filter data based on heterozygosity.
`write.single.snp`	(logical) Restrict data analysis to only the first SNP per locus (implies –no-haps). Default: `write.single.snp = FALSE`.
`write.random.snp`	(logical) Restrict data analysis to one random SNP per locus (implies –no-haps). Default: `write.random.snp = FALSE`.
`B`	(character) Path to a file containing Blacklisted markers to be excluded from the export. Default: `B = NULL`.
`W`	(character) Path to a file containing Whitelisted markers to include in the export. Default: `W = NULL`.
`e`	(character) Restriction enzyme name. Default: `e = NULL`.
`merge.sites`	(logical) Merge loci that were produced from the same restriction enzyme cutsite (requires reference-aligned data). Default: `merge.sites = FALSE`.
`merge.prune.lim`	(integer) When merging adjacent loci, if at least X Default: `merge.prune.lim = NULL`.
`hwe`	(logical) Calculate divergence from Hardy-Weinberg equilibrium using the exact test at the SNP level and Guo and Thompson MCMC algorithm at the haplotype level. Default: `hwe = FALSE`.
`fstats`	(logical) Enable SNP and haplotype-based F statistics. Default: `fstats = FALSE`.
`fst.correction`	(character) Specify a correction to be applied to Fst values: 'p_value', 'bonferroni_win', or 'bonferroni_gen'. Default: `fst.correction = NULL`.
`p.value.cutoff`	(double) maximum p-value to keep an Fst measurement. Also used as base for Bonferroni correction. Default: `p.value.cutoff = 0.05`.
`k`	(logical) Enable kernel-smoothed Pi, Fis, Fst, Fst', and Phi_st calculations. Default: `k = FALSE`.
`smooth.fstats`	(logical) Enable kernel-smoothed Fst, Fst', and Phi_st calculations. Default: `smooth.fstats = FALSE`.
`smooth.popstats`	(logical) Enable kernel-smoothed Pi and Fis calculations. Default: `smooth.popstats = FALSE`.
`sigma`	(integer) Standard deviation of the kernel smoothing weight distribution. Default: `sigma = 150000` (150kb).
`bootstrap`	(logical) Turn on boostrap resampling for all smoothed statistics. Default: `bootstrap = FALSE`.
`N`	(integer) Number of bootstrap resamplings to calculate. Default: `N = 100`.
`bootstrap.pifis`	(logical) Turn on boostrap resampling for smoothed SNP-based Pi and Fis calculations. Default: `bootstrap.pifis = FALSE`.
`bootstrap.fst`	(logical) Turn on boostrap resampling for smoothed Fst calculations based on pairwise population comparison of SNPs. Default: `bootstrap.fst = FALSE`.
`bootstrap.div`	(logical) Turn on boostrap resampling for smoothed haplotype diveristy and gene diversity calculations based on haplotypes. Default: `bootstrap.div = FALSE`.
`bootstrap.phist`	(logical) Turn on boostrap resampling for smoothed Phi_st calculations based on haplotypes. Default: `bootstrap.phist = FALSE`.
`bootstrap.wl`	(character) Path to a whitelist file. Only use bootstrap loci contained in this whitelist. Default: `bootstrap.wl = NULL`.
`ordered.export`	(logical) If data is reference aligned, exports will be ordered; only a single representative of each overlapping site. Default: `ordered.export = FALSE`.
`fasta.samples`	(logical) Output the sequences of the two haplotypes of each (diploid) sample, for each locus, in FASTA format. stackr default is different than stacks default because I think having this info helps to identify problems. Default: `fasta.samples = TRUE`.
`fasta.samples_raw`	(logical) Output all haplotypes observed in each sample, for each locus, in FASTA format. Default: `fasta.samples_raw = FALSE`.
`fasta.loci`	(logical) Output consensus sequences of all loci, in FASTA format. stackr default is different than stacks default because I think having this info helps to identify problems. Default: `fasta.loci = TRUE`.
`vcf`	(logical) Output SNPs in Variant Call Format (VCF). Default: `vcf = TRUE`.
`genepop`	(logical) Output results in GenePop format. Default: `genepop = FALSE`.
`structure`	(logical) Output results in Structure format. Default: `structure = FALSE`.
`radpainter`	(logical) Output results results in fineRADstructure/RADpainter format. Default: `radpainter = FALSE`.
`phase`	(logical) Output genotypes in PHASE format. Default: `phase = FALSE`.
`fastphase`	(logical) Output genotypes in fastPHASE format. Default: `fastphase = FALSE`.
`plink`	(logical) Output genotypes in PLINK format. Default: `plink = FALSE`.
`hzar`	(logical) Output genotypes in Hybrid Zone Analysis using R (HZAR) format. Default: `hzar = FALSE`.
`phylip`	(logical) Output nucleotides that are fixed-within, and variant among populations in Phylip format for phylogenetic tree construction. Default: `phylip = FALSE`.
`phylip.var`	(logical) Include variable sites in the phylip output encoded using IUPAC notation. Default: `phylip.var = FALSE`.
`phylip.var.all`	(logical) Include all sequence as well as variable sites in the phylip output encoded using IUPAC notation. Default: `phylip.var.all = FALSE`.
`treemix`	(logical) Output SNPs in a format useable for the TreeMix program (Pickrell and Pritchard). Default: `treemix = FALSE`.
`no.hap.exports`	(logical) Omit haplotype outputs. Default: `no.hap.exports = FALSE`.
`h`	Display this help messsage. Default: `h = FALSE`
`verbose`	turn on additional logging. Default: `verbose = FALSE`
`v`	print program version. Default: `v = FALSE`
`log.fst.comp`	(logical) Log components of Fst/Phi_st calculations to a file. Default: `log.fst.comp = FALSE`

Value

populations returns different output depending on arguments selected.

References

Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.

Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Guo SW, Thompson EA (1992) Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics, 48, 361-372.

Examples

## Not run: 
pop <- stackr::run_populations(M = "population.map.tsv")

## End(Not run)

thierrygosselin/stackr documentation built on April 13, 2025, 10:28 a.m.

thierrygosselin/stackr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

thierrygosselin/stackr
Run stacks pipeline for RADseq analysis inside R

run_populations: Run STACKS Version >= 2.0 populations module
In thierrygosselin/stackr: Run stacks pipeline for RADseq analysis inside R

Run STACKS Version >= 2.0 populations module

Description

Usage

Arguments

Value

References

See Also

Examples

Related to run_populations in thierrygosselin/stackr...

R Package Documentation

Browse R Packages

We want your feedback!

thierrygosselin/stackr Run stacks pipeline for RADseq analysis inside R

run_populations: Run STACKS Version >= 2.0 populations module In thierrygosselin/stackr: Run stacks pipeline for RADseq analysis inside R

Run STACKS Version >= 2.0 populations module

Description

Usage

Arguments

Value

References

See Also

Examples

Related to run_populations in thierrygosselin/stackr...

R Package Documentation

Browse R Packages

We want your feedback!

thierrygosselin/stackr
Run stacks pipeline for RADseq analysis inside R

run_populations: Run STACKS Version >= 2.0 populations module
In thierrygosselin/stackr: Run stacks pipeline for RADseq analysis inside R