run_cstacks: Run STACKS cstacks module
In thierrygosselin/stackr: Run stacks pipeline for RADseq analysis inside R

run_cstacks

R Documentation

Run STACKS cstacks module

Description

Run STACKS cstacks module inside R! The function runs a summary of the log file automatically at the end (summary_cstacks). In the event of a power outage, computer or cluster crash, just re-run the function. The function will start over from the last catalog generated.

Usage

run_cstacks(
  P = "06_ustacks_2_gstacks",
  o = "06_ustacks_2_gstacks",
  M = "02_project_info/population.map.catalog.tsv",
  catalog.path = NULL,
  n = 1,
  parallel.core = parallel::detectCores() - 1,
  max.gaps = 2,
  min.aln.len = 0.8,
  disable.gapped = FALSE,
  k.len = NULL,
  report.mmatches = FALSE,
  split.catalog = 20
)

Arguments

`P`	path to the directory containing STACKS files. Default: `P = "06_ustacks_2_gstacks"`. Inside the folder `06_ustacks_2_gstacks`, you should have: 4 files for each samples: The sample name is the prefix of the files ending with: `.alleles.tsv.gz, .models.tsv.gz, .snps.tsv.gz, .tags.tsv.gz`. Those files are created in the ustacks module.
`o`	Output path to write catalog. Default: `o = "06_ustacks_2_gstacks"`
`M`	path to a population map file (Required when P is used). Default: `M = "02_project_info/population.map.catalog.tsv"`.
`catalog.path`	This is for the "Catalog editing" part in cstacks where you can provide the path to an existing catalog. cstacks will add data to this existing catalog. With default: `catalog.path = NULL` or with a supplied path, the function The function scan automatically for the presence of a catalog inside the input folder. If none is found, a new catalog is created. If your catalog is not in the input folder, supply a path here. e.g. `catalog.path = ~/catalog_folder`, the catalog files are inside the P folder along the samples files and detected automatically. If a catalog is detected in the input folder, the samples in the `sample.list` argument will be added in this catalog. The catalog is made of 3 files: `catalog.alleles.tsv.gz, catalog.snps.tsv.gz, catalog.tags.tsv.gz`
`n`	number of mismatches allowed between sample loci when build the catalog. Default: `n = 1`
`parallel.core`	Enable parallel execution with num_threads threads. Default: `parallel.core = parallel::detectCores() - 1`
`max.gaps`	The number of gaps allowed between stacks before merging. Default: `max.gaps = 2`
`min.aln.len`	The minimum length of aligned sequence in a gapped alignment. Default: `min.aln.len = 0.8`
`disable.gapped`	Disable gapped alignments between stacks. Default: `disable.gapped = FALSE` (use gapped alignments).
`k.len`	Specify k-mer size for matching between between catalog loci (automatically calculated by default). Advice: don't modify. Default: `k.len = NULL`
`report.mmatches`	Report query loci that match more than one catalog locus. Advice: don't modify. Default: `report.mmatches = FALSE`
`split.catalog`	(integer) In how many samples you want to split the catalog population map. This allows to have a backup catalog every `split.catalog` samples. Their is obviously a trade-off between the integer use here, the time to initialize an existing catalog and re-starting from zero if everything crash. Default: `split.catalog = 20`. Very useful on a personal computer or university computer cluster....

Details

Computer or server problem during the cstacks ? Look in the log file to see which individuals remains to be included. Create a new list of individuals to include and use the catalog.path argument to point to the catalog created before the problem.

Value

sstacks returns a .matches.tsv.gz file for each sample

References

Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.

Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Examples

## Not run: 
# The simplest form of the function:
run_cstacks()
# that's it ! Now if you have your own workflow folders, etc. See below.
Next example, let say you only want to include 10 individuals/pop and
include in the catalog samples with more than 2000000 reads. With the project
info file in the global environment:
library(tidyverse)
individuals.catalog <- project.info.file) %>%
filter(RETAINED > 2000000) %>%
group_by(POP_ID) %>%
sample_n(size = 10, replace = FALSE) %>%
ungroup %>%
arrange(desc(RETAINED)) %>%
distinct(INDIVIDUALS_REP, POP_ID)
# Write file to disk
readr::write_tsv(x = individuals.catalog,
file = "02_project_info/population.map.catalog.tsv")
# The next line will give you the list of individuals to include
individuals.catalog <- individuals.catalog$INDIVIDUALS_REP

# To keep your info file updated with this information:
project.info.file <- project.info.file %>%
mutate(CATALOG = if_else(INDIVIDUALS_REP %in% individuals.catalog,
true = "catalog", false = "not_catalog")
)
write_tsv(project.info.file, "project.info.catalog.tsv")

# Then run the command this way:
run_cstacks (
P = "06_ustacks_2_gstacks",
catalog.path = NULL,
n = 1,
parallel.core = 32,
h = FALSE,
max.gaps = 2, min.aln.len = 0.8,
k.len = NULL, report.mmatches = FALSE
)

## End(Not run)

thierrygosselin/stackr documentation built on April 13, 2025, 10:28 a.m.