run_cstacks: Run STACKS cstacks module

Description Usage Arguments Details Value References See Also Examples

View source: R/run_cstacks.R

Description

Run STACKS cstacks module inside R! The function runs a summary of the log file automatically at the end (summary_cstacks).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
run_cstacks(
  P = "06_ustacks_2_gstacks",
  o = "06_ustacks_2_gstacks",
  M = "02_project_info/population.map.catalog.tsv",
  catalog.path = NULL,
  n = 1,
  parallel.core = parallel::detectCores() - 1,
  max.gaps = 2,
  min.aln.len = 0.8,
  disable.gapped = FALSE,
  k.len = NULL,
  report.mmatches = FALSE,
  split.catalog = 20
)

Arguments

P

path to the directory containing STACKS files. Default: P = "06_ustacks_2_gstacks". Inside the folder 06_ustacks_2_gstacks, you should have:

  • 4 files for each samples: The sample name is the prefix of the files ending with: .alleles.tsv.gz, .models.tsv.gz, .snps.tsv.gz, .tags.tsv.gz. Those files are created in the ustacks module.

o

Output path to write catalog. Default: o = "06_ustacks_2_gstacks"

M

path to a population map file (Required when P is used). Default: M = "02_project_info/population.map.catalog.tsv".

catalog.path

This is for the "Catalog editing" part in cstacks where you can provide the path to an existing catalog. cstacks will add data to this existing catalog. With default: catalog.path = NULL or with a supplied path, the function The function scan automatically for the presence of a catalog inside the input folder. If none is found, a new catalog is created. If your catalog is not in the input folder, supply a path here. e.g. catalog.path = ~/catalog_folder, the catalog files are inside the P folder along the samples files and detected automatically. If a catalog is detected in the input folder, the samples in the sample.list argument will be added in this catalog. The catalog is made of 3 files: catalog.alleles.tsv.gz, catalog.snps.tsv.gz, catalog.tags.tsv.gz

n

number of mismatches allowed between sample loci when build the catalog. Default: n = 1

parallel.core

Enable parallel execution with num_threads threads. Default: parallel.core = parallel::detectCores() - 1

max.gaps

The number of gaps allowed between stacks before merging. Default: max.gaps = 2

min.aln.len

The minimum length of aligned sequence in a gapped alignment. Default: min.aln.len = 0.8

disable.gapped

Disable gapped alignments between stacks. Default: disable.gapped = FALSE (use gapped alignments).

k.len

Specify k-mer size for matching between between catalog loci (automatically calculated by default). Advice: don't modify. Default: k.len = NULL

report.mmatches

Report query loci that match more than one catalog locus. Advice: don't modify. Default: report.mmatches = FALSE

split.catalog

(integer) In how many samples you want to split the catalog population map. This allows to have backup catalog every split.catalog samples. Their is obviously a trade-off between the integer use here, the time to initialize an existing catalog to often and re-starting from zero if everything crash. Default: split.catalog = 20.

Details

Computer or server problem during the cstacks ? Look in the log file to see which individuals remains to be included. Create a new list of individuals to include and use the catalog.path argument to point to the catalog created before the problem.

Value

sstacks returns a .matches.tsv.gz file for each sample

References

Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.

Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

See Also

sstacks

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
## Not run: 
# The simplest form of the function:
run_cstacks()
# that's it ! Now if you have your own workflow folders, etc. See below.
Next example, let say you only want to include 10 individuals/pop and
include in the catalog samples with more than 2000000 reads. With the project
info file in the global environment:
library(tidyverse)
individuals.catalog <- project.info.file) %>%
filter(RETAINED > 2000000) %>%
group_by(POP_ID) %>%
sample_n(size = 10, replace = FALSE) %>%
ungroup %>%
arrange(desc(RETAINED)) %>%
distinct(INDIVIDUALS_REP, POP_ID)
# Write file to disk
readr::write_tsv(x = individuals.catalog,
path = "02_project_info/population.map.catalog.tsv")
# The next line will give you the list of individuals to include
individuals.catalog <- individuals.catalog$INDIVIDUALS_REP

# To keep your info file updated with this information:
project.info.file <- project.info.file %>%
mutate(CATALOG = if_else(INDIVIDUALS_REP %in% individuals.catalog,
true = "catalog", false = "not_catalog")
)
write_tsv(project.info.file, "project.info.catalog.tsv")

# Then run the command this way:
run_cstacks (
P = "06_ustacks_2_gstacks",
catalog.path = NULL,
n = 1,
parallel.core = 32,
h = FALSE,
max.gaps = 2, min.aln.len = 0.8,
k.len = NULL, report.mmatches = FALSE
)

## End(Not run)

thierrygosselin/stackr documentation built on Nov. 11, 2020, 11 a.m.