run_cstacks | R Documentation |
Run STACKS
cstacks
module inside R! The function runs a summary of the log file automatically
at the end (summary_cstacks
). In the event of a power outage,
computer or cluster crash, just re-run the function. The function will start
over from the last catalog generated.
run_cstacks(
P = "06_ustacks_2_gstacks",
o = "06_ustacks_2_gstacks",
M = "02_project_info/population.map.catalog.tsv",
catalog.path = NULL,
n = 1,
parallel.core = parallel::detectCores() - 1,
max.gaps = 2,
min.aln.len = 0.8,
disable.gapped = FALSE,
k.len = NULL,
report.mmatches = FALSE,
split.catalog = 20
)
P |
path to the directory containing STACKS files.
Default:
|
o |
Output path to write catalog.
Default: |
M |
path to a population map file (Required when P is used).
Default: |
catalog.path |
This is for the "Catalog editing" part in cstacks where
you can provide the path to an existing catalog.
cstacks will add data to this existing catalog.
With default: |
n |
number of mismatches allowed between sample loci when build the catalog.
Default: |
parallel.core |
Enable parallel execution with num_threads threads.
Default: |
max.gaps |
The number of gaps allowed between stacks before merging.
Default: |
min.aln.len |
The minimum length of aligned sequence in a gapped
alignment.
Default: |
disable.gapped |
Disable gapped alignments between stacks.
Default: |
k.len |
Specify k-mer size for matching between between catalog loci
(automatically calculated by default).
Advice: don't modify.
Default: |
report.mmatches |
Report query loci that match more than one catalog locus.
Advice: don't modify.
Default: |
split.catalog |
(integer) In how many samples you want to split the
catalog population map. This allows to have a backup catalog every
|
Computer or server problem during the cstacks ? Look in the log file to see which individuals remains to be included. Create a new list of individuals to include and use the catalog.path argument to point to the catalog created before the problem.
sstacks
returns a .matches.tsv.gz file for each sample
Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.
Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.
## Not run:
# The simplest form of the function:
run_cstacks()
# that's it ! Now if you have your own workflow folders, etc. See below.
Next example, let say you only want to include 10 individuals/pop and
include in the catalog samples with more than 2000000 reads. With the project
info file in the global environment:
library(tidyverse)
individuals.catalog <- project.info.file) %>%
filter(RETAINED > 2000000) %>%
group_by(POP_ID) %>%
sample_n(size = 10, replace = FALSE) %>%
ungroup %>%
arrange(desc(RETAINED)) %>%
distinct(INDIVIDUALS_REP, POP_ID)
# Write file to disk
readr::write_tsv(x = individuals.catalog,
file = "02_project_info/population.map.catalog.tsv")
# The next line will give you the list of individuals to include
individuals.catalog <- individuals.catalog$INDIVIDUALS_REP
# To keep your info file updated with this information:
project.info.file <- project.info.file %>%
mutate(CATALOG = if_else(INDIVIDUALS_REP %in% individuals.catalog,
true = "catalog", false = "not_catalog")
)
write_tsv(project.info.file, "project.info.catalog.tsv")
# Then run the command this way:
run_cstacks (
P = "06_ustacks_2_gstacks",
catalog.path = NULL,
n = 1,
parallel.core = 32,
h = FALSE,
max.gaps = 2, min.aln.len = 0.8,
k.len = NULL, report.mmatches = FALSE
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.