run_TS: Run Taxon Sampling

Description Usage Arguments Value

View source: R/run_TS.R

Description

Run the TaxonSampling method to return a sample of taxonomic IDs according to the desired balance / diversity.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
run_TS(
  taxlist,
  taxon,
  m,
  seq_file = NULL,
  out_file = NULL,
  method = "diversity",
  randomize = "no",
  replacement = FALSE,
  ignoreIDs = NULL,
  requireIDs = NULL,
  sampling = "agnostic",
  verbose = TRUE
)

Arguments

taxlist

list object of class _taxonsampling_, returned by [get_species_count()].

taxon

Taxon ID from which to start sampling children taxa (single character or integer value)

m

desired sample size

seq_file

character string with the path to the multifasta file containing the input sequences.

out_file

character string naming a file to save the output (a multifasta file). Ignored if 'seq_file == NULL'.

method

sampling method to use. Accepts "balanced" (favors balanced taxa representation) or "diversity" (favors maximized taxa representation)

randomize

randomization strategy: should the algorithm choose IDs randomly ("yes"), maintaining a balanced allocation ("no"), or with a balanced allocation at the top taxonomic level and randomized afterwards ("after_first_round")?

replacement

logical flag: should the algorithm allow repeated IDs in the output (if needed to reach m IDs in the output with maximized taxonomy diversity).

ignoreIDs

vector (character or integer) of IDs that must not appear in the output.

requireIDs

vector (character or integer) of IDs that must appear in the output. Notice that 'ignoreIDs' has precedence over 'requireIDs', i.e., IDs that occur in both will be ignored. 'requireIDs' that are children of any 'ignoreIDs' will also be ignored.

sampling

sampling mode. Accepts "agnostic" (sample species in a diversity-agnostic manner) or "known_species" (sample based on known species diversity).

verbose

logical: regulates function echoing to console.

Value

Input object'taxlist' updated with vector '$outputIDs' of sampled IDs and list '$outputSeqs' (if seq_file is not 'NULL') containing information about the sequences sampled.


fcampelo/TaxonSampling documentation built on Jan. 29, 2022, 7:11 a.m.