Recursivly sample observations with a heirarchical classification

Description

Recursivly sample a set of observations with a heirarchical classification. This function takes other functions as arguments and is intended to be used to make other more user-friendly functions.

Usage

1
2
3
4
recursive_sample(root_id, get_obs, get_subtaxa, get_rank = NULL,
  cat_obs = unlist, max_counts = c(), min_counts = c(),
  max_children = c(), min_children = c(), obs_filters = list(),
  subtaxa_filters = list(), stop_conditions = list(), ...)

Arguments

root_id

(character of length 1) The taxon to sample. By default, the root of the taxonomy used.

get_obs

(function(character)) A function that returns the observations assigned to the a given taxon. The function's first argument should be the taxon id and it should return a data structure possibly representing multiple observations.

get_subtaxa

(function(character)) A function that returns the sub taxa for a given taxon. The function's first argument should be the taxon id and it should return a vector of taxon IDs.

get_rank

(function(character)) A function that returns the rank of a given taxon id. The function's first argument should be the taxon id and it should return the rank of that taxon.

cat_obs

(function(list)) A function that takes a list of whatever is returned by get_obs and concatenates them into a single data structure of the type returned by get_obs.

max_counts

(numeric) A named vector that defines that maximum number of observations in for each level specified. The names of the vector specifies that level each number applies to. If more than the maximum number of observations exist for a given taxon, it is randomly subsampled to this number.

min_counts

(numeric) A named vector that defines that minimum number of observations in for each level specified. The names of the vector specifies that level each number applies to.

max_children

(numeric) A named vector that defines that maximum number of subtaxa per taxon for each level specified. The names of the vector specifies that level each number applies to. If more than the maximum number of subtaxa exist for a given taxon, they are randomly subsampled to this number of subtaxa.

min_children

(numeric) A named vector that defines that minimum number of subtaxa in for each level specified. The names of the vector specifies that level each number applies to.

obs_filters

(list of function(observations, id)) A list of functions that take a data structure containing the information of multiple observations and a taxon id. Returns a object of the same type with some of the observations potentially removed.

subtaxa_filters

(list of function(observations, id)) A list of functions that take a data structure containing the information of multiple subtaxa IDs and the current taxon id. Returns a object of the same type with some of the subtaxa potentially removed. If a function returns NULL, then no observations for the current taxon are returned.

stop_conditions

(list of function(id)) A list of functions that take the current taxon id. If any of the functions return TRUE, the observations for the current taxon are returned rather than looking for observations of subtaxa, stopping the recursion.

...

Additional parameters are passed to all of the function options.

See Also

taxonomic_sample

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.