map_to_state_space: Map states of a discrete trait to integers.

View source: R/map_to_state_space.R

map_to_state_spaceR Documentation

Map states of a discrete trait to integers.

Description

Given a list of states (e.g., for each tip in a tree), map the unique states to integers 1,..,Nstates, where Nstates is the number of possible states. This function can be used to translate states that are originally represented by characters or factors, into integer states as required by ancestral state reconstruction and hidden state prediction functions in this package.

Usage

map_to_state_space(raw_states, fill_gaps=FALSE, 
                   sort_order="natural")

Arguments

raw_states

A vector of values (states), each of which can be converted to a character. This vector can include the same value multiple times, for example if values represent the trait's states for tips in a tree. The vector may also include NA, for example if they represent unknown states for some tree tips. NAs are omitted from the state space.

fill_gaps

Logical. If TRUE, then states are converted to integers using as.integer(as.character()), and then all missing intermediate integer values are included as additional possible states. For example, if raw_states contained the values 2,4,6, then 3 and 5 are assumed to also be possible states.

sort_order

Character, specifying the order in which raw_states should be mapped to ascending integers. Either "natural" or "alphabetical". If "natural", numerical parts of characters are sorted numerically, e.g. as in "3"<"a2"<"a12"<"b1".

Details

Several ancestral state reconstruction and hidden state prediction algorithms in the castor package (e.g., asr_max_parsimony) require that the focal trait's states are represented by integer indices within 1,..,Nstates. These indices are then associated, afor example, with column and row indices in the transition cost matrix (in the case of maximum parsimony reconstruction) or with column indices in the returned matrix containing marginal ancestral state probabilities (e.g., in asr_mk_model). The function map_to_state_space can be used to conveniently convert a set of discrete states into integers, for use with the aforementioned algorithms.

Value

A list with the following elements:

Nstates

Integer. Number of possible states for the trait, based on the unique values encountered in raw_states (after conversion to characters). This may be larger than the number of unique values in raw_states, if fill_gaps was set to TRUE.

state_names

Character vector of size Nstates, storing the original name (character version) of each unique state. For example, if raw_states was c("b1","3","a12","a2","b1","a2", NA) and sort_order=="natural", then Nstates will be 4 and state_names will be c("3","a2","a12","b1").

state_values

A numeric vector of size Nstates, providing the numerical value for each unique state. For example, the states "3","a2","4.5" will be mapped to the numeric values 3, NA, 4.5. Note that this may not always be meaningful, depending on the biological interpretation of the states.

mapped_states

Integer vector of size equal to length(raw_states), listing the integer representation of each value in raw_states. May also include NA, at those locations where raw_states was NA.

name2index

An integer vector of size Nstates, with names(name2index) set to state_names. This vector can be used to map any new list of states (in character format) to their integer representation. In particular, name2index[as.character(raw_states)] is equal to mapped_states.

Author(s)

Stilianos Louca

Examples

# generate a sequence of random states
unique_states = c("b","c","a")
raw_states = unique_states[sample.int(3,size=10,replace=TRUE)]

# map to integer state space
mapping = map_to_state_space(raw_states)

cat(sprintf("Checking that original unique states is the same as the one inferred:\n"))
print(unique_states)
print(mapping$state_names)

cat(sprintf("Checking reversibility of mapping:\n"))
print(raw_states)
print(mapping$state_names[mapping$mapped_states])

castor documentation built on June 29, 2024, 9:08 a.m.