calculate_g: Calculate the maximum tree length, g, under parsimony

View source: R/calculate_g.r

calculate_gR Documentation

Calculate the maximum tree length, g, under parsimony

Description

Given a costmatrix and set of tip states returns the longest possible tree length under maximum parsimony.

Usage

calculate_g(
  costmatrix,
  tip_states,
  polymorphism_behaviour = "polymorphism",
  uncertainty_behaviour = "uncertainty"
)

Arguments

costmatrix

An object of class costMatrix.

tip_states

A character vector of tip states, with polymorphic states separated by &, uncertainties by /, missing values as NA, and inapplicables as empty strings "".

polymorphism_behaviour

One of either "missing", "uncertainty", "polymorphism", or "random". See details.

uncertainty_behaviour

One of either "missing", "uncertainty", "polymorphism", or "random". See details.

Details

The maximum cost a character could have on any tree under maximum parsimony, termed g, depends on both the individual state-to-state transition costs (captured by a costmatrix) and the sampled states (i.e., the tip_states input). In practice this is the maximum parsimony length on the star tree. This length cannot be exceeded by any other tree (Hoyal Cuthill and Lloyd, in review). Note: this is standard practice in phylogenetics software and is also how both PAUP* (Swofford 2003) and TNT (Goloboff et al. 2008; Goloboff and Catalano 2016) calculate maximum cost.

Special cases

A number of special cases apply to calculating g and are discussed further below.

Polymorphisms

Polymorphisms remain a complex problem in phylogenetics and here multiple options are provided to deal with them. These include: 1. "missing" - where they are simply replaced by a missing value (see below), 2. "uncertainty" - where they are treated as uncertainties instead (see below), 3. "polymorphism" - where they are treated as genuinely polymorphic, and 4. "random" - where one of the tip states is selected at random.

Options 1, 2, and 4 can be seen as undercounting the true amount of evolution that has occurred. However, how to correctly count this amount is unclear. If option 3 is chosen then polymorphic states must be present in costmatrix and users should refer to the add_polymorphisms_to_costmatrix function for details on available options.

Uncertainties

Uncertainties are much simpler to deal with than polymorphisms and a means to incorporate them into length counts was laid out in Swofford and Maddison (1992). Indeed, popular software such as PAUP* (Swofford 2003) and TNT (Goloboff et al. 2008; Goloboff and Catalano 2016) simply treat polymorphisms as uncertainties perhaps because of this. There is still a concern of undercounting evolutionary change for uncertainties in the maximum parsimony context as the cheapest possible state will in effect be used everytime, whereas future study that removes uncertainty may reveal s higher cost state to be the true value. As such the same options are offered for uncertainties as polymorphisms, including to treat them as polymorphisms although this should probably only be done where they were miscoded in the first place.

Again, if using uncertainties as uncertainties, these must be included in the costmatrix and this can be done by using the add_uncertainties_to_costmatrix function.

Missing values

In practice missing values (NA) may exist amongst tip_states. These are permitted, and in practice are mathematically and practically equivalent to a statement that a tip could be any state present in the costmatrix (i.e., a special case of an uncertainty where no state can be ruled out). However, it should be considered in interpretation that g will typically become smaller as the number of missing values increases.

Inapplicable values

Inapplicable values ("") may also exist amongst tip_states. These are conceptually different to missing values as there is no possibility that they can ever be (re)coded. Currently these are treated exactly the same as missing values, but again the user should apply caution in interpreting g in such cases as again it will be smaller than otherwise identical characters with fewer inapplicable values.

Value

A single value indicating the maximum length, g. Note: this is not modified by costmatrix$weight.

Author(s)

Graeme T. Lloyd graemetlloyd@gmail.com and Jen Hoyal Cuthill j.hoyal-cuthill@essex.ac.uk

References

Goloboff, P. A. and Catalano, S. A., 2016. TNT version 1.5, including a full implementation of phylogenetic morphometrics/ Cladistics, 32. 221-238

Goloboff, P., Farris, J. and Nixon, K., 2008. TNT, a free program for phylogenetic analysis. Cladistics, 24, 774-786.

Hoyal Cuthill, J. F. and Lloyd, G. T., in review.

Swofford, D. L., 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.

Swofford, D. L. and Maddison, W. P., 1992. Parsimony, character-state reconstructions, and evolutionary inferences. In R. L. Mayden (ed.), Systematics, Historical Ecology, and North American Freshwater Fishes. Stanford University Press, Stanford. pp187-223.

See Also

calculate_gmax

Examples


# Create a Type I character costmatrix:
constant_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state = 0,
  character_type = "unordered"
)

# Calculate g for the case of five state 0s:
calculate_g(
  costmatrix = constant_costmatrix,
  tip_states = c("0", "0", "0", "0", "0")
)

# Create a Type II character costmatrix:
binary_symmetric_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state = 1,
  character_type = "unordered"
)

# Calculate g for the case of two state 0s and three state 1s:
calculate_g(
  costmatrix = binary_symmetric_costmatrix,
  tip_states = c("0", "0", "1", "1", "1")
)

# Create a Type III character costmatrix:
unordered_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state= 2,
  character_type = "unordered"
)

# Calculate g for the case of two state 0s and three state 1s and two state 2s:
calculate_g(
  costmatrix = unordered_costmatrix,
  tip_states = c("0", "0", "1", "1", "1", "2", "2")
)

# Create a Type IV character costmatrix:
linear_ordered_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state= 2,
  character_type = "ordered"
)

# Calculate g for the case of two state 0s and three state 1s and two state 2s:
calculate_g(
  costmatrix = linear_ordered_costmatrix,
  tip_states = c("0", "0", "1", "1", "1", "2", "2")
)

# Create a Type V character costmatrix:
nonlinear_ordered_costmatrix <- convert_adjacency_matrix_to_costmatrix(
  adjacency_matrix = matrix(
    data = c(
      0, 1, 0, 0,
      1, 0, 1, 1,
      0, 1, 0, 0,
      0, 1, 0, 0
    ),
    nrow = 4,
    dimnames = list(0:3, 0:3)
  )
)

# Calculate g for the case of two state 0s, three state 1s, two state 2s and one state 3:
calculate_g(
  costmatrix = nonlinear_ordered_costmatrix,
  tip_states = c("0", "0", "1", "1", "1", "2", "2", "3")
)

# Create a Type VI character costmatrix:
binary_irreversible_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state= 1,
  character_type = "irreversible"
)

# Calculate g for the case of two state 0s and three state 1s:
calculate_g(
  costmatrix = binary_irreversible_costmatrix,
  tip_states = c("0", "0", "1", "1", "1")
)

# Create a Type VII character costmatrix:
multistate_irreversible_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state= 2,
  character_type = "irreversible"
)

# Calculate g for the case of two state 0s and three state 1s and two state 2s:
calculate_g(
  costmatrix = multistate_irreversible_costmatrix,
  tip_states = c("0", "0", "1", "1", "1", "2", "2")
)

# Create a Type VIII character costmatrix:
binary_dollo_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state= 1,
  character_type = "dollo"
)

# Calculate g for the case of two state 0s and three state 1s:
calculate_g(
  costmatrix = binary_dollo_costmatrix,
  tip_states = c("0", "0", "1", "1", "1")
)

# Create a Type IX character costmatrix:
multistate_dollo_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state= 2,
  character_type = "dollo"
)

# Calculate g for the case of two state 0s and three state 1s and two state 2s:
calculate_g(
  costmatrix = multistate_dollo_costmatrix,
  tip_states = c("0", "0", "1", "1", "1", "2", "2")
)

# Create a Type X character costmatrix:
multistate_symmetric_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state= 5,
  character_type = "ordered"
)
multistate_symmetric_costmatrix$type <- "custom"
multistate_symmetric_costmatrix$costmatrix <- matrix(
  data = c(
    0, 1, 2, 3, 2, 3,
    1, 0, 3, 2, 1, 2,
    2, 3, 0, 3, 2, 1,
    3, 2, 3, 0, 1, 2,
    2, 1, 2, 1, 0, 1,
    3, 2, 1, 2, 1, 0
  ),
  nrow = multistate_symmetric_costmatrix$size,
  ncol = multistate_symmetric_costmatrix$size,
  byrow = TRUE,
  dimnames = list(
    multistate_symmetric_costmatrix$single_states,
    multistate_symmetric_costmatrix$single_states
  )
)

# Calculate g for the case of two state 0s, three state 1s, two state 2s, one state 3, three state 4s and two state 5s:
calculate_g(
  costmatrix = multistate_symmetric_costmatrix,
  tip_states = c("0", "0", "1", "1", "1", "2", "2", "3", "4", "4", "4", "5", "5")
)

# Create a Type XI character costmatrix:
binary_asymmetric_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state= 1,
  character_type = "ordered"
)
binary_asymmetric_costmatrix$type <- "custom"
binary_asymmetric_costmatrix$costmatrix <- matrix(
  data = c(
    0, 1,
    10, 0
  ),
  nrow = binary_asymmetric_costmatrix$size,
  ncol = binary_asymmetric_costmatrix$size,
  byrow = TRUE,
  dimnames = list(
    binary_asymmetric_costmatrix$single_states,
    binary_asymmetric_costmatrix$single_states
  )
)
binary_asymmetric_costmatrix$symmetry <- "Asymmetric"

# Calculate g for the case of two state 0s and three state 1s:
calculate_g(
  costmatrix = binary_asymmetric_costmatrix,
  tip_states = c("0", "0", "1", "1", "1")
)

# Create a Type XII character costmatrix:
multistate_asymmetric_costmatrix <- make_costmatrix(
  min_state = 0,
  max_state= 2,
  character_type = "ordered"
)
multistate_asymmetric_costmatrix$type <- "custom"
multistate_asymmetric_costmatrix$costmatrix <- matrix(
  data = c(
    0, 1, 1,
    1, 0, 1,
    10, 10, 0
  ),
  nrow = multistate_asymmetric_costmatrix$size,
  ncol = multistate_asymmetric_costmatrix$size,
  byrow = TRUE,
  dimnames = list(
    multistate_asymmetric_costmatrix$single_states,
    multistate_asymmetric_costmatrix$single_states
  )
)
multistate_asymmetric_costmatrix$symmetry <- "Asymmetric"

# Calculate g for the case of two state 0s and three state 1s and two state 2s:
calculate_g(
  costmatrix = multistate_asymmetric_costmatrix,
  tip_states = c("0", "0", "1", "1", "1", "2", "2")
)


graemetlloyd/Claddis documentation built on May 9, 2024, 8:07 a.m.