Estimating the Minimum Number of Character Transitions Using Maximum Parsimony

Description

minCharChange is a function which takes a cladogram and a discrete trait and finds the solutions of inferred character states for ancestral nodes that minimizes the number of character state transitions (either gains or losses/reversals) for a given topology and a set of discrete character data. minCharChange relies on ancPropStateMat, which is a wrapper for phangorn's function ancestral.pars.

Usage

1
2
3
4
5
6
7
8
minCharChange(trait, tree, randomMax = 10000, maxParsimony = TRUE,
  orderedChar = FALSE, type = "MPR", cost = NULL, printMinResult = TRUE,
  ambiguity = c(NA, "?"), dropAmbiguity = FALSE, polySymbol = "&",
  contrast = NULL)

ancPropStateMat(trait, tree, orderedChar = FALSE, type = "MPR",
  cost = NULL, ambiguity = c(NA, "?"), dropAmbiguity = FALSE,
  polySymbol = "&", contrast = NULL, returnContrast = FALSE)

Arguments

trait

A vector of trait values for a discrete character, preferably named with taxon names identical to the tip labels on the input tree.

tree

A cladogram of type 'phylo'. Any branch lengths are ignored.

randomMax

The maximum number of cladograms examined when searching a large number of solutions consistent with the reconstructed ancestral states from ancestral.pars with the minimum number of character state transitions. If the number of potential solutions is less than randomMax, then solutions are exhaustively searched.

maxParsimony

If maxParsimony is TRUE (the default) then only solutions with the smallest number of total transitions examined will be returned. Note that since solutions are stochastically 'guessed' at, and the number of possible solutions may not be exhaustively searched, there may have been solutions not examined with a lower number of transitions even if maxParsimony = TRUE. Regardless, one may want to do maxParsimony = FALSE if one is interested in whether there are solutions with a smaller number of gains or losses and thus wants to return all solutions.

orderedChar

If TRUE (not the default), then the character will be reconstructed with a cost (step) matrix of a linear, ordered character. This is not applicable if type = "ACCTRAN", as cost matrices cannot be used with ACCTRAN in ancestral.pars, and an error will be returned if orderedChar = TRUE but a cost matrix is given, as the only reason to use orderedChar is to produce a cost matrix automatically.

type

The parsimony algorithm applied by ancestral.pars, which can apply one of two: "MPR" (the default) is a relatively fast algorithm developed by Hamazawa et al. (1995) and Narushima and Hanazawa (1997), which relies on reconstructing the states at each internal node by re-rooting at that node. "ACCTRAN", the 'accelerated transitions' algorithm (Swofford and Maddison, 1987), favors character reversal over independent gains when there is ambiguity. The "ACCTRAN" option in ancestral.pars avoids repeated rerooting of the tree to search for a smaller set of maximum-parsimony solutions that satisfy the ACCTRAN algorithm, but does so by assigning edge weights. As of phangorn v1.99-12, both of these algorithms apply the Sankoff parsimony algorithm, which allows multifurcations (polytomies).

cost

A matrix of the cost (i.e. number of steps) necessary to change between states of the input character trait. If NULL (the default), the character is assumed to be unordered with equal cost to change from any state to another. Cost matrices only impact the "MPR" algorithm; if a cost matrix is given but type = "ACCTRAN", an error is issued.

printMinResult

If TRUE (the default), a summary of the results is printed to the terminal. The information in this summary may be more detailed if the results of the analysis are simpler (i.e. fewer unique solutions).

ambiguity

A vector of values which indicate ambiguous (i.e. missing or unknown) character state codings in supplied trait data. Taxa coded ambiguously as treated as being equally likely to be any state coding. By default, NA values and "?" symbols are treated as ambiguous character codings, in agreement with behavior of functions in packages phangorn and Claddis. This argument is designed to mirror an hidden argument with an identical name in function phyDat in package phangorn.

dropAmbiguity

A logical. If TRUE (which is not the default), all taxa with ambiguous codings as defined by argument ambiguity will be dropped prior to ancestral nodes being inferred. This may result in too few taxa.

polySymbol

A single symbol which separates alternative states for polymorphic codings; the default symbol is "&", following the output by Claddis's ReadMorphNexus function, where polymorphic taxa are indicated by default with a string with state labels separated by an "&" symbol. For example, a taxon coded as polymorphic for states 1 or 2, would be indicated by the string "1&2". polySymbol is used to break up these strings and automatically construct a fitting contrast table for use with this data, including for ambiguous character state codings.

contrast

A matrix of type integer with cells of 0 and 1, where each row is labelled with a string value used for indicating character states in trait, and each column is labelled with the formal state label to be used for assign taxa to particular character states. A value of 1 indicates that the respective coding string for that row should be interpreted as reflecting the character state listed for that column. A coding could reflect multiple states (such as might occur when taxa are polymorphic for some morphological character), so the sums of rows and columns can sum to more than 1. If contrast is not NULL (the default), the arguments will nullify This argument is designed to mirror an hidden argument with an identical name in function phyDat in package phangorn. This structure is based on the contrasts tables used for statistical evaluation of factors. See the phangorn vignette "Special features of phangorn" for more details on its implementation within phangorn including an example. See examples below for the construction of an example contrast matrix for character data with polymorphisms, coded as character data output by Claddis's ReadMorphNexus function, where polymorphic taxa are indicated with a string with state labels separated by an "&" symbol.

returnContrast

If TRUE, the contrast table used by ancestral.pars will be output instead for user evaluation that polymorphic symbols and ambiguous states are being parsed correctly.

Details

The wrapper function ancPropStateMat simply automates the application of functions ancestral.pars and phyDat from phangorn, along with several additional checks and code to present the result as a matrix, rather than a specialized list.

Note that although the default orderedChar argument assumes that multistate characters are unordered, the results of character change will always be reported as gains and losses relative to the numbering of the states in the output transitionSumChanges, exactly as if they had been ordered. In the case where the character is actually ordered, this may be considered a conservative approach, as using a parsimony algorithm for unordered character states allows fewer gains or losses to be counted on branches where multiple gains and losses are reported. If the character is presumably unordered and multistate, however, then the gains and losses division is arbitrary nonsense and should be combined to to obtain the total number of character changes.

Value

By default, ancPropStateMat returns a matrix, with rows corresponding to the ID numbers of tips and nodes in $edge, and columns corresponding to character states, with the value representing the proportional weight of that node being that state under the algorithm used (known tip values are always 1). If argument returnContrast is TRUE then ancPropStateMat will instead return the final contrast table used by phyDat for interpreting character state strings.

minCharChange invisibly returns a list containing the following elements, several of which are printed by default to the console, as controlled by argument printMinResult:

message

Describes the performance of minCharChange at searching for a minimum solution.

sumTransitions

A vector recording the total number of necessary transitions (sum total of gains and losses/reversal) for each solution; effectively the parsimony cost of each solution.

minTransitions

A symmetrical matrix with number of rows and columns equal to the number of character states, with values in each cell indicating the minimum number of transitions from one ancestral state (i.e. the rows) to a descendant state (i.e. the columns), taken across the set of kept solutions (dependent on which are kept as decided by argument maxParsimony). Generally guaranteed not to add up to the number of edges contained within the input tree, and thus may not represent any realistic evolutionary scenario but does represent a conservative approach for asking 'what is the smallest possible number of transitions from 0 to 1' or 'smallest possible number of transitions from 1 to 0', independently of each other.

solutionArray

A three-dimensional array, where for each solution, we have a matrix with edges for rows and two columns indicating the ancestral and child nodes of that edge, with values indicating the states inferred for those nodes in a particular solution.

transitionArray

A labelled three-dimensional array where for each solution we have a symmetrical matrix with number of rows and columns equal to the number of character states, with values in each cell indicating the total number of transitions from one ancestral state (i.e. the rows) to a descendant state (i.e. the columns).

transitionSumChanges

Which is a three column matrix with a row for every solution, with the values in the three columns measuring the number of edges (branches) inferred to respectively have gains, no change or losses (i.e. reversals), as calculated relative to the order of character states.

Author(s)

David W. Bapst

References

Hanazawa, M., H. Narushima, and N. Minaka. 1995. Generating most parsimonious reconstructions on a tree: A generalization of the Farris-Swofford-Maddison method. Discrete Applied Mathematics 56(2-3):245-265.

Narushima, H., and M. Hanazawa. 1997. A more efficient algorithm for MPR problems in phylogeny. Discrete Applied Mathematics 80(2-3):231-238.

Schliep, K. P. 2011. phangorn: phylogenetic analysis in R. Bioinformatics 27(4):592-593.

Swofford, D. L., and W. P. Maddison. 1987. Reconstructing ancestral character states under Wagner parsimony. Mathematical Biosciences 87(2):199-229.

See Also

The functions described here are effectively wrapers of phangorn's function ancestral.pars.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# let's write a quick & dirty ancestral trait plotting function

quickAncPlotter<-function(tree,ancData,cex){
ancCol<-(1:ncol(ancData))+1
	plot(tree,show.tip.label=FALSE,no.margin=TRUE,direction="upwards")
	tiplabels(pch=16,pie=ancData[(1:Ntip(tree)),],cex=cex,piecol=ancCol,
	col=0)
	nodelabels(pie=ancData[-(1:Ntip(tree)),],cex=cex,piecol=ancCol)	
	}

# example with retiolitid graptolite data

data(retiolitinae)

#unordered, MPR
ancMPR<-ancPropStateMat(retioTree, trait=retioChar[,2], type="MPR")
#unordered, ACCTRAN
ancACCTRAN<-ancPropStateMat(retioTree, trait=retioChar[,2], type="ACCTRAN")

#let's compare MPR versus ACCTRAN results
layout(1:2)
quickAncPlotter(retioTree,ancMPR,cex=0.5)
text(x=4,y=5,"type='MPR'",cex=1.5)
quickAncPlotter(retioTree,ancACCTRAN,cex=0.5)
text(x=5,y=5,"type='ACCTRAN'",cex=1.5)

minCharChange(retioTree,trait=retioChar[,2],type="MPR")
minCharChange(retioTree,trait=retioChar[,2],type="ACCTRAN")

# with simulated data

set.seed(444)
tree<-rtree(50)
#simulate under a likelihood model
char<-rTraitDisc(tree,k=3,rate=0.7)
tree$edge.length<-NULL
tree<-ladderize(tree)

#unordered, MPR
ancMPR<-ancPropStateMat(tree, trait=char, type="MPR")
#unordered, ACCTRAN
ancACCTRAN<-ancPropStateMat(tree, trait=char, type="ACCTRAN")
#ordered, MPR
ancMPRord<-ancPropStateMat(tree, trait=char, orderedChar=TRUE, type="MPR")

#let's compare MPR versus ACCTRAN results
layout(1:2)
quickAncPlotter(tree,ancMPR,cex=0.3)
text(x=8,y=15,"type='MPR'",cex=1.5)
quickAncPlotter(tree,ancACCTRAN,cex=0.3)
text(x=9,y=15,"type='ACCTRAN'",cex=1.5)
#MPR has much more uncertainty in node estimates
	#but that doesn't mean ACCTRAN is preferable

#let's compare unordered versus ordered under MPR
layout(1:2)
quickAncPlotter(tree,ancMPR,cex=0.3)
text(x=8,y=15,"unordered char\nMPR",cex=1.5)
quickAncPlotter(tree,ancMPRord,cex=0.3)
text(x=9,y=15,"ordered char\nMPR",cex=1.5)
layout(1)

## Not run: 
# what ancPropStateMat automates (with lots of checks):

require(phangorn)
char1<-matrix(char,,1)
rownames(char1)<-names(char)
#translate into something for phangorn to read
char1<-phyDat(char1,type="USER",levels=sort(unique(char1)))
x<-ancestral.pars(tree,char1,type="MPR")
y<-ancestral.pars(tree,char1,type="ACCTRAN")

## End(Not run)

#estimating minimum number of transitions with MPR 
minCharChange(tree,trait=char,type="MPR")

#and now with ACCTRAN
minCharChange(tree,trait=char,type="ACCTRAN")

#POLYMORPHISM IN CHARACTER DATA


# example trait data with a polymorphic taxon
     # separated with '&' symbol
# similar to polymorphic data output by ReadMorphNexus from package Claddis
charPoly<-as.character(c(1,2,NA,0,0,1,"1&2",2,0,NA,0,2,1,1,"1&2"))
#simulate a tree with 16 taxa
set.seed(444)
tree<-rtree(15)
tree$edge.length<-NULL
tree<-ladderize(tree)
names(charPoly)<-tree$tip.label
charPoly

# need a contrast matrix that takes this into account
    #can build row by row, by hand

#first, build contrast matrix for basic states
contrast012<-rbind(c(1,0,0),c(0,1,0),c(0,0,1))
colnames(contrast012)<-rownames(contrast012)<-0:2
contrast012

#add polymorphic state and NA ambiguity as new rows
contrastPoly<-c(0,1,1)
contrastNA<-c(1,1,1)
contrastNew<-rbind(contrast012,'1&2'=contrastPoly,contrastNA)
rownames(contrastNew)[5]<-NA

#let's look at contrast
contrastNew

# now try this contrast table we've assembled
    # default: unordered, MPR
ancPoly<-ancPropStateMat(tree, trait=charPoly, contrast=contrastNew)

# but...!
# we can also do it automatically, 
    # by default, states with '&' are automatically treated
    # as polymorphic character codings by ancPropStateMat
ancPolyAuto<-ancPropStateMat(tree, trait=charPoly, polySymbol="&")

# but does this match what the table we constructed?
ancPropStateMat(tree, trait=charPoly,
		polySymbol="&", returnContrast=TRUE)

# compare to contrastNew above!
# only difference should be the default ambiguous
	# character '?' is added to the table

#compare reconstructions
layout(1:2)
quickAncPlotter(tree,ancPoly,cex=0.5)
text(x=3.5,y=1.2,"manually-constructed\ncontrast",cex=1.3)
quickAncPlotter(tree,ancPolyAuto,cex=0.5)
text(x=3.5,y=1.2,"auto-constructed\ncontrast",cex=1.3)
layout(1)

#look pretty similar!

#i.e. the default polySymbol="&", but could be a different symbol
     #such as "," or "\"... it can only be *one* symbol, though

# all of this machinery should function just fine in minCharChange
	# again, by default polySymbol="&" (included anyway here for kicks)
minCharChange(tree, trait=charPoly, polySymbol="&")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.