aggr.trait.map: Plot the mapping of a reconstructed trait averaging several...

View source: R/aggr_trait_map.R

aggr.trait.mapR Documentation

Plot the mapping of a reconstructed trait averaging several methods and add "thermometers" of methods variation

Description

This function is a wrapper of the plot.mapping and thermo.var functions. Its goal is to take the arithmetic mean of the values of a trait reconstructed for taxa following different methods and/or method parameters. Such arithmetic mean is considered as the "aggregated" method and is displayed by coloring the tree branches. Additional information about the variation between methods can be displayed by "thermometers" (see also tiplabels and nodelabels) for each taxon, with a color gradient for the value reconstructed for the array of methods.

Usage

aggr.trait.map(tree,values,type=c("taxa","branch"),plot=c("methods","groups","aggr"),aggr=TRUE,groups=NULL,
               order=c("phylo","names","edge"),cols.args,lims=c("local","groups","global","asym","sym0","symx"),
               disag=FALSE,disag.type=c("sign","steps"),return.disag=FALSE,thermo=TRUE,
               plot.mapping.args=NULL,thermo.var.args=NULL)

Arguments

tree

The phylogenetic tree to put "thermometers" on.

values

The data to "map" onto the phylogenetic tree and that the colors have to follow; can be a data frame or a matrix with taxa as rows and methods as columns.

type

Optional character. If mapping=TRUE, whether the values represent values for branches (hence coloring the edge with a single color; type="discrete") or taxa (hence coloring the edges with a gradient from a taxon to another; type="continuous")

plot

Optional character. Whether to plot all methods (plot="methods", hence automatically not plotting "thermometers" for these plots), groups of methods (plot="groups"), "aggregated" values (plot="aggr", the default), or combinations of them in a character vector. (hence automatically not plotting "thermometers" if plot="methods").

aggr

Optional. The reference values column for the "aggregated" variable (i.e., the variable taking into account all methods by taking the arithmetic mean of all values for each taxon) that represents the center of the color palette (but not necessarily of the "thermometers"!). The values data can already contain it as being the last column (aggr=TRUE, the default), as being absent (aggr=FALSE, computed by the function), as being one of the columns but no the last one (aggr being the column name or number refering to it in values).

groups

Optional. If plot="groups", there must be a list or a vector specifying which variables pertain to which "group" of variables. Can either be a list of vectors citing the number or the name of each variable, or a numeric vector with numbers from 1 to the number of groupes sorted like the variables (e.g., for three variables, if the two first are grouped and the latter is alone, then groups=c(1,1,2)).

order

Optional character. To specify the order of the values to take into account for the "mapping" and the "thermometers" (if thermo=TRUE). Default is to consider that values are sorted in the tips/nodes order (order="phylo"; 1-Ntip rows of values being for tips 1-N, and so on for the nodes). Values can also be sorted depending on their names (order="names"; if the tree AND the values have names for tips AND nodes), according to the tree branches construction (order="edge"; branches construction is available by asking tree$edge, the numbers refering to tips and nodes), or given a custom order (order being a vector of the names or of the number of all tips/nodes and of same length than the length of values)

cols.args

Optional list. A list of arguments for the reference color palette to be passed to the scale.palette function. These arguments are the palette resolution ncol, the colors to consider col, the central color middle.col if there is (otherwise turning this to NA), a central value in the values range middle (if there is, otherwise turning this to NA), and the values steps to follow steps (if there are, otherwise turning this to NA). Of these, the parameters ncol and cols are the most important; the parameters middle.col and middle can be left empty, and the parameter span is estimated as the range of values if left empty. By default, a "red-yellow-blue" palette of resolution 100 is computed. Please not that if a lot of colors are provided, if type="branch", and if branch.col.freqs and optionally branch.col.freqs.type are provided in plot.mapping.args, the function may encounter a bug: some points would not have an attributed color because of the too narrow value steps between each color.

lims

Optional. The type or values of limits for the "mapping" and/or the "thermometers" to consider. If a character, limits can encompass the range of plotted values only (lims="local"; the average of all methods) or of all values (lims="global") and they can be asymmetric (taking natural values range, "asy"), symmetric around zero (taking further value from zero and its opposite, "sym0"), or around another value ("symx", the arithmetic mean by default); hence it can be a vector of length 1 (choosing limits across or within variables), 2 (adding the asymmetric or symmetric choice), or 3 (adding the central value if symmetric to a given value) that is applicable for all desired "mappings" and for "thermometers. If a numeric, in the case of a continuous trait, the value limits to consider (two values: the inferior and superior bounds), hence a numeric vector of length two. In both cases, it can also be a list of such vectors (character or numeric) of length two if thermo=TRUE and desiring different limits for "mappings" and "thermometers", or a list of same length than the number of desired plots (depending on what is passed to the plot argument, with therefore different limits for each plot) and whose elements are either a single vector (specifying limits for "mappings" and also for "thermometers" if applicable) or a list of two vectors (specifying limits for both "mappings" and "thermometers").

disag

Optional. If type="branch", whether to consider "disagreements" between branch values; this is especially useful while considering signs or discrete traits. By default set to FALSE, can be set to TRUE (hence coloring disagreeing branches in black) or to any color (hence coloring disagreeing branches in the given color).

disag.type

Optional. If type="branch" and disag=TRUE, the type of disagreement to consider: it can regards signs of values (-1/0/1) (disag.type="sign") or discrete steps between integer values (disag.type="steps")

return.disag

If disag=TRUE, whether to return the list of disagreeing taxa.

thermo

Optional logical. Whether to plot "thermometers" to account for values variation or not. Set to TRUE by default, automatically turned to FALSE if the values is a single-columned matrix/dataframe.

plot.mapping.args

Optional list. Arguments to be passed to plot.mapping that are not informed from elsewhere. These are the plot title (title) and other "mapping" arguments (mapping.args). Can be a list of lists if plot="methods" or plot=c("methods","aggr"), then of same length than the desired number of plots.

thermo.var.args

Optional list. Argmuents to be passed to thermo.var that are not informed from elsewhere. These are the resolution for "thermometers" (resolution), the choice to plot a bar indicating the location of the aggregated value (aggr.bar) and its color type (aggr.bar.col), and various graphical aspects of "thermometers" (their border with border, their size with cex, their width with width, their height with height, and their position relative to taxa with adj).

Examples

require(ape)
require(phytools)
# Get a random tree
set.seed(10)
tree<-rtree(50)
# Get a random distribution of values for tips
tipvalues<-matrix(ncol=5,nrow=Ntip(tree))
set.seed(20)
tipvalues[,1]<-fastBM(tree)
# Modify a bit these values, with about one half of variation, to simulate different methods
for(i in 2:5){
  set.seed(i+11)
  tipvalues[,i]<-sapply(tipvalues[,1],function(x){x+runif(1,-0.5,0.5)*diff(range(tipvalues[,1]))})
}
rownames(tipvalues)<-tree$tip.label
# Get the ancestral reconstructions for each "method"
ancvalues<-matrix(ncol=5,nrow=Nnode(tree))
for(i in 1:5){
  set.seed(i*2)
  ancvalues[,i]<-fastAnc(tree,tipvalues[,i])
}
# Collate tips and nodes values
values<-rbind(tipvalues,ancvalues)
# Map all "methods" + average values (all "methods" for each tip) and get an idea of the variation across "methods"
layout(get.grid(ncol(values)+1))
aggr.trait.map(tree,values,type="taxa",aggr=FALSE)
# All plots have their own color range according to their values, but there are obviously some discrepancies, let's use the total data range instead
aggr.trait.map(tree,values,type="taxa",aggr=FALSE,lims="global")
# The range of values being not especially controlled but having values above and below zero, let's have the first condition ("local" color ranges) with now a symmetric color scale
aggr.trait.map(tree,values,type="taxa",aggr=FALSE,lims="sym0")
# Let's now have a symmetrical color range indexed on all values
aggr.trait.map(tree,values,type="taxa",aggr=FALSE,lims=c("global","sym0"))
# The data are not necessarily distributed around zero, let's have a symmetric color range but around the "real" middle of the values range, to have closer fidelity of the colors to the values
aggr.trait.map(tree,values,type="taxa",aggr=FALSE,lims=c("global","symx",mean(range(values))))
# Let's repeat this but this time using "local" color ranges for "mappings" (to have maximal palette range for each), still keeping a symmetric color palette (using thus "symx") centered on "local" data arithmetic mean (not providing the "x" that will be automatically calculated)
# and "global" color ranges for "thermometers" (to check the "aggregated" values position, with necessarily the colors of the "thermometers" not matching that of the "aggregated mapping")
aggr.trait.map(tree,values,type="taxa",aggr=FALSE,lims=list(c("local","symx"),c("global","symx",mean(range(values)))))
# Now let's consider there are two groups or methods: one with the first tree, one with the two last. Let's plot them
layout(get.grid(2))
aggr.trait.map(tree,values,type="taxa",aggr=FALSE,lims="global",plot="groups",groups=c(1,1,1,2,2))
# Let's now see them with all methods and with global aggregate
layout(get.grid(ncol(values)+3))
aggr.trait.map(tree,values,type="taxa",aggr=FALSE,lims="global",plot=c("methods","groups","aggr"),groups=c(1,1,1,2,2))

# Get trait values changes between taxa
changes<-apply(values,2,function(x){x[tree$edge[,2]]-x[tree$edge[,1]]})
# Do as previously for values changes
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge")
# This time, we can now that changes are negative or positive, so with a "center" around zero; let's have a symmetric color range around zero
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims="sym0")
# Due to a very high extreme we lost a lot of finer resolution, so let's do the same but with
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("global","sym0"))
# As it stands, it converts changes values in three colors, hence separating distribution (governed by limits) in three ranges of equal width.
# However, we could be interested in having an uneven repartition since most branches are in the second condition (color yellow), i.e., close to a zero value.
# Let's have the same as previously but with colors distributed so to have three even classes of colors.
# To do so, we need to indicate that the colors do not represent equal thirds between data limits but a custom width (representing, here, one third of values distribution)
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("global","sym0"),plot.mapping.args = list("branch.col.freqs"="equal","branch.col.freqs.type"="proportion"))
# Do the same with a finer color resolution
cols.args<-list("fun"="scale.palette","ncols"=1000,"cols"=c("blue","yellow","red"))
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",cols.args=cols.args,lims=c("local","sym0"),plot.mapping.args = list("branch.col.freqs"="equal","branch.col.freqs.type"="proportion"))
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",cols.args=cols.args,lims=c("global","sym0"),plot.mapping.args = list("branch.col.freqs"="equal","branch.col.freqs.type"="proportion"))
# Now do the same with the values in the 5% range around zero being in yellow
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("local","sym0"),plot.mapping.args = list("branch.col.freqs"=c(0.475,0.05,0.475),"branch.col.freqs.type"="width"))
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("global","sym0"),plot.mapping.args = list("branch.col.freqs"=c(0.475,0.05,0.475),"branch.col.freqs.type"="width"))
# Now do the same with the values in the 5% range around zero being in yellow before doing the color gradient (with therefore more colors, the values in the 5% range around zero being the yellowests)
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("local","sym0"),cols.args=cols.args,plot.mapping.args = list("branch.col.freqs"=c(0.475,0.05,0.475),"branch.col.freqs.type"="width"))
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("global","sym0"),cols.args=cols.args,plot.mapping.args = list("branch.col.freqs"=c(0.475,0.05,0.475),"branch.col.freqs.type"="width"))
# Now do the same with the 5% central values being in yellow
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("local","sym0"),plot.mapping.args = list("branch.col.freqs"=c(0.475,0.05,0.475),"branch.col.freqs.type"="proportion"))
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("global","sym0"),plot.mapping.args = list("branch.col.freqs"=c(0.475,0.05,0.475),"branch.col.freqs.type"="proportion"))
# Now do the same with the 5% central values being in yellow before doing the color gradient (with therefore more colors, the 5% central values being the yellowests)
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("local","sym0"),cols.args=cols.args,plot.mapping.args = list("branch.col.freqs"=c(0.475,0.05,0.475),"branch.col.freqs.type"="proportion"))
aggr.trait.map(tree,changes,type="branch",aggr=FALSE,order="edge",lims=c("global","sym0"),cols.args=cols.args,plot.mapping.args = list("branch.col.freqs"=c(0.475,0.05,0.475),"branch.col.freqs.type"="proportion"))

# Get signs of changes
signs<-apply(changes,c(1,2),sign)
aggr.trait.map(tree,signs,type="branch",aggr=FALSE,order="edge",disag=TRUE)
# Now signs are too much present, let's get some tolerance and set as zero value the ones close to zero
range(changes) # Let's take -0.5 and 0.5 as bounds
signs<-apply(changes,c(1,2),function(x){if(x<0.5&x>(-0.5)){x<-0}else{sign(x)}})
aggr.trait.map(tree,signs,type="branch",aggr=FALSE,order="edge",disag=TRUE)
# Let's do same with -1 and 1 as bounds, and with disagreeing branches in light gray, to first see agreeing branches
signs<-apply(changes,c(1,2),function(x){if(x<1&x>(-1)){x<-0}else{sign(x)}})
aggr.trait.map(tree,signs,type="branch",aggr=FALSE,order="edge",disag="lightgray")
# Now let's get the disagreeing branches
disag<-aggr.trait.map(tree,signs,type="branch",aggr=FALSE,order="edge",disag="lightgray",return.disag=TRUE)
disag[which(disag)]

# Sort values with custom order
order<-sample(1:nrow(values),nrow(values))
# Get aggregated values relying on their names
tree$node.label<-as.character((1:Nnode(tree))+Ntip(tree)) # Adding names for nodes
rownames(values)<-c(tree$tip.label,tree$node.label) #Adding node labels as rownames to values
par(mfrow=c(1,1))
aggr.trait.map(tree,values,type="taxa",aggr=FALSE,plot="aggr",plot.mapping.args=list("title"="reference"))
aggr.trait.map(tree,values[order,],type="taxa",aggr=FALSE,plot="aggr",order="names",plot.mapping.args=list("title"="relying on names"))
# Rely on the given order, still having names for nodes
aggr.trait.map(tree,values[order,],type="taxa",aggr=FALSE,plot="aggr",order=order,plot.mapping.args=list("title"="relying on taxa names order"))
aggr.trait.map(tree,values[order,],type="taxa",aggr=FALSE,plot="aggr",order=order,plot.mapping.args=list("title"="relying on numeric order with node labels"))
# Rely on the given order but without node names
tree$node.label<-NULL
rownames(values)<-c(tree$tip.label,rep("",Nnode(tree)))
aggr.trait.map(tree,values[order,],type="taxa",aggr=FALSE,plot="aggr",order=order,plot.mapping.args=list("title"="relying on numeric order without node labels"))
# Manually add aggregated values a priori and specify it
aggr.trait.map(tree,values,type="taxa",aggr=FALSE,plot="aggr",plot.mapping.args=list("title"="reference"))
aggr<-apply(values,1,mean)
# First with adding aggregated values as the last column
values2<-cbind(values,"aggr"=aggr)
aggr.trait.map(tree,values2,type="taxa",aggr=TRUE,plot="aggr",plot.mapping.args=list("title"="aggr as default (last column)"))
# Second, mixing values columns and specify which is the aggregated one by its name or its number
values3<-values2[,sample(1:ncol(values2),ncol(values2))]
aggr.trait.map(tree,values2,type="taxa",aggr="aggr",plot="aggr",plot.mapping.args=list("title"="aggr somewhere, specifying it by its name"))
aggr.trait.map(tree,values3,type="taxa",aggr=which(colnames(values3)=="aggr"),plot="aggr",plot.mapping.args=list("title"="aggr somewhere, specifying it by its position"))


jacobmaugoust/ULT documentation built on May 16, 2023, 1:29 p.m.