pwc: Do pairwise comparison among parameter blocks
In bstaggmartin/evorates: Evolving Rate Models of Continuous Trait Evolution

View source: R/PAR_math.R

pwc	R Documentation

Do pairwise comparison among parameter blocks

Description

Wrap this function around a mathematical expression involving param_block arrays to switch to "pairwise comparison mode". This changes the default behavior from recycling columns of param_block arrays to ensure each array has the same number of parameters to instead performing mathematical operations between each pair of parameters in the arrays.

Usage

pwc(expr, fit = NULL)

Arguments

expr

A mathematical expression generally involving two or more param_block arrays. All arrays must have compatible iterations/quantiles/diagnostics (typically the first dimension/rows) and chains (typically the third dimension/slices).

fit

An optional object of class "evorates_fit" or "param_block". If not NULL (the default), any unrecognized object names in expr will be interpreted as parameters to extract from fit using the %chains% operator only (the delayed evaluation makes it difficult to automatically select different operators based on expr). Note that this function will only pass referenced object names, not strings, to the %chains% operator. As a result, only unquoted names can be used to specify selections from fit; just as with data.frame objects, backticks (``) can be used to enclose object names including non-standard characters (see examples below). For more details on how parameters are extracted, see grapes-chains-grapes.

Details

This function can be a bit finicky at times (particularly when taking advantage of the fit argument), but I find it too helpful to get rid of at this point. Basically, any time you do math with param_block arrays, R checks whether a variable in the parent environment, make.Ops.param_blocks.pw, is TRUE or FALSE. If TRUE, R does math with param_block arrays in pairwise mode. This function is thus nothing more than a wrapper that evaluates expr with make.Ops.param_blocks.pw set to TRUE in the parent environment. Any time you start messing with scoping and going across environments in R, weird side effects are possible.

Practically speaking, just be sure to look out for ambiguous situations where you try to use an object name that already exists in your current R environment to refer to something to extract from fit! When in doubt about this, it is safest to keep fit as NULL and manually extract the necessary param_block arrays within expr or as separate objects later referred to by expr.

Value

Typically another array of class "param_block" with a param_type set to that of the inputted arrays. The dimension of these arrays will generally go in the order of iterations/quantiles/diagnostics, then parameters, then chains. The array will have the same iterations/quantiles/diagnostics and chains as the inputted arrays but will have a new set of parameters corresponding to every pairwise combination of parameters among the inputted arrays. As is typical with doing math on param_block arrays, the result is never simplified (i.e., dimension of length 1 are never collapsed), such that complicated expressions are evaluated more efficiently.

Resulting parameters are arranged in "column-major" order, meaning that each parameter on the right hand side of any binary mathematical operation (e.g., +, >, etc.) will be recycled to match up with the total number of parameters on the left hand side. For example, cet_fit %chains% 3:4 - cet_fit %chains% 1:2 would result in parameters R_3%-%R_1, R_4%-%R_1, R_3%-%R_2, and R_4%-%R_2, in that order.

Note that, since expr could technically be anything, you could get a completely different output from this function if expr doesn't include any param_block objects (e.g., pwc(1) would just return 1).

Examples

#get whale/dolphin evorates fit
data("cet_fit")

#here we compare branchwise rates for edges 3 and 4 to those for 1 and 2
pwc.result <- pwc(fit %chains% 3:4 - fit%chains% 1:2)
#compare to
norm.result <- fit %chains% 3:4 - fit%chains% 1:2

#notice that we cannot use numeric selections in conjunction with the fit argument
#such selections are just interpreted as numbers
pwc.result <- pwc(3:4 - 1:2, fit = cet_fit) #returns c(2, 2)
#note that character selections are similarly just interpreted as strings
## Not run: pwc.result <- pwc(c("^R_3$", "^R_4$") - c("^R_1$", "^R_2$"), fit = cet_fit) #returns error
#you have to use backticks instead to make the selections object names rather than strings
pwc.result <- pwc(c(`^R_3$`, `^R_4$`) - c(`^R_1$`, `^R_2$`), fit = cet_fit)
#object names work without backticks if they don't include special symbols like ^ or $
pwc.result <- pwc(c(R_sig2, R_mu) - c(`^R_1$`, `^R_2$`), fit = cet_fit)
#also, note that specifying fit always results in selecting posterior samples/"chains", so this won't work
## Not run: pwc.result <- pwc(c(`^R_3$`, `^R_4$`) - cet_fit %quantiles% 1:2, fit = cet_fit) #returns error
#in this case, it would be better to use more manual selection to get the desired result
pwc.result <- pwc(cet_fit %quantiles% 3:4 - cet_fit %quantiles% 1:2, fit = cet_fit)
#so yeah, the fit argument can be a bit clunky...

#as a more practical example...
#maybe we want to compare branchwise rates within the Globicephalinae subfamily?
globs <- fit %chains% get.clade.edges(cet_fit$call$tree, c("Pseudorca", "Feresa"))
pwc.result <- pwc(globs - globs)
#notice how the results here can "blow up" and be hard to interpret...
#there are 10 edges in Globicephalinae clade so pairwise comparison results in 10 * 10 = 100 parameters!
#a helpful technique here is to instead summarize the results in a matrix; here's a function for this:
foo <- function(x, par1, par2 = NULL){
   only.par1.flag <- FALSE
   if(is.null(par2)){
      par2 <- par1
      only.par1.flag <- TRUE
   }
   nms1 <- names(par1)
   nms2 <- names(par2)
   out <- matrix(x, length(nms1), length(nms2),
                 dimnames = list(nms1, nms2))
   if(only.par1.flag) diag(out) <- NA
   out
}
#now let's look at the median differences (averaged across each chain) in rates:
pwc.result <- foo(rowMeans(pwc.result %quantiles% list(".", 0.5)), globs)
#or even the posterior probability associated with the difference:
pwc.result <- foo(rowMeans(pwc(globs > globs) %means% "."), globs)
#note that parameters on the left hand side of ">" correspond to rows and those on the right to columns
#so pwc.result here shows the posterior probability that the row parameter is greater than the column one
#we could of course compare branchwise rates among different groups too using this framework
meso <- cet_fit %chains% get.clade.edges(cet_fit$call$tree, "Mesoplodon")
pwc.result <- foo(rowMeans(pwc(globs > meso) %means% "."), globs, meso)
#of course, this is still confusing when we don't know the numbers of each edge in the phylogeny

#here's perhaps a better example where we compare all "tip rates" with better parameter names
tip.rates <- setNames(cet_fit %chains% tip.edges(cet_fit), cet_fit$call$tree$tip.label)
pwc.result <- foo(rowMeans(pwc(tip.rates > tip.rates) %means% "."), tip.rates)
#the result is a large matrix, but it does allow you to compare rates among all tips in the phylogeny
#it may be helpful to plot such results, though I haven't spent too much time experimenting with this
image(x = 1:nrow(pwc.result), y = 1:ncol(pwc.result), z = t(pwc.result)[,nrow(pwc.result):1],
      col = colorRampPalette(c('blue', 'steelblue', 'gray', 'pink2', 'red'))(12),
      xlab = '', xaxt = 'n', ylab = '', yaxt = 'n')
row.genera <- rev(sapply(strsplit(rownames(pwc.result), '_'), '[', 1))
axis(2, at = 1:nrow(pwc.result), labels = row.genera, las = 2)
col.genera <- sapply(strsplit(colnames(pwc.result), '_'), '[', 1)
axis(1, at = 1:ncol(pwc.result), labels = col.genera, las = 2)
#redder colors indicate a row tip likely exhibiting a faster rate than a column tip
#and vice versa for bluer colors

bstaggmartin/evorates documentation built on May 31, 2024, 5:56 a.m.