constrainParPaleo: Constrain Parameters for a Model Function from paleotree

View source: R/constrainParPaleo.R

constrainParPaleoR Documentation

Constrain Parameters for a Model Function from paleotree

Description

This function constrains a model to make submodels with fewer parameters, using a structure and syntax taken from the function constrain in Rich Fitzjohn's package diversitree.

Usage

constrainParPaleo(f, ..., formulae = NULL, names = parnames(f), extra = NULL)

Arguments

f

A function to constrain. This function must be of the S3 class paleotreeFunc and have all necessary attributes expected of that class, which include parameter names and upper and lower bounds. As I have deliberately not exported the function which creates this class, it should be impossible for non-advanced users to obtain such objects easily without using one of the make functions, which automatically output a function of the appropriate class and attributes.

...

Formulae indicating how the function should be constrained. See details and examples for lengthy discussion.

formulae

Optional list of constraints, possibly in addition to those in ...

names

Optional Character vector of names, the same length as the number of parameters in x. Use this only if parnames does not return a vector for your function. Generally this should not be used. DWB: This argument is kept for purposes of keeping the function as close to the parent as possible but, in general, should not be used because the input function must have all attributes expected of class paleotreeFunc, including parameter names.

extra

Optional vector of additional names that might appear on the RHS of constraints but do not represent names in the function's argnames. This can be used to set up dummy variables (example coming later).

Details

This function is based on (but does not depend on) the function constrain from the package diversitree. Users should refer to this parent function for more detailed discussion of model constraint usage and detailed examples.

The parent function was forked to add functionality necessary for dealing with the high parameter count models typical to some paleontological analyses, particularly the inverse survivorship method. This necessitated that the new function be entirely separate from its parent. Names of functions involved (both exported and not) have been altered to avoid overlap in the package namespaces. Going forward, the paleotree package maintainer (Bapst) will try to be vigilant with respect to changes in constrain in the original package, diversitree.

Useful information from the diversitree manual (11/01/13):

"If f is a function that takes a vector x as its first argument, this function returns a new function that takes a shorter vector x with some elements constrained in some way; parameters can be fixed to particular values, constrained to be the same as other parameters, or arbitrary expressions of free parameters."

In general, formulae should be of the structure:

LHS ~ RHS

...where the LHS is the 'Parameter We Want to Constrain' and the RHS is whatever we are constraining the LHS to, usually another parameter. LHS and RHS are the 'left-hand side' and 'right-hand side' respectively (which I personally find obscure).

Like the original constrain function this function is based on, this function cannot remove constraints previously placed on a model object and there may be cases in which the constrained function may not make sense, leading to an error. The original function will sometimes issue nonsensical functions with an incorrect number/names of parameters if the parameters to be constrained are given in the wrong order in formulae.

Differences from the original constrain function from diversitree

This forked paleotree version of constrain has two additional features, both introduced to aid in constraining models with a high number of repetitive parameters. (I did not invent these models, so don't shoot the messenger.)

First, it allows nuanced control over the constraining of many parameters simultaneously, using the all and match descriptors. This system depends on parameters being named as such: name.group1.group2.group3 and so on. Each 'group' is reference to a system of groups, perhaps referring to a time interval, region, morphology, taxonomic group or some other discrete characterization among the data (almost all functions envisioned for paleotree are for per-taxon diversification models). The number of group systems is arbitrary, and may be from zero to a very large number; it depends on the 'make' function used and the arguments selected by the user. For example, the parameter x.1 would be for the parameter x in the first group of the first group system (generally a time interval for most paleotree functions). For a more complicated exampled, with the parameter x.1.3.1, the third group for the second group system (perhaps this taxonomic data point has a morphological feature not seen in some other taxa) and group 1 of the third group system (maybe biogeographic region 1? The possibilities are endless depending on user choices!).

The all option work like so: if x.all ~ x.1 is given as a formulae, then all x parameters will be constrained to equal x.1. For example, if there is x.1, x.2, x.3 and x.4 parameters for a model, x.all ~ x.1 would be equivalent to individually giving the formulae x.2~x.1, x.3~x.1 and x.4~x.1. This means that if there are many parameters of a particular type (say, 50 x parameters) it is easy to constrain all with a short expression. It is not necessary that the The all term can be used anywhere in the name of the parameter in a formulae, including to make all parameters for a given group the same. Furthermore, the LHS and RHS don't need to be same parameter group, and both can contain all statements, even multiple all statements. Consider these examples, each of which are legal, acceptable uses:

x.all ~ y.1

Constrains all values of the parameter x for every group to be equal to the single value for the parameter y for group 1 (note that there's only a single set of groups).

all.1 ~ x.1

Constrains all parameters for the first group to equal each other, here arbitrary named x.1. For example, if there is parameters named x.1, y.1 and z.1, all will be constrained to be equal to a single parameter value.

x.all.all ~ y.2.3

Constrains all values for x in any and all groups to equal the single value for y for group 2 (of system 1) and group 3 (of system 2).

x.all ~ y.all

Constrains all values of x for every group and y for every group to be equal to a single value, which by default will be reported as y.1

The match term is similar, allowing parameter values from the same group to be quickly matched and made equivalent. These match terms must have a matching (cue laughter) term both in the corresponding LHS and RHS of the formula. For example, consider x.match ~ y.match where there are six parameters: x.1, x.2, x.3, y.1, y.2 and y.3. This will effectively constrain x.1 ~ y.1, x.2 ~ y.2 and x.3 ~ y.3. This is efficient for cases where we have some parameters that we often treat as equal. For example, in paleontology, we sometimes make a simplifying assumption that birth and death rates are equal in multiple time intervals. Some additional legal examples are:

x.match.1 ~ y.match.1

This will constrain only parameters of x and y to to equal each other if they both belong to the same group for the first group system AND belong to group 1 of the first group.

all.match. ~ x.match

This will constrain all named parameters in each group to equal each other; for example, if there are parameters x.1, y.1, z.1, x.2, y.2 and z.2, this will constrain them such that y.1 ~ x.1, z.1 ~ x.1, y.2 ~ x.2 and z.2 ~ x.2, leaving x.1 and x.2 as the only parameters effectively.

There are two less fortunate qualities to the introduction of the above terminology.

Unfortunately, this package author apologizes that his programming skills are not good enough to allow more complex sets of constraints, as would be typical with the parent function, when all or match terms are included. For example, it would not be legal to attempt to constraint y.all ~ x.1 / 2, where the user presumably is attempting to constrain all y values to equal the x parameter to equal half of the x parameter for group 1. This will not be parsed as such and should return an error. However, there are workarounds, but they require using constrainParPaleo more than once. For the above example, a user could first use y.all ~ y.1 constraining all y values to be equal. Then a user could constrain with the formula y.1 ~ x.1 / 2 which would then constrain y.1 (and all the y values constrained to equal it) to be equal to the desired fraction.

Furthermore, this function expects that parameter names don't already have period-separated terms that are identical to all or match. No function in paleotree should produce such natively. If such were to occur, perhaps by specially replacing parameter names, constrainParPaleo would confuse these terms for the specialty terms described here.

Secondly, this altered version of constrain handles the parameter bounds included as attributes in functions output by the various 'make' functions. This means that if x.1 ~ y.1 is given, constrainParPaleo will test if the bounds on x.1 and y.1 are the same. If the bounds are not the same, constrainParPaleo will return an error. This is important, as some models in paleotree may make a parameter a rate (bounded zero to some value greater than one) or a probability (bounded zero to one), depending on user arguments. Users may not realize these differences and, in many cases, constraining a rate to equal a probability is nonsense (absolute poppycock!). If a user really wishes to constrain two parameters with different bounds to be equal (I have no idea why anyone would want to do this), they can use the parameter bound replacement functions described in modelMethods to set the parameter bounds as equal. Finally, once parameters with the same bounds are constrained, the output has updated bounds that reflect the new set of parameters for the new constrained function.

Value

Modified from the diversitree manual: This function returns a constrained function that can be passed through to the optimization functions of a user's choice, such as optim, find.mle in diversitree or mcmc. It will behave like any other function. However, it has a modified class attribute so that some methods will dispatch differently: parnames, for example, will return the names of the parameters of the constrained function and parInit will return the initial values for those same constrained set of parameters. All arguments in addition to x will be passed through to the original function f.

Additional useful information from the diversitree manual (11/01/13):

For help in designing constrained models, the returned function has an additional argument pars.only, when this is TRUE the function will return a named vector of arguments rather than evaluate the function (see Examples).

Author(s)

This function (and even this help file!) was originally written by Rich Fitzjohn for his library diversitree, and subsequently rewritten and modified by David Bapst.

References

FitzJohn, R. G. 2012. diversitree: comparative phylogenetic analyses of diversification in R. Methods in Ecology and Evolution 3(6):1084-1092.

See Also

As noted above, this function is strongly based on (but does not depend on) the function constrain from the library diversitree.

Examples

# simulation example with make_durationFreqCont, with three random groups
set.seed(444)
record <- simFossilRecord(
    p = 0.1, 
    q = 0.1, 
    nruns = 1,
    nTotalTaxa = c(30,40), 
    nExtant = 0
    )
taxa <- fossilRecord2fossilTaxa(record)
rangesCont <- sampleRanges(taxa,r = 0.5)
# create a groupings matrix
grp1 <- matrix(
    sample(1:3,nrow(taxa),replace = TRUE), , 1)   
likFun <- make_durationFreqCont(rangesCont, groups = grp1)

# can constrain both extinction rates to be equal
constrainFun <- constrainParPaleo(likFun, q.2 ~ q.1)

#see the change in parameter names and bounds
parnames(likFun)
parnames(constrainFun)
parbounds(likFun)
parbounds(constrainFun)

# some more ways to constrain stuff!

#constrain all extinction rates to be equal
constrainFun <- constrainParPaleo(likFun, q.all ~ q.1)
parnames(constrainFun)

#constrain all rates for everything to be a single parameter
constrainFun <- constrainParPaleo(likFun, r.all ~ q.all)
parnames(constrainFun)

#constrain all extinction rates to be equal & all sampling to be equal
constrainFun <- constrainParPaleo(likFun, q.all ~ q.1, r.all ~ r.1)
parnames(constrainFun)

#similarly, can use match.all to make all matching parameters equal each other
constrainFun <- constrainParPaleo(likFun, match.all ~ match.all)
parnames(constrainFun)

#Constrain rates in same group to be equal
constrainFun <- constrainParPaleo(likFun, r.match ~ q.match)
parnames(constrainFun)


dwbapst/paleotree documentation built on Aug. 30, 2022, 6:44 a.m.