mulTree: Run MCMCglmm on multiple trees

View source: R/mulTree.R

mulTreeR Documentation

Run MCMCglmm on multiple trees

Description

Running a MCMCglmm model on a multiple phylogenies and a data.frame combined using as.mulTree. The results are written out of R environment as individual models.

Usage

mulTree(
  mulTree.data,
  formula,
  parameters,
  chains = 2,
  priors,
  ...,
  convergence = 1.1,
  ESS = 1000,
  verbose = TRUE,
  output = "mulTree_models",
  warn = FALSE,
  parallel,
  ask = TRUE
)

Arguments

mulTree.data

A list of class mulTree generated using as.mulTree.

formula

An object of class formula (excluding the random terms).

parameters

A list of three numerical values to be used respectively as: (1) the number of generations, (2) the sampling value, (3) the burnin.

chains

The number of independent chains to run per model.

priors

A series of priors to use for the MCMC. If missing, the priors will be the default parameters from the MCMCglmm function.

...

Any additional arguments to be passed to the MCMCglmm function.

convergence

A numerical value for assessing chains convergence (default = 1.1).

ESS

A numerical value for assessing the effective sample size (default = 1000).

verbose

A logical value stating whether to be verbose or not (default = TRUE).

output

A string of characters that will be used as chain name for the models output (default = mulTree_models).

warn

Whether to print the warning messages from the MCMCglmm function (default = FALSE).

parallel

An optional vector containing the virtual connection process type for running the chains in parallel (requires snow package).

ask

logical, whether to ask to overwrite models (TRUE - default) or not (FALSE)).

Value

Generates MCMCglmm models and saves them sequentially out of R environment to minimise users RAM usage. Use read.mulTree to reload the models back in the R environment. Because of the calculation of the vcv matrix for each model and each tree in the MCMCglmm models, this function is really RAM demanding. For big datasets we heavily recommend to have at least 4GB RAM DDR3 available.

Author(s)

Thomas Guillerme

See Also

MCMCglmm, as.mulTree, read.mulTree

Examples

## Quick example:
## Before the analysis
data <- data.frame("sp.col" = LETTERS[1:5], var1 = rnorm(5), var2 = rnorm(5))
tree <- replicate(3, rcoal(5, tip.label = LETTERS[1:5]), simplify = FALSE)
class(tree) <- "multiPhylo"
mulTree.data <- as.mulTree(data, tree, taxa = "sp.col")
priors <- list(R = list(V = 1/2, nu = 0.002),
     G = list(G1 = list(V = 1/2, nu = 0.002)))
## quick example
mulTree(mulTree.data, formula = var1 ~ var2, parameters = c(10000, 10, 1000),
     chains = 2, prior = priors, output = "quick_example", convergence = 1.1,
     ESS = 100)
## Clean folder
file.remove(list.files(pattern = "quick_example"))
## alternative example with parallel argument (and double the chains!)
mulTree(mulTree.data, formula = var1 ~ var2, parameters = c(10000, 10, 1000),
     chains = 4, prior = priors, output = "quick_example", convergence = 1.1,
     ESS = 100, parallel = "SOCK")
## Clean folder
file.remove(list.files(pattern = "quick_example"))

## Not run: 
## Before the analysis:
## read in the data
data(lifespan)
## combine aves and mammalia trees
combined_trees <- tree.bind(x = trees_mammalia, y = trees_aves, sample = 2,
     root.age = 250)

## Preparing the variables for the mulTree function
## creates the "mulTree" object
mulTree_data <- as.mulTree(data = lifespan_volant, tree = combined_trees,
     taxa = "species")
## formula
test_formula <- longevity ~ mass + volant
## parameters (number of generations, thin/sampling, burnin)
mcmc_parameters <- c(101000, 10, 1000)
# For higher ESS run longer by increasing the number of generations
## priors
mcmc_priors <- list(R = list(V = 1/2, nu = 0.002),
     G = list(G1 = list(V = 1/2, nu = 0.002)))

## Running MCMCglmm on multiple trees
## WARNING: This example takes between 1 and 2 minutes to run
## and generates files in your current directory.
mulTree(mulTree_data, formula = test_formula, parameters = mcmc_parameters,
     priors = mcmc_priors, output = "longevity.example", ESS = 50)

## The models are saved out of R environment under the "longevity.example"
## chains names.
## Use read.mulTree() to read the generated models.

## Remove the generated files from the current directory
file.remove(list.files(pattern = "longevity.example"))

## Parallel example
## Loading the snow package
library(snow)
## Running the same MCMCglmm on multiple trees
mulTree(mulTree_data, formula = test_formula, parameters = mcmc_parameters,
     priors = mcmc_priors, output = "longevity.example", ESS = 50,
     parallel = "SOCK")
## Remove the generated files from the current directory
file.remove(list.files(pattern = "longevity.example"))
 
## Same example but including specimens
## Subset of the data
data <- lifespan_volant[sample(nrow(lifespan_volant), 30),]
##Create a dataset with two specimen per species
data <- rbind(cbind(data, specimen = rep("spec1", 30)), cbind(data,
     specimen = rep("spec2", 30)))
##Cleaning the trees
trees <- clean.data(data, combined_trees, data.col = "species")$tree

##Creates the mulTree object
mulTree_data <- as.mulTree(data, trees, taxa = "species",
     rand.terms = ~species+specimen)

## Updating the priors
mcmc_priors <- list(R = list(V = 1/2, nu = 0.002),
                    G = list(G1 = list(V = 1/2, nu = 0.002),
                    G2 = list(V = 1/2, nu = 0.002)))

##Running MCMCglmm on multiple trees
mulTree(mulTree_data, formula = test_formula, parameters = mcmc_parameters,
     priors = mcmc_priors, output = "longevity.example", ESS = 50)
##Remove the generated files from the current directory
file.remove(list.files(pattern = "longevity.example"))

## End(Not run)


TGuillerme/mulTree documentation built on Feb. 21, 2024, 9:18 a.m.