#' @title Ancestral states estimations with multiple trees
#'
#' @description Fast ancestral states estimations run on multiple trees using the Mk model from castor::asr_mk_model.
#'
#' @param data A \code{matrix}, \code{data.frame} or \code{list} with the characters for each taxa.
#' @param tree A \code{phylo} or \code{mutiPhylo} object (if the \code{tree} argument contains node labels, they will be used to name the output).
#' @param models A \code{character} vector, unambiguous named \code{list} or \code{matrix} to be passed as model arguments to \code{castor::asr_mk_model} or \code{ape::ace} (see details).
#' @param threshold either \code{logical} for applying a relative threshold (\code{TRUE} - default) or no threshold (\code{FALSE}) or a \code{numeric} value of the threshold (e.g. 0.95). See details.
#' @param special.tokens optional, a named \code{vector} of special tokens to be passed to \code{\link[base]{grep}} (make sure to protect the character with \code{"\\\\"}). By default \code{special.tokens <- c(missing = "\\\\?", inapplicable = "\\\\-", polymorphism = "\\\\&", uncertainty = "\\\\/")}. Note that \code{NA} values are not compared and that the symbol "@" is reserved and cannot be used.
#' @param special.behaviours optional, a \code{list} of one or more functions for a special behaviour for \code{special.tokens}. See details.
#' @param brlen.multiplier optional, a vector of branch length modifiers (e.g. to convert time branch length in changes branch length) or a list of vectors (the same length as \code{tree}).
#' @param verbose \code{logical}, whether to be verbose (\code{TRUE}) or not (\code{FALSE} - default).
#' @param parallel Either a \code{logical}, whether to use parallel algorithm (\code{TRUE}) or not (\code{FALSE} - default); or directly an \code{integer} indicating the number of cores to use (note that if \code{parallel = 1}, one core will be used but the parallel integration will still be called).
#' @param output optional, see \code{Value} section below.
#' @param options.args optional, a named list of options to be passed to function called by \code{castor::asr_mk_model}.
#' @param estimation.details optional, whether to also return the details for each estimation as returned by \code{castor::asr_mk_model} or \code{ape::ace}. This argument can be left \code{NULL} (default) or be any combination of the elements returned by \code{castor::asr_mk_model} or \code{ape::ace} (e.g. \code{c("loglikelihood", "transition_matrix", "CI95")}).
#'
#' @details
#'
#' Depending on the type of characters \code{models} argument can be either:
#' \itemize{
#' \item the name of a single model to apply to all characters (if all characters are discrete or all are continuous); see below for the list of available names. For example \code{models = "ER"} applies the Equal Rates model to all characters (assuming they are all discrete characters).
#' \item a vector of model names to apply to different type of characters (see below for the list). For example \code{models = c("ER", "ER", "BM")} applies the Equal Rates model to the two first characters (discrete) and the \code{"BM"} model to the third character (continuous).
#' \item a transition \code{"matrix"} to be applied to all characters (if discrete). For example \code{models = matrix(0.2, 2, 2)}.
#' \item an single named list of arguments to be applied to all characters by passing it to \code{ape::ace} (if continuous). For example \code{models = list(method = "GLS", corStruct = corBrownian(1, my_tree))}.
#' \item an un-ambiguous list of arguments to be passed to either \code{castor::asr_mk_model} (discrete characters) or \code{ape::ace} (continuous characters). For example \code{models = list("char1" = list(transition_matrix = matrix(0.2, 2, 2)), "char2" = list(method = "GLS", corStruct = corBrownian(1, my_tree)))} to be specifically passed to the characters named "char1" and "char2"
#'}
#'
#' The available built-in models for discrete characters in \code{castor::asr_mk_model} are:
#' \itemize{
#' \item \code{"ER"} for all equal rates
#' \item \code{"SYM"} for symmetric rates
#' \item \code{"ARD"} all rates are different
#' \item \code{"SUEDE"} equal stepwise transitions (e.g. for meristic/counting characters)
#' \item \code{"SRD"} different stepwise transitions
#' }
#' See directly \code{castor::asr_mk_model} for more models.
#'
#' The available built-in models and methods for continuous characters in \code{ape::ace} are:
#' \itemize{
#' \item \code{"BM"} model: for a default Brownian Motion with the "REML" method
#' \item \code{"REML"} method: for a default Brownian Motion with the "REML" method (same as above)
#' \item \code{"ML"} method: for a default Brownian Motion with the "ML" method
#' \item \code{"pic"} method: for a default Brownian Motion with the "pic" (least squared) method
#'}
#'
#' The \code{threshold} option allows to convert ancestral states likelihoods into discrete states. When \code{threshold = FALSE}, the ancestral state estimated is the one with the highest likelihood (or at random if likelihoods are equal). When \code{threshold = TRUE}, the ancestral state estimated are all the ones that are have a scaled likelihood greater than the maximum observed scaled likelihood minus the inverse number of possible states (i.e. \code{select_state >= (max(likelihood) - 1/n_states)}). This option makes the threshold selection depend on the number of states (i.e. if there are more possible states, a lower scaled likelihood for the best state is expected). Finally using a numerical value for the threshold option (e.g. \code{threshold = 0.95}) will simply select only the ancestral states estimates with a scaled likelihood equal or greater than the designated value. This option makes the threshold selection absolute. Regardless, if more than one value is select, the uncertainty token (\code{special.tokens["uncertainty"]}) will be used to separate the states. If no value is selected, the uncertainty token will be use between all observed characters (\code{special.tokens["uncertainty"]}).
#'
#' \code{special.behaviours} allows to generate a special rule for the \code{special.tokens}. The functions should can take the arguments \code{character, all_states} with \code{character} being the character that contains the special token and \code{all_states} for the character (which is automatically detected by the function). By default, missing data returns and inapplicable returns all states, and polymorphisms and uncertainties return all present states.
#' \itemize{
#' \item{\code{missing = function(x,y) y}}
#' \item{\code{inapplicable = function(x,y) y}}
#' \item{\code{polymorphism = function(x,y) strsplit(x, split = "\\\\&")[[1]]}}
#' \item{\code{uncertainty = function(x,y) strsplit(x, split = "\\\\/")[[1]]}}
#' }
#'
#' Functions in the list must be named following the special token of concern (e.g. \code{missing}), have only \code{x, y} as inputs and a single output a single value (that gets coerced to \code{integer} automatically). For example, the special behaviour for the special token \code{"?"} can be coded as: \code{special.behaviours = list(missing = function(x, y) return(NA)} to make ignore the character for taxa containing \code{"?"}.
#'
#' When using the parallel option (either through using \code{parallel = TRUE} by using the number of available cores minus on or manually setting the number of cores - e.g. \code{parallel = 5}), the \code{castor::asr_mk_model} function will use the designated number of cores (using the option \code{Nthreads = <requested_number_of_cores>}). Additionally, if the input \code{tree} is a \code{"multiPhylo"} object, the trees will be run in parallel for each number of cores, thus decreasing computation time accordingly (e.g. if 3 cores are requested and \code{tree} contains 12 \code{"phylo"} objects, 4 different \code{"phylo"} objects will be run in parallel on the 3 cores making the calculation around 3 times faster).
#'
#' @return
#' Returns a \code{"matrix"} or \code{"list"} of ancestral states. By default, the function returns the ancestral states in the same format as the input \code{matrix}. This can be changed using the option \code{output = "matrix"} or \code{"list"} to force the class of the output.
#' To output the combined ancestral states and input, you can use \code{"combined"} (using the input format) or \code{"combined.matrix"} or \code{"combined.list"}.
#' If using continuous characters only, you can use the output option \code{"dispRity"} to directly output a usable \code{dispRity} object with all trees and all the data (estimated and input).
#' \emph{NOTE} that if the input data had multiple character types (continuous and discrete) and that \code{"matrix"} or \code{"combined.matrix"} output is requested, the function returns a \code{"data.frame"}.
#'
#' @examples
#' set.seed(42)
#' ## A simple example:
#' ## A random tree with 10 tips
#' tree <- rcoal(10)
#' ## Setting up the parameters
#' my_rates = c(rgamma, rate = 10, shape = 5)
#'
#' ## A random Mk matrix (10*50)
#' matrix_simple <- sim.morpho(tree, characters = 50, model = "ER", rates = my_rates,
#' invariant = FALSE)
#'
#' ## Run a basic ancestral states estimations
#' ancestral_states <- multi.ace(matrix_simple, tree)
#' ancestral_states[1:5, 1:5]
#'
#' ## A more complex example
#' ## Create a multiple list of 5 trees
#' multiple_trees <- rmtree(5, 10)
#'
#' ## Modify the matrix to contain missing and special data
#' matrix_complex <- matrix_simple
#' matrix_complex[sample(1:length(matrix_complex), 50)] <- "-"
#' matrix_complex[sample(1:length(matrix_complex), 50)] <- "0%2"
#' matrix_complex[sample(1:length(matrix_complex), 50)] <- "?"
#' matrix_complex[1:5,1:5]
#'
#' ## Set a list of extra special tokens
#' my_spec_tokens <- c("weirdtoken" = "%")
#'
#' ## Set some special behaviours for the "weirdtoken" and for "-" and "?"
#' my_spec_behaviours <- list()
#' ## Inapplicable tokens "-" are ignored
#' my_spec_behaviours$inapplicable <- function(x,y) return(NA)
#' ## Missing tokens "?" are considered as all states
#' my_spec_behaviours$missing <- function(x,y) return(y)
#' ## Weird tokens are considered as state 0 and 3
#' my_spec_behaviours$weirdtoken <- function(x,y) return(c(1,2))
#'
#' ## Create a random branch length modifier to apply to each tree
#' branch_lengths <- rnorm(18)^2
#'
#' ## Setting a list of model ("ER" for the 25 first characters and then "SYM")
#' my_models <- c(rep("ER", 25), rep("SYM", 25))
#'
#' ## Run the ancestral states on all the tree with multiple options
#' ancestral_states <- multi.ace(matrix_complex, multiple_trees,
#' verbose = TRUE,
#' models = my_models,
#' threshold = 0.95,
#' special.tokens = my_spec_tokens,
#' special.behaviours = my_spec_behaviours,
#' brlen.multiplier = branch_lengths,
#' output = "combined.matrix")
#'
#' ## The results for the the two first characters for the first tree
#' ancestral_states[[1]][, 1:2]
#'
#' \dontrun{
#' ## The same example but running in parallel
#' ancestral_states <- multi.ace(matrix_complex, multiple_trees,
#' verbose = TRUE,
#' models = my_models,
#' threshold = 0.95,
#' special.tokens = my_spec_tokens,
#' special.behaviours = my_spec_behaviours,
#' brlen.multiplier = branch_lengths,
#' output = "combined.matrix",
#' parallel = TRUE)
#' }
#' @seealso
#' \code{castor::asr_mk_model}, \code{char.diff}
# \code{fit.ace.model},
#'
#' @author Thomas Guillerme
#' @export
multi.ace <- function(data, tree, models, threshold = TRUE, special.tokens, special.behaviours, brlen.multiplier, verbose = FALSE, parallel = FALSE, output, options.args, estimation.details = NULL) {
match_call <- match.call()
## SANITIZING
## matrix
matrix <- data
input_class <- check.class(matrix, c("matrix", "list", "data.frame"))
## Convert the matrix if not a list
class_matrix <- class(matrix)
if(class_matrix[[1]] == "list") {
matrix <- do.call(rbind, matrix)
}
## Get the characters
n_characters <- ncol(matrix)
## tree
check.class(tree, c("phylo", "multiPhylo"))
if(is(tree, "phylo")) {
tree <- list(tree)
class(tree) <- "multiPhylo"
}
## Check the tree and data
cleaned_data <- clean.data(matrix, tree)
if(!is.na(cleaned_data$dropped_tips) || !is.na(cleaned_data$dropped_rows)) {
stop(paste0("Some names in the data or the tree(s) are not matching.\nYou can use dispRity::clean.data(", as.expression(match_call$data), ", ", as.expression(match_call$tree), ") to find out more."), call. = FALSE)
}
## Find the node labels (and eventually add them to the trees)
node_labels <- lapply(tree, get.node.labels)
## Split the trees and the labels
tree <- lapply(node_labels, `[[`, 1)
class(tree) <- "multiPhylo"
node_labels <- lapply(node_labels, `[[`, 2)
#########
##
## Handle the other options (threshold, brlen, verbose, parallel, output, estimation.details)
##
#########
## threshold
check.class(threshold, c("logical", "numeric"))
if(is(threshold, "logical")) {
if(threshold) {
## Use the relative threshold function
threshold.type <- "relative"
} else {
## Use no threshold (just max)
threshold.type <- "max"
}
} else {
## Use an absolute threshold
threshold.type <- "absolute"
}
## brlen multiplier
if(!missing(brlen.multiplier)) {
## Check class
brlen.multiplier_class <- check.class(brlen.multiplier, c("numeric", "list", "integer"))
if(brlen.multiplier_class == "list") {
## Check the class and length of each list element
for(one_tree in 1:length(tree)) {
check.class(brlen.multiplier[[one_tree]], c("numeric", "integer"), msg = paste0(": brlen.multiplier[[", one_tree, "]] must contain ", Nedge(tree[[one_tree]]), " numeric values (number of edges)."))
check.length(brlen.multiplier[[one_tree]], Nedge(tree[[one_tree]]), msg = paste0(": brlen.multiplier[[", one_tree, "]] must contain ", Nedge(tree[[one_tree]]), " numeric values (number of edges)."))
}
} else {
## check the length
check.length(brlen.multiplier, Nedge(tree[[1]]), msg = paste0(" must contain ", Nedge(tree[[1]]), " values (number of edges)."))
## Replicate it for each tree
brlen.multiplier <- replicate(length(tree), brlen.multiplier, simplify = FALSE)
}
## Multiply the branch lengths
multiply.brlen <- function(tree, multiplier) {
tree$edge.length <- tree$edge.length * multiplier
return(tree)
}
tree <- mapply(multiply.brlen, tree, brlen.multiplier, SIMPLIFY = FALSE)
class(tree) <- "multiPhylo"
}
## verbose
check.class(verbose, "logical")
## Set up parallel arguments
check.class(parallel, c("logical", "numeric", "integer"))
if(is.logical(parallel)) {
do_parallel <- ifelse(parallel, TRUE, FALSE)
## Get the number of cores
if(do_parallel) {
cores <- parallel::detectCores() - 1
} else {
cores <- 1
}
} else {
check.class(parallel, "numeric")
check.length(parallel, 1, " must be logical or the number of cores to use.")
do_parallel <- TRUE
## Get the number of cores
cores <- parallel
}
#########
## Handle the characters
#########
## Preparing the data
if(verbose) cat("Preparing the data:.")
## Detecting the continuous or discrete characters
character_is_continuous <- logical()
## Looping to allow dropping the levels from matrix
for(col in 1:ncol(matrix)) {
character_is_continuous <- c(character_is_continuous, is.numeric(matrix[, col, drop = TRUE]))
}
do_discrete <- do_continuous <- FALSE
continuous_char_ID <- discrete_char_ID <- numeric()
## Split the matrices by character types
if(any(character_is_continuous)) {
## Split the matrix for continuous characters
matrix_continuous <- matrix[, character_is_continuous]
n_characters_continuous <- sum(character_is_continuous)
do_continuous <- TRUE
continuous_char_ID <- which(character_is_continuous)
}
if(any(!character_is_continuous)) {
## Split the matrix for discrete characters
matrix_discrete <- matrix[, !character_is_continuous]
## Convert into characters
matrix_discrete <- apply(matrix_discrete, 2, as.character)
rownames(matrix_discrete) <- rownames(matrix)
n_characters_discrete <- sum(!character_is_continuous)
do_discrete <- TRUE
discrete_char_ID <- which(!character_is_continuous)
}
## Correct input class if all continuous
if(do_continuous && !do_discrete && input_class == "data.frame") {
matrix <- as.matrix(matrix)
input_class <- "matrix"
}
## output
if(missing(output)) {
output <- class(matrix)[1]
} else {
check.class(output, "character")
available_methods <- c("matrix", "list", "combined", "combined.list", "combined.matrix", "dispRity")
check.method(output, available_methods, "output option")
## Combined
if(output == "combined") {
output <- paste(output, class(matrix)[1], sep = ".")
}
## Check dispRity
if(output == "dispRity" && do_discrete) {
stop("Only ancestral state estimations for continuous characters can be converted into a dispRity object.\nSelect an other output method.", call. = FALSE)
}
}
## Set data.frame output to matrix
if(output == "data.frame") {
output <- "matrix"
}
## Handle the tokens
# special.tokens <- character(); special.behaviours <- list() ; warning("DEBUG: multi.ace")
if(do_discrete) {
## Special tokens
if(missing(special.tokens)) {
special.tokens <- character()
}
check.class(special.tokens, "character")
not.exist <- function(special.tokens, token) {
name_token <- names(special.tokens[token])
return(is.null(name_token) || is.na(name_token))
}
if(not.exist(special.tokens, "missing")) {
special.tokens["missing"] <- "\\?"
}
if(not.exist(special.tokens, "inapplicable")) {
special.tokens["inapplicable"] <- "\\-"
}
if(not.exist(special.tokens, "polymorphism")) {
special.tokens["polymorphism"] <- "\\&"
}
if(not.exist(special.tokens, "uncertainty")) {
special.tokens["uncertainty"] <- "\\/"
}
## Checking for the reserved character
reserved <- c("\\@", "@") %in% special.tokens
if(any(reserved)) {
stop("special.tokens cannot contain the character '@' since it is reserved for the dispRity::char.diff function.", call. = FALSE)
}
## Checking whether the special.tokens are unique
if(length(unique(special.tokens)) != length(special.tokens)) {
stop("special.tokens cannot contain duplicated tokens.", call. = FALSE)
}
## If any special token is NA, convert them as "N.A" temporarily
if(any(is.na(special.tokens))) {
matrix_discrete <- ifelse(is.na(matrix_discrete), "N.A", matrix_discrete)
special.tokens[is.na(special.tokens)] <- "N.A"
}
## Special behaviours
if(missing(special.behaviours)) {
special.behaviours <- list()
}
check.class(special.behaviours, "list")
if(is.null(special.behaviours$missing)) {
special.behaviours$missing <- function(x,y) return(y)
}
if(is.null(special.behaviours$inapplicable)) {
special.behaviours$inapplicable <- function(x,y) return(y)
}
if(is.null(special.behaviours$polymorphism)) {
special.behaviours$polymorphism <- function(x,y) return(strsplit(x, split = "\\&")[[1]])
}
if(is.null(special.behaviours$uncertainty)) {
special.behaviours$uncertainty <- function(x,y) return(strsplit(x, split = "\\/")[[1]])
}
## Match the behaviours and tokens in the same order
special.behaviours <- special.behaviours[sort(names(special.behaviours))]
special.tokens <- special.tokens[sort(names(special.tokens))]
## Translate the characters using the special behaviours
characters_discrete <- unlist(apply(do.call(cbind, apply(matrix_discrete, 2, convert.bitwise, special.tokens, special.behaviours, bitwise = FALSE)), 2, list), recursive = FALSE)
if(verbose) cat(".")
## Get a list of character states
characters_states <- lapply(characters_discrete, function(char) sort(unique(na.omit(unlist(char)))))
if(verbose) cat(".")
## Find invariant characters
invariants <- which(lengths(characters_states) < 2)
## Handle invariant characters
if(length(invariants) > 0) {
invariants_ID <- discrete_char_ID[invariants]
has_invariants <- TRUE
## Stop if they are only invariant characters
if(do_continuous) {
if(length(invariants) == n_characters_discrete) {
warning(match_call$data, " contains only invariant discrete characters.")
}
} else {
if(length(invariants) == n_characters_discrete) {
stop.call(call = match_call$data, " contains only invariant characters.")
}
}
## Remove the characters
invariant_characters <- characters_discrete[invariants]
invariant_characters_states <- characters_states[invariants]
characters_discrete <- characters_discrete[-invariants]
characters_states <- characters_states[-invariants]
## Tell the user
invar_IDs <- paste0(invariants_ID, collapse = ", ")
warning(paste0("The character", ifelse(length(invariants) > 1, "s", "") , " ", invar_IDs, ifelse(length(invariants) > 1, " are", " is"), " invariant (using the current special behaviours for special characters) and", ifelse(length(invariants) > 1, " are", " is"), " simply duplicated for each node."), call. = FALSE)
} else {
invariants_ID <- integer()
has_invariants <- FALSE
invariant_characters_states <- NULL
}
if(verbose) cat(".")
## Get the character tables
characters_tables <- mapply(convert.char.table, characters_discrete, characters_states, SIMPLIFY = FALSE)
if(verbose) cat(".")
}
## Handle the continuous characters
if(do_continuous) {
## Make the continuous characters as lists
characters_continuous <- apply(matrix_continuous, 2, list)
if(verbose) cat(".")
}
if(verbose) cat("Done.\n")
#########
## Handle the models for each character
#########
## Default (missing models)
if(missing(models)) {
if(do_discrete) {
models_discrete <- replicate(n_characters_discrete, "ER", simplify = FALSE)
}
if(do_continuous) {
models_continuous <- replicate(n_characters_continuous, set.continuous.args.ace(), simplify = FALSE)
}
} else {
## Input models
models_class <- check.class(models, c("character", "list", "matrix"))
## Models is a vector of models
if(models_class == "character") {
## Check the different models
available_models_discrete <- c("ER", "SYM", "ARD", "SUEDE", "SRD")
available_models_continuous <- c("BM", "REML", "ML", "pic")
## Unique model
if(length(models) == 1) {
if(do_discrete && !do_continuous) {
check.method(models, available_models_discrete, msg = "model applied to all discrete characters")
models_discrete <- replicate(n_characters_discrete, models, simplify = FALSE)
}
if(!do_discrete && do_continuous) {
check.method(models, available_models_continuous, msg = "model applied to all continuous characters")
models_continuous <- set.continuous.args.ace.models(models, n = n_characters_continuous)
}
if(do_discrete && do_continuous) {
stop("Only one model is specified but both discrete and continuous characters are detected.", call. = FALSE)
}
} else {
## Vector of models
if(length(models) != n_characters) {
stop(paste0("Incorrect number of models specified: ", length(models), " models for ", n_characters, " characters."), call. = FALSE)
} else {
check.method(models, c(available_models_discrete, available_models_continuous), msg = "models applied to characters")
## Check models per character types
## Discrete
if(do_discrete) {
if(sum(models %in% available_models_discrete) != n_characters_discrete) {
stop(paste0("Incorrect number of models specified: ", sum(models %in% available_models_discrete), " models for ", n_characters, " discrete characters."), call. = FALSE)
} else {
## Discrete models (valid)
models_discrete <- as.list(models[models %in% available_models_discrete])
}
}
## Continuous
if(do_continuous) {
if(sum(models %in% available_models_continuous) != n_characters_continuous) {
stop(paste0("Incorrect number of models specified: ", sum(models %in% available_models_continuous), " models for ", n_characters, " continuous characters."), call. = FALSE)
} else {
## Continuous models (valid)
models_continuous <- sapply(models[models %in% available_models_continuous], set.continuous.args.ace.models, n = 1)
}
}
}
}
}
## Models is a transition matrix (discrete only)
if(models_class == "matrix") {
if(do_continuous) {
stop("Transition matrices can be used as models only for discrete characters.", call. = FALSE)
} else {
models_discrete <- replicate(n_characters_discrete, models, simplify = FALSE)
}
}
## Models is a complicated list
if(models_class == "list") {
if(length(models) == 1) {
if(do_discrete && do_continuous) {
stop("Only one model is specified but both discrete and continuous characters are detected.", call. = FALSE)
}
## Set the models for discrete
if(do_discrete) {
models_discrete <- replicate(n_characters_discrete, models, simplify = FALSE)
}
## Set the models for continuous
if(do_continuous) {
models_continuous <- replicate(n_characters_continuous, do.call(set.continuous.args.ace, models), simplify = FALSE)
}
}
## Models is a list of models
check.length(models, n_characters, msg = paste0(" list must be the same length as the number of characters (", n_characters, ")."))
## Separate the models per type
if(do_discrete) {
models_discrete <- models[discrete_char_ID]
}
if(do_continuous) {
models_continuous <- models[continuous_char_ID]
## Format correctly
models_continuous <- lapply(models_continuous, function(x) do.call(set.continuous.args.ace, x))
}
}
## Remove invariant characters
if(do_discrete && has_invariants) {
## Remove the models
models_discrete <- models_discrete[-invariants]
}
}
if(do_discrete && has_invariants) {
models_discrete <- models_discrete[-invariants]
}
#########
##
## Handle the options
##
#########
if(missing(options.args)) {
## No options
options.ace <- options.castor <- options.args <- NULL
} else {
## must be list with names
check.class(options.args, "list")
options_error <- "options.args must be an unambiguous named list of options for castor::asr_mk_model() or ape::ace()."
## Check the available names
options_avail <- c(names(formals(castor::asr_mk_model)), names(formals(ape::ace)))
if(is.null(names(options.args)) || !all(names(options.args) %in% options_avail)) {
stop(options_error, call. = FALSE)
}
## Sort the options
options.ace <- options.castor <- NULL
if(do_continuous) {
options.ace <- options.args[names(options.args) %in% names(formals(ape::ace))]
if(length(options.ace) == 0) {
options.ace <- NULL
}
}
if(do_discrete) {
options.castor <- options.args[names(options.args) %in% names(formals(castor::asr_mk_model))]
if(length(options.castor) == 0) {
options.castor <- NULL
}
}
}
## Check the estimation details
if(!is.null(estimation.details)) {
## The return args from castor::asr_mk_model (1.6.6)
return_args_discrete <- c("success", "Nstates", "transition_matrix", "loglikelihood", "ancestral_likelihoods")
return_args_continuous <- c("CI95", "sigma2", "loglik")
if(do_discrete && do_continuous) {
return_args <- c(return_args_discrete, return_args_continuous)
} else {
if(do_discrete) {
return_args <- return_args_discrete
}
if(do_continuous) {
return_args <- return_args_continuous
}
}
## Check the requested details
check.method(estimation.details, return_args, msg = "estimation.details")
} else {
return_args_discrete <- return_args_continuous <- NULL
}
#########
##
## set the arguments for calls
##
#########
## Setting the continuous characters call
if(do_continuous) {
## Create the character arguments
character_continuous_args <- mapply(function(character, ace.args, options = NULL) return(c(x = character, ace.args, options)), characters_continuous, models_continuous, MoreArgs = list(options = options.ace), SIMPLIFY = FALSE)
## Create the character and tree arguments
tree_character_continuous_args <- list()
for(one_tree in 1:length(tree)) {
tree_character_continuous_args[[one_tree]] <- lapply(character_continuous_args, function(character, tree) {character$phy <- tree; return(character)}, tree[[one_tree]])
}
## Set verbose fun
if(verbose) {
fun_continuous <- function(...) {
cat(".")
return(ape::ace(...))
}
} else {
fun_continuous <- ape::ace
}
}
## Setting the discrete characters call
if(do_discrete) {
## Set the details to return (if any)
if(any(return_args_discrete %in% estimation.details)) {
details_out <- return_args_discrete[return_args_discrete %in% estimation.details]
} else {
details_out <- NULL
}
## Set up the arguments for one tree
character_discrete_args <- mapply(make.args, characters_tables, characters_states, models_discrete, MoreArgs = list(estimation.details = details_out, castor.options = options.castor), SIMPLIFY = FALSE)
## Create the character and tree arguments
tree_character_discrete_args <- list()
for(one_tree in 1:length(tree)) {
tree_character_discrete_args[[one_tree]] <- lapply(character_discrete_args, function(character, tree) {character$tree <- tree; return(character)}, tree[[one_tree]])
}
}
#########
##
## run the calls
##
#########
if(do_parallel) {
## Remove verbose
if(verbose) {
cat(paste0("Running the estimation for ", length(tree), " tree", ifelse(length(tree) > 1, "s ", " "), "using ", cores, " core", ifelse(cores == 1, "...", "s...")))
was_verbose <- TRUE
verbose <- FALSE
} else {
was_verbose <- FALSE
}
## Set up the cluster
cluster <- parallel::makeCluster(cores)
## Get the current environment
current_env <- environment()
export_arguments_list <- export_functions_list <- character()
if(do_discrete) {
## Get the export lists
export_arguments_list <- c("tree_character_discrete_args",
"special.tokens",
"invariants",
"threshold.type",
"threshold",
"verbose",
"characters_states",
"invariant_characters_states")
export_functions_list <- c("one.tree.ace",
"castor.ace",
"tree.data.update",
"add.state.names",
"translate.likelihood")
}
if(do_continuous) {
export_arguments_list <- c(export_arguments_list, "tree_character_continuous_args")
export_functions_list <- c(export_functions_list, "fun_continuous")
}
## Export from this environment
parallel::clusterExport(cluster, c(export_arguments_list, export_functions_list), envir = current_env)
## Call the cluster
if(do_discrete) {
discrete_estimates <- parLapply(cl = cluster, tree_character_discrete_args, one.tree.ace, special.tokens, invariants, characters_states, threshold.type, threshold, invariant_characters_states, verbose)
}
if(do_continuous) {
continuous_estimates <- parLapply(cl = cluster, tree_character_continuous_args, lapply, function(x) do.call(fun_continuous, x))
## Remove the ugly call
continuous_estimates <- lapply(continuous_estimates, lapply, function(x) {x$call <- "ape::ace"; return(x)})
## Add node labels
# TODO: to be removed if ape update
add.nodes <- function(obj, phy) {
## Adding node labels to $ace and $CI95 (if available)
options(warn = -1)
names <- as.integer(names(obj$ace))
options(warn = 0)
if(!is.null(phy$node.label) && !is.na(names[1])) {
ordered_node_labels <- phy$node.label[c(as.integer(names(obj$ace))- Ntip(phy))]
names(obj$ace) <- ordered_node_labels
if(!is.null(obj$CI95)) {
rownames(obj$CI95) <- ordered_node_labels
}
}
return(obj)
}
for(one_tree in 1:length(tree)) {
continuous_estimates[[one_tree]] <- lapply(continuous_estimates[[one_tree]], add.nodes, phy = tree[[one_tree]])
}
}
## Stop the cluster
parallel::stopCluster(cluster)
## Reactivate the verbose
if(was_verbose) {
cat("Done.")
verbose <- TRUE
}
} else {
## Make the functions verbose
if(verbose) cat("Running ancestral states estimations:")
## Run the continuous characters
if(do_continuous) {
## Run all the ace
continuous_estimates <- lapply(tree_character_continuous_args, lapply, function(x) do.call(fun_continuous, x))
## Remove the ugly call
continuous_estimates <- lapply(continuous_estimates, lapply, function(x) {x$call <- "ape::ace"; return(x)})
## Add node labels
# TODO: to be removed if ape update
add.nodes <- function(obj, phy) {
## Adding node labels to $ace and $CI95 (if available)
options(warn = -1)
names <- as.integer(names(obj$ace))
options(warn = 0)
if(!is.null(phy$node.label) && !is.na(names[1])) {
ordered_node_labels <- phy$node.label[c(as.integer(names(obj$ace))- Ntip(phy))]
names(obj$ace) <- ordered_node_labels
if(!is.null(obj$CI95)) {
rownames(obj$CI95) <- ordered_node_labels
}
}
return(obj)
}
for(one_tree in 1:length(tree)) {
continuous_estimates[[one_tree]] <- lapply(continuous_estimates[[one_tree]], add.nodes, phy = tree[[one_tree]])
}
}
## Run the discrete characters
if(do_discrete) {
## Run all the ace for discrete
discrete_estimates <- lapply(tree_character_discrete_args, one.tree.ace, special.tokens, invariants, characters_states, threshold.type, threshold, invariant_characters_states, verbose)
}
if(verbose) cat("Done.\n")
}
#########
##
## handle the outputs
##
#########
## Handle the continuous characters
if(do_continuous) {
## Get the results in a matrix format
results_continuous <- lapply(lapply(continuous_estimates, lapply, `[[`, "ace"), function(x) do.call(cbind, x))
## Get the details for continuous
if(any(return_args_continuous %in% estimation.details)) {
## Get which details to grep
details_out <- return_args_continuous[return_args_continuous %in% estimation.details]
## Get the details
details_continuous <- lapply(continuous_estimates, lapply, function(x, details_out) return(x[details_out]), details_out)
} else {
details_continuous <- NULL
}
}
if(do_discrete) {
## Get the results in a matrix format
results_discrete <- lapply(lapply(discrete_estimates, `[[`, 1), function(x) do.call(cbind, x))
## Get the details
details_discrete <- lapply(discrete_estimates, `[[`, 2)
}
## Handle output
## Combine the traits
if(do_discrete && do_continuous) {
## Combine the traits
results_out <- mapply(bind.characters, results_continuous, results_discrete,
MoreArgs = list(order = list("continuous" = continuous_char_ID, "discrete" = unique(c(discrete_char_ID, invariants_ID)))),
SIMPLIFY = FALSE)
## Return the details per characters
if(is.null(details_continuous)) {
## Make a list of nulls
details_continuous <- replicate(length(tree), lapply(as.list(1:n_characters_continuous), function(x) return(NULL)), simplify = FALSE)
}
if(is.null(details_discrete[[1]])) {
## Make a list of nulls
details_discrete <- replicate(length(tree), lapply(as.list(1:n_characters_discrete), function(x) return(NULL)), simplify = FALSE)
}
details_out <- mapply(bind.details, details_continuous, details_discrete,
MoreArgs = list(order = list("continuous" = continuous_char_ID, "discrete" = unique(c(discrete_char_ID, invariants_ID)))),
SIMPLIFY = FALSE)
}
if(do_discrete && !do_continuous) {
results_out <- results_discrete
details_out <- details_discrete
}
if(do_continuous && !do_discrete) {
results_out <- results_continuous
details_out <- details_continuous
}
## Handle output
output_return <- switch(output,
matrix = results_out,
list = lapply(results_out, make.list),
combined.matrix = lapply(results_out, add.tips, matrix = matrix),
combined.list = lapply(lapply(results_out, add.tips, matrix = matrix), make.list),
dispRity = make.dispRity(data = lapply(results_out, add.tips, matrix = matrix), tree = tree)
)
## Results out
if(is.null(estimation.details)) {
if(length(tree) == 1) {
return(output_return[[1]])
} else {
return(output_return)
}
} else {
if(length(tree) == 1) {
return(list(estimations = output_return[[1]], details = details_out[[1]]))
} else {
return(list(estimations = output_return, details = details_out))
}
}
}
# matrix <- matrix(sample(c("0", "1", "0&1", NA),
# 500, replace = TRUE), nrow = 10, dimnames =
# list(paste0("t", 1:10), c()))
# trees <- rmtree(10, 10)
# claddis.wrapper <- function(tree, matrix) {
# return(estimate_ancestral_states(matrix, tree,
# estimate_all_nodes = TRUE))
# }
# # Reformat for use elsewhere in Claddis:
# Claddis_matrix <- build_cladistic_matrix(matrix)
# serial_start <- Sys.time()
# ancestral_states <- multi.ace(data = matrix, tree = trees, special.tokens = c("missing" = NA))
# serial_end <- Sys.time()
# ## The same example but running in parallel
# parallel_start <- Sys.time()
# ancestral_states <- multi.ace(matrix, trees, special.tokens = c("missing" = NA))
# parallel_end <- Sys.time()
# claddis_start <- Sys.time()
# test <- lapply(trees, claddis.wrapper, Claddis_matrix)
# claddis_end <- Sys.time()
# serial_time <- serial_end-serial_start
# parallel_time <- parallel_end-parallel_start
# claddis_time <- claddis_end-claddis_start
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.