Nothing
#' @title Ancestral states estimations with multiple trees
#'
#' @description Fast ancestral states estimations run on multiple trees using the Mk model from castor::asr_mk_model.
#'
#' @param data A \code{matrix}, \code{data.frame} or \code{list} with the characters for each taxa.
#' @param tree A \code{phylo} or \code{mutiPhylo} object (if the \code{tree} argument contains node labels, they will be used to name the output).
#' @param models A \code{character} vector, unambiguous named \code{list} or \code{matrix} to be passed as model arguments to \code{castor::asr_mk_model} or \code{ape::ace} (see details).
#' @param threshold either \code{logical} for applying a relative threshold (\code{TRUE} - default) or no threshold (\code{FALSE}) or a \code{numeric} value of the threshold (e.g. 0.95). See details.
#' @param special.tokens optional, a named \code{vector} of special tokens to be passed to \code{\link[base]{grep}} (make sure to protect the character with \code{"\\\\"}). By default \code{special.tokens <- c(missing = "\\\\?", inapplicable = "\\\\-", polymorphism = "\\\\&", uncertainty = "\\\\/")}. Note that \code{NA} values are not compared and that the symbol "@" is reserved and cannot be used.
#' @param special.behaviours optional, a \code{list} of one or more functions for a special behaviour for \code{special.tokens}. See details.
#' @param brlen.multiplier optional, a vector of branch length modifiers (e.g. to convert time branch length in changes branch length) or a list of vectors (the same length as \code{tree}).
#' @param verbose \code{logical}, whether to be verbose (\code{TRUE}) or not (\code{FALSE} - default).
#' @param parallel Either a \code{logical}, whether to use parallel algorithm (\code{TRUE}) or not (\code{FALSE} - default); or directly an \code{integer} indicating the number of cores to use (note that if \code{parallel = 1}, one core will be used but the parallel integration will still be called).
#' @param output optional, see \code{Value} section below.
#' @param options.args optional, a named list of options to be passed to function called by \code{castor::asr_mk_model}.
#' @param estimation.details optional, whether to also return the details for each estimation as returned by \code{castor::asr_mk_model} or \code{ape::ace}. This argument can be left \code{NULL} (default) or be any combination of the elements returned by \code{castor::asr_mk_model} or \code{ape::ace} (e.g. \code{c("loglikelihood", "transition_matrix", "CI95")}).
#'
#' @details
#'
#' Depending on the type of characters \code{models} argument can be either:
#' \itemize{
#' \item the name of a single model to apply to all characters (if all characters are discrete or all are continuous); see below for the list of available names. For example \code{models = "ER"} applies the Equal Rates model to all characters (assuming they are all discrete characters).
#' \item a vector of model names to apply to different type of characters (see below for the list). For example \code{models = c("ER", "ER", "BM")} applies the Equal Rates model to the two first characters (discrete) and the \code{"BM"} model to the third character (continuous).
#' \item a transition \code{"matrix"} to be applied to all characters (if discrete). For example \code{models = matrix(0.2, 2, 2)}.
#' \item an single named list of arguments to be applied to all characters by passing it to \code{ape::ace} (if continuous). For example \code{models = list(method = "GLS", corStruct = corBrownian(1, my_tree))}.
#' \item an un-ambiguous list of arguments to be passed to either \code{castor::asr_mk_model} (discrete characters) or \code{ape::ace} (continuous characters). For example \code{models = list("char1" = list(transition_matrix = matrix(0.2, 2, 2)), "char2" = list(method = "GLS", corStruct = corBrownian(1, my_tree)))} to be specifically passed to the characters named "char1" and "char2"
#'}
#'
#' The available built-in models for discrete characters in \code{castor::asr_mk_model} are:
#' \itemize{
#' \item \code{"ER"} for all equal rates
#' \item \code{"SYM"} for symmetric rates
#' \item \code{"ARD"} all rates are different
#' \item \code{"SUEDE"} equal stepwise transitions (e.g. for meristic/counting characters)
#' \item \code{"SRD"} different stepwise transitions
#' }
#' See directly \code{castor::asr_mk_model} for more models.
#'
#' The available built-in models and methods for continuous characters in \code{ape::ace} are:
#' \itemize{
#' \item \code{"BM"} model: for a default Brownian Motion with the "REML" method
#' \item \code{"REML"} method: for a default Brownian Motion with the "REML" method (same as above)
#' \item \code{"ML"} method: for a default Brownian Motion with the "ML" method
#' \item \code{"pic"} method: for a default Brownian Motion with the "pic" (least squared) method
#'}
#'
#' The \code{threshold} option allows to convert ancestral states likelihoods into discrete states. When \code{threshold = FALSE}, the ancestral state estimated is the one with the highest likelihood (or at random if likelihoods are equal). When \code{threshold = TRUE}, the ancestral state estimated are all the ones that are have a scaled likelihood greater than the maximum observed scaled likelihood minus the inverse number of possible states (i.e. \code{select_state >= (max(likelihood) - 1/n_states)}). This option makes the threshold selection depend on the number of states (i.e. if there are more possible states, a lower scaled likelihood for the best state is expected). Finally using a numerical value for the threshold option (e.g. \code{threshold = 0.95}) will simply select only the ancestral states estimates with a scaled likelihood equal or greater than the designated value. This option makes the threshold selection absolute. Regardless, if more than one value is select, the uncertainty token (\code{special.tokens["uncertainty"]}) will be used to separate the states. If no value is selected, the uncertainty token will be use between all observed characters (\code{special.tokens["uncertainty"]}).
#'
#' \code{special.behaviours} allows to generate a special rule for the \code{special.tokens}. The functions should can take the arguments \code{character, all_states} with \code{character} being the character that contains the special token and \code{all_states} for the character (which is automatically detected by the function). By default, missing data returns and inapplicable returns all states, and polymorphisms and uncertainties return all present states.
#' \itemize{
#' \item{\code{missing = function(x,y) y}}
#' \item{\code{inapplicable = function(x,y) y}}
#' \item{\code{polymorphism = function(x,y) strsplit(x, split = "\\\\&")[[1]]}}
#' \item{\code{uncertainty = function(x,y) strsplit(x, split = "\\\\/")[[1]]}}
#' }
#'
#' Functions in the list must be named following the special token of concern (e.g. \code{missing}), have only \code{x, y} as inputs and a single output a single value (that gets coerced to \code{integer} automatically). For example, the special behaviour for the special token \code{"?"} can be coded as: \code{special.behaviours = list(missing = function(x, y) return(NA)} to make ignore the character for taxa containing \code{"?"}.
#'
#' When using the parallel option (either through using \code{parallel = TRUE} by using the number of available cores minus on or manually setting the number of cores - e.g. \code{parallel = 5}), the \code{castor::asr_mk_model} function will use the designated number of cores (using the option \code{Nthreads = <requested_number_of_cores>}). Additionally, if the input \code{tree} is a \code{"multiPhylo"} object, the trees will be run in parallel for each number of cores, thus decreasing computation time accordingly (e.g. if 3 cores are requested and \code{tree} contains 12 \code{"phylo"} objects, 4 different \code{"phylo"} objects will be run in parallel on the 3 cores making the calculation around 3 times faster).
#'
#' @return
#' Returns a \code{"matrix"} or \code{"list"} of ancestral states. By default, the function returns the ancestral states in the same format as the input \code{matrix}. This can be changed using the option \code{output = "matrix"} or \code{"list"} to force the class of the output.
#' To output the combined ancestral states and input, you can use \code{"combined"} (using the input format) or \code{"combined.matrix"} or \code{"combined.list"}.
#' If using continuous characters only, you can use the output option \code{"dispRity"} to directly output a usable \code{dispRity} object with all trees and all the data (estimated and input).
#' \emph{NOTE} that if the input data had multiple character types (continuous and discrete) and that \code{"matrix"} or \code{"combined.matrix"} output is requested, the function returns a \code{"data.frame"}.
#'
#' @examples
#' set.seed(42)
#' ## A simple example:
#' ## A random tree with 10 tips
#' tree <- rcoal(10)
#' ## Setting up the parameters
#' my_rates = c(rgamma, rate = 10, shape = 5)
#'
#' ## A random Mk matrix (10*50)
#' matrix_simple <- sim.morpho(tree, characters = 50, model = "ER", rates = my_rates,
#' invariant = FALSE)
#'
#' ## Run a basic ancestral states estimations
#' ancestral_states <- multi.ace(matrix_simple, tree)
#' ancestral_states[1:5, 1:5]
#'
#' ## A more complex example
#' ## Create a multiple list of 5 trees
#' multiple_trees <- rmtree(5, 10)
#'
#' ## Modify the matrix to contain missing and special data
#' matrix_complex <- matrix_simple
#' matrix_complex[sample(1:length(matrix_complex), 50)] <- "-"
#' matrix_complex[sample(1:length(matrix_complex), 50)] <- "0%2"
#' matrix_complex[sample(1:length(matrix_complex), 50)] <- "?"
#' matrix_complex[1:5,1:5]
#'
#' ## Set a list of extra special tokens
#' my_spec_tokens <- c("weirdtoken" = "%")
#'
#' ## Set some special behaviours for the "weirdtoken" and for "-" and "?"
#' my_spec_behaviours <- list()
#' ## Inapplicable tokens "-" are ignored
#' my_spec_behaviours$inapplicable <- function(x,y) return(NA)
#' ## Missing tokens "?" are considered as all states
#' my_spec_behaviours$missing <- function(x,y) return(y)
#' ## Weird tokens are considered as state 0 and 3
#' my_spec_behaviours$weirdtoken <- function(x,y) return(c(1,2))
#'
#' ## Create a random branch length modifier to apply to each tree
#' branch_lengths <- rnorm(18)^2
#'
#' ## Setting a list of model ("ER" for the 25 first characters and then "SYM")
#' my_models <- c(rep("ER", 25), rep("SYM", 25))
#'
#' ## Run the ancestral states on all the tree with multiple options
#' ancestral_states <- multi.ace(matrix_complex, multiple_trees,
#' verbose = TRUE,
#' models = my_models,
#' threshold = 0.95,
#' special.tokens = my_spec_tokens,
#' special.behaviours = my_spec_behaviours,
#' brlen.multiplier = branch_lengths,
#' output = "combined.matrix")
#'
#' ## The results for the the two first characters for the first tree
#' ancestral_states[[1]][, 1:2]
#'
#' \dontrun{
#' ## The same example but running in parallel
#' ancestral_states <- multi.ace(matrix_complex, multiple_trees,
#' verbose = TRUE,
#' models = my_models,
#' threshold = 0.95,
#' special.tokens = my_spec_tokens,
#' special.behaviours = my_spec_behaviours,
#' brlen.multiplier = branch_lengths,
#' output = "combined.matrix",
#' parallel = TRUE)
#' }
#' @seealso
#' \code{castor::asr_mk_model}, \code{char.diff}
# \code{fit.ace.model},
#'
#' @author Thomas Guillerme
#' @export
multi.ace <- function(data, tree, models, threshold = TRUE, special.tokens, special.behaviours, brlen.multiplier, verbose = FALSE, parallel = FALSE, output, options.args, estimation.details = NULL) {
match_call <- match.call()
## SANITIZING
## matrix
matrix <- data
input_class <- check.class(matrix, c("matrix", "list", "data.frame"))
## Convert the matrix if not a list
class_matrix <- class(matrix)
if(class_matrix[[1]] == "list") {
matrix <- do.call(rbind, matrix)
}
## Get the characters
n_characters <- ncol(matrix)
## tree
check.class(tree, c("phylo", "multiPhylo"))
if(is(tree, "phylo")) {
tree <- list(tree)
class(tree) <- "multiPhylo"
}
## Check the tree and data
cleaned_data <- clean.data(matrix, tree)
if(!is.na(cleaned_data$dropped_tips) || !is.na(cleaned_data$dropped_rows)) {
stop(paste0("Some names in the data or the tree(s) are not matching.\nYou can use dispRity::clean.data(", as.expression(match_call$data), ", ", as.expression(match_call$tree), ") to find out more."), call. = FALSE)
}
## Find the node labels (and eventually add them to the trees)
node_labels <- lapply(tree, get.node.labels)
## Split the trees and the labels
tree <- lapply(node_labels, `[[`, 1)
class(tree) <- "multiPhylo"
node_labels <- lapply(node_labels, `[[`, 2)
#########
##
## Handle the other options (threshold, brlen, verbose, parallel, output, estimation.details)
##
#########
## threshold
check.class(threshold, c("logical", "numeric"))
if(is(threshold, "logical")) {
if(threshold) {
## Use the relative threshold function
threshold.type <- "relative"
} else {
## Use no threshold (just max)
threshold.type <- "max"
}
} else {
## Use an absolute threshold
threshold.type <- "absolute"
}
## brlen multiplier
if(!missing(brlen.multiplier)) {
## Check class
brlen.multiplier_class <- check.class(brlen.multiplier, c("numeric", "list", "integer"))
if(brlen.multiplier_class == "list") {
## Check the class and length of each list element
for(one_tree in 1:length(tree)) {
check.class(brlen.multiplier[[one_tree]], c("numeric", "integer"), msg = paste0(": brlen.multiplier[[", one_tree, "]] must contain ", Nedge(tree[[one_tree]]), " numeric values (number of edges)."))
check.length(brlen.multiplier[[one_tree]], Nedge(tree[[one_tree]]), msg = paste0(": brlen.multiplier[[", one_tree, "]] must contain ", Nedge(tree[[one_tree]]), " numeric values (number of edges)."))
}
} else {
## check the length
check.length(brlen.multiplier, Nedge(tree[[1]]), msg = paste0(" must contain ", Nedge(tree[[1]]), " values (number of edges)."))
## Replicate it for each tree
brlen.multiplier <- replicate(length(tree), brlen.multiplier, simplify = FALSE)
}
## Multiply the branch lengths
multiply.brlen <- function(tree, multiplier) {
tree$edge.length <- tree$edge.length * multiplier
return(tree)
}
tree <- mapply(multiply.brlen, tree, brlen.multiplier, SIMPLIFY = FALSE)
class(tree) <- "multiPhylo"
}
## verbose
check.class(verbose, "logical")
## Set up parallel arguments
check.class(parallel, c("logical", "numeric", "integer"))
if(is.logical(parallel)) {
do_parallel <- ifelse(parallel, TRUE, FALSE)
## Get the number of cores
if(do_parallel) {
cores <- parallel::detectCores() - 1
} else {
cores <- 1
}
} else {
check.class(parallel, "numeric")
check.length(parallel, 1, " must be logical or the number of cores to use.")
do_parallel <- TRUE
## Get the number of cores
cores <- parallel
}
#########
## Handle the characters
#########
## Preparing the data
if(verbose) cat("Preparing the data:.")
## Detecting the continuous or discrete characters
character_is_continuous <- logical()
## Looping to allow dropping the levels from matrix
for(col in 1:ncol(matrix)) {
character_is_continuous <- c(character_is_continuous, is.numeric(matrix[, col, drop = TRUE]))
}
do_discrete <- do_continuous <- FALSE
continuous_char_ID <- discrete_char_ID <- numeric()
## Split the matrices by character types
if(any(character_is_continuous)) {
## Split the matrix for continuous characters
matrix_continuous <- matrix[, character_is_continuous]
n_characters_continuous <- sum(character_is_continuous)
do_continuous <- TRUE
continuous_char_ID <- which(character_is_continuous)
}
if(any(!character_is_continuous)) {
## Split the matrix for discrete characters
matrix_discrete <- matrix[, !character_is_continuous]
## Convert into characters
matrix_discrete <- apply(matrix_discrete, 2, as.character)
rownames(matrix_discrete) <- rownames(matrix)
n_characters_discrete <- sum(!character_is_continuous)
do_discrete <- TRUE
discrete_char_ID <- which(!character_is_continuous)
}
## Correct input class if all continuous
if(do_continuous && !do_discrete && input_class == "data.frame") {
matrix <- as.matrix(matrix)
input_class <- "matrix"
}
## output
if(missing(output)) {
output <- class(matrix)[1]
} else {
check.class(output, "character")
available_methods <- c("matrix", "list", "combined", "combined.list", "combined.matrix", "dispRity")
check.method(output, available_methods, "output option")
## Combined
if(output == "combined") {
output <- paste(output, class(matrix)[1], sep = ".")
}
## Check dispRity
if(output == "dispRity" && do_discrete) {
stop("Only ancestral state estimations for continuous characters can be converted into a dispRity object.\nSelect an other output method.", call. = FALSE)
}
}
## Set data.frame output to matrix
if(output == "data.frame") {
output <- "matrix"
}
## Handle the tokens
# special.tokens <- character(); special.behaviours <- list() ; warning("DEBUG: multi.ace")
if(do_discrete) {
## Special tokens
if(missing(special.tokens)) {
special.tokens <- character()
}
check.class(special.tokens, "character")
not.exist <- function(special.tokens, token) {
name_token <- names(special.tokens[token])
return(is.null(name_token) || is.na(name_token))
}
if(not.exist(special.tokens, "missing")) {
special.tokens["missing"] <- "\\?"
}
if(not.exist(special.tokens, "inapplicable")) {
special.tokens["inapplicable"] <- "\\-"
}
if(not.exist(special.tokens, "polymorphism")) {
special.tokens["polymorphism"] <- "\\&"
}
if(not.exist(special.tokens, "uncertainty")) {
special.tokens["uncertainty"] <- "\\/"
}
## Checking for the reserved character
reserved <- c("\\@", "@") %in% special.tokens
if(any(reserved)) {
stop("special.tokens cannot contain the character '@' since it is reserved for the dispRity::char.diff function.", call. = FALSE)
}
## Checking whether the special.tokens are unique
if(length(unique(special.tokens)) != length(special.tokens)) {
stop("special.tokens cannot contain duplicated tokens.", call. = FALSE)
}
## If any special token is NA, convert them as "N.A" temporarily
if(any(is.na(special.tokens))) {
matrix_discrete <- ifelse(is.na(matrix_discrete), "N.A", matrix_discrete)
special.tokens[is.na(special.tokens)] <- "N.A"
}
## Special behaviours
if(missing(special.behaviours)) {
special.behaviours <- list()
}
check.class(special.behaviours, "list")
if(is.null(special.behaviours$missing)) {
special.behaviours$missing <- function(x,y) return(y)
}
if(is.null(special.behaviours$inapplicable)) {
special.behaviours$inapplicable <- function(x,y) return(y)
}
if(is.null(special.behaviours$polymorphism)) {
special.behaviours$polymorphism <- function(x,y) return(strsplit(x, split = "\\&")[[1]])
}
if(is.null(special.behaviours$uncertainty)) {
special.behaviours$uncertainty <- function(x,y) return(strsplit(x, split = "\\/")[[1]])
}
## Match the behaviours and tokens in the same order
special.behaviours <- special.behaviours[sort(names(special.behaviours))]
special.tokens <- special.tokens[sort(names(special.tokens))]
## Translate the characters using the special behaviours
characters_discrete <- unlist(apply(do.call(cbind, apply(matrix_discrete, 2, convert.bitwise, special.tokens, special.behaviours, bitwise = FALSE)), 2, list), recursive = FALSE)
if(verbose) cat(".")
## Get a list of character states
characters_states <- lapply(characters_discrete, function(char) sort(unique(na.omit(unlist(char)))))
if(verbose) cat(".")
## Find invariant characters
invariants <- which(lengths(characters_states) < 2)
## Handle invariant characters
if(length(invariants) > 0) {
invariants_ID <- discrete_char_ID[invariants]
has_invariants <- TRUE
## Stop if they are only invariant characters
if(do_continuous) {
if(length(invariants) == n_characters_discrete) {
warning(match_call$data, " contains only invariant discrete characters.")
}
} else {
if(length(invariants) == n_characters_discrete) {
stop.call(call = match_call$data, " contains only invariant characters.")
}
}
## Remove the characters
invariant_characters <- characters_discrete[invariants]
invariant_characters_states <- characters_states[invariants]
characters_discrete <- characters_discrete[-invariants]
characters_states <- characters_states[-invariants]
## Tell the user
invar_IDs <- paste0(invariants_ID, collapse = ", ")
warning(paste0("The character", ifelse(length(invariants) > 1, "s", "") , " ", invar_IDs, ifelse(length(invariants) > 1, " are", " is"), " invariant (using the current special behaviours for special characters) and", ifelse(length(invariants) > 1, " are", " is"), " simply duplicated for each node."), call. = FALSE)
} else {
invariants_ID <- integer()
has_invariants <- FALSE
invariant_characters_states <- NULL
}
if(verbose) cat(".")
## Get the character tables
characters_tables <- mapply(convert.char.table, characters_discrete, characters_states, SIMPLIFY = FALSE)
if(verbose) cat(".")
}
## Handle the continuous characters
if(do_continuous) {
## Make the continuous characters as lists
characters_continuous <- apply(matrix_continuous, 2, list)
if(verbose) cat(".")
}
if(verbose) cat("Done.\n")
#########
## Handle the models for each character
#########
## Default (missing models)
if(missing(models)) {
if(do_discrete) {
models_discrete <- replicate(n_characters_discrete, "ER", simplify = FALSE)
}
if(do_continuous) {
models_continuous <- replicate(n_characters_continuous, set.continuous.args.ace(), simplify = FALSE)
}
} else {
## Input models
models_class <- check.class(models, c("character", "list", "matrix"))
## Models is a vector of models
if(models_class == "character") {
## Check the different models
available_models_discrete <- c("ER", "SYM", "ARD", "SUEDE", "SRD")
available_models_continuous <- c("BM", "REML", "ML", "pic")
## Unique model
if(length(models) == 1) {
if(do_discrete && !do_continuous) {
check.method(models, available_models_discrete, msg = "model applied to all discrete characters")
models_discrete <- replicate(n_characters_discrete, models, simplify = FALSE)
}
if(!do_discrete && do_continuous) {
check.method(models, available_models_continuous, msg = "model applied to all continuous characters")
models_continuous <- set.continuous.args.ace.models(models, n = n_characters_continuous)
}
if(do_discrete && do_continuous) {
stop("Only one model is specified but both discrete and continuous characters are detected.", call. = FALSE)
}
} else {
## Vector of models
if(length(models) != n_characters) {
stop(paste0("Incorrect number of models specified: ", length(models), " models for ", n_characters, " characters."), call. = FALSE)
} else {
check.method(models, c(available_models_discrete, available_models_continuous), msg = "models applied to characters")
## Check models per character types
## Discrete
if(do_discrete) {
if(sum(models %in% available_models_discrete) != n_characters_discrete) {
stop(paste0("Incorrect number of models specified: ", sum(models %in% available_models_discrete), " models for ", n_characters, " discrete characters."), call. = FALSE)
} else {
## Discrete models (valid)
models_discrete <- as.list(models[models %in% available_models_discrete])
}
}
## Continuous
if(do_continuous) {
if(sum(models %in% available_models_continuous) != n_characters_continuous) {
stop(paste0("Incorrect number of models specified: ", sum(models %in% available_models_continuous), " models for ", n_characters, " continuous characters."), call. = FALSE)
} else {
## Continuous models (valid)
models_continuous <- sapply(models[models %in% available_models_continuous], set.continuous.args.ace.models, n = 1)
}
}
}
}
}
## Models is a transition matrix (discrete only)
if(models_class == "matrix") {
if(do_continuous) {
stop("Transition matrices can be used as models only for discrete characters.", call. = FALSE)
} else {
models_discrete <- replicate(n_characters_discrete, models, simplify = FALSE)
}
}
## Models is a complicated list
if(models_class == "list") {
if(length(models) == 1) {
if(do_discrete && do_continuous) {
stop("Only one model is specified but both discrete and continuous characters are detected.", call. = FALSE)
}
## Set the models for discrete
if(do_discrete) {
models_discrete <- replicate(n_characters_discrete, models, simplify = FALSE)
}
## Set the models for continuous
if(do_continuous) {
models_continuous <- replicate(n_characters_continuous, do.call(set.continuous.args.ace, models), simplify = FALSE)
}
}
## Models is a list of models
check.length(models, n_characters, msg = paste0(" list must be the same length as the number of characters (", n_characters, ")."))
## Separate the models per type
if(do_discrete) {
models_discrete <- models[discrete_char_ID]
}
if(do_continuous) {
models_continuous <- models[continuous_char_ID]
## Format correctly
models_continuous <- lapply(models_continuous, function(x) do.call(set.continuous.args.ace, x))
}
}
## Remove invariant characters
if(do_discrete && has_invariants) {
## Remove the models
models_discrete <- models_discrete[-invariants]
}
}
if(do_discrete && has_invariants) {
models_discrete <- models_discrete[-invariants]
}
#########
##
## Handle the options
##
#########
if(missing(options.args)) {
## No options
options.ace <- options.castor <- options.args <- NULL
} else {
## must be list with names
check.class(options.args, "list")
options_error <- "options.args must be an unambiguous named list of options for castor::asr_mk_model() or ape::ace()."
## Check the available names
options_avail <- c(names(formals(castor::asr_mk_model)), names(formals(ape::ace)))
if(is.null(names(options.args)) || !all(names(options.args) %in% options_avail)) {
stop(options_error, call. = FALSE)
}
## Sort the options
options.ace <- options.castor <- NULL
if(do_continuous) {
options.ace <- options.args[names(options.args) %in% names(formals(ape::ace))]
if(length(options.ace) == 0) {
options.ace <- NULL
}
}
if(do_discrete) {
options.castor <- options.args[names(options.args) %in% names(formals(castor::asr_mk_model))]
if(length(options.castor) == 0) {
options.castor <- NULL
}
}
}
## Check the estimation details
if(!is.null(estimation.details)) {
## The return args from castor::asr_mk_model (1.6.6)
return_args_discrete <- c("success", "Nstates", "transition_matrix", "loglikelihood", "ancestral_likelihoods")
return_args_continuous <- c("CI95", "sigma2", "loglik")
if(do_discrete && do_continuous) {
return_args <- c(return_args_discrete, return_args_continuous)
} else {
if(do_discrete) {
return_args <- return_args_discrete
}
if(do_continuous) {
return_args <- return_args_continuous
}
}
## Check the requested details
check.method(estimation.details, return_args, msg = "estimation.details")
} else {
return_args_discrete <- return_args_continuous <- NULL
}
#########
##
## set the arguments for calls
##
#########
## Setting the continuous characters call
if(do_continuous) {
## Create the character arguments
character_continuous_args <- mapply(function(character, ace.args, options = NULL) return(c(x = character, ace.args, options)), characters_continuous, models_continuous, MoreArgs = list(options = options.ace), SIMPLIFY = FALSE)
## Create the character and tree arguments
tree_character_continuous_args <- list()
for(one_tree in 1:length(tree)) {
tree_character_continuous_args[[one_tree]] <- lapply(character_continuous_args, function(character, tree) {character$phy <- tree; return(character)}, tree[[one_tree]])
}
## Set verbose fun
if(verbose) {
fun_continuous <- function(...) {
cat(".")
return(ape::ace(...))
}
} else {
fun_continuous <- ape::ace
}
}
## Setting the discrete characters call
if(do_discrete) {
## Set the details to return (if any)
if(any(return_args_discrete %in% estimation.details)) {
details_out <- return_args_discrete[return_args_discrete %in% estimation.details]
} else {
details_out <- NULL
}
## Set up the arguments for one tree
character_discrete_args <- mapply(make.args, characters_tables, characters_states, models_discrete, MoreArgs = list(estimation.details = details_out, castor.options = options.castor), SIMPLIFY = FALSE)
## Create the character and tree arguments
tree_character_discrete_args <- list()
for(one_tree in 1:length(tree)) {
tree_character_discrete_args[[one_tree]] <- lapply(character_discrete_args, function(character, tree) {character$tree <- tree; return(character)}, tree[[one_tree]])
}
}
#########
##
## run the calls
##
#########
if(do_parallel) {
## Remove verbose
if(verbose) {
cat(paste0("Running the estimation for ", length(tree), " tree", ifelse(length(tree) > 1, "s ", " "), "using ", cores, " core", ifelse(cores == 1, "...", "s...")))
was_verbose <- TRUE
verbose <- FALSE
} else {
was_verbose <- FALSE
}
## Set up the cluster
cluster <- parallel::makeCluster(cores)
## Get the current environment
current_env <- environment()
export_arguments_list <- export_functions_list <- character()
if(do_discrete) {
## Get the export lists
export_arguments_list <- c("tree_character_discrete_args",
"special.tokens",
"invariants",
"threshold.type",
"threshold",
"verbose",
"characters_states",
"invariant_characters_states")
export_functions_list <- c("one.tree.ace",
"castor.ace",
"tree.data.update",
"add.state.names",
"translate.likelihood")
}
if(do_continuous) {
export_arguments_list <- c(export_arguments_list, "tree_character_continuous_args")
export_functions_list <- c(export_functions_list, "fun_continuous")
}
## Export from this environment
parallel::clusterExport(cluster, c(export_arguments_list, export_functions_list), envir = current_env)
## Call the cluster
if(do_discrete) {
discrete_estimates <- parLapply(cl = cluster, tree_character_discrete_args, one.tree.ace, special.tokens, invariants, characters_states, threshold.type, threshold, invariant_characters_states, verbose)
}
if(do_continuous) {
continuous_estimates <- parLapply(cl = cluster, tree_character_continuous_args, lapply, function(x) do.call(fun_continuous, x))
## Remove the ugly call
continuous_estimates <- lapply(continuous_estimates, lapply, function(x) {x$call <- "ape::ace"; return(x)})
## Add node labels
# TODO: to be removed if ape update
add.nodes <- function(obj, phy) {
## Adding node labels to $ace and $CI95 (if available)
options(warn = -1)
names <- as.integer(names(obj$ace))
options(warn = 0)
if(!is.null(phy$node.label) && !is.na(names[1])) {
ordered_node_labels <- phy$node.label[c(as.integer(names(obj$ace))- Ntip(phy))]
names(obj$ace) <- ordered_node_labels
if(!is.null(obj$CI95)) {
rownames(obj$CI95) <- ordered_node_labels
}
}
return(obj)
}
for(one_tree in 1:length(tree)) {
continuous_estimates[[one_tree]] <- lapply(continuous_estimates[[one_tree]], add.nodes, phy = tree[[one_tree]])
}
}
## Stop the cluster
parallel::stopCluster(cluster)
## Reactivate the verbose
if(was_verbose) {
cat("Done.")
verbose <- TRUE
}
} else {
## Make the functions verbose
if(verbose) cat("Running ancestral states estimations:")
## Run the continuous characters
if(do_continuous) {
## Run all the ace
continuous_estimates <- lapply(tree_character_continuous_args, lapply, function(x) do.call(fun_continuous, x))
## Remove the ugly call
continuous_estimates <- lapply(continuous_estimates, lapply, function(x) {x$call <- "ape::ace"; return(x)})
## Add node labels
# TODO: to be removed if ape update
add.nodes <- function(obj, phy) {
## Adding node labels to $ace and $CI95 (if available)
options(warn = -1)
names <- as.integer(names(obj$ace))
options(warn = 0)
if(!is.null(phy$node.label) && !is.na(names[1])) {
ordered_node_labels <- phy$node.label[c(as.integer(names(obj$ace))- Ntip(phy))]
names(obj$ace) <- ordered_node_labels
if(!is.null(obj$CI95)) {
rownames(obj$CI95) <- ordered_node_labels
}
}
return(obj)
}
for(one_tree in 1:length(tree)) {
continuous_estimates[[one_tree]] <- lapply(continuous_estimates[[one_tree]], add.nodes, phy = tree[[one_tree]])
}
}
## Run the discrete characters
if(do_discrete) {
## Run all the ace for discrete
discrete_estimates <- lapply(tree_character_discrete_args, one.tree.ace, special.tokens, invariants, characters_states, threshold.type, threshold, invariant_characters_states, verbose)
}
if(verbose) cat("Done.\n")
}
#########
##
## handle the outputs
##
#########
## Handle the continuous characters
if(do_continuous) {
## Get the results in a matrix format
results_continuous <- lapply(lapply(continuous_estimates, lapply, `[[`, "ace"), function(x) do.call(cbind, x))
## Get the details for continuous
if(any(return_args_continuous %in% estimation.details)) {
## Get which details to grep
details_out <- return_args_continuous[return_args_continuous %in% estimation.details]
## Get the details
details_continuous <- lapply(continuous_estimates, lapply, function(x, details_out) return(x[details_out]), details_out)
} else {
details_continuous <- NULL
}
}
if(do_discrete) {
## Get the results in a matrix format
results_discrete <- lapply(lapply(discrete_estimates, `[[`, 1), function(x) do.call(cbind, x))
## Get the details
details_discrete <- lapply(discrete_estimates, `[[`, 2)
}
## Handle output
## Combine the traits
if(do_discrete && do_continuous) {
## Combine the traits
results_out <- mapply(bind.characters, results_continuous, results_discrete,
MoreArgs = list(order = list("continuous" = continuous_char_ID, "discrete" = unique(c(discrete_char_ID, invariants_ID)))),
SIMPLIFY = FALSE)
## Return the details per characters
if(is.null(details_continuous)) {
## Make a list of nulls
details_continuous <- replicate(length(tree), lapply(as.list(1:n_characters_continuous), function(x) return(NULL)), simplify = FALSE)
}
if(is.null(details_discrete[[1]])) {
## Make a list of nulls
details_discrete <- replicate(length(tree), lapply(as.list(1:n_characters_discrete), function(x) return(NULL)), simplify = FALSE)
}
details_out <- mapply(bind.details, details_continuous, details_discrete,
MoreArgs = list(order = list("continuous" = continuous_char_ID, "discrete" = unique(c(discrete_char_ID, invariants_ID)))),
SIMPLIFY = FALSE)
}
if(do_discrete && !do_continuous) {
results_out <- results_discrete
details_out <- details_discrete
}
if(do_continuous && !do_discrete) {
results_out <- results_continuous
details_out <- details_continuous
}
## Handle output
output_return <- switch(output,
matrix = results_out,
list = lapply(results_out, make.list),
combined.matrix = lapply(results_out, add.tips, matrix = matrix),
combined.list = lapply(lapply(results_out, add.tips, matrix = matrix), make.list),
dispRity = make.dispRity(data = lapply(results_out, add.tips, matrix = matrix), tree = tree)
)
## Results out
if(is.null(estimation.details)) {
if(length(tree) == 1) {
return(output_return[[1]])
} else {
return(output_return)
}
} else {
if(length(tree) == 1) {
return(list(estimations = output_return[[1]], details = details_out[[1]]))
} else {
return(list(estimations = output_return, details = details_out))
}
}
}
# matrix <- matrix(sample(c("0", "1", "0&1", NA),
# 500, replace = TRUE), nrow = 10, dimnames =
# list(paste0("t", 1:10), c()))
# trees <- rmtree(10, 10)
# claddis.wrapper <- function(tree, matrix) {
# return(estimate_ancestral_states(matrix, tree,
# estimate_all_nodes = TRUE))
# }
# # Reformat for use elsewhere in Claddis:
# Claddis_matrix <- build_cladistic_matrix(matrix)
# serial_start <- Sys.time()
# ancestral_states <- multi.ace(data = matrix, tree = trees, special.tokens = c("missing" = NA))
# serial_end <- Sys.time()
# ## The same example but running in parallel
# parallel_start <- Sys.time()
# ancestral_states <- multi.ace(matrix, trees, special.tokens = c("missing" = NA))
# parallel_end <- Sys.time()
# claddis_start <- Sys.time()
# test <- lapply(trees, claddis.wrapper, Claddis_matrix)
# claddis_end <- Sys.time()
# serial_time <- serial_end-serial_start
# parallel_time <- parallel_end-parallel_start
# claddis_time <- claddis_end-claddis_start
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.