R/extractMemoryFeatures.R

Defines functions extractMemoryFeatures

Documented in extractMemoryFeatures

#' Extracts ecological memory features on the output of \code{\link{computeMemory}}.
#'
#' @description It computes the following features of the ecological memory patterns returned by \code{\link{computeMemory}}:
#' \itemize{
#'    \item \code{memory strength} maximum difference in relative importance between each component (endogenous, exogenous, and concurrent) and the median of the random component. This is computed for exogenous, endogenous, and concurrent effect.
#'     \item \code{memory length} proportion of lags over which the importance of a memory component is above the median of the random component. This is only computed for endogenous and exogenous memory.
#' \item \code{dominance} proportion of the lags above the median of the random term over which a memory component has a higher importance than the other component. This is only computed for endogenous and exogenous memory.
#' }
#'
#'
#'@usage extractMemoryFeatures(
#'  memory.pattern = NULL,
#'  exogenous.component = NULL,
#'  endogenous.component = NULL,
#'  sampling.subset = NULL,
#'  scale.strength = TRUE
#'  )
#'
#' @param memory.pattern either a list resulting from \code{\link{computeMemory}}, or a dataframe with memory patterns of several taxa generated by \code{\link{experimentToTable}}.
#' @param exogenous.component character string or vector of character strings, name of the variable or variables defining the exogenous component.
#' @param endogenous.component character string, string, name of the variable defining the endogenous component. If the data was generated by \code{\link{prepareLaggedData}}, \code{endogenous.component} would usually be \code{"Response"}.
#' @param sampling.subset only relevant when \code{analysis.output} is the result of \code{runExperiment}. Character string with the name of the column of the list with the simulation outcomes.
#' @param scale.strength boolean. If \code{TRUE}, the strength of the ecological memory components, which has the same units as the importance scores yielded by Random Forest (percentage of increment in mean squared error when a variable is permuted), is scaled between 0 and 1.
#'
#' @details \strong{Warning:} this function only works when only one exogenous component (driver) is used to define the model in \code{\link{computeMemory}}. If more than one driver is provided throught the argument \code{exogenous.component}, the maximum importance scores of all exogenous variables is considered. In other words, the importance of exogenous variables is not additive.
#'
#' @author Blas M. Benito  <blasbenito@gmail.com>
#'
#' @return A dataframe with 8 columns and 1 row if \code{memory.pattern} is the output of \code{\link{computeMemory}} and 13 columns and as many rows as taxa are in the input if it is the output of \code{\link{experimentToTable}}. The columns are:
#'
#' \itemize{
#'   \item \emph{label} character string to identify the taxon. It either inherits its values from \code{\link{experimentToTable}}, or sets the default ID as "1".
#'   \item \emph{strength.endogenous} numeric in the range [0, 100], in importance units (percentage of increment in the mean squared error of the random forest model if the variable is permuted) difference between the maximum importance of the endogenous component at any lag and the median of the random component (see details in \code{\link{computeMemory}})
#'   \item \emph{strength.exogenous} numeric in the range [0, 100], same as above, but for the exogenous component.
#'   \item \emph{strenght.concurrent} numeric in the range [0, 100], same as above, but for the concurrent component (driver at lag 0).
#'   \item \emph{length.endogenous} numeric in the range [0, 100], proportion of lags over which the importance of the endogenous memory component is above the median of the random component.
#'   \item \emph{length.exogenous} numeric in the range [0, 1], same as above but for the exogenous memory component.
#'   \item \emph{dominance.endogenous} numeric in the range [0, 1], proportion of the lags above the median of the random term over which a the endogenous memory component has a higher importance than the exogenous component.
#'   \item \emph{dominance.exogenous}, opposite as above.
#'   \item \emph{maximum.age}, numeric. As every column after this one, only provided if \code{memory.pattern} is the output of \code{\link{experimentToTable}}. Trait of the given taxon.
#'   \item \emph{fecundity} numeric, trait of the given taxon.
#'   \item \emph{niche.A.mean} numeric, trait of the given taxon.
#'   \item \emph{niche.A.sd} numeric, trait of the given taxon.
#'   \item \emph{sampling} numeric, trait of the given taxon.
#' }
#'
#' @seealso \code{\link{computeMemory}}
#'
#' @examples
#'
#' #loading example data
#' data(palaeodataMemory)
#'
#' #computing ecological memory features
#' memory.features <- extractMemoryFeatures(
#'   memory.pattern = palaeodataMemory,
#'   exogenous.component = c(
#'   "climate.temperatureAverage",
#'   "climate.rainfallAverage"
#'   ),
#'   endogenous.component = "Response",
#'   sampling.subset = NULL,
#'   scale.strength = TRUE
#'   )
#'
#'
#' @export
extractMemoryFeatures <- function(memory.pattern = NULL,
                                  exogenous.component = NULL,
                                  endogenous.component = NULL,
                                  sampling.subset = NULL,
                                  scale.strength = TRUE){

  #entry point
  x <- memory.pattern

  #checking if it is output of computeMemory or experimentToTable
  #if x$memory does not exist, it is an output of experimentToTable
  if(is.null(x$memory) == TRUE){

    #factors to character
    x$label <- as.character(x$label)
    x$Variable <- as.character(x$Variable)

    #subsetting by sampling.subset if available
    if(is.null(sampling.subset) == FALSE){
      x <- x[x$sampling  ==  sampling.subset, ]
    }

    #identifying available groups
    taxa <- unique(x$label)
    sampling <- unique(x$sampling)

    #memory object switch to false
    is.computeMemory.object <- FALSE

  } else {

    #it is a list containing results of a single taxa.
    is.computeMemory.object <- TRUE
    taxa <- 1
    sampling <- "this"


    # #if there is no sampling column, we add it
    # if(is.null(x$sampling)){
    #
    #   x$sampling <- "this" #just a fake useless name
    #   sampling.subset <- "this"
    #
    #   #identifying available groups (repeated code because I am sick of this shit)
    #   taxa <- unique(x$label)
    #   sampling <- unique(x$sampling)
    # }
  }

  #dataframe to store results
  nas <- rep(NA, (length(taxa) * length(sampling)))
  output.df <- data.frame(label = nas,
                       strength.endogenous = nas,
                       strength.exogenous = nas,
                       strength.concurrent = nas,
                       length.endogenous = nas,
                       length.exogenous = nas,
                       dominance.endogenous = nas,
                       dominance.exogenous = nas,
                       maximum.age = nas,
                       fecundity = nas,
                       niche.mean = nas,
                       niche.sd = nas,
                       sampling = nas,
                       stringsAsFactors = FALSE)

  #row counter
  row.counter = 0

  #iterating through taxa and sampling
  for(taxon in taxa){
    for(samp in sampling){

      #+1 to the row counter
      row.counter = row.counter + 1

      #subsetting the taxon
      if(is.computeMemory.object == FALSE){

        x.temp = x[x$label == taxon, ]
        x.temp = x.temp[x.temp$sampling == samp, ]

      } else {
        x.temp = x$memory
      }

      #random median
      random.median <- round(x.temp[x.temp$Variable == "Random", "median"][1], 2)

      #number of lags
      lags <- unique(x.temp$Lag)
      lags <- lags[lags != 0]

      #computing memory strength (difference betweenn component and median of the random term)
      # strength.concurrent <- x.temp[x.temp$Variable == exogenous.component & x.temp$Lag == 0, "median"] - random.median
      # x.temp <- x.temp[x.temp$Lag!=0,] #removing lag 0
      # strength.endogenous  <-  max(x.temp[x.temp$Variable == endogenous.component, "median"]) - random.median
      # strength.exogenous <- max(x.temp[x.temp$Variable == exogenous.component, "median"]) - random.median

      strength.concurrent <- max(x.temp[x.temp$Variable %in% exogenous.component & x.temp$Lag == 0, "median"]) - random.median
      x.temp <- x.temp[x.temp$Lag!=0,] #removing lag 0
      strength.endogenous  <-  max(x.temp[x.temp$Variable %in% endogenous.component, "median"]) - random.median
      strength.exogenous <- max(x.temp[x.temp$Variable %in% exogenous.component, "median"]) - random.median

      #to 0 if negative
      if(strength.endogenous < 0){strength.endogenous = 0}
      if(strength.exogenous < 0){strength.exogenous = 0}


      #computing memory length: number of lags above the median of the random component
      length.endogenous <- sum(x.temp[x.temp$Variable == endogenous.component, "median"] > random.median) / length(lags)

      #getting medians of exogenous components
      max.exogenous <- x.temp[x.temp$Variable == exogenous.component[1], "median"]
      if(length(exogenous.component) > 1){
        for(j in exogenous.component[2:length(exogenous.component)]){
          max.exogenous <- pmax(max.exogenous, x.temp[x.temp$Variable == j, "median"])
        }
      }
      length.exogenous <- sum(max.exogenous > random.median) / length(lags)


      #computing memory dominance lags of one component above other component
      endogenous <- x.temp[x.temp$Variable == endogenous.component, "median"]
      exogenous <- max.exogenous

      #values below random.median to zero
      endogenous[endogenous < random.median] <- 0
      exogenous[exogenous < random.median] <- 0

      #values
      dominance.endogenous <- sum(endogenous > exogenous) / length(lags)
      dominance.exogenous <- sum(exogenous > endogenous) / length(lags)

      #params
      if(is.computeMemory.object == FALSE){

        maximum.age <- x.temp$maximum.age[1]
        fecundity <- x.temp$fecundity[1]
        niche.mean <- x.temp$niche.A.mean[1]
        niche.sd <- x.temp$niche.A.sd[1]

      } else {
        maximum.age <- fecundity <- niche.mean <- niche.sd <- NA
      }

      #filling dataframe
      output.df[row.counter, ] <- c(taxon,
                                   strength.endogenous,
                                   strength.exogenous,
                                   strength.concurrent,
                                   length.endogenous,
                                   length.exogenous,
                                   dominance.endogenous,
                                   dominance.exogenous,
                                   maximum.age,
                                   fecundity,
                                   niche.mean,
                                   niche.sd,
                                   samp)

    } #end of iteration through sampling
  } #end of iteration through taxa

  #to numeric
  output.df[, 2:(ncol(output.df)-1)] <- sapply(output.df[, 2:(ncol(output.df)-1)], as.numeric)

  #removing trait columns if input is not a memory object
  if(is.computeMemory.object == TRUE){
    output.df <- output.df[, 1:8]
  }

  #rescaling strength components
  if(length(taxa) > 1 | !("this" %in% sampling)){
    if(scale.strength == TRUE){
      output.df$strength.concurrent <- output.df$strength.concurrent / max(output.df$strength.concurrent)
      output.df$strength.exogenous <- output.df$strength.exogenous / max(output.df$strength.exogenous)
      output.df$strength.endogenous <- output.df$strength.endogenous / max(output.df$strength.endogenous)
    }
  }

  return(output.df)

}
BlasBenito/memoria documentation built on Feb. 20, 2022, 1:45 a.m.