R/disparity_resample.R
In GeometricMorphometricsMix: Heterogeneous Methods for Shape and Other Multidimensional Data

Documented in disparity_resample plot.disparity_resample print.disparity_resample

#' Resampling-based estimates (bootstrap or rarefaction) of disparity
#' / morphospace occupation
#'
#' Provides a unified interface to obtain resampled (bootstrap or
#' rarefied) estimates of several disparity / morphospace occupation
#' statistics.
#'
#' The function allows choosing among the following multivariate
#' statistics:
#' \itemize{
#'  \item Multivariate variance (trace of the covariance matrix; sum
#'  of variances)
#'  \item Mean pairwise Euclidean distance
#'  \item Convex hull volume (n-dimensional)
#'  \item Claramunt proper variance (Claramunt 2010) based on a linear
#'  shrinkage covariance matrix
#' }
#'
#' If the input `Data` is univariate (i.e., a vector), the analysis
#' defaults to computing the (univariate) variance within each group
#' (the attribute `statistic` is ignored).
#'
#' If `bootstrap_rarefaction=="bootstrap"`, the function performs
#' resampling with replacement (i.e., classical bootstrap) by sampling
#' rows of the data matrix / data frame.
#' Optionally, the user can specify a custom sample size via the
#' `sample_size` argument.
#' This allows comparison of bootstrap confidence intervals at the
#' same sample size (essentially, this is rarefaction sampling with
#' replacement), which can be useful to compare bootstrapped
#' confidence intervals across different groups for statistics which
#' are sensitive to sample size (at the expense of broader than
#' necessary confidence intervals for groups that are larger).
#'
#' If `bootstrap_rarefaction=="rarefaction"`, the function performs
#' resampling without replacement at the sample size indicated in
#' `sample_size` (numeric) or, if `sample_size=="smallest"`, at the
#' size of the smallest group (all groups are resampled to that size).
#' Rarefaction requires specifying a valute to the attribute
#' `sample_size`; an error is returned otherwise.
#'
#' @section Parallelization:
#' This function automatically uses parallel processing via the future
#' framework, when the packages future and future.apply are installed.
#' This is particularly useful for large datasets, large number of
#' resamples or computationally intensive statistics (e.g., convex
#' hull volume).
#' The parallelization strategy is determined by the user's choice of
#' future plan, providing flexibility across different computing
#' environments (local multicore, cluster, etc.). 
#' The function performs parallelization at the level of individual
#' bootstrap/rarefaction replicates within each group. The future plan
#' should be set up by the user before calling this function using
#' \code{future::plan()} (see examples). If no plan is set or the
#' future packages are not available, the function will use sequential
#' processing.
#'
#' @section Average estimate:
#' For bootstrap resampling, the average estimate reported is the mean
#' of the bootstrap resampled values (for consistency across all
#' bootstrap scenarios). For rarefaction, because the purpose is to
#' make groups comparable at a common (smaller) sample size, the
#' average estimate reported is the mean of the rarefied resampled
#' values for that group (i.e., the mean across all rarefaction
#' replicates).
#'
#' @section Input data types:
#' `Data` can be a data frame, a matrix, a vector, or a 3D array of
#' landmark coordinates (e.g., p landmarks x k dimensions x n
#' specimens). In the latter case, the array is converted internally
#' to a 2D matrix with specimens in rows and (landmark * dimension)
#' variables in columns.
#'
#' @note Because of how the computation works, convex hull volume
#' computation requires the number of observations (specimens) to be
#' (substantially) greater than the number of variables (dimensions).
#' In case of shape or similar, consider using the scores along the
#' first (few/several) principal components.
#' Sometimes errors are thrown due to near-zero components, in this
#' case try reducing the number of principal components used.
#' Examples of use of this statistic with geometric morphometric data
#' include Drake & Klingenberg 2010 (American Naturalist), Fruciano
#' et al. 2012 (Environmental Biology of Fishes) and Fruciano et al.
#' 2014 (Biological Journal of the Linnean Society).
#' Because of the sensitivity of this statistic to outliers, usually
#' rarefaction is preferred to bootstrapping.
#'
#' @note "Multivariate variance" is also called "total variance",
#' "Procrustes variance" (in geometric morphometrics) and "sum of
#' univariate variances".
#' Note how the computation here does not divide variance by sample
#' size (other than the normal division performed in the computation
#' of variances).
#'
#' @details 
#' This function uses the future framework for parallel processing,
#' requiring packages future and future.apply.
#' Users should set up their preferred parallelization strategy using
#' \code{future::plan()} before calling this function.
#' For example:
#' \itemize{
#'   \item \code{future::plan(future::sequential)} for sequential
#'   processing
#'   \item \code{future::plan(future::multisession, workers = 4)} for
#'   parallel processing with 4 workers (works on all platforms
#'   including Windows)
#'   \item \code{future::plan(future::multicore, workers = 4)} for
#'   forked processes (Unix-like systems)
#'   \item \code{future::plan(future::cluster, workers =
#'   c("host1", "host2"))} for cluster computing
#' }
#' If no plan is set or the future packages are not available,
#' the function will use sequential processing.
#'
#' @param Data A data frame, matrix, vector, or 3D array.
#'   Observations (specimens) must be in rows (if a 3D array is
#'   supplied, the third dimension is assumed to index specimens).
#' @param group A factor or a vector indicating group membership (will
#'   be coerced to factor). If `NULL` (default) a single analysis is
#'   performed on the full dataset.
#' @param n_resamples Number of resampling replicates (default 1000).
#' @param statistic Character string identifying the statistic to
#'   compute. One of `"multivariate_variance"`,
#'   `"mean_pairwise_euclidean_distance"`, `"convex_hull_volume"`,
#'   `"claramunt_proper_variance"`. Default is
#'   `"multivariate_variance"`. Ignored for univariate data (vector
#'   input).
#' @param CI Desired two-sided confidence interval level (default 0.95)
#'   used for percentile confidence intervals. Use CI=1 to display the
#'   full range (minimum to maximum) of resampled values.
#' @param bootstrap_rarefaction Either `"bootstrap"` (default) for
#'   resampling with replacement or `"rarefaction"` for resampling
#'   without replacement.
#' @param sample_size Either `NULL` (default), a positive integer
#'   indicating the number of rows to sample, or the character
#'   `"smallest"` to use the size of the smallest group (all groups 
#'   resampled to that size). For `bootstrap_rarefaction=="rarefaction"`,
#'   sampling is without replacement and this parameter is required
#'   (not `NULL`). For `bootstrap_rarefaction=="bootstrap"`, sampling
#'   is with replacement; if `NULL`, uses original group sizes,
#'   otherwise uses the specified sample size. If `"smallest"` is
#'   supplied but no groups are defined, an error is returned.
#'
#' @return A list containing:
#'  \describe{
#'    \item{chosen_statistic}{Character vector of length 1 with the
#'    human-readable name of the statistic used.}
#'    \item{results}{A data frame with columns `group`, `average`,
#'    `CI_min`, `CI_max`. One row per group.
#'      When CI=1, `CI_min` and `CI_max` represent the minimum and
#'      maximum of resampled values rather than confidence intervals.}
#'    \item{resampled_values}{If a single group: numeric vector of
#'    length `n_resamples` with the resampled values.
#'      If multiple groups: a named list with one numeric vector
#'      (length `n_resamples`) per group.}
#'    \item{CI_level}{The CI level used (between 0 and 1). When CI=1,
#'    ranges are computed instead of confidence intervals.}
#' }
#' 
#' The returned object has class "disparity_resample" and comes with
#' associated S3 methods for convenient display and visualization:
#' \itemize{
#'   \item \code{\link{print.disparity_resample}}: Prints a formatted
#'   summary of results including confidence interval overlap
#'   assessment for multiple groups
#'   \item \code{\link{plot.disparity_resample}}: Creates a confidence
#'   interval plot using ggplot2
#' }
#'
#' @references Drake AG, Klingenberg CP. 2010. Large-scale
#' diversification of skull shape in domestic dogs: disparity and
#' modularity. American Naturalist 175:289-301.
#' @references Claramunt S. 2010. Discovering exceptional
#' diversifications at continental scales: the case of the endemic
#' families of Neotropical Suboscine passerines. Evolution
#' 64:2004-2019.
#' @references Fruciano C, Tigano C, Ferrito V. 2012. Body shape
#' variation and colour change during growth in a protogynous fish.
#' Environmental Biology of Fishes 94:615-622.
#' @references Fruciano C, Pappalardo AM, Tigano C, Ferrito V. 2014.
#' Phylogeographical relationships of Sicilian brown trout and the
#' effects of genetic introgression on morphospace occupation.
#' Biological Journal of the Linnean Society 112:387-398.
#' @references Fruciano C, Franchini P, Raffini F, Fan S, Meyer A.
#' 2016. Are sympatrically speciating Midas cichlid fish special?
#' Patterns of morphological and genetic variation in the closely
#' related species Archocentrus centrarchus. Ecology and Evolution
#' 6:4102-4114.
#' @seealso \code{\link{disparity_test}},
#' \code{\link{print.disparity_resample}},
#' \code{\link{plot.disparity_resample}}
#'
#' @examples
#' set.seed(123)
#' # Simulate two groups with different means but same covariance
#' if (requireNamespace("MASS", quietly = TRUE)) {
#'   X1 = MASS::mvrnorm(20, mu=rep(0, 10), Sigma=diag(10))
#'   X2 = MASS::mvrnorm(35, mu=rep(2, 10), Sigma=diag(10))
#'   Data = rbind(X1, X2)
#'   grp = factor(c(rep("A", nrow(X1)), rep("B", nrow(X2))))
#'
#'   # Sequential processing
#'   # future::plan(future::sequential)  # Default sequential processing
#'   
#'   # Parallel processing (uncomment to use)
#'   # future::plan(future::multisession, workers = 2)  # Use 2 workers
#'
#'   # Bootstrap multivariate variance
#'   boot_res = disparity_resample(
#'     Data, group=grp, n_resamples=200,
#'     statistic="multivariate_variance",
#'     bootstrap_rarefaction="bootstrap"
#'   )
#'   # Direct access to results table
#'   boot_res$results
#'   
#'   # Using the print method for formatted output
#'   print(boot_res)
#'   
#'   # Using the plot method to visualize results
#'   # plot(boot_res)  # Uncomment to create confidence interval plot
#'
#'   # Rarefaction (to the smallest group size) of mean pairwise
#'   # Euclidean distance
#'   rar_res = disparity_resample(
#'     Data, group=grp, n_resamples=200,
#'     statistic="mean_pairwise_euclidean_distance",
#'     bootstrap_rarefaction="rarefaction", sample_size="smallest"
#'   )
#'# Now simulate a third group with larger variance
#'  X3 = MASS::mvrnorm(15, mu=rep(0, 10), Sigma=diag(10)*1.5)
#'  grp2 = factor(
#'    c(rep("A", nrow(X1)), rep("B", nrow(X2)), rep("C", nrow(X3)))
#'  )
#'  boot_res2 = disparity_resample(
#'    Data=rbind(X1, X2, X3), group=grp2, n_resamples=1000,
#'    statistic="multivariate_variance",
#'    bootstrap_rarefaction="bootstrap"
#'  )
#'  print(boot_res2)
#'  # plot(boot_res2)
#'  # Plot of the obtained (95%) confidence intervals (uncomment to plot)
#'  
#'  # Reset to sequential processing when done (optional)
#'  # future::plan(future::sequential)
#' }
#'
#' @import stats
#' @export
disparity_resample=function(Data, group=NULL, n_resamples=1000,
                            statistic="multivariate_variance", CI=0.95,
                            bootstrap_rarefaction="bootstrap",
                            sample_size=NULL) {

  # Check if future framework is available for parallel processing
  use_future = requireNamespace("future", quietly = TRUE) &&
    requireNamespace("future.apply", quietly = TRUE)

  # Input validation
  validate_disparity_resample_inputs(
    Data, group, n_resamples, statistic, CI, 
    bootstrap_rarefaction, sample_size
  )
  
  # Data preparation and processing
  prepared_data = prepare_data_disparity_resample(Data, group)
  
  # Extract variables for compatibility with existing code
  Data = prepared_data$data
  group_factor = prepared_data$group_factor
  group_levels = prepared_data$group_levels
  group_sizes = prepared_data$group_sizes
  original_input_is_vector = prepared_data$original_input_is_vector
  single_group_analysis = prepared_data$single_group_analysis
  n_obs = prepared_data$n_obs

  # Group validation
  validate_groups_disparity_resample(group_sizes)

  # Statistic validation
  validate_stat_reqs_disparity_resample(
    statistic, prepared_data, sample_size, bootstrap_rarefaction
  )

  # Statistic setup and labeling
  stat_label_map=list(
    multivariate_variance="Multivariate variance",
    mean_pairwise_euclidean_distance=
      "Mean pairwise Euclidean distance",
    convex_hull_volume="Convex hull volume",
    claramunt_proper_variance="Claramunt proper variance",
    variance_univariate="Variance"
  )

  chosen_statistic = if (original_input_is_vector) {
    stat_label_map$variance_univariate
  } else { stat_label_map[[statistic]] }

  # Helper to compute one statistic ---------------------------
  compute_stat=function(X) {
    if (original_input_is_vector) { return(var(X)) }
    if (statistic=="multivariate_variance") { return(muvar(X)) }
    if (statistic=="mean_pairwise_euclidean_distance") {
      return(meanpairwiseEuclideanD(X))
    }
    if (statistic=="convex_hull_volume") {
      if (!requireNamespace("geometry", quietly = TRUE)) {
        stop("Package 'geometry' is required for convex hull volume")
      }
      return(geometry::convhulln(X, options="FA")$vol)
    }
  if (statistic=="claramunt_proper_variance") {
      if (!requireNamespace("nlshrink", quietly = TRUE)) {
        stop("Package 'nlshrink' is required for Claramunt proper variance") }
      return(claramunt_proper_variance(X))
    }
  }

  # Rarefaction settings using modular function --------------------------
  if (bootstrap_rarefaction=="rarefaction") {
    sample_size_num = resolve_rarefaction_sample_size(sample_size, group_sizes)
  }
  
  # Bootstrap with custom sample size settings -----------------------------
  if (bootstrap_rarefaction=="bootstrap" && !is.null(sample_size)) {
    sample_size_num = resolve_bootstrap_sample_size(sample_size, group_sizes)
  }

  # Preallocation of "storage"
  resampled_values_list=list()
  results_rows=list()

  alpha=(1-CI)/2

  for (g in group_levels) {
    idx=which(group_factor==g)
    Xg = if (original_input_is_vector) {
      Data[idx]
    } else {
      Data[idx,,drop=FALSE]
    }
    n_g = length(idx)

    if (bootstrap_rarefaction=="rarefaction") {
      # sample_size validation already done above
      if (use_future) {
        res_vals=unlist(future.apply::future_lapply(
          seq_len(n_resamples), function(i) {
          sampled_idx=sample(
            seq_len(n_g), sample_size_num, replace=FALSE
          )
          if (original_input_is_vector) {
            compute_stat(Xg[sampled_idx])
          } else {
            compute_stat(Xg[sampled_idx,,drop=FALSE])
          }
        }, future.seed = TRUE,
           future.packages = c("geometry", "nlshrink")))
      } else {
        res_vals=unlist(lapply(seq_len(n_resamples), function(i) {
          sampled_idx=sample(
            seq_len(n_g), sample_size_num, replace=FALSE
          )
          if (original_input_is_vector) {
            compute_stat(Xg[sampled_idx])
          } else {
            compute_stat(Xg[sampled_idx,,drop=FALSE])
          }
        }))
      }
      average_stat=mean(res_vals)
    } else if (bootstrap_rarefaction=="bootstrap") {
      if (!is.null(sample_size)) {
        # Bootstrap with custom sample size
        if (use_future) {
          res_vals=unlist(future.apply::future_lapply(
            seq_len(n_resamples), function(i) {
            sampled_idx=sample(
              seq_len(n_g), sample_size_num, replace=TRUE
            )
            if (original_input_is_vector) {
              compute_stat(Xg[sampled_idx])
            } else {
              compute_stat(Xg[sampled_idx,,drop=FALSE])
            }
          }, future.seed = TRUE,
             future.packages = c("geometry", "nlshrink")))
        } else {
          res_vals=unlist(lapply(seq_len(n_resamples), function(i) {
            sampled_idx=sample(
              seq_len(n_g), sample_size_num, replace=TRUE
            )
            if (original_input_is_vector) {
              compute_stat(Xg[sampled_idx])
            } else {
              compute_stat(Xg[sampled_idx,,drop=FALSE])
            }
          }))
        }
        # Average of resampled estimates for consistency
        average_stat=mean(res_vals)
      } else {
        # Standard bootstrap (original sample size)
        if (use_future) {
          res_vals=unlist(future.apply::future_lapply(
            seq_len(n_resamples), function(i) {
            sampled_idx=sample(seq_len(n_g), n_g, replace=TRUE)
            if (original_input_is_vector) {
              compute_stat(Xg[sampled_idx])
            } else {
              compute_stat(Xg[sampled_idx,,drop=FALSE])
            }
          }, future.seed = TRUE,
             future.packages = c("geometry", "nlshrink")))
        } else {
          res_vals=unlist(lapply(seq_len(n_resamples), function(i) {
            sampled_idx=sample(seq_len(n_g), n_g, replace=TRUE)
            if (original_input_is_vector) {
              compute_stat(Xg[sampled_idx])
            } else {
              compute_stat(Xg[sampled_idx,,drop=FALSE])
            }
          }))
        }
        average_stat=mean(res_vals)  # Average of resampled estimates
      }
    } else {
      stop(
        "bootstrap_rarefaction must be either 'bootstrap' or ",
        "'rarefaction'"
      )
    }

    # Handle CI=1 as a special case for full range
    if (CI == 1) {
      CI_min = min(res_vals)
      CI_max = max(res_vals)
    } else {
      CI_min = quantile(res_vals, probs=alpha, names=FALSE, type=7)
      CI_max = quantile(
        res_vals, probs=1-alpha, names=FALSE, type=7
      )
    }

    resampled_values_list[[g]]=res_vals
    results_rows[[g]]=data.frame(group=g, average=average_stat,
                                 CI_min=CI_min, CI_max=CI_max,
                                 row.names=NULL)
  }

  results_df=do.call(rbind, results_rows)
  
  # Fix group handling inconsistency: keep "All" for single group analysis
  if (single_group_analysis) { 
    # Keep "All" as the group label for consistency
    results_df$group = "All"
  }

  # resampled_values output formatting --------------------------
  resampled_values_out = if (length(group_levels)==1) {
    unname(resampled_values_list[[1]])
  } else { resampled_values_list }

  results_list=list(chosen_statistic=chosen_statistic,
              results=results_df,
              resampled_values=resampled_values_out,
              CI_level=CI)
  class(results_list)=c("disparity_resample", "list")
  # Set class for S3 methods
return(results_list)
}

##########################
## Validation Functions ##
##########################


#' Validate inputs for disparity_resample function
#'
#' @param Data Input data for validation
#' @param group Group factor for validation  
#' @param n_resamples Number of resamples for validation
#' @param statistic Statistic choice for validation
#' @param CI Confidence interval level for validation
#'   (0 < CI <= 1, where CI=1 means full range)
#' @param bootstrap_rarefaction Resampling method for validation
#' @param sample_size Sample size for rarefaction validation
#'
#' @return NULL (throws errors if validation fails)
#' @noRd
validate_disparity_resample_inputs = function(
  Data, group, n_resamples, statistic, CI, 
  bootstrap_rarefaction, sample_size
) {
  
  # Validate CI parameter
  if (!is.numeric(CI) || length(CI) != 1 || CI <= 0 || CI > 1) {
    stop(
      "CI must be a single numeric value between 0 and 1 ",
      "(inclusive). Use CI=1 to display the full range of ",
      "resampled values."
    )
  }
  
  # Validate n_resamples
  if (!is.numeric(n_resamples) || length(n_resamples) != 1 || n_resamples <= 0 || n_resamples != as.integer(n_resamples)) {
    stop("n_resamples must be a positive integer")
  }
  
  # Validate bootstrap_rarefaction
  if (!bootstrap_rarefaction %in% c("bootstrap", "rarefaction")) {
    stop("bootstrap_rarefaction must be either 'bootstrap' or 'rarefaction'")
  }
  
  # Validate sample_size for rarefaction
  if (bootstrap_rarefaction == "rarefaction") {
    if (is.null(sample_size)) { 
      stop("sample_size must be provided for rarefaction") 
    }
    if (!identical(sample_size, "smallest")) {
      sample_size_num = as.numeric(sample_size)
      if (is.na(sample_size_num) || sample_size_num <= 0 || sample_size_num != as.integer(sample_size_num)) { 
        stop("sample_size must be a positive integer or 'smallest'") 
      }
    }
  }
  
  # Validate sample_size for bootstrap (when provided)
  if (bootstrap_rarefaction == "bootstrap" && !is.null(sample_size)) {
    if (!identical(sample_size, "smallest")) {
      sample_size_num = as.numeric(sample_size)
      if (is.na(sample_size_num) || sample_size_num <= 0 || sample_size_num != as.integer(sample_size_num)) { 
        stop("sample_size must be a positive integer or 'smallest'") 
      }
    }
  }
  
  invisible(NULL)
}


#' Prepare and validate data for disparity analysis
#'
#' @param Data Input data (matrix, data frame, vector, or 3D array)
#' @param group Optional group factor
#'
#' @return List with processed data and metadata
#' @noRd
prepare_data_disparity_resample = function(Data, group) {
  
  # Determine if input is originally a vector
  original_input_is_vector = FALSE
  if (is.array(Data) && length(dim(Data)) == 3) {
    # Assume dimensions: landmarks x dimensions x specimens (p x k x n)
    if (!requireNamespace("Morpho", quietly = TRUE)) {
      stop("Package 'Morpho' is required for 3D array conversion")
    }
    Data = Morpho::vecx(Data)
  }
  if (is.vector(Data) && !is.list(Data)) {
    Data = as.numeric(Data)
    original_input_is_vector = TRUE
  }
  if (is.data.frame(Data)) { 
    Data = as.matrix(Data) 
  }
  if (!original_input_is_vector && !is.matrix(Data)) {
    stop("Data should be a matrix, data frame, vector, or a 3D array") 
  }

  # Handle missing data
  if (original_input_is_vector) {
    missing_indices = which(is.na(Data))
    if (length(missing_indices) > 0) {
      warning(sprintf("Removing %d observations with missing data", length(missing_indices)))
      Data = Data[-missing_indices]
      if (!is.null(group)) {
        group = group[-missing_indices]
      }
    }
  } else {
    missing_rows = which(apply(Data, 1, function(x) any(is.na(x))))
    if (length(missing_rows) > 0) {
      warning(sprintf("Removing %d observations with missing data", length(missing_rows)))
      Data = Data[-missing_rows, , drop = FALSE]
      if (!is.null(group)) {
        group = group[-missing_rows]
      }
    }
  }
  
  # Check for empty data after missing data removal
  n_obs = ifelse(original_input_is_vector, length(Data), nrow(Data))
  if (n_obs == 0) {
    stop("No observations remain after removing missing data")
  }

  # Handle grouping
  if (is.null(group)) {
    group_factor = factor(rep("All", n_obs))
    single_group_analysis = TRUE
  } else {
    if (length(group) != n_obs) {
      stop("Length of group does not match number of observations after removing missing data") 
    }
    group_factor = as.factor(group)
    single_group_analysis = FALSE
  }

  group_levels = levels(group_factor)
  group_sizes = table(group_factor)
  
  return(list(
    data = Data,
    group_factor = group_factor,
    group_levels = group_levels,
    group_sizes = group_sizes,
    original_input_is_vector = original_input_is_vector,
    single_group_analysis = single_group_analysis,
    n_obs = n_obs
  ))
}


#' Validate group requirements for disparity analysis
#'
#' @param group_sizes Table of group sizes
#' @param min_group_size Minimum required group size
#'
#' @return NULL (throws errors if validation fails)
#' @noRd
validate_groups_disparity_resample = function(group_sizes, min_group_size = 2) {
  
  min_size = min(group_sizes)
  
  if (min_size < min_group_size) {
    small_groups = names(group_sizes)[group_sizes < min_group_size]
    stop(sprintf("All groups must have at least %d observations for variance calculation. Groups with < %d observations: %s", 
                 min_group_size, min_group_size, paste(small_groups, collapse = ", ")))
  }
  
  invisible(NULL)
}


#' Validate statistic-specific requirements
#'
#' @param statistic Name of the statistic to compute
#' @param prepared_data List returned by prepare_data_disparity_resample
#' @param sample_size Sample size for rarefaction (if applicable)
#' @param bootstrap_rarefaction Type of resampling
#'
#' @return NULL (throws errors if validation fails)
#' @noRd
validate_stat_reqs_disparity_resample = function(statistic, prepared_data, sample_size = NULL, bootstrap_rarefaction = "bootstrap") {
  
  # Skip validation for univariate data
  if (prepared_data$original_input_is_vector) {
    return(invisible(NULL))
  }
  
  # Validate statistic choice
  statistic_choices = c("multivariate_variance", "mean_pairwise_euclidean_distance",
                        "convex_hull_volume", "claramunt_proper_variance")
  
  if (!(statistic %in% statistic_choices)) {
    stop(paste("statistic should be one of:", paste(statistic_choices, collapse = ", "))) 
  }
  
  # Special validation for convex hull volume
  if (statistic == "convex_hull_volume") {
    n_vars = ncol(prepared_data$data)
    min_group_size = min(prepared_data$group_sizes)
    
    if (bootstrap_rarefaction == "bootstrap") {
      if (!is.null(sample_size)) {
        # Bootstrap with custom sample size
        if (identical(sample_size, "smallest")) {
          sample_size_num = min(prepared_data$group_sizes)
        } else {
          sample_size_num = as.numeric(sample_size)
        }
        
        if (sample_size_num <= n_vars) {
          stop(sprintf("Convex hull volume with bootstrap at custom sample size requires sample_size > number of variables. sample_size: %d, Number of variables: %d", 
                       sample_size_num, n_vars))
        }
      } else {
        # Standard bootstrap (original group sizes)
        if (min_group_size <= n_vars) {
          stop(sprintf("Convex hull volume requires more observations than variables in each group. Minimum group size: %d, Number of variables: %d", 
                       min_group_size, n_vars))
        }
      }
    } else if (bootstrap_rarefaction == "rarefaction" && !is.null(sample_size)) {
      # Resolve sample_size if it's "smallest"
      if (identical(sample_size, "smallest")) {
        sample_size_num = min(prepared_data$group_sizes)
      } else {
        sample_size_num = as.numeric(sample_size)
      }
      
      if (sample_size_num <= n_vars) {
        stop(sprintf("Convex hull volume with rarefaction requires sample_size > number of variables. sample_size: %d, Number of variables: %d", 
                     sample_size_num, n_vars))
      }
    }
  }
  
  invisible(NULL)
}


#' Resolve rarefaction sample size parameter
#'
#' @param sample_size User-provided sample size (numeric or "smallest")
#' @param group_sizes Table of group sizes
#'
#' @return Numeric sample size value
#' @noRd
resolve_rarefaction_sample_size = function(sample_size, group_sizes) {
  
  if (identical(sample_size, "smallest")) {
    if (length(levels(as.factor(names(group_sizes)))) == 1) { 
      stop("sample_size='smallest' requires multiple groups") 
    }
    sample_size_num = min(group_sizes)
  } else {
    sample_size_num = as.numeric(sample_size)
    if (is.na(sample_size_num) || sample_size_num <= 0 || sample_size_num != as.integer(sample_size_num)) { 
      stop("sample_size must be a positive integer or 'smallest'") 
    }
  }
  
  # Additional validations
  if (sample_size_num < 2) {
    stop("sample_size for rarefaction must be at least 2 for variance calculation")
  }
  
  min_group_size = min(group_sizes)
  if (sample_size_num > min_group_size) {
    stop(sprintf("sample_size (%d) is larger than the smallest group size (%d)", 
                 sample_size_num, min_group_size))
  }
  
  return(sample_size_num)
}


#' Resolve bootstrap sample size parameter (when sample_size is specified for bootstrap)
#'
#' @param sample_size User-provided sample size (numeric or "smallest")
#' @param group_sizes Table of group sizes
#'
#' @return Numeric sample size value
#' @noRd
resolve_bootstrap_sample_size = function(sample_size, group_sizes) {
  
  if (identical(sample_size, "smallest")) {
    if (length(levels(as.factor(names(group_sizes)))) == 1) { 
      stop("sample_size='smallest' requires multiple groups") 
    }
    sample_size_num = min(group_sizes)
  } else {
    sample_size_num = as.numeric(sample_size)
    if (is.na(sample_size_num) || sample_size_num <= 0 || sample_size_num != as.integer(sample_size_num)) { 
      stop("sample_size must be a positive integer or 'smallest'") 
    }
  }
  
  # Additional validations
  if (sample_size_num < 2) {
    stop("sample_size for bootstrap must be at least 2 for variance calculation")
  }
  
  # Note: For bootstrap with replacement, sample_size can be larger than the original group size
  # This is a key difference from rarefaction
  
  return(sample_size_num)
}






####################
#### S3 methods ####
####################


#' Plot method for disparity_resample objects
#'
#' Creates a confidence interval plot for disparity resample results
#'
#' @param x An object of class "disparity_resample"
#' @param point_color A single color or a vector of colors for point estimates.
#'   If length 1, the same color is used for all points. If length equals the
#'   number of groups, colors are assigned per group. (default "darkblue")
#' @param errorbar_color A single color or a vector of colors for error bars.
#'   Follows the same recycling rules as `point_color`. (default "darkred")
#' @param title Optional character string for plot title (default NULL)
#' @param ... Additional arguments (currently unused)
#'
#' @return A ggplot object
#' @export
plot.disparity_resample=function(x, point_color = "darkblue", errorbar_color = "darkred", title = NULL, ...) {
  # Check if results contain groups or single analysis
  if (any(x$results$group == "All") && nrow(x$results) == 1) {
    # Single group case - already labeled as "All"
    plot_data=x$results
    x_lab="Analysis"
  } else {
    # Multiple groups case
    plot_data=x$results
    x_lab="Group"
  }
  
  # Create plot using internal CI_plot function
  p=CI_plot(data=plot_data, x_var="group", y_var="average",
            ymin_var="CI_min", ymax_var="CI_max",
            x_lab=x_lab, y_lab=x$chosen_statistic, 
            point_color = point_color, errorbar_color = errorbar_color,
            title = title, ...)
  
return(p)
}


#' Print method for disparity_resample objects
#'
#' Prints results table and checks for CI overlap among groups
#'
#' @param x An object of class "disparity_resample"
#' @param ... Additional arguments (not used)
#'
#' @return Invisibly returns the input object
#' @export
print.disparity_resample=function(x, ...) {
  
  cat("Disparity resampling results\n")
  cat("===========================\n\n")
  cat("Statistic:", x$chosen_statistic, "\n")
  
  # Determine if CI=1 (range) or confidence interval
  is_range = !is.null(x$CI_level) && x$CI_level == 1
  
  if (is_range) {
    cat("Range: Full range (minimum to maximum) of resampled values\n\n")
  } else {
    ci_percent = if (!is.null(x$CI_level)) {
      paste0(round(x$CI_level * 100), "%")
    } else {
      "95%"  # Default assumption if CI_level not stored
    }
    cat("Confidence level:", ci_percent, "\n\n")
  }
  
  # Print results table
  print(x$results, row.names=FALSE)
  
  # Check for CI overlap if multiple groups
  if (nrow(x$results) > 1 && !any(x$results$group == "All")) {
    if (is_range) {
      cat("\nRange overlap assessment:\n")
    } else {
      cat("\nConfidence interval overlap assessment:\n")
    }
    
    # Check all pairwise overlaps
    n_groups=nrow(x$results)
    overlaps=matrix(TRUE, nrow=n_groups, ncol=n_groups)
    # Initialize overlap matrix
    
    for (i in seq_len(n_groups)) {
      for (j in seq_len(n_groups)) {
        if (i != j) {
          # Check if CIs overlap: max(min1, min2) <= min(max1, max2)
          overlap_check=max(x$results$CI_min[i], x$results$CI_min[j]) <= 
                        min(x$results$CI_max[i], x$results$CI_max[j])
          overlaps[i, j]=overlap_check
        }
      }
    }
    
    # Summary of overlaps
    all_overlap=all(overlaps[upper.tri(overlaps)])
    # Check upper triangle only (avoid diagonal and duplicates)
    
    if (all_overlap) {
      if (is_range) {
        cat("All ranges overlap.\n")
      } else {
        cat("All confidence intervals overlap.\n")
      }
    } else {
      if (is_range) {
        cat("At least one pair of ranges does not overlap.\n")
      } else {
        cat("At least one pair of confidence intervals does not overlap.\n")
      }
    }
  }
  
  cat("\n")
  invisible(x)
}
Any scripts or data that you put into this service are public.
GeometricMorphometricsMix documentation built on April 18, 2026, 1:06 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
GeometricMorphometricsMix
Heterogeneous Methods for Shape and Other Multidimensional Data

R/disparity_resample.R
In GeometricMorphometricsMix: Heterogeneous Methods for Shape and Other Multidimensional Data

Defines functions print.disparity_resample plot.disparity_resample resolve_bootstrap_sample_size resolve_rarefaction_sample_size validate_stat_reqs_disparity_resample validate_groups_disparity_resample prepare_data_disparity_resample validate_disparity_resample_inputs disparity_resample

Documented in disparity_resample plot.disparity_resample print.disparity_resample

Try the GeometricMorphometricsMix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

GeometricMorphometricsMix Heterogeneous Methods for Shape and Other Multidimensional Data

R/disparity_resample.R In GeometricMorphometricsMix: Heterogeneous Methods for Shape and Other Multidimensional Data

Defines functions print.disparity_resample plot.disparity_resample resolve_bootstrap_sample_size resolve_rarefaction_sample_size validate_stat_reqs_disparity_resample validate_groups_disparity_resample prepare_data_disparity_resample validate_disparity_resample_inputs disparity_resample

Documented in disparity_resample plot.disparity_resample print.disparity_resample

Try the GeometricMorphometricsMix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

GeometricMorphometricsMix
Heterogeneous Methods for Shape and Other Multidimensional Data

R/disparity_resample.R
In GeometricMorphometricsMix: Heterogeneous Methods for Shape and Other Multidimensional Data