R/dm.growth.evaluate.R

Defines functions dmgv_extract_evaluation_table dmgv_safe_cor dmgv_safe_r2 dmgv_effective_parameters dmgv_run_method dm.growth.evaluate

Documented in dm.growth.evaluate

#' Compare dendrometer growth-fitting methods using fit statistics
#'
#' @description
#' Fits one or more growth models to dendrometer series and returns a compact
#' table of evaluation statistics only.
#'
#' This function is intended for method comparison rather than data extraction.
#' It runs the selected fitting methods, aligns observed and fitted values on
#' all non-missing observation days, and calculates goodness-of-fit measures for
#' each fitted series and vegetation-season fit.
#'
#' Supported single-curve methods are `"gam"`, `"gompertz"`, `"logistic"`,
#' `"richards"`, `"loess"`, and `"spline"`, which are fitted using
#' [dm.growth.fit()]. Supported bimodal methods are `"double_gompertz"` and
#' `"double_richards"`, which are fitted using [dm.growth.fit.double()].
#'
#' The function returns only an evaluation table and does not return fitted
#' curves, processed data, or parameter objects. It is therefore useful for
#' comparing alternative fitting methods across series, years, or sites before
#' selecting a final modeling approach.
#'
#' @details
#' Evaluation statistics are calculated by comparing observed daily values
#' against fitted daily values on all days where observed values are available.
#'
#' The returned metrics have the following interpretation:
#' \itemize{
#'   \item `rmse`: root mean squared error. Smaller values indicate closer fit.
#'   \item `mae`: mean absolute error. Smaller values indicate closer fit and are
#'     less sensitive to large deviations than RMSE.
#'   \item `bias`: mean fitted minus observed value. Positive values indicate
#'     overestimation and negative values indicate underestimation.
#'   \item `r2`: coefficient of determination, computed from residual and total
#'     sums of squares. Larger values indicate better fit.
#'   \item `correlation`: Pearson correlation between observed and fitted values.
#'     Larger values indicate stronger agreement in shape.
#'   \item `nrmse`: normalized RMSE, calculated as RMSE divided by the observed
#'     value range. This allows comparison across series with different
#'     magnitudes.
#'   \item `rss`: residual sum of squares.
#'   \item `aic_approx` and `bic_approx`: approximate information criteria based
#'     on RSS and an effective parameter count rather than the exact likelihood
#'     returned by each modeling function.
#' }
#'
#' The columns `aic_approx` and `bic_approx` are labeled as approximate because
#' they are not extracted from the original model objects through a unified
#' likelihood interface. Instead, they are computed from the residual sum of
#' squares and an effective number of fitted parameters. They are therefore most
#' useful for relative comparison among methods fitted to the same dataset.
#'
#' Negative values of `aic_approx` or `bic_approx` are not an error. They occur
#' naturally when the average residual variance is smaller than 1, because the
#' logarithmic term in the information-criterion formula becomes negative. In
#' practice, the absolute sign is not important; only differences among methods
#' fitted to the same data should be interpreted.
#'
#' For methods with flexible smoothness, such as GAM or spline, the effective
#' parameter count is approximate. Consequently, `aic_approx` and `bic_approx`
#' should be treated as heuristic ranking measures rather than exact
#' likelihood-based criteria.
#'
#' @param df A data frame or tibble. The first column must be a time stamp and
#'   the remaining selected columns must be numeric dendrometer series.
#' @param TreeNum Either `"all"`, a numeric vector selecting dendrometer series
#'   by position, or a character vector of series names.
#' @param methods Character vector of fitting methods to evaluate.
#' @param years Either `"all"` or a character vector of vegetation-season labels.
#' @param year_mode Either `"yearly"` or `"pooled"`.
#' @param fit_GRO Logical. If `TRUE`, convert processed daily series to
#'   cumulative growth before fitting.
#' @param site_mode Vegetation season definition. One of `"NH"`, `"SH"`, or
#'   `"CS"`.
#' @param custom_veg_season Numeric vector of length 2 defining a custom season
#'   in day-of-year coordinates when `site_mode = "CS"`.
#' @param growth_fraction Numeric vector of length 2 used by the fitting
#'   functions to estimate cumulative-growth timing.
#' @param min_unique_growth Minimum number of unique cumulative-growth values
#'   required before a fit is attempted.
#' @param rate_threshold_fraction Numeric scalar between 0 and 1. Passed to the
#'   fitting functions where supported.
#' @param peak_separation_min Minimum peak separation in days for
#'   [dm.growth.fit.double()].
#' @param valley_ratio_max Maximum allowed valley-to-peak ratio for
#'   [dm.growth.fit.double()].
#' @param min_relative_peak Minimum relative peak height for
#'   [dm.growth.fit.double()].
#' @param start_value_gompertz_parameters Optional starting values for Gompertz
#'   fits.
#' @param start_value_richards_parameters Optional starting values for logistic
#'   and Richards fits.
#' @param start_value_double_gompertz_parameters Optional starting values for
#'   double-Gompertz fits.
#' @param start_value_double_richards_parameters Optional starting values for
#'   double-Richards fits.
#' @param loess_span Span used when `method = "loess"`.
#' @param spline_df Degrees of freedom used when `method = "spline"`.
#' @param verbose Logical. If `TRUE`, prints progress messages.
#'
#' @return
#' A tibble with one row per fitted series and fit, containing:
#' \describe{
#'   \item{method}{Fitting method used for the row.}
#'   \item{series}{Series name.}
#'   \item{fit_id}{Vegetation-season label or `"pooled"`.}
#'   \item{n_compare}{Number of observed days used for comparison.}
#'   \item{rmse}{Root mean squared error.}
#'   \item{mae}{Mean absolute error.}
#'   \item{bias}{Mean fitted minus observed value.}
#'   \item{r2}{Coefficient of determination.}
#'   \item{correlation}{Pearson correlation between observed and fitted values.}
#'   \item{nrmse}{RMSE divided by the observed range.}
#'   \item{rss}{Residual sum of squares.}
#'   \item{aic_approx}{Approximate AIC based on RSS and effective parameter count.}
#'   \item{bic_approx}{Approximate BIC based on RSS and effective parameter count.}
#'   \item{k_effective}{Effective parameter count used in the approximate
#'     information criteria.}
#'   \item{converged}{Logical convergence flag returned by the fitting function.}
#'   \item{model_note}{Model note, warning, or error message returned by the
#'     fitting function.}
#' }
#'
#' @seealso [dm.growth.fit()], [dm.growth.fit.double()]
#'
#' @export
dm.growth.evaluate <- function(
    df,
    TreeNum = "all",
    methods = c(
      "gam", "gompertz", "logistic", "richards", "loess", "spline",
      "double_gompertz", "double_richards"
    ),
    years = "all",
    year_mode = c("yearly", "pooled"),
    fit_GRO = TRUE,
    site_mode = c("NH", "SH", "CS"),
    custom_veg_season = c(275, 274),
    growth_fraction = c(0.1, 0.9),
    min_unique_growth = 10,
    rate_threshold_fraction = 0.1,
    peak_separation_min = 10,
    valley_ratio_max = 0.4,
    min_relative_peak = 0.05,
    start_value_gompertz_parameters = list(a = NA_real_, b = NA_real_, k = NA_real_),
    start_value_richards_parameters = list(a = NA_real_, k = NA_real_, t0 = NA_real_, v = 1),
    start_value_double_gompertz_parameters = list(
      a = NA_real_, k = NA_real_, t0 = NA_real_,
      a2 = NA_real_, k2 = NA_real_, t02 = NA_real_
    ),
    start_value_double_richards_parameters = list(
      a = NA_real_, k = NA_real_, t0 = NA_real_, v = 1,
      a2 = NA_real_, k2 = NA_real_, t02 = NA_real_, v2 = 1
    ),
    loess_span = 0.2,
    spline_df = 10,
    verbose = TRUE
) {
  year_mode <- match.arg(year_mode)
  site_mode <- match.arg(site_mode)


  supported_methods <- c(
    "gam", "gompertz", "logistic", "richards", "loess", "spline",
    "double_gompertz", "double_richards"
  )

  methods <- unique(as.character(methods))
  bad_methods <- setdiff(methods, supported_methods)
  if (length(bad_methods) > 0) {
    stop(
      "Unsupported methods: ", paste(bad_methods, collapse = ", "),
      ". Supported methods are: ", paste(supported_methods, collapse = ", "), "."
    )
  }

  out_list <- vector("list", length(methods))
  names(out_list) <- methods

  for (i in seq_along(methods)) {
    mm <- methods[i]

    if (isTRUE(verbose)) {
      message("Evaluating method: ", mm)
    }

    fit_obj <- dmgv_run_method(
      method = mm,
      df = df,
      TreeNum = TreeNum,
      years = years,
      year_mode = year_mode,
      fit_GRO = fit_GRO,
      site_mode = site_mode,
      custom_veg_season = custom_veg_season,
      growth_fraction = growth_fraction,
      min_unique_growth = min_unique_growth,
      rate_threshold_fraction = rate_threshold_fraction,
      peak_separation_min = peak_separation_min,
      valley_ratio_max = valley_ratio_max,
      min_relative_peak = min_relative_peak,
      start_value_gompertz_parameters = start_value_gompertz_parameters,
      start_value_richards_parameters = start_value_richards_parameters,
      start_value_double_gompertz_parameters = start_value_double_gompertz_parameters,
      start_value_double_richards_parameters = start_value_double_richards_parameters,
      loess_span = loess_span,
      spline_df = spline_df
    )

    out_list[[i]] <- dmgv_extract_evaluation_table(fit_obj, method_name = mm)
  }

  dplyr::bind_rows(out_list)
}


# Internal helpers -------------------------------------------------------------

#' @keywords internal
dmgv_run_method <- function(
    method,
    df,
    TreeNum,
    years,
    year_mode,
    fit_GRO,
    site_mode,
    custom_veg_season,
    growth_fraction,
    min_unique_growth,
    rate_threshold_fraction,
    peak_separation_min,
    valley_ratio_max,
    min_relative_peak,
    start_value_gompertz_parameters,
    start_value_richards_parameters,
    start_value_double_gompertz_parameters,
    start_value_double_richards_parameters,
    loess_span,
    spline_df
) {
  if (method %in% c("gam", "gompertz", "logistic", "richards", "loess", "spline")) {
    return(
      dm.growth.fit(
        df = df,
        TreeNum = TreeNum,
        method = method,
        years = years,
        year_mode = year_mode,
        fit_GRO = fit_GRO,
        site_mode = site_mode,
        custom_veg_season = custom_veg_season,
        growth_fraction = growth_fraction,
        min_unique_growth = min_unique_growth,
        rate_threshold_fraction = rate_threshold_fraction,
        start_value_gompertz_parameters = start_value_gompertz_parameters,
        start_value_richards_parameters = start_value_richards_parameters,
        loess_span = loess_span,
        spline_df = spline_df,
        verbose = FALSE
      )
    )
  }

  if (method == "double_gompertz") {
    return(
      dm.growth.fit.double(
        df = df,
        TreeNum = TreeNum,
        method = "gompertz",
        years = years,
        year_mode = year_mode,
        fit_GRO = fit_GRO,
        site_mode = site_mode,
        custom_veg_season = custom_veg_season,
        growth_fraction = growth_fraction,
        min_unique_growth = min_unique_growth,
        rate_threshold_fraction = rate_threshold_fraction,
        peak_separation_min = peak_separation_min,
        valley_ratio_max = valley_ratio_max,
        min_relative_peak = min_relative_peak,
        start_value_double_gompertz_parameters = start_value_double_gompertz_parameters,
        start_value_double_richards_parameters = start_value_double_richards_parameters,
        verbose = FALSE
      )
    )
  }

  if (method == "double_richards") {
    return(
      dm.growth.fit.double(
        df = df,
        TreeNum = TreeNum,
        method = "richards",
        years = years,
        year_mode = year_mode,
        fit_GRO = fit_GRO,
        site_mode = site_mode,
        custom_veg_season = custom_veg_season,
        growth_fraction = growth_fraction,
        min_unique_growth = min_unique_growth,
        rate_threshold_fraction = rate_threshold_fraction,
        peak_separation_min = peak_separation_min,
        valley_ratio_max = valley_ratio_max,
        min_relative_peak = min_relative_peak,
        start_value_double_gompertz_parameters = start_value_double_gompertz_parameters,
        start_value_double_richards_parameters = start_value_double_richards_parameters,
        verbose = FALSE
      )
    )
  }

  stop("Unknown method: ", method)
}

#' @keywords internal
dmgv_effective_parameters <- function(method, edf = NA_real_, spline_df = NA_real_) {
  if (method == "gam") {
    if (is.finite(edf)) return(as.numeric(edf) + 1)
    return(NA_real_)
  }
  if (method == "gompertz") return(3)
  if (method == "logistic") return(3)
  if (method == "richards") return(4)
  if (method == "loess") {
    if (is.finite(edf)) return(as.numeric(edf))
    return(NA_real_)
  }
  if (method == "spline") {
    if (is.finite(spline_df)) return(as.numeric(spline_df))
    return(NA_real_)
  }
  if (method == "double_gompertz") return(6)
  if (method == "double_richards") return(8)
  NA_real_
}

#' @keywords internal
dmgv_safe_r2 <- function(obs, fit) {
  ok <- is.finite(obs) & is.finite(fit)
  if (sum(ok) < 2) return(NA_real_)

  obs <- obs[ok]
  fit <- fit[ok]

  sse <- sum((obs - fit)^2)
  sst <- sum((obs - mean(obs))^2)

  if (!is.finite(sst) || sst <= 0) return(NA_real_)
  1 - sse / sst
}

#' @keywords internal
dmgv_safe_cor <- function(obs, fit) {
  ok <- is.finite(obs) & is.finite(fit)
  if (sum(ok) < 2) return(NA_real_)
  suppressWarnings(stats::cor(obs[ok], fit[ok]))
}

#' @keywords internal
dmgv_extract_evaluation_table <- function(fit_obj, method_name) {
  TIME  <- season_label  <- season_day  <- observed  <- series  <- NULL
  fit_id  <- fitted  <- obs_range  <- rmse  <- method_requested  <- edf  <- spline_df <- NULL
  converged <- model_note <- k_effective <- rss <- n_compare <- method <- mae <- NULL
  bias <- r2 <- correlation <- nrmse <- aic_approx <- bic_approx <- NULL
  meta_cols <- c(
    "TIME", "season_label", "season_start", "season_end",
    "season_day", "season_length", "any_observed"
  )

  proc <- fit_obj$processed_data
  fitd <- fit_obj$fitted_data
  fit_params <- fit_obj$fit_parameters

  series_cols <- intersect(
    setdiff(names(proc), meta_cols),
    setdiff(names(fitd), meta_cols)
  )

  obs_long <- proc %>%
    dplyr::select(TIME, season_label, season_day, dplyr::all_of(series_cols)) %>%
    tidyr::pivot_longer(
      cols = dplyr::all_of(series_cols),
      names_to = "series",
      values_to = "observed"
    )

  fit_long <- fitd %>%
    dplyr::select(TIME, season_label, season_day, dplyr::all_of(series_cols)) %>%
    tidyr::pivot_longer(
      cols = dplyr::all_of(series_cols),
      names_to = "series",
      values_to = "fitted"
    )

  comp_long <- obs_long %>%
    dplyr::left_join(fit_long, by = c("TIME", "season_label", "season_day", "series")) %>%
    dplyr::filter(is.finite(observed))

  if (all(fit_params$fit_id == "pooled")) {
    comp_long$fit_id <- "pooled"
  } else {
    comp_long$fit_id <- as.character(comp_long$season_label)
  }

  metric_tbl <- comp_long %>%
    dplyr::group_by(series, fit_id) %>%
    dplyr::summarise(
      n_compare = sum(is.finite(observed) & is.finite(fitted)),
      rss = sum((observed - fitted)^2, na.rm = TRUE),
      rmse = sqrt(mean((observed - fitted)^2, na.rm = TRUE)),
      mae = mean(abs(observed - fitted), na.rm = TRUE),
      bias = mean(fitted - observed, na.rm = TRUE),
      r2 = dmgv_safe_r2(observed, fitted),
      correlation = dmgv_safe_cor(observed, fitted),
      obs_range = diff(range(observed, na.rm = TRUE)),
      .groups = "drop"
    ) %>%
    dplyr::mutate(
      nrmse = dplyr::if_else(is.finite(obs_range) & obs_range > 0, rmse / obs_range, NA_real_)
    ) %>%
    dplyr::select(-obs_range)

  eval_tbl <- fit_params %>%
    dplyr::mutate(
      method_requested = method_name,
      k_effective = mapply(
        FUN = dmgv_effective_parameters,
        method = method_requested,
        edf = dplyr::coalesce(edf, NA_real_),
        spline_df = dplyr::coalesce(spline_df, NA_real_)
      )
    ) %>%
    dplyr::select(series, fit_id, converged, model_note, k_effective, method_requested) %>%
    dplyr::left_join(metric_tbl, by = c("series", "fit_id")) %>%
    dplyr::mutate(
      aic_approx = dplyr::if_else(
        is.finite(rss) & rss > 0 & is.finite(k_effective) & n_compare > 0,
        n_compare * log(rss / n_compare) + 2 * k_effective,
        NA_real_
      ),
      bic_approx = dplyr::if_else(
        is.finite(rss) & rss > 0 & is.finite(k_effective) & n_compare > 0,
        n_compare * log(rss / n_compare) + log(n_compare) * k_effective,
        NA_real_
      )
    ) %>%
    dplyr::rename(method = method_requested) %>%
    dplyr::select(
      method, series, fit_id,
      n_compare,
      rmse, mae, bias, r2, correlation, nrmse, rss,
      aic_approx, bic_approx, k_effective,
      converged, model_note
    )

  class(eval_tbl) <- c("dm_growth_evaluation", class(eval_tbl))
  attr(eval_tbl, "call") <- match.call()
  eval_tbl
}

Try the dendRoAnalyst package in your browser

Any scripts or data that you put into this service are public.

dendRoAnalyst documentation built on May 20, 2026, 5:07 p.m.