Nothing
#' Compare dendrometer growth-fitting methods using fit statistics
#'
#' @description
#' Fits one or more growth models to dendrometer series and returns a compact
#' table of evaluation statistics only.
#'
#' This function is intended for method comparison rather than data extraction.
#' It runs the selected fitting methods, aligns observed and fitted values on
#' all non-missing observation days, and calculates goodness-of-fit measures for
#' each fitted series and vegetation-season fit.
#'
#' Supported single-curve methods are `"gam"`, `"gompertz"`, `"logistic"`,
#' `"richards"`, `"loess"`, and `"spline"`, which are fitted using
#' [dm.growth.fit()]. Supported bimodal methods are `"double_gompertz"` and
#' `"double_richards"`, which are fitted using [dm.growth.fit.double()].
#'
#' The function returns only an evaluation table and does not return fitted
#' curves, processed data, or parameter objects. It is therefore useful for
#' comparing alternative fitting methods across series, years, or sites before
#' selecting a final modeling approach.
#'
#' @details
#' Evaluation statistics are calculated by comparing observed daily values
#' against fitted daily values on all days where observed values are available.
#'
#' The returned metrics have the following interpretation:
#' \itemize{
#' \item `rmse`: root mean squared error. Smaller values indicate closer fit.
#' \item `mae`: mean absolute error. Smaller values indicate closer fit and are
#' less sensitive to large deviations than RMSE.
#' \item `bias`: mean fitted minus observed value. Positive values indicate
#' overestimation and negative values indicate underestimation.
#' \item `r2`: coefficient of determination, computed from residual and total
#' sums of squares. Larger values indicate better fit.
#' \item `correlation`: Pearson correlation between observed and fitted values.
#' Larger values indicate stronger agreement in shape.
#' \item `nrmse`: normalized RMSE, calculated as RMSE divided by the observed
#' value range. This allows comparison across series with different
#' magnitudes.
#' \item `rss`: residual sum of squares.
#' \item `aic_approx` and `bic_approx`: approximate information criteria based
#' on RSS and an effective parameter count rather than the exact likelihood
#' returned by each modeling function.
#' }
#'
#' The columns `aic_approx` and `bic_approx` are labeled as approximate because
#' they are not extracted from the original model objects through a unified
#' likelihood interface. Instead, they are computed from the residual sum of
#' squares and an effective number of fitted parameters. They are therefore most
#' useful for relative comparison among methods fitted to the same dataset.
#'
#' Negative values of `aic_approx` or `bic_approx` are not an error. They occur
#' naturally when the average residual variance is smaller than 1, because the
#' logarithmic term in the information-criterion formula becomes negative. In
#' practice, the absolute sign is not important; only differences among methods
#' fitted to the same data should be interpreted.
#'
#' For methods with flexible smoothness, such as GAM or spline, the effective
#' parameter count is approximate. Consequently, `aic_approx` and `bic_approx`
#' should be treated as heuristic ranking measures rather than exact
#' likelihood-based criteria.
#'
#' @param df A data frame or tibble. The first column must be a time stamp and
#' the remaining selected columns must be numeric dendrometer series.
#' @param TreeNum Either `"all"`, a numeric vector selecting dendrometer series
#' by position, or a character vector of series names.
#' @param methods Character vector of fitting methods to evaluate.
#' @param years Either `"all"` or a character vector of vegetation-season labels.
#' @param year_mode Either `"yearly"` or `"pooled"`.
#' @param fit_GRO Logical. If `TRUE`, convert processed daily series to
#' cumulative growth before fitting.
#' @param site_mode Vegetation season definition. One of `"NH"`, `"SH"`, or
#' `"CS"`.
#' @param custom_veg_season Numeric vector of length 2 defining a custom season
#' in day-of-year coordinates when `site_mode = "CS"`.
#' @param growth_fraction Numeric vector of length 2 used by the fitting
#' functions to estimate cumulative-growth timing.
#' @param min_unique_growth Minimum number of unique cumulative-growth values
#' required before a fit is attempted.
#' @param rate_threshold_fraction Numeric scalar between 0 and 1. Passed to the
#' fitting functions where supported.
#' @param peak_separation_min Minimum peak separation in days for
#' [dm.growth.fit.double()].
#' @param valley_ratio_max Maximum allowed valley-to-peak ratio for
#' [dm.growth.fit.double()].
#' @param min_relative_peak Minimum relative peak height for
#' [dm.growth.fit.double()].
#' @param start_value_gompertz_parameters Optional starting values for Gompertz
#' fits.
#' @param start_value_richards_parameters Optional starting values for logistic
#' and Richards fits.
#' @param start_value_double_gompertz_parameters Optional starting values for
#' double-Gompertz fits.
#' @param start_value_double_richards_parameters Optional starting values for
#' double-Richards fits.
#' @param loess_span Span used when `method = "loess"`.
#' @param spline_df Degrees of freedom used when `method = "spline"`.
#' @param verbose Logical. If `TRUE`, prints progress messages.
#'
#' @return
#' A tibble with one row per fitted series and fit, containing:
#' \describe{
#' \item{method}{Fitting method used for the row.}
#' \item{series}{Series name.}
#' \item{fit_id}{Vegetation-season label or `"pooled"`.}
#' \item{n_compare}{Number of observed days used for comparison.}
#' \item{rmse}{Root mean squared error.}
#' \item{mae}{Mean absolute error.}
#' \item{bias}{Mean fitted minus observed value.}
#' \item{r2}{Coefficient of determination.}
#' \item{correlation}{Pearson correlation between observed and fitted values.}
#' \item{nrmse}{RMSE divided by the observed range.}
#' \item{rss}{Residual sum of squares.}
#' \item{aic_approx}{Approximate AIC based on RSS and effective parameter count.}
#' \item{bic_approx}{Approximate BIC based on RSS and effective parameter count.}
#' \item{k_effective}{Effective parameter count used in the approximate
#' information criteria.}
#' \item{converged}{Logical convergence flag returned by the fitting function.}
#' \item{model_note}{Model note, warning, or error message returned by the
#' fitting function.}
#' }
#'
#' @seealso [dm.growth.fit()], [dm.growth.fit.double()]
#'
#' @export
dm.growth.evaluate <- function(
df,
TreeNum = "all",
methods = c(
"gam", "gompertz", "logistic", "richards", "loess", "spline",
"double_gompertz", "double_richards"
),
years = "all",
year_mode = c("yearly", "pooled"),
fit_GRO = TRUE,
site_mode = c("NH", "SH", "CS"),
custom_veg_season = c(275, 274),
growth_fraction = c(0.1, 0.9),
min_unique_growth = 10,
rate_threshold_fraction = 0.1,
peak_separation_min = 10,
valley_ratio_max = 0.4,
min_relative_peak = 0.05,
start_value_gompertz_parameters = list(a = NA_real_, b = NA_real_, k = NA_real_),
start_value_richards_parameters = list(a = NA_real_, k = NA_real_, t0 = NA_real_, v = 1),
start_value_double_gompertz_parameters = list(
a = NA_real_, k = NA_real_, t0 = NA_real_,
a2 = NA_real_, k2 = NA_real_, t02 = NA_real_
),
start_value_double_richards_parameters = list(
a = NA_real_, k = NA_real_, t0 = NA_real_, v = 1,
a2 = NA_real_, k2 = NA_real_, t02 = NA_real_, v2 = 1
),
loess_span = 0.2,
spline_df = 10,
verbose = TRUE
) {
year_mode <- match.arg(year_mode)
site_mode <- match.arg(site_mode)
supported_methods <- c(
"gam", "gompertz", "logistic", "richards", "loess", "spline",
"double_gompertz", "double_richards"
)
methods <- unique(as.character(methods))
bad_methods <- setdiff(methods, supported_methods)
if (length(bad_methods) > 0) {
stop(
"Unsupported methods: ", paste(bad_methods, collapse = ", "),
". Supported methods are: ", paste(supported_methods, collapse = ", "), "."
)
}
out_list <- vector("list", length(methods))
names(out_list) <- methods
for (i in seq_along(methods)) {
mm <- methods[i]
if (isTRUE(verbose)) {
message("Evaluating method: ", mm)
}
fit_obj <- dmgv_run_method(
method = mm,
df = df,
TreeNum = TreeNum,
years = years,
year_mode = year_mode,
fit_GRO = fit_GRO,
site_mode = site_mode,
custom_veg_season = custom_veg_season,
growth_fraction = growth_fraction,
min_unique_growth = min_unique_growth,
rate_threshold_fraction = rate_threshold_fraction,
peak_separation_min = peak_separation_min,
valley_ratio_max = valley_ratio_max,
min_relative_peak = min_relative_peak,
start_value_gompertz_parameters = start_value_gompertz_parameters,
start_value_richards_parameters = start_value_richards_parameters,
start_value_double_gompertz_parameters = start_value_double_gompertz_parameters,
start_value_double_richards_parameters = start_value_double_richards_parameters,
loess_span = loess_span,
spline_df = spline_df
)
out_list[[i]] <- dmgv_extract_evaluation_table(fit_obj, method_name = mm)
}
dplyr::bind_rows(out_list)
}
# Internal helpers -------------------------------------------------------------
#' @keywords internal
dmgv_run_method <- function(
method,
df,
TreeNum,
years,
year_mode,
fit_GRO,
site_mode,
custom_veg_season,
growth_fraction,
min_unique_growth,
rate_threshold_fraction,
peak_separation_min,
valley_ratio_max,
min_relative_peak,
start_value_gompertz_parameters,
start_value_richards_parameters,
start_value_double_gompertz_parameters,
start_value_double_richards_parameters,
loess_span,
spline_df
) {
if (method %in% c("gam", "gompertz", "logistic", "richards", "loess", "spline")) {
return(
dm.growth.fit(
df = df,
TreeNum = TreeNum,
method = method,
years = years,
year_mode = year_mode,
fit_GRO = fit_GRO,
site_mode = site_mode,
custom_veg_season = custom_veg_season,
growth_fraction = growth_fraction,
min_unique_growth = min_unique_growth,
rate_threshold_fraction = rate_threshold_fraction,
start_value_gompertz_parameters = start_value_gompertz_parameters,
start_value_richards_parameters = start_value_richards_parameters,
loess_span = loess_span,
spline_df = spline_df,
verbose = FALSE
)
)
}
if (method == "double_gompertz") {
return(
dm.growth.fit.double(
df = df,
TreeNum = TreeNum,
method = "gompertz",
years = years,
year_mode = year_mode,
fit_GRO = fit_GRO,
site_mode = site_mode,
custom_veg_season = custom_veg_season,
growth_fraction = growth_fraction,
min_unique_growth = min_unique_growth,
rate_threshold_fraction = rate_threshold_fraction,
peak_separation_min = peak_separation_min,
valley_ratio_max = valley_ratio_max,
min_relative_peak = min_relative_peak,
start_value_double_gompertz_parameters = start_value_double_gompertz_parameters,
start_value_double_richards_parameters = start_value_double_richards_parameters,
verbose = FALSE
)
)
}
if (method == "double_richards") {
return(
dm.growth.fit.double(
df = df,
TreeNum = TreeNum,
method = "richards",
years = years,
year_mode = year_mode,
fit_GRO = fit_GRO,
site_mode = site_mode,
custom_veg_season = custom_veg_season,
growth_fraction = growth_fraction,
min_unique_growth = min_unique_growth,
rate_threshold_fraction = rate_threshold_fraction,
peak_separation_min = peak_separation_min,
valley_ratio_max = valley_ratio_max,
min_relative_peak = min_relative_peak,
start_value_double_gompertz_parameters = start_value_double_gompertz_parameters,
start_value_double_richards_parameters = start_value_double_richards_parameters,
verbose = FALSE
)
)
}
stop("Unknown method: ", method)
}
#' @keywords internal
dmgv_effective_parameters <- function(method, edf = NA_real_, spline_df = NA_real_) {
if (method == "gam") {
if (is.finite(edf)) return(as.numeric(edf) + 1)
return(NA_real_)
}
if (method == "gompertz") return(3)
if (method == "logistic") return(3)
if (method == "richards") return(4)
if (method == "loess") {
if (is.finite(edf)) return(as.numeric(edf))
return(NA_real_)
}
if (method == "spline") {
if (is.finite(spline_df)) return(as.numeric(spline_df))
return(NA_real_)
}
if (method == "double_gompertz") return(6)
if (method == "double_richards") return(8)
NA_real_
}
#' @keywords internal
dmgv_safe_r2 <- function(obs, fit) {
ok <- is.finite(obs) & is.finite(fit)
if (sum(ok) < 2) return(NA_real_)
obs <- obs[ok]
fit <- fit[ok]
sse <- sum((obs - fit)^2)
sst <- sum((obs - mean(obs))^2)
if (!is.finite(sst) || sst <= 0) return(NA_real_)
1 - sse / sst
}
#' @keywords internal
dmgv_safe_cor <- function(obs, fit) {
ok <- is.finite(obs) & is.finite(fit)
if (sum(ok) < 2) return(NA_real_)
suppressWarnings(stats::cor(obs[ok], fit[ok]))
}
#' @keywords internal
dmgv_extract_evaluation_table <- function(fit_obj, method_name) {
TIME <- season_label <- season_day <- observed <- series <- NULL
fit_id <- fitted <- obs_range <- rmse <- method_requested <- edf <- spline_df <- NULL
converged <- model_note <- k_effective <- rss <- n_compare <- method <- mae <- NULL
bias <- r2 <- correlation <- nrmse <- aic_approx <- bic_approx <- NULL
meta_cols <- c(
"TIME", "season_label", "season_start", "season_end",
"season_day", "season_length", "any_observed"
)
proc <- fit_obj$processed_data
fitd <- fit_obj$fitted_data
fit_params <- fit_obj$fit_parameters
series_cols <- intersect(
setdiff(names(proc), meta_cols),
setdiff(names(fitd), meta_cols)
)
obs_long <- proc %>%
dplyr::select(TIME, season_label, season_day, dplyr::all_of(series_cols)) %>%
tidyr::pivot_longer(
cols = dplyr::all_of(series_cols),
names_to = "series",
values_to = "observed"
)
fit_long <- fitd %>%
dplyr::select(TIME, season_label, season_day, dplyr::all_of(series_cols)) %>%
tidyr::pivot_longer(
cols = dplyr::all_of(series_cols),
names_to = "series",
values_to = "fitted"
)
comp_long <- obs_long %>%
dplyr::left_join(fit_long, by = c("TIME", "season_label", "season_day", "series")) %>%
dplyr::filter(is.finite(observed))
if (all(fit_params$fit_id == "pooled")) {
comp_long$fit_id <- "pooled"
} else {
comp_long$fit_id <- as.character(comp_long$season_label)
}
metric_tbl <- comp_long %>%
dplyr::group_by(series, fit_id) %>%
dplyr::summarise(
n_compare = sum(is.finite(observed) & is.finite(fitted)),
rss = sum((observed - fitted)^2, na.rm = TRUE),
rmse = sqrt(mean((observed - fitted)^2, na.rm = TRUE)),
mae = mean(abs(observed - fitted), na.rm = TRUE),
bias = mean(fitted - observed, na.rm = TRUE),
r2 = dmgv_safe_r2(observed, fitted),
correlation = dmgv_safe_cor(observed, fitted),
obs_range = diff(range(observed, na.rm = TRUE)),
.groups = "drop"
) %>%
dplyr::mutate(
nrmse = dplyr::if_else(is.finite(obs_range) & obs_range > 0, rmse / obs_range, NA_real_)
) %>%
dplyr::select(-obs_range)
eval_tbl <- fit_params %>%
dplyr::mutate(
method_requested = method_name,
k_effective = mapply(
FUN = dmgv_effective_parameters,
method = method_requested,
edf = dplyr::coalesce(edf, NA_real_),
spline_df = dplyr::coalesce(spline_df, NA_real_)
)
) %>%
dplyr::select(series, fit_id, converged, model_note, k_effective, method_requested) %>%
dplyr::left_join(metric_tbl, by = c("series", "fit_id")) %>%
dplyr::mutate(
aic_approx = dplyr::if_else(
is.finite(rss) & rss > 0 & is.finite(k_effective) & n_compare > 0,
n_compare * log(rss / n_compare) + 2 * k_effective,
NA_real_
),
bic_approx = dplyr::if_else(
is.finite(rss) & rss > 0 & is.finite(k_effective) & n_compare > 0,
n_compare * log(rss / n_compare) + log(n_compare) * k_effective,
NA_real_
)
) %>%
dplyr::rename(method = method_requested) %>%
dplyr::select(
method, series, fit_id,
n_compare,
rmse, mae, bias, r2, correlation, nrmse, rss,
aic_approx, bic_approx, k_effective,
converged, model_note
)
class(eval_tbl) <- c("dm_growth_evaluation", class(eval_tbl))
attr(eval_tbl, "call") <- match.call()
eval_tbl
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.