char_thresh_local: Calculate a per-sample local threshold for charcoal peak...

View source: R/char_thresh_local.R

char_thresh_localR Documentation

Calculate a per-sample local threshold for charcoal peak identification

Description

Determines a sliding-window threshold value for each sample, based on the distribution of C_peak within a window centred on that sample. The noise component within each window is modelled as a zero-mean Gaussian (residuals), a one-mean Gaussian (ratios), or via a Gaussian mixture model (GMM). Mirrors CharThreshLocal.m from the MATLAB v2.0 codebase.

Usage

char_thresh_local(
  charcoal,
  smoothing,
  peak_analysis,
  site = NULL,
  results = NULL,
  plot_data = 0L
)

Arguments

charcoal

Named list with peak (C_peak series), ybpI (resampled ages), and accI.

smoothing

Named list with yr (window width in years).

peak_analysis

Named list with cPeak, threshMethod, threshValues (length 4).

site

Character string (site name; unused in R, kept for API symmetry).

results

Named list (unused in R, kept for API symmetry).

plot_data

0/1 flag; ignored in R (no diagnostic plots).

Details

## Window selection The half-window is hw = round(0.5 * smoothing$yr / r) samples, where r = mean(diff(charcoal$ybpI)) is the record resolution. The window is shifted (not shrunk) at record boundaries, matching MATLAB's loop logic.

## NaN bridging within windows NaN entries in charcoal$peak (from record gaps) are replaced with the neutral value (0 for residuals, 1 for ratios) before distribution fitting. This preserves window size while preventing gap samples from biasing the fit, matching the v2.0 bug fix documented in CharThreshLocal.m.

## Small-sample fallback If fewer than 4 non-neutral samples exist in the window, a simple Gaussian is fitted (same as method 2) and the KS test is skipped.

## GMM (threshMethod = 3) MATLAB's GaussianMixture(X, 2, 2, false) is replicated by gaussian_mixture_em(X) from utils_gaussian_mixture.R. This uses the same first/last-point initialisation and loose convergence criterion (\epsilon = 0.03 \log N) as the original Bowman CLUSTER EM, substantially improving agreement with MATLAB threshold values compared to mclust. The noise component is identified as the Gaussian with the smaller mean. Poor-fit fallback (mu1 == mu2) re-fits with K = 3 and takes the two components with the smallest means, mirroring the MATLAB v2.0 bug fix (the fallback passes the local window X, not the full record).

## KS goodness-of-fit MATLAB evaluates the fitted normal CDF at 101 equally-spaced bin centres and calls kstest() with a custom CDF table. R uses stats::ks.test(noise, "pnorm", mu, sigma), which evaluates the continuous CDF at each observation. P-values may differ by a small amount for small samples; the statistic converges as sample size grows.

## Post-loop smoothing After the per-sample loop, pos, neg, and SNI are smoothed with char_lowess(span = smoothing$yr / r, iter = 0). SNI is then clamped to \geq 0.

Value

Named list char_thresh with elements:

pos

Numeric matrix [N x 4]: per-sample positive threshold for each of the four threshValues, after Lowess smoothing.

neg

Numeric matrix [N x 4]: per-sample negative threshold, after Lowess smoothing.

SNI

Numeric vector [N]: signal-to-noise index time series, Lowess- smoothed and clamped to \geq 0.

GOF

Numeric vector [N]: KS goodness-of-fit p-values per sample (NA where fewer than 4 noise samples exist).

See Also

[char_thresh_global()], [char_smooth()], [CharAnalysis()]


CharAnalysis documentation built on May 3, 2026, 5:06 p.m.