View source: R/char_thresh_local.R
| char_thresh_local | R Documentation |
Determines a sliding-window threshold value for each sample, based on the
distribution of C_peak within a window centred on that sample. The noise
component within each window is modelled as a zero-mean Gaussian (residuals),
a one-mean Gaussian (ratios), or via a Gaussian mixture model (GMM).
Mirrors CharThreshLocal.m from the MATLAB v2.0 codebase.
char_thresh_local(
charcoal,
smoothing,
peak_analysis,
site = NULL,
results = NULL,
plot_data = 0L
)
charcoal |
Named list with |
smoothing |
Named list with |
peak_analysis |
Named list with |
site |
Character string (site name; unused in R, kept for API symmetry). |
results |
Named list (unused in R, kept for API symmetry). |
plot_data |
0/1 flag; ignored in R (no diagnostic plots). |
## Window selection
The half-window is hw = round(0.5 * smoothing$yr / r) samples, where
r = mean(diff(charcoal$ybpI)) is the record resolution. The window
is shifted (not shrunk) at record boundaries, matching MATLAB's loop logic.
## NaN bridging within windows
NaN entries in charcoal$peak (from record gaps) are replaced with the
neutral value (0 for residuals, 1 for ratios) before distribution fitting.
This preserves window size while preventing gap samples from biasing the fit,
matching the v2.0 bug fix documented in CharThreshLocal.m.
## Small-sample fallback If fewer than 4 non-neutral samples exist in the window, a simple Gaussian is fitted (same as method 2) and the KS test is skipped.
## GMM (threshMethod = 3)
MATLAB's GaussianMixture(X, 2, 2, false) is replicated by
gaussian_mixture_em(X) from utils_gaussian_mixture.R. This
uses the same first/last-point initialisation and loose convergence
criterion (\epsilon = 0.03 \log N) as the original Bowman CLUSTER
EM, substantially improving agreement with MATLAB threshold values compared
to mclust. The noise component is identified as the Gaussian with
the smaller mean. Poor-fit fallback (mu1 == mu2) re-fits with K = 3 and
takes the two components with the smallest means, mirroring the MATLAB v2.0
bug fix (the fallback passes the local window X, not the full record).
## KS goodness-of-fit
MATLAB evaluates the fitted normal CDF at 101 equally-spaced bin centres
and calls kstest() with a custom CDF table. R uses
stats::ks.test(noise, "pnorm", mu, sigma), which evaluates the
continuous CDF at each observation. P-values may differ by a small amount
for small samples; the statistic converges as sample size grows.
## Post-loop smoothing
After the per-sample loop, pos, neg, and SNI are
smoothed with char_lowess(span = smoothing$yr / r, iter = 0).
SNI is then clamped to \geq 0.
Named list char_thresh with elements:
Numeric matrix [N x 4]: per-sample positive threshold for each
of the four threshValues, after Lowess smoothing.
Numeric matrix [N x 4]: per-sample negative threshold, after Lowess smoothing.
Numeric vector [N]: signal-to-noise index time series, Lowess-
smoothed and clamped to \geq 0.
Numeric vector [N]: KS goodness-of-fit p-values per sample (NA where fewer than 4 noise samples exist).
[char_thresh_global()], [char_smooth()], [CharAnalysis()]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.