#' Identifying changes in mean and variance of accumulation-rate records
#'
#' @description
#' Determines zone boundaries for single proxy accumulation-rate records based
#' on change-point analysis and checks for the influence of
#' sediment-accumulation rates on the detected change points.
#'
#' @param serie A list returned from the \code{pretreatment_data()}
#' function. NB: the \code{pretreatment_data()} function must be run
#' with the argument \code{out = "accI"}.
#' @param series_name A character string giving the name of the data set,
#' or typically the site name (optional).
#' @param proxy A character string giving the name (typically the column name)
#' of the variable that should be analysed.
#' @param n_rand Positive integer giving the number of iterations used to check
#' for change-points in random data sets (by default
#' \code{n_rand = 1000}).
#' @param bootstrap Logical. Determines how the random data sets are generated.
#' If TRUE, then each of the \code{n_rand} random data sets are
#' generated by random sampling with replacement from the input data.
#' If FALSE, then the code generates in each of the \code{n_rand}
#' iterations a vector of random concentration values having a
#' user-specified distribution (based on \code{rand_ds_distr}).
#' @param rand_ds_distr A character string giving the distribution used
#' to generate the random data sets when \code{bootstrap = FALSE}.
#' Currently only "uniform" is supported (default). The random values
#' range between the minimum and the maximum concentration values of
#' the re-sampled series.
#' @param max_cpts Positive integer giving the maximum number of change points
#' to search for when \code{meth_cpt = "BinSeg"}. By default
#' \code{max_cpts = 50}).
#' @param n_screen Positive number giving the minimum frequency of occurrence of
#' change-points in the random data sets to be validated. By default,
#' \code{n_screen = 0.025}. Thus, with \code{n_rand = 1000} this equals
#' a 2.5% chance of occurrence.
#' @param tau Positive integer giving the tolerance search window (in years)
#' for change points in the 'true proxy-accumulation rate' record. By
#' default \code{tau} = the number of years the raw record was re-sampled
#' to (as of variable \code{yr.interp} that is written in the list
#' generated with the \code{pretreatment_data()} function).
#' @param meth_cpt As of \code{changepoint::cpt.meanvar()}: choice of
#' "AMOC", "PELT", "BinSeg", or "SegNeigh". But note that "SegNeigh" is
#' computationally slow, use other methods instead.
#' @param pen_method Refer to \code{changepoint::cpt.meanvar()} for
#' argument \code{penalty}. By default, \code{pen_method = "Manual"}.
#' @param pen_val Refer to \code{changepoint::cpt.meanvar()} for the
#' argument \code{pen.value}. By default, \code{pen_val = "4*log(n)"},
#' where \code{n} is equal to the number of observations.
#' @param t_stat Refer to \code{changepoint::cpt.meanvar()} for the
#' argument \code{test.stat}. Currently only "Normal" is supported
#' (default).
#' @param minseglen Positive integer giving the minimum number of observations
#' within zones. By default \code{minseglen = 2}, see also
#' \code{changepoint::cpt.meanvar()}.
#'
#' @details This function is used to determine if any of the change points of
#' a proxy-accumulation record is strongly influenced by the modeled
#' sediment-accumulation rates. The function performs a change-point
#' analysis with the 'true' proxy-accumulation record (based on
#' the effective concentration values), and in addition it performs a
#' change-point analysis on a set (by default \code{n_rand = 1000}) of
#' non-sense proxy-accumulation records (based on randomly drawn
#' concentration values). The change points of the latter records
#' (hereafter, non-sense change points) are then first
#' screened based on their frequency of occurrence (\code{n_screen}, as
#' some change points may occur by chance). Thereafter, the change points
#' are screened further to retain
#' those that separate zones where the mean proxy-accumulation rate
#' changes in the same direction as the mean sediment-accumulation rate.
#' The rationale being that if a change point of the 'true
#' proxy-accumulation record' matches a non-sense change point
#' (with a tolerance search window equal to \code{tau}), and
#' if both records change in the same direction (say, both the
#' proxy-accumulation rate and the sediment-accumulation rate increase
#' across a change point), then the change point of the
#' proxy-accumulation rate may be strongly influenced by
#' sediment-accumulation rates (proxy_ar = proxy_conc * sed_ar).
#'
#' @returns
#' a list with the depths and ages of the change points, the mean values
#' for the zones, as well as the settings of the change-point analysis.
#'
#' In addition, a diagnostic plot consisting of three panels is sent to the
#' device.
#' The plot includes (from top to bottom):
#' - the re-sampled concentration values,
#' - the sediment-accumulation rate, and
#' - the proxy accumulation-rate record.
#'
#' The latter panel may show:
#' - change points of the proxy accumulation-rate record (blue vertical lines),
#' - mean proxy accumulation-rate values for the zones (red horizontal lines),
#' - non-sense change points that match 'true change points' (red circles),
#' - non-sense change points that do not match 'true change points' (green
#' circles),
#'
#' @seealso \code{\link{cpt.meanvar}}, \code{\link{pretreatment_data}}
#'
#' @references
#' Finsinger W, Magyari EK, Fevre J, Orban I, Pal I, Vincze I,
#' Hubay K, Birks HH, Braun M, Toth M (2016) Holocene fire regimes near the
#' treeline in the Retezat Mts. (Southern Carpathians). Quaternary
#' International, *477*, 94-105. 10.1016/j.quaint.2016.04.029.
#'
#' Killick R, Eckley IA (2014). changepoint: An R Package for Changepoint
#' Analysis. Journal of Statistical Software, *58*(3), 1-19.
#'
#' @examples
#' \dontrun{
#' co <- tapas::co_char_data
#' co_i <- tapas::pretreatment_data(co, out = "accI")
#'
#' ## change-point analysis with default options
#' co_i_cpts <- cpts_ar(co_i, proxy = "char")
#'
#' ## change-point analysis with higher penalty value
#' co_i_cpts_8logn <- cpts_ar(co_i, proxy = "char", pen_val = "8*log(n)")
#'
#' ## Change-point analysis with the 'red noise' record (no change point)
#' rdn <- tapas::red_noise
#' rdn_i <- tapas::pretreatment_data(rdn)
#' rdn_i_cpts <- tapas::cpts_ar(rdn_i, proxy = "char")
#'
#'
#' ## As in the example above, but introducing a step-wise increase in the
#' ## sediment-accumulation rate [cm/yr]
#' ## NB: sar = 1/sediment-deposition time [yr/cm]
#' sdt <- c(rep_len(10, length.out = 100), rep_len(25, length.out = 360))
#' a_bot <- cumsum(sdt)
#' rdn2 <- rdn[1:length(a_bot), ]
#' rdn2$age_bot <- a_bot
#' rdn2$age_top <- a_bot - sdt
#' rdn2_i <- tapas::pretreatment_data(rdn2, yrInterp = 25)
#' rdn2_i_cpts <- tapas::cpts_ar(rdn2_i, proxy = "char")
#' }
#'
#' @author Walter Finsinger
#'
#' @importFrom changepoint cpt.meanvar cpts
#' @importFrom graphics segments
#' @importFrom stats approx runif
#'
#' @export cpts_ar
cpts_ar <- function(serie, series_name = NULL, proxy = NULL, n_rand = 1000,
bootstrap = FALSE, rand_ds_distr = "uniform",
max_cpts = 50, n_screen = 0.025, tau = NULL,
meth_cpt = "BinSeg",
pen_method = "Manual", pen_val = "4*log(n)",
t_stat = "Normal", minseglen = 2) {
# Initial check up ----------------------------------------------------------
## Put the user’s layout settings back in place when the function is done
opar <- par("mfrow", "mar", "oma", "cex")
on.exit(par(opar))
## Check if input list contains proxy accumulation-rate values
if (serie$out != "accI") {
stop("Fatal error: set argument out = 'accI' when running the",
"tapas::pretreatment_data() function.")
}
## Check if the variable name (proxy) is coherent
proxies_options <- colnames(serie$raw$series[ 2:dim(serie$raw$series)[2] ])
if (is.null(proxy) & length(proxies_options) > 1) {
stop(paste0("Fatal error: Unrecognized variable name (proxy = ). ",
"Accepted names: ", paste(proxies_options, collapse = ", "), "."
))
}
if (is.null(proxy) & length(proxies_options) == 1) {
proxy <- colnames(serie$raw$series) [2]
}
if (!(proxy %in% proxies_options)) {
stop("Unrecognized variable name: '", proxy, "'. ",
"Accepted names: ", paste(proxies_options, collapse = ", "), ".")
}
# Check if other arguments are coherent
if (bootstrap == FALSE & !rand_ds_distr == "uniform") {
stop("If bootstrap == FALSE, set rand_ds_distr = 'uniform'")
}
if (is.null(tau) == TRUE) {
tau <- serie$int$yr.interp
}
# Extract variables from input list -----------------------------------------
cm_i <- serie$int$cmI
age_i <- serie$int$series.int$age
conc_i <- serie$int$series.conI[[proxy]]
ar_i <- serie$int$series.int[[paste0(proxy, "AR")]]
serie_length <- dim(serie$int$series.int)[1]
conc_i_range <- c(min(conc_i), max(conc_i))
## Get sediment-accumulation rates of the pre-treated series
sed_ar_i <- c(diff(cm_i) / diff(age_i), NA)
# Fill in some missing values by approximation
sed_ar_i <- stats::approx(cm_i, sed_ar_i, xout = cm_i, rule = 2)$y
# If there are still NA values left
if (any(is.na(sed_ar_i))) {
sed_ar_i_na <- which(is.na(sed_ar_i))
sed_ar_i[sed_ar_i_na] <- sed_ar_i[min(sed_ar_i_na) - 1]
}
# Start the Change-point analysis -------------------------------------------
## Prepare space to store data generated in each iteration of the loop
cpts_rand <- matrix()
#### CPT analysis of random data sets ---------------------------------------
for (i in 1:n_rand) { # Loop cpt analyses through rand data sets
## With bootstrapped Concentration values
if (bootstrap == TRUE) {
conc_rand <- sample(conc_i, size = serie_length, replace = T)
}
## With random concentration values generated from a
## theoretical distribution
if (bootstrap == FALSE) {
## Generate random record from a uniform distribution
if (rand_ds_distr == "uniform") {
conc_rand <- stats::runif(n = serie_length,
min = conc_i_range[1], max = conc_i_range[2])
}
}
## Calculate proxy-accumulation rates based on the sediment-accumulation
## rate from the depth-age model
rand_ar_i <- conc_rand * sed_ar_i
### Run change-point analysis with the random proxy-accumulation rate
cpt_ar_rand <- changepoint::cpt.meanvar(rand_ar_i,
method = meth_cpt,
Q = max_cpts,
test.stat = t_stat,
penalty = pen_method,
pen.value = pen_val,
minseglen = minseglen)
cpts_rand <- c(cpts_rand, changepoint::cpts(cpt_ar_rand))
}
## Screen cpts from rand series ---------------------------------------------
# Keep only those that occur more often than (n_rand * n_screen)
cpts_rand_smr <- tabulate(cpts_rand, nbins = max(1, cpts_rand, na.rm = T))
cpts_rand <- which(cpts_rand_smr > (n_rand * n_screen))
## Run cpt analysis with true CHAR record -----------------------------------
cpt_ar <- changepoint::cpt.meanvar(ar_i,
method = meth_cpt,
Q = max_cpts,
test.stat = t_stat,
penalty = pen_method,
pen.value = pen_val,
minseglen = minseglen)
# Gather data from the cpt_ar list
cpts_positions <- changepoint::cpts(cpt_ar)
cpts_breaks <- c(1, changepoint::cpts(cpt_ar), length(ar_i))
cpts_depths <- cm_i[changepoint::cpts(cpt_ar)]
cpts_ages <- age_i[changepoint::cpts(cpt_ar)]
cpts_means <- cpt_ar@param.est$mean
# Prepare a data.frame with the means and ages of the change points in the
# 'true' proxy-accumulation record for each zone
cpts_means_df <- data.frame(c(min(age_i), cpts_ages),
c(cpts_ages, max(age_i)),
cpts_means)
colnames(cpts_means_df) <- c("top_age", "bot_age", "mean")
# Further screen the cpts_rand considering the following conditions:
# - they should occur around the true change points;
# - the sediment-accumulation rate and the proxy-accumulation rate should
# change in the same direction:
cpts_rand_val <- NULL # create an empty vector
# Check every cpts_rand if it occurs within a search window of +/- tau years
# from the 'true' cpts:
if (length(cpts_rand) > 0) {
for (z in 1:length(cpts_rand)) {
# Check if the cpts_rand[z] occurs in the true record
cpts_rand_z <- (age_i[cpts_rand[z]] < (age_i[cpts_positions] + tau) &
age_i[cpts_rand[z]] > (age_i[cpts_positions] - tau))
if (any(cpts_rand_z) == TRUE) {
cpts_sel <- which(cpts_rand_z == TRUE)
# if (length(cpts_sel) > 1) {
# stop('')
#
# }
# get the mean proxy-accumulation rates for the zones preceding and
# following the cpt
v_pre_change <- cpts_means[cpts_sel + 1]
v_post_change <- cpts_means[cpts_sel]
# get the mean sed_ar_i for the two zones
sed_ar_pre_change <- mean(
sed_ar_i[cpts_breaks[cpts_sel + 1]:cpts_breaks[cpts_sel + 2]])
sed_ar_post_change <- mean(
sed_ar_i[cpts_breaks[cpts_sel]:cpts_breaks[cpts_sel + 1]])
# Check if v and sed_ar change in the same direction across the cpt
v_sign <- sign(v_pre_change - v_post_change)
sed_ar_sign <- sign(sed_ar_pre_change - sed_ar_post_change)
if (v_sign * sed_ar_sign > 0) {
cpts_rand_val[z] <- cpts_rand[z]
}
}
}
}
# Make diagnostic plot -----------------------------------------------------
par(mfrow = c(3,1), mar = c(0.5,5,0.5,1), oma = c(4,1,3,1), cex = 0.9)
x_lim <- rev(range(c(min(age_i), max(age_i))))
y_lim <- c(0, 1.1*max(ar_i))
plot(age_i, conc_i, type = "s",
ylab = paste0(proxy, " concentration\nresampled"),
xlim = x_lim, axes = FALSE)
axis(1, labels = FALSE, at = seq(0, max(age_i), 1000))
axis(2)
plot(age_i, sed_ar_i, type = "l",
ylab = "Sediment-accumulation\nrate (cm/yr)", xaxt = "n",
xlim = x_lim, ylim = c(0, 1.1*max(sed_ar_i)), lwd = 1.5,
axes = FALSE)
axis(1, labels = FALSE, at = seq(0, max(age_i), 1000))
axis(2)
plot(age_i, ar_i, type = "s", xlim = x_lim, ylim = y_lim,
ylab = paste(proxy, "AR"), axes = FALSE)
axis(1, labels = TRUE, at = seq(0, max(age_i), 1000))
axis(2)
graphics::segments(x0 = cpts_means_df$top_age, y0 = cpts_means_df$mean,
x1 = cpts_means_df$bot_age, y1 = cpts_means_df$mean,
col = "red", lwd = 2)
abline(v = cpts_ages, lwd = 2, col = "blue")
points(age_i[cpts_rand],
rep_len(max(ar_i), length.out = length(cpts_rand)),
col = "green")
points(age_i[cpts_rand_val],
rep_len(max(ar_i), length.out = length(cpts_rand_val)),
col = "red", pch = 19)
mtext(("Age (cal yrs BP)"), side = 1, line = 2.5, cex = 0.9)
# Return output -------------------------------------------------------------
output <- structure(list(depths_cpts = cpts_depths,
ages_cpts = cpts_ages,
cpts_means = cpts_means_df,
series_name = series_name, proxy = proxy,
n_screen = n_screen, tau = tau,
pen_val = pen_val,
meth_cpt = meth_cpt,
pen_method = pen_method,
t_stat = t_stat,
minseglen = minseglen))
class(output) <- "tapas_cpts"
return(output)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.