thin_ts_iter: Determine the amount of thinning required to reduce...
In edwardlavender/Tools4ETS: Tools for Ecological Time Series

thin_ts_iter

R Documentation

Determine the amount of thinning required to reduce autocorrelation to a given level

Description

Thinning (i.e. retaining every nth observation in a dataset) is a commonly used method to reduce serial autocorrelation and simplify models. The effectiveness of thinning as a strategy to deal with autocorrelation depends on the volume of available data and the level of thinning required to remove autocorrelation. This function implements an iterative thinning approach to determine how the estimated autocorrelation parameter and the volume of data change as a dataset is thinned by progressively greater amounts. To do this, the user must supply a dataframe to be thinned along with parameters that are passed to thin_ts, which implements thinning. The user supplies a starting value for the thinning index (i.e., specifying that every nth observation should be retained on the first iteration of the algorithm), and an increment which specifies the increase in the thinning index with each iteration of the algorithm. (For models with high serial autocorrelation, the user is advised to begin with a reasonably large increase in the thinning index with each iteration (increment), which will result in faster convergence but less smooth profiles (i.e. estimated changes in the autocorrelation parameter with the thinning index.) On each iteration, user-defined functions are used to (a) evaluate a model using the thinned dataset and (b) compute the autocorrelation parameter. These functions provide the flexibility to implement the approach for a wide variety of models in which the fitting procedure may or may not estimate the value of the autocorrelation parameter (e.g. bam versus gamm). This process increases until the autocorrelation parameter falls below a user-defined threshold. Following algorithm convergence (i.e. the reduction in the autocorrelation parameter below a user-defined threshold), a list of outputs is returned which can be saved. A customisable plot can also be produced to demonstrate the change in (a) the estimated autocorrelation parameter and (b) the volume of data with (c) the level of thinning. This indicates the extent of thinning necessary to reduce autocorrelation consistently below the level expected from white noise (a 95 percent confidence interval is added to the plot), and the volume of data which remains for model fitting after this level of thinning. To facilitate plot customisation, a list of outputs from previous function iterations can be passed to the function, which bypasses the algorithm implementation and simply produces the plot, enabling quick adjustments to each plot component.

Usage

thin_ts_iter(
  dat,
  ind = NULL,
  flag1,
  first = 1,
  AR1_req = 0.01,
  nth = 0,
  increment = 1,
  eval_mod,
  resid_method = function(mod) {
     stats::resid(mod)
 },
  est_AR1 = function(mod) {
     stats::acf(stats::resid(mod), plot = FALSE)$acf[2]
 },
  thin_ts_iter_ls = NULL,
  plot = TRUE,
  p1_pretty_axis_args = list(side = 1:2, pretty = list(n = 5)),
  p2_pretty_axis_args = list(side = 4, pretty = list(n = 5)),
  p1_args = list(type = "b", pch = 21, bg = "black", col = "black", cex = 0.5),
  p2_args = list(type = "b", pch = 21, bg = "dimgrey", col = "dimgrey", cex = 0.5),
  add_error_envelope_args = list(),
  add_legend = TRUE,
  legend_args = list(),
  mtext_args = list(list(side = 1, text = "Thinning Index", line = 2.5), list(side = 2,
    text = expression(paste(hat(AR1)["lag =" ~ 1])), line = 2.5), list(side = 4, text =
    "log[n(obs)]", line = 2.5)),
  verbose = TRUE
)

Arguments

`dat`	A dataframe which is thinned and used to fit the model on the first iteration.
`ind`	A character which defines the column name in `dat` which uniquely distinguishes each independent time series (e.g. `flag3` returned by `flag_ts`). If `NULL`, `dat` is assumed to comprise a single time series. This is passed to `thin_ts`.
`flag1`	A character which defines the column name in `dat` which defines the start of each independent time series with `TRUE`. This is passed to `thin_ts`.
`first`	A numeric value which defines the starting position in each independent time series from which every `nth` observation will be selected. This is passed to `thin_ts`. Unlike `thin_ts`, this should be a single number. However, `thin_ts_iter` can be implemented using different values for `first` under the same degree of thinning to compare results among different thinned subsets of data.
`AR1_req`	A numeric input which defines the desired number to which the AR1 parameter should be reduced by thinning.
`nth`	A numeric input that defines the starting thinning index value. For a given `nth` value, every `nth` value is selected. This is passed to `thin_ts`. The default starting value is 0; i.e., the algorithm initially implements a model without thinning, which forms the baseline. The `nth` value is then progressively increased by a user-defined increment (see `increment`, below). On each iteration, the model is re-run and the estimated value of the autocorrelation parameter is saved until this falls below a user-defined threshold (`AR1_req`, see below).
`increment`	A numeric value that defines the value by which the thinning index, `nth`, should increase on each iteration of the algorithm. Larger values result in faster convergence but less smooth profiles. Therefore, it may be advisable to start with a reasonably large value to assess algorithm duration. Based on algorithm speed, the starting value of the thinning index (`nth`) can be adjusted and the increment (`increment`) decreased to create a smoother profile in the region of interest. The minimum increment is 1.
`eval_mod`	A function which acts as a wrapper for implementing a model. The only input to this function should be the data used to evaluate the model.
`resid_method`	A function which extracts the residuals from a model. `resid` is the default option, which is usually appropriate.
`est_AR1`	A function which computes an autocorrelation parameter for a model, such as an AR1 parameter. The only input to this function should be the model. For models without a correlation structure, the default option is usually appropriate: this uses the autocorrelation function of residuals to estimate the approximate AR1 parameter. However, some adjustments may be required to this function (e.g. to extract model residuals appropriately or to extract a different parameter). In other cases, a model may estimate the autocorrelation parameter and the user can define a function here to extract the model estimate from the model object.
`thin_ts_iter_ls`	A named list of outputs from a previous implementation of `thin_ts_iter` which can be used to produce a plot. This bypasses the need to re-run the algorithm to produce plots.
`plot`	A logical input which defines whether or not to produce a plot demonstrating the change in (a) the autocorrelation parameter and (b) the volume of data with (c) the thinning index.
`p1_pretty_axis_args`	A named list of arguments passed to `pretty_axis` to make the x and y axes. These pertain to the thinning index and the autocorrelation parameter. The default options usually result in pretty axes.
`p2_pretty_axis_args`	A named list of arguments passed to `pretty_axis` to make the second y axis. This pertains to the volume of data left after thinning. The default options usually result in a pretty axis.
`p1_args`	A named list of arguments to customise the first plot, which demonstrates the decline in the autocorrelation parameter with the thinning index.
`p2_args`	A named list of arguments to customise the second plot, which is added ontop of the first plot to demonstrate the simultaneous decline in the volume of data with the thinning index.
`add_error_envelope_args`	A list of arguments passed to `add_error_envelope` to customise the 95 percent confidence envelope for an AR1 parameter in white noise.
`add_legend`	A logical input which defines whether or not to add a legend.
`legend_args`	A named list of arguments passed to `legend` to customise the legend. Most parameters should be successfully extracted internally from `p1_args`, `p2_args` and `pretty_axis_args`. The placement of the legend is implemented automatically but can be overridden by the user here. Legend placement is implemented in relation to the x axis and the second y axis.
`mtext_args`	A named list of arguments passed to `mtext` to add axes labels. A nested list is used to control each axis independently.
`verbose`	A logical input which defines whether or not to print messages; namely, the estimated autocorrelation parameter on each algorithm iteration which can be used to monitor algorithm process/speed of convergence.

Value

The function returns a list and/or a plot. On the first implementation of the algorithm, a list is contained with five elements: (1) start_time, the start time of the algorithm; (2) end_time, the end time of the algorithm; (3) iteration_duration, the duration (in minutes) of the algorithm; (4) iteration_record, a dataframe providing a record of algorithm outputs including (a) nth (the value of the thinning index on each iteration), (b) AR1_est (the value of the autocorrelation parameter on each iteration), (c) nrw_log (the logarithm of the number of observations remaining in dat on each iteration), (d) lowerCI (the lower 95 percent confidence interval for the AR1 parameter in white noise) and (e) upperCI (the upper 95 percent confidence interval for the AR1 parameter in white noise) and (4) CI, a list with three elements (x, the values of nth, as above; lowerCI, as above; upperCI, as above) used to add confidence intervals to the plot via add_error_envelope.

Author(s)

Edward Lavender

Examples

#### Define model parameters and simulate observations
# Imagine the depth of an animal changes in a concave down pattern throughout the year.
# Define x values, the number of days since January 1st
set.seed(1)
x <- 1:365
# Define expected y values based on a concave-down effect of x:
quadratic <- function(a, b, x, h, k){
  step1 <- (b* x - h)^2 + k
  step2 <- a * step1
  return(step2)
}
mu <- quadratic(a = -0.001, b = 1, x = x, h = median(x), k = 100)
# Define observed y values with autocorrelation
y <- mu + arima.sim(list(order = c(1, 0, 0), ar = 0.6), n = length(mu), sd = 5)
# Define dataframe
d <- data.frame(x = x, y = y)
# Visualise simulated observations
graphics::plot(d$x, d$y, type = "l")

#### Flag independent sections of time series
d <- cbind(d, flag_ts(x = d$x, duration_threshold = 2880, flag = 1:3))
head(d)

#### Example (1): Implement thin_ts_iter with default options
thin_ts_iter_ls1 <-
  thin_ts_iter(dat = d,
               ind = "flag3",
               flag1 = "flag1",
               first = 1,
               AR1_req = 0.01,
               nth = 0,
               increment = 1,
               eval_mod = function(data){ mgcv::bam(y ~ s(x), data = data) },
               resid_method = function(mod) { stats::resid(mod) },
               est_AR1 = function(mod){ stats::acf(stats::resid(mod), plot = FALSE)$acf[2] },
               thin_ts_iter_ls = NULL,
               plot = TRUE,
               p1_pretty_axis_args = list(side = 1:2, pretty = list(n = 5)),
               p2_pretty_axis_args = list(side = 4, pretty = list(n = 5)),
               p1_args = list(type = "b",
                              pch = 21,
                              bg = "black",
                              col = "black",
                              cex = 0.5
               ),
               p2_args = list(type = "p",
                              pch = 24,
                              bg = "dimgrey",
                              col = "dimgrey",
                              cex = 0.5
               ),
               add_error_envelope_args = list(),
               add_legend = TRUE,
               legend_args = list(),
               mtext_args = list(list(side = 1, text = "Thinning Index", line = 2.5),
                                 list(side = 2,
                                      text = expression(paste(hat(AR1)["lag =" ~ 1])),
                                      line = 2.5),
                                 list(side = 4, text = "log[n(obs)]", line = 1)
               ),
               verbose = TRUE
  )

edwardlavender/Tools4ETS documentation built on Nov. 29, 2022, 7:41 a.m.