monthly_response: monthly_response

View source: R/monthly_response.R

monthly_responseR Documentation

monthly_response

Description

Function calculates all possible values of a selected statistical metric between one or more response variables and monthly sequences of environmental data. Calculations are based on moving window which slides through monthly environmental data. All calculated metrics are stored in a matrix. The location of stored calculated metric in the matrix is indicating a window width (row names) and a location in a matrix of monthly sequences of environmental data (column names).

Usage

monthly_response(
  response,
  env_data,
  method = "cor",
  metric = "r.squared",
  cor_method = "pearson",
  previous_year = FALSE,
  number_previous_years = NULL,
  neurons = 1,
  lower_limit = 1,
  upper_limit = 12,
  fixed_width = 0,
  brnn_smooth = TRUE,
  remove_insignificant = TRUE,
  alpha = 0.05,
  row_names_subset = FALSE,
  reference_window = "start",
  aggregate_function = "mean",
  quantile_prob = 0.5,
  temporal_stability_check = "sequential",
  k = 2,
  k_running_window = 30,
  cross_validation_type = "blocked",
  subset_years = NULL,
  ylimits = NULL,
  seed = NULL,
  tidy_env_data = FALSE,
  boot = FALSE,
  boot_n = 1000,
  boot_ci_type = "norm",
  boot_conf_int = 0.95,
  month_interval = NULL,
  dc_method = NULL,
  cor_na_use = "everything"
)

Arguments

response

a data frame with tree-ring proxy variables as columns and (optional) years as row names. Row.names should be matched with those from a env_data data frame. If not, set row_names_subset = TRUE.

env_data

a data frame of monthly sequences of environmental data as columns and years as row names. Each row represents a year and each column represents a day of a year (or month). Row.names should be matched with those from a response data frame. If not, set row_names_subset = TRUE. Alternatively, env_data could be a tidy data with three columns, i.e. Year, DOY (Month) and third column representing values of mean temperatures, sum of precipitation etc. If tidy data is passed to the function, set the argument tidy_env_data to TRUE.

method

a character string specifying which method to use. Current possibilities are "cor" (default), "lm" and "brnn".

metric

a character string specifying which metric to use. Current possibilities are "r.squared" and "adj.r.squared". If method = "cor", metric is not relevant.

cor_method

a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman".

previous_year

logical. If TRUE, previous-year climate data are included in the analysis. If FALSE, no previous-year climate data are included and number_previous_years is ignored.

number_previous_years

integer between 1 and 5 specifying how many previous years should be included in the environmental matrix when previous_year = TRUE. For example, number_previous_years = 2 uses monthly climate data from years t - 2, t - 1 and t for response year t. If NULL and previous_year = TRUE, one previous year is included.

neurons

positive integer that indicates the number of neurons used for brnn method

lower_limit

lower limit of window width (i.e. number of consecutive months to be used for calculations)

upper_limit

upper limit of window width (i.e. number of consecutive months to be used for calculations)

fixed_width

fixed width used for calculations (i.e. number of consecutive months to be used for calculations)

brnn_smooth

if set to TRUE, a smoothing algorithm is applied that removes unrealistic calculations which are a result of neural net failure.

remove_insignificant

if set to TRUE, removes all correlations bellow the significant threshold level, based on a selected alpha. For "lm" and "brnn" method, squared threshold is used, which corresponds to R squared statistics.

alpha

significance level used to remove insignificant calculations.

row_names_subset

if set to TRUE, row.names are used to subset env_data and response data frames. Only years from both data frames are kept.

reference_window

character string, the reference_window argument describes, how each calculation is referred. There are two different options: 'start' (default) and 'end'. If the reference_window argument is set to 'start', then each calculation is related to the starting month of window. If the reference_window argument is set to 'end', then each calculation is related to the ending day of window calculation.

aggregate_function

character string specifying how the monthly data should be aggregated. The default is 'mean', the other options are 'median', 'sum' and 'quantile'.

quantile_prob

numeric value between 0 and 1 specifying the quantile probability used when aggregate_function = 'quantile'. For example, quantile_prob = 0.95 calculates the 95th percentile. The default is 0.5.

temporal_stability_check

character string, specifying, how temporal stability between the optimal selection and response variable(s) will be analysed. Current possibilities are "sequential", "progressive" and "running_window". Sequential check will split data into k splits and calculate selected metric for each split. Progressive check will split data into k splits, calculate metric for the first split and then progressively add 1 split at a time and calculate selected metric. For running window, select the length of running window with the k_running_window argument.

k

integer, number of breaks (splits) for temporal stability and cross validation analysis.

k_running_window

the length of running window for temporal stability check. Applicable only if temporal_stability argument is set to running window.

cross_validation_type

character string, specifying, how to perform cross validation between the optimal selection and response variables. If the argument is set to "blocked", years will not be shuffled. If the argument is set to "randomized", years will be shuffled.

subset_years

a subset of years to be analyzed. Should be given in the form of subset_years = c(1980, 2005)

ylimits

limit of the y axes for plot_extreme. It should be given in the form of: ylimits = c(0,1)

seed

optional seed argument for reproducible results

tidy_env_data

if set to TRUE, env_data should be inserted as a data frame with three columns: "Year", "Month", "Precipitation/Temperature/etc."

boot

logical, if TRUE, bootstrap procedure will be used to calculate estimates correlation coefficients, R squared or adjusted R squared metrices

boot_n

The number of bootstrap replicates

boot_ci_type

A character string representing the type of bootstrap intervals required. The value should be any subset of the values c("norm","basic", "stud", "perc", "bca").

boot_conf_int

A scalar or vector containing the confidence level(s) of the required interval(s)

month_interval

a vector of two values defining the interval of months used for calculations. Positive values indicate months in the current year. Negative values indicate months in the previous-year block and are only used when previous_year = TRUE. If previous_year = FALSE and negative values are supplied, month_interval is ignored and the analysis is performed for the current year only using month_interval = c(1, 12). If number_previous_years > 1, the previous-year block starts with the earliest included previous year. For example, previous_year = TRUE, number_previous_years = 2 and month_interval = c(-1, 12) analyses the full sequence from January of year t - 2 to December of year t.

dc_method

a character string to determine the method to detrend climate data. Possible values are "none" (default) and "SLD" which refers to Simple Linear Detrending

cor_na_use

an optional character string giving a method for computing covariances in the presence of missing values for correlation coefficients. This must be (an abbreviation of) one of the strings "everything" (default), "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". See also the documentation for the base cor() function.

Value

a list with 19 elements:

  1. $calculations - a matrix with calculated metrics

  2. $method - the character string of a method

  3. $metric - the character string indicating the metric used for calculations

  4. $analysed_period - the character string specifying the analysed period based on the information from row names. If there are no row names, this argument is given as NA

  5. $optimized_return - data frame with two columns, response variable and aggregated (averaged) monthly data that return the optimal results. This data.frame could be directly used to calibrate a model for climate reconstruction

  6. $optimized_return_all - a data frame with aggregated monthly data, that returned the optimal result for the entire env_data (and not only subset of analysed years)

  7. $transfer_function - a ggplot object: scatter plot of optimized return and a transfer line of the selected method

  8. $temporal_stability - a data frame with calculations of selected metric for different temporal subsets

  9. $cross_validation - a data frame with cross validation results

  10. $plot_heatmap - ggplot2 object: a heatmap of calculated metrics

  11. $plot_extreme - ggplot2 object: line or bar plot of a row with the highest value in a matrix of calculated metrics

  12. $type - the character string describing type of analysis: daily or monthly

  13. $reference_window - character string, which reference window was used for calculations

  14. $boot_lower - matrix with lower limit of confidence intervals of bootstrap calculations

  15. $boot_upper - matrix with upper limit of confidence intervals of bootstrap calculations

  16. $aggregated_climate - matrix with all aggregated climate series

  17. $previous_year - logical indicating whether previous-year climate data were used

  18. $number_previous_years - integer indicating how many previous years were used

Examples



# The examples below are enclosed within donttest{} to minimize the execution
# time during R package checks.

# Load the dendroTools R package
library(dendroTools)

# Load data used for examples
data(data_MVA)
data(data_TRW)
data(data_TRW_1)
data(example_proxies_individual)
data(example_proxies_1)
data(LJ_monthly_temperatures)
data(LJ_monthly_precipitation)

# 1 Example with tidy precipitation data
example_tidy_data <- monthly_response(response = data_MVA,
    lower_limit = 1, upper = 24, dc_method = "SLD",
    env_data = LJ_monthly_precipitation, fixed_width = 0,
    method = "cor", row_names_subset = TRUE,
    remove_insignificant = FALSE, previous_year = FALSE,
    reference_window = "end",
    alpha = 0.05, aggregate_function = 'sum', boot = FALSE,
    tidy_env_data = TRUE, boot_n = 100, month_interval = c(-5, 10))

# summary(example_tidy_data)
# plot(example_tidy_data, type = 1)
# plot(example_tidy_data, type = 2)

# 2 Example with split data for early and late
example_MVA_early <- monthly_response(response = data_MVA,
    env_data = LJ_monthly_temperatures,
    method = "cor", row_names_subset = TRUE, previous_year = TRUE,
    remove_insignificant = TRUE, alpha = 0.05,
    subset_years = c(1940, 1980), aggregate_function = 'mean')

example_MVA_late <- monthly_response(response = data_MVA,
    env_data = LJ_monthly_temperatures,
    method = "cor", row_names_subset = TRUE, alpha = 0.05,
    previous_year = TRUE, remove_insignificant = TRUE,
    subset_years = c(1981, 2010), aggregate_function = 'mean')

# summary(example_MVA_late)
# plot(example_MVA_early, type = 1)
# plot(example_MVA_late, type = 1)
# plot(example_MVA_early, type = 2)
# plot(example_MVA_late, type = 2)

# 3 Example negative correlations
example_neg_cor <- monthly_response(response = data_TRW_1, alpha = 0.05,
   env_data = LJ_monthly_temperatures,
   method = "cor", row_names_subset = TRUE,
   remove_insignificant = TRUE, boot = FALSE)

# summary(example_neg_cor)
# plot(example_neg_cor, type = 1)
# plot(example_neg_cor, type = 2)
# example_neg_cor$temporal_stability

# 4 Example of multiproxy analysis
# summary(example_proxies_1)
# cor(example_proxies_1)

example_multiproxy <- monthly_response(response = example_proxies_1,
   env_data = LJ_monthly_temperatures,
   method = "lm", metric = "adj.r.squared",
   row_names_subset = TRUE, previous_year = FALSE,
   remove_insignificant = TRUE, alpha = 0.05)

# summary(example_multiproxy)
# plot(example_multiproxy, type = 1)

# 5 Example to test the temporal stability
example_MVA_ts <- monthly_response(response = data_MVA,
   env_data = LJ_monthly_temperatures,
   method = "lm", metric = "adj.r.squared", row_names_subset = TRUE,
   remove_insignificant = TRUE, alpha = 0.05,
   temporal_stability_check = "running_window", k_running_window = 10)

# summary(example_MVA_ts)
# example_MVA_ts$temporal_stability

# 6 Example using quantiles for the aggregation
example_q95 <- monthly_response(
  response = data_MVA,
  env_data = LJ_monthly_temperatures,
  method = "cor",
  fixed_width = 3,
  row_names_subset = TRUE,
  previous_year = TRUE,
  aggregate_function = "quantile",
  quantile_prob = 0.95
)

# summary(example_q95)
# example_q95$temporal_stability

# Example using two previous years plus the current year
example_two_previous_years <- monthly_response(
  response = data_MVA,
  env_data = LJ_monthly_temperatures,
  method = "cor",
  fixed_width = 3,
  row_names_subset = TRUE,
  previous_year = TRUE,
  number_previous_years = 2,
  month_interval = c(-1, 10),
  aggregate_function = "mean"
)



dendroTools documentation built on May 21, 2026, 1:06 a.m.