daily_response_seascorr: daily_response_seascorr

Description Usage Arguments Value Examples

Description

Function calculates all possible partial correlation coefficients between tree-ring chronology and daily environmental (usually climate) data. Calculations are based on moving window which is defined with two arguments: lower_limit and upper_limit. All calculated (partial) correlation coeficients are stored in a matrix. The location of stored correlation in the matrix is indicating a window width (row names) and a location in a matrix of daily sequences of environmental data (column names).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
daily_response_seascorr(response, env_data_primary, env_data_control,
  lower_limit = 30, upper_limit = 90, fixed_width = 0,
  previous_year = FALSE, pcor_method = "pearson",
  remove_insignificant = TRUE, alpha = 0.05,
  row_names_subset = FALSE, PCA_transformation = FALSE,
  log_preprocess = TRUE, components_selection = "automatic",
  eigenvalues_threshold = 1, N_components = 2,
  aggregate_function_env_data_primary = "mean",
  aggregate_function_env_data_control = "mean",
  temporal_stability_check = "sequential", k = 2,
  k_running_window = 30, cross_validation_type = "blocked",
  subset_years = NULL, plot_specific_window = NULL, ylimits = NULL,
  seed = NULL, tidy_env_data_primary = FALSE,
  tidy_env_data_control = FALSE, reference_window = "start")

Arguments

response

a data frame with tree-ring proxy variable and (optional) years as row names. Row.names should be matched with those from env_data_primary and env_data_control data frame. If not, set the row_names_subset argument to TRUE.

env_data_primary

primary data frame of daily sequences of environmental data as columns and years as row names. Each row represents a year and each column represents a day of a year. Row.names should be matched with those from the response data frame. If not, set the argument row_names_subset to TRUE. Alternatively, env_data_primary could be a tidy data with three columns, i.e. Year, DOY and third column representing values of mean temperatures, sum of precipitation etc. If tidy data is passed to the function, set the argument tidy_env_data_primary to TRUE.

env_data_control

a data frame of daily sequences of environmental data as columns and years as row names. This data is used as control for calculations of partial correlation coefficients. Each row represents a year and each column represents a day of a year. Row.names should be matched with those from the response data frame. If not, set the row_names_subset argument to TRUE. Alternatively, env_data_control could be a tidy data with three columns, i.e. Year, DOY and third column representing values of mean temperatures, sum of precipitation etc. If tidy data is passed to the function, set the argument tidy_env_data_control to TRUE.

lower_limit

lower limit of window width

upper_limit

upper limit of window width

fixed_width

fixed width used for calculation. If fixed_width is assigned a value, upper_limit and lower_limit will be ignored

previous_year

if set to TRUE, env_data and response variables will be rearranged in a way, that also previous year will be used for calculations of selected statistical metric.

pcor_method

a character string indicating which partial correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman", can be abbreviated.

remove_insignificant

if set to TRUE, removes all correlations bellow the significant threshold level, based on a selected alpha.

alpha

significance level used to remove insignificant calculations.

row_names_subset

if set to TRUE, row.names are used to subset env_data_primary, env_data_control and response data frames. Only years from all three data frames are kept.

PCA_transformation

if set to TRUE, all variables in the response data frame will be transformed using PCA transformation.

log_preprocess

if set to TRUE, variables will be transformed with logarithmic transformation before used in PCA

components_selection

character string specifying how to select the Principal Components used as predictors. There are three options: "automatic", "manual" and "plot_selection". If argument is set to automatic, all scores with eigenvalues above 1 will be selected. This threshold could be changed by changing the eigenvalues_threshold argument. If parameter is set to "manual", user should set the number of components with N_components argument. If components selection is set to "plot_selection", Scree plot will be shown and a user must manually enter the number of components to be used as predictors.

eigenvalues_threshold

threshold for automatic selection of Principal Components

N_components

number of Principal Components used as predictors

aggregate_function_env_data_primary

character string specifying how the daily data from env_data_primary should be aggregated. The default is 'mean', the two other options are 'median' and 'sum'

aggregate_function_env_data_control

character string specifying how the daily data from env_data_control should be aggregated. The default is 'mean', the two other options are 'median' and 'sum'

temporal_stability_check

character string, specifying, how temporal stability between the optimal selection and response variable(s) will be analysed. Current possibilities are "sequential", "progressive" and "running_window". Sequential check will split data into k splits and calculate selected metric for each split. Progressive check will split data into k splits, calculate metric for the first split and then progressively add 1 split at a time and calculate selected metric. For running window, select the length of running window with the k_running_window argument.

k

integer, number of breaks (splits) for temporal stability and cross validation analysis.

k_running_window

the length of running window for temporal stability check. Applicalbe only if temporal_stability argument is set to running window.

cross_validation_type

character string, specifying, how to perform cross validation between the optimal selection and response variables. If the argument is set to "blocked", years will not be shuffled. If the argument is set to "randomized", years will be shuffled.

subset_years

a subset of years to be analyzed. Should be given in the form of subset_years = c(1980, 2005)

plot_specific_window

integer representing window width to be displayed for plot_specific

ylimits

limit of the y axes for plot_extreme and plot_specific. It should be given in the form of: ylimits = c(0,1)

seed

optional seed argument for reproducible results

tidy_env_data_primary

if set to TRUE, env_data_primary should be inserted as a data frame with three columns: "Year", "DOY", "Precipitation/Temperature/etc."

tidy_env_data_control

if set to TRUE, env_data_control should be inserted as a data frame with three columns: "Year", "DOY", "Precipitation/Temperature/etc."

reference_window

character string, the reference_window argument describes, how each calculation is referred. There are three different options: 'start' (default), 'end' and 'middle'. If the reference_window argument is set to 'start', then each calculation is related to the starting day of window. If the reference_window argument is set to 'middle', each calculation is related to the middle day of window calculation. If the reference_window argument is set to 'end', then each calculation is related to the ending day of window calculation. For example, if we consider correlations with window from DOY 15 to DOY 35. If reference window is set to ‘start’, then this calculation will be related to the DOY 15. If the reference window is set to ‘end’, then this calculation will be related to the DOY 35. If the reference_window is set to 'middle', then this calculation is related to DOY 25. The optimal selection, which describes the optimal consecutive days that returns the highest calculated metric and is obtained by the $plot_extreme output, is the same for all three reference windows.

Value

a list with 14 elements:

1 $calculations a matrix with calculated metrics
2 $method the character string of a method
3 $metric the character string indicating the metric used for calculations
4 $analysed_period the character string specifying the analysed period based on the information from row names. If there are no row names, this argument is given as NA
5 $optimized_return data frame with two columns, response variable and aggregated (averaged) daily data that return the optimal results. This data.frame could be directly used to calibrate a model for climate reconstruction
6 $optimized_return_all a data frame with aggregated daily data, that returned the optimal result for the entire env_data_primary (and not only subset of analysed years)
7 $transfer_function a ggplot object: scatter plot of optimized return and a transfer line of the selected method
8 $temporal_stability a data frame with calculations of selected metric for different temporal subsets
9 $cross_validation a data frame with cross validation results
10 $plot_heatmap ggplot2 object: a heatmap of calculated metrics
11 $plot_extreme ggplot2 object: line plot of a row with the highest value in a matrix of calculated metrics
12 $plot_specific ggplot2 object: line plot of a row with a selected window width in a matrix of calculated metrics
13 $PCA_output princomp object: the result output of the PCA analysis
14 $type the character string describing type of analysis: daily or monthly
15 $reference_window character string, which referece window was used for calculations

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
## Not run: 
# Load the dendroTools R package
library(dendroTools)

# Load data
data(data_MVA)
data(data_TRW)
data(data_TRW_1)
data(example_proxies_individual)
data(example_proxies_1)
data(LJ_daily_temperatures)
data(LJ_daily_precipitation)

# 1 Basic example
example_basic <- daily_response_seascorr(response = data_MVA,
                          env_data_primary = LJ_daily_temperatures,
                          env_data_control = LJ_daily_precipitation,
                          row_names_subset = TRUE, lower_limit = 1,
                          remove_insignificant = TRUE,
                          aggregate_function_env_data_primary = 'median',
                          aggregate_function_env_data_control = 'median',
                          alpha = 0.05, pcor_method = "spearman",
                          tidy_env_data_primary = FALSE,
                          previous_year = TRUE,
                          tidy_env_data_control = TRUE,
                          reference_window = "middle")
summary(example_basic)
example_basic$plot_extreme
example_basic$plot_heatmap
example_basic$plot_specific
example_basic$optimized_return
example_basic$optimized_return_all


# 2 Example with fixed temporal time window
example_fixed_width <- daily_response_seascorr(response = data_MVA,
                          env_data_primary = LJ_daily_temperatures,
                          env_data_control = LJ_daily_precipitation,
                          row_names_subset = TRUE,
                          remove_insignificant = TRUE,
                          aggregate_function_env_data_primary = 'mean',
                          aggregate_function_env_data_control = 'mean',
                          alpha = 0.05,
                          fixed_width = 45,
                          tidy_env_data_primary = FALSE,
                          tidy_env_data_control = TRUE,
                          reference_window = "end")

example_fixed_width$plot_extreme
example_fixed_width$plot_heatmap
example_fixed_width$optimized_return
example_fixed_width$optimized_return_all


## End(Not run)

jernejjevsenak/dendroTools documentation built on June 5, 2019, 4:06 a.m.