knitr::opts_chunk$set( warning = FALSE, collapse = TRUE, comment = "#>" )
This vignette demonstrates advanced techniques for examining causal relationships between time series using the patterncausality
package. We will focus on three key aspects:
Through cross-validation, we aim to understand:
To demonstrate the application of cross-validation, we will begin by importing a climate dataset from the patterncausality
package.
library(patterncausality) data(climate_indices)
Now, let's apply cross-validation to evaluate the robustness of pattern causality. We will use the Pacific North American (PNA) and North Atlantic Oscillation (NAO) climate indices as our example time series.
set.seed(123) X <- climate_indices$PNA Y <- climate_indices$NAO result <- pcCrossValidation( X = X, Y = Y, numberset = seq(100, 500, by = 10), E = 3, tau = 2, metric = "euclidean", h = 1, weighted = FALSE ) print(result$results)
To better visualize the results, we will use the plot
function to generate a line chart.
plot(result)
As you can see from the plot, the location of the causality tends to stabilize as the sample size increases. This indicates that our method is effective at capturing the underlying patterns and causal connections within the time series.
In this tutorial, you've learned how to use cross-validation to assess the reliability of time series causality and how to use visualization tools to better understand the results.
Now, let's examine the cross-validation process when the random
parameter is set to FALSE
. This approach uses a systematic sampling method rather than random sampling.
set.seed(123) X <- climate_indices$PNA Y <- climate_indices$NAO result_non_random <- pcCrossValidation( X = X, Y = Y, numberset = seq(100, 500, by = 100), E = 3, tau = 2, metric = "euclidean", h = 1, weighted = FALSE, random = FALSE ) print(result_non_random$results)
We can also visualize the results of the non-random cross-validation:
plot(result_non_random)
By comparing the results of the random and non-random cross-validation, you can gain a deeper understanding of how different sampling methods affect the stability and reliability of the causality analysis.
To obtain more robust results and understand the uncertainty in our causality measures, we can use bootstrap sampling in our cross-validation analysis. This approach repeatedly samples the data with replacement and provides statistical summaries of the causality measures.
set.seed(123) X <- climate_indices$PNA Y <- climate_indices$NAO result_boot <- pcCrossValidation( X = X, Y = Y, numberset = seq(100, 500, by = 100), E = 3, tau = 2, metric = "euclidean", h = 1, weighted = FALSE, random = TRUE, bootstrap = 10 # Perform 100 bootstrap iterations )
The bootstrap analysis provides several statistical measures for each sample size: - Mean: Average causality measure across bootstrap samples - 5% and 95% quantiles: Confidence intervals for the causality measure - Median: Central tendency measure robust to outliers
Let's examine the results:
print(result_boot$results)
We can visualize the bootstrap results using the plot function, which now shows confidence intervals:
plot(result_boot, separate = TRUE)
The shaded area in the plot represents the range between the 5th and 95th percentiles of the bootstrap samples, providing a measure of uncertainty in our causality estimates. The solid line shows the median value, which is more robust to outliers than the mean.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.