knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=16/2.54, fig.height=10/2.54 ) library(chocR) data(res_choc) data(res_confid)
This packages aims at carrying out a change of occurence analysis on multivariate time-series. If we have a time-series X (for example, daily observations of temperature) and a time-series Y (for example, daily observations of river discharge) measured at the same time step, it can be interesting to detect whether some associations of X and Y become less frequent or more frequent through time (for example, an increase of the occurence of low discharge associated with high temperature means that drought becomes more frequent). To carry out the analysis, choc splits the dataset in sub-periods (for example years) with serveral observations. We recommend that the sup-periods have similar duration and similar number of observations. Then, a kernel estimator is fitted on each sub-period, so that a density of probability can be estimated for any point
Here is an example on simulated data set:
library(MASS) library(ks) tvar <- rep(1:40,times=100) #times steps meansX <-tvar/40 #trend on 1st variable meansY <- -0.5*tvar/40 #trend on 2nd variable sigma <- matrix(c(1,.1,.1,1),2,2) #covariance matrix values <- t(apply(cbind(meansX,meansY),1,function(mu) mvrnorm(1,mu,sigma))) #generate the values H <- Hpi #choose the default bandwith res_choc <- choc(values,H,tvar)
head(res_choc$grid)
The first lines aims at generating an artificial dataset, then a method is chosen to select the bandwith of the kernel estimator. The choc analysis is then carried out. res$grid provides for a set of
We also provide a method to assess whether these trends are significant or not. We provide to different methods. The method "kern" is based on simulations of surrogate data. If we assumed that there is not trend in the data, we can fit a kernel on the whole data-set and use this kernel to simulate data having the same structure than the original data set (same number of observations per period), and we can compute corresponding Kendall coefficient on the surrogate data sets. By doing this, we generate distribution of Kendall coefficient under H0 and therefore, we can assess the significance of our the coefficient on the observed data set. We should be aware that, by doing this, we totally ignore a potential autocorrelation (such as seasonality) within period. To overcome this limitation, we provide a second method "perm". The method is based on permutation: we permutate periods leading to vectors such as {D20,D12,Dp,...D1,D40} and estimate corresponding Kendall coefficient. Once again, this provides a distribution of Kendall coefficient under H0 (no trend so that periods can be permuted).
#res_confid <- estimate_confidence(res_choc,"perm",0.95,1000) #be aware that this make take several hours
g<-plot_choc(res_confid) print(g)
In this example, we used a permutation method with 1000 replicates to estimate the significance of Kendall coefficients. A function is then provided to plot results. Here we illustrate a choc analysis in two dimensions, but it can be extended to more dimensions.
We saw on the plot that the increasing trend in X and the declining trend in Y leads to a rarefaction of events with simultanous high Y and low X and, conversely, a combination of high X and low Y tend become more frequent (regarding our example on river discharge and temperature, this would correspond to an increase of drought period).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.