The goal of CausalArima is to estimate the causal effect of an intervention on a univariate time series using ARIMA models.
You can install the development version of CausalArima from GitHub with:
# install.packages("devtools")
devtools::install_github("FMenchetti/CausalArima")
This is a basic example which shows you how to use the package:
library(CausalArima)
#> Loading required package: forecast
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
#> Loading required package: ggplot2
#> Loading required package: gridExtra
# simulate data
n<-100
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = n)
y <- 1.2 * x1 + rnorm(n)
y[ floor(n*.71):n] <- y[ floor(n*.71):n] + 10
data <- cbind(y, x1)
dates <- seq.Date(from = as.Date("2014-01-05"), by = "days", length.out = n)
start<-as.numeric(strftime(as.Date(dates[1], "%Y-%m-%d"), "%u"))
# Adding a fictional intervention
int.date <- as.Date("2014-03-16")
# fit the model - Causal effect estimation
ce <- CausalArima(y = ts(y, start = start, frequency = 1), dates = dates, int.date = int.date,
xreg =x1, nboot = 1000)
How to obtain the plot of the forecast:
forecasted<-plot(ce, type="forecast")
forecasted
How to obtain the plot of the estimated effects and cumulative effects:
impact_p<-plot(ce, type="impact")
grid.arrange(impact_p$plot, impact_p$cumulative_plot)
How to obtain a quick summary of the estimated effect:
summary(ce)
#>
#> Point causal effect 12.257
#> Standard error 1.211
#> Left-sided p-value 1
#> Bidirectional p-value 0
#> Right-sided p-value 0
#>
#> Cumulative causal effect 310.709
#> Standard error 6.634
#> Left-sided p-value 1
#> Bidirectional p-value 0
#> Right-sided p-value 0
#>
#> Temporal average causal effect 10.357
#> Standard error 0.221
#> Left-sided p-value 1
#> Bidirectional p-value 0
#> Right-sided p-value 0
How to obtain a detailed summary of the results, with an option to produce tables in html format (notice that to proper display the results as html on a Rmarkdown chunk you have to set result as ‘asis’). Other possible format include “numeric”, useful to retrieve the statistics and use them in calculations, and “latex”. Estimated model:
summary_model<-impact(ce, format="html")
summary_model$arima
$arima_order
p d q arima_order 0 0 0 $param coef se t value xreg 1.199333 0.0016581 723.3279 $accuracy ME RMSE MAE MPE MAPE MASE ACF1 Training set 0.0043228 1.202503 0.9464393 -0.0051705 0.9072633 0.5734012 0.1407503 $log_stats loglik aic bic aicc metrics -112.234 228.4681 232.9651 228.6472Causal impact:
summary_model$impact_norm
$average
estimate sd p_value_left p_value_bidirectional p_value_right 10.35698 0.2211311 1 0 0 $sum estimate sd p_value_left p_value_bidirectional p_value_right 310.7095 6.633933 1 0 0 $point_effect estimate sd p_value_left p_value_bidirectional p_value_right 12.25715 1.211185 1 0 0Causal impact based on boostrap:
summary_model$impact_boot
$average
estimates inf sup sd observed 117.0485168 NA NA NA forecasted 106.6915345 106.2768920 107.1550453 0.2184921 absolute_effect 10.3569824 9.8934716 10.7716249 0.2184921 relative_effect 0.0970741 0.0927297 0.1009604 0.0020479 $effect_cum estimates inf sup sd observed 3511.4555050 NA NA NA forecasted 3200.7460337 3188.3067592 3214.6513578 6.5547623 absolute_effect 310.7094713 296.8041472 323.1487458 6.5547623 relative_effect 0.0970741 0.0927297 0.1009604 0.0020479 $p_values x alpha 0.05 p 0.00How to inspect the residuals, with the plots of autocorrelation (ACF) and partial autocorrelation (PACF) functions and QQ-plots:
residuals<-plot(ce, type="residuals")
grid.arrange(residuals$ACF, residuals$PACF, residuals$QQ_plot)
# simulate data
n<-100
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = n)
y <- 1.2 * x1 + rnorm(n)
y[ floor(n*.71):n] <- y[ floor(n*.71):n] + 10
data <- cbind(y, x1)
dates <- seq.Date(from = as.Date("2014-01-05"), by = "days", length.out = n)
start<-as.numeric(strftime(as.Date(dates[1], "%Y-%m-%d"), "%u"))
# Adding a fictional intervention
int.date <- as.Date("2014-03-16")
horizon<-as.Date(c("2014-03-25", "2014-04-05")) # add horizons
# fit the model - Causal effect estimation
ce <- CausalArima(y = ts(y, start = start, frequency = 1), ic = "aicc", dates = dates, int.date = int.date,
xreg =x1, nboot = 1000)
How to obtain the plot of the estimated effects and cumulative effects:
impact_p<-plot(ce, type="impact", horizon = horizon)
grid.arrange(impact_p$plot, impact_p$cumulative_plot)
How to obtain a quick summary of the estimated effect:
summary(ce, horizon = horizon)
#> 2014-03-25 2014-04-05
#> Point causal effect 9.962 9.673
#> Standard error 1.211 1.211
#> Left-sided p-value 1 1
#> Bidirectional p-value 0 0
#> Right-sided p-value 0 0
#>
#> Cumulative causal effect 104.356 216.327
#> Standard error 3.83 5.55
#> Left-sided p-value 1 1
#> Bidirectional p-value 0 0
#> Right-sided p-value 0 0
#>
#> Temporal average causal effect 10.436 10.301
#> Standard error 0.383 0.264
#> Left-sided p-value 1 1
#> Bidirectional p-value 0 0
#> Right-sided p-value 0 0
How to obtain a detailed summary of the results, with an option to produce tables in html format (notice that to proper display the results as html on a Rmarkdown chunk you have to set result as ‘asis’). Other possible format include “numeric”, useful to retrieve the statistics and use them in calculations, and “latex”. Estimated model:
summary_model<-impact(ce, format="html", horizon = horizon)
summary_model$arima
$arima_order
p d q arima_order 0 0 0 $param coef se t value xreg 1.199333 0.0016581 723.3279 $accuracy ME RMSE MAE MPE MAPE MASE ACF1 Training set 0.0043228 1.202503 0.9464393 -0.0051705 0.9072633 0.5734012 0.1407503 $log_stats loglik aic bic aicc metrics -112.234 228.4681 232.9651 228.6472Causal impact:
summary_model$impact_norm
$average
horizon estimate sd p_value_left p_value_bidirectional p_value_right 2014-03-25 10.43560 0.3830103 1 0 0 2014-04-05 10.30127 0.2643022 1 0 0 $sum horizon estimate sd p_value_left p_value_bidirectional p_value_right 2014-03-25 104.3560 3.830103 1 0 0 2014-04-05 216.3267 5.550346 1 0 0 $point_effect horizon estimate sd p_value_left p_value_bidirectional p_value_right 2014-03-25 9.961521 1.211185 1 0 0 2014-04-05 9.673077 1.211185 1 0 0Causal impact based on boostrap:
summary_model$impact_boot
$2014-03-25
2014-04-05
estimates
inf
sup
sd
observed
117.8232981
NA
NA
NA
forecasted
107.4906190
106.9643496
108.0108449
0.2687184
absolute_effect
10.3326792
9.8124532
10.8589485
0.2687184
relative_effect
0.0961263
0.0912866
0.1010223
0.0024999
estimates
inf
sup
sd
observed
2356.4659629
NA
NA
NA
forecasted
2149.8123792
2139.2869920
2160.2168984
5.3743690
absolute_effect
206.6535837
196.2490645
217.1789708
5.3743690
relative_effect
0.0961263
0.0912866
0.1010223
0.0024999
x
alpha
0.05
p
0.00
The plotting functions have some graphical parameters that make easier to personalize the plots:
forecasted_2<-plot(ce, type="forecast", fill_colour="orange",
colours=c("red", "blue"))
forecasted_2
All plotting functions return a ggplot object or a list of ggplot objects, which makes easy to modify any ggplot parameters of the theme. The ggthemes package can be useful to employ directly some pre-customized themes, for example we can use the Wall Street Journal theme simply typing:
library(ggthemes)
forecasted+theme_wsj()
You can read more on Estimating the causal effect of an intervention in a time series setting: the C-ARIMA approach (Fiammetta Menchetti, Fabrizio Cipollini, Fabrizia Mealli, 2021).
It is also available on youtube a video of a webinar on the topic: Fiammetta Menchetti: Estimating the causal effect of an intervention in a time series setting.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.