CausalImpact: Inferring causal impact using structural time-series models

Description Usage Arguments Value Note Author(s) Examples

View source: R/impact_analysis.R

Description

CausalImpact() performs causal inference through counterfactual predictions using a Bayesian structural time-series model.

See the package documentation (http://google.github.io/CausalImpact/) to understand the underlying assumptions. In particular, the model assumes that the time series of the treated unit can be explained in terms of a set of covariates which were themselves not affected by the intervention whose causal effect we are interested in.

The easiest way of running a causal analysis is to call CausalImpact() with data, pre.period, post.period, model.args (optional), and alpha (optional). In this case, a time-series model is automatically constructed and estimated. The argument model.args offers some control over the model. See Example 1 below.

An alternative is to supply a custom model. In this case, the function is called with bsts.model, post.period.response, and alpha (optional). See Example 3 below.

Usage

1
2
3
4
  CausalImpact(data = NULL, pre.period = NULL,
    post.period = NULL, model.args = NULL,
    bsts.model = NULL, post.period.response = NULL,
    alpha = 0.05)

Arguments

data

Time series of response variable and any covariates. This can be a zoo object, a vector, a matrix, or a data.frame. In any of these cases, the response variable must be in the first column, and any covariates in subsequent columns. A zoo object is recommended, as its time indices will be used to format the x-axis in plot().

pre.period

A vector specifying the first and the last time point of the pre-intervention period in the response vector y. This period can be thought of as a training period, used to determine the relationship between the response variable and the covariates. If data is a zoo object with a time attribute, pre.period must be indicated using the same time scale (i.e. using the same class as time(data), see Example 2 below). If data doesn't have a time attribute, post.period is indicated with indices.

post.period

A vector specifying the first and the last day of the post-intervention period we wish to study. This is the period after the intervention has begun whose effect we are interested in. The relationship between response variable and covariates, as determined during the pre-period, will be used to predict how the response variable should have evolved during the post-period had no intervention taken place. If data is a zoo object with a time attribute, post.period must be indicated using the same time scale. If data doesn't have a time attribute, post.period is indicated with indices.

model.args

Further arguments to adjust the default construction of the state-space model used for inference. One particularly important parameter is prior.level.sd, which specifies our a priori knowledge about the volatility of the data. For even more control over the model, you can construct your own model using the bsts package and feed the fitted model into CausalImpact(), as shown in Example 3.

bsts.model

Instead of passing in data and having CausalImpact() construct a model, it is possible to create a custom model using the bsts package. In this case, omit data, pre.period, and post.period. Instead only pass in bsts.model, post.period.response, and alpha (optional). The model must have been fitted on data where the response variable was set to NA during the post-treatment period. The actual observed data during this period must then be passed to the function in post.period.response.

post.period.response

Actual observed data during the post-intervention period. This is required if and only if a fitted bsts.model is provided instead of data.

alpha

Desired tail-area probability for posterior intervals. Defaults to 0.05, which will produce central 95% intervals.

Value

CausalImpact() returns a CausalImpact object containing the original observed response, its counterfactual predictions, as well as pointwise and cumulative impact estimates along with posterior credible intervals. Results can summarised using summary() and visualized using plot(). The object is a list with the following fields:

The field series is a zoo time-series object with the following columns:

response Observed response as supplied to CausalImpact().
cum.response Cumulative response during the modeling period.
point.pred Posterior mean of counterfactual predictions.
point.pred.lower Lower limit of a (1 - alpha) posterior interval.
point.pred.upper Upper limit of a (1 - alpha) posterior interval.
cum.pred Posterior cumulative counterfactual predictions.
cum.pred.lower Lower limit of a (1 - alpha) posterior interval.
cum.pred.upper Upper limit of a (1 - alpha) posterior interval.
point.effect Point-wise posterior causal effect.
point.effect.lower Lower limit of the posterior interval (as above).
point.effect.lower Upper limit of the posterior interval (as above).
cum.effect Posterior cumulative effect.
cum.effect.lower Lower limit of the posterior interval (as above).
cum.effect.lower Upper limit of the posterior interval (as above).

Note

Optional arguments can be passed as a list in model.args, providing additional control over model construction:

Author(s)

Kay H. Brodersen kbrodersen@google.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Example 1
#
# Example analysis on a simple artificial dataset
# consisting of a response variable y and a
# single covariate x1.
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = 52)
y <- 1.2 * x1 + rnorm(52)
y[41:52] <- y[41:52] + 10
data <- cbind(y, x1)
pre.period <- c(1, 40)
post.period <- c(41, 52)
impact <- CausalImpact(data, pre.period, post.period)

# Print and plot results
summary(impact)
summary(impact, "report")
plot(impact)
plot(impact, "original")
plot(impact$model$bsts.model, "coefficients")

# For further output, type:
names(impact)

## Not run: 
# Example 2
#
# Weekly time series: same data as in example 1, annotated
# with dates.
times <- seq.Date(as.Date("2016-01-03"), by = 7, length.out = 52)
data <- zoo(cbind(y, x1), times)

impact <- CausalImpact(data, times[pre.period], times[post.period])

summary(impact)  # Same as in example 1.
plot(impact)  # The plot now shows dates on the x-axis.

# Example 3
#
# For full flexibility, specify a custom model and pass the
# fitted model to CausalImpact(). To run this example, run
# the code for Example 1 first.
post.period.response <- y[post.period[1] : post.period[2]]
y[post.period[1] : post.period[2]] <- NA
ss <- AddLocalLevel(list(), y)
bsts.model <- bsts(y ~ x1, ss, niter = 1000)
impact <- CausalImpact(bsts.model = bsts.model,
                       post.period.response = post.period.response)
plot(impact)

## End(Not run)

Example output

Loading required package: bsts
Loading required package: BoomSpikeSlab
Loading required package: Boom
Loading required package: MASS

Attaching package: 'Boom'

The following object is masked from 'package:stats':

    rWishart

Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Loading required package: xts
Posterior inference {CausalImpact}

                         Average        Cumulative  
Actual                   112            1340        
Prediction (s.d.)        102 (0.39)     1223 (4.72) 
95% CI                   [101, 103]     [1214, 1232]
                                                    
Absolute effect (s.d.)   9.8 (0.39)     117.3 (4.72)
95% CI                   [9.1, 11]      [108.8, 127]
                                                    
Relative effect (s.d.)   9.6% (0.39%)   9.6% (0.39%)
95% CI                   [8.9%, 10%]    [8.9%, 10%] 

Posterior tail-area probability p:   0.00132
Posterior prob. of a causal effect:  99.8679%

For more details, type: summary(impact, "report")

Analysis report {CausalImpact}


During the post-intervention period, the response variable had an average value of approx. 111.70. By contrast, in the absence of an intervention, we would have expected an average response of 101.93. The 95% interval of this counterfactual prediction is [101.14, 102.63]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 9.77 with a 95% interval of [9.07, 10.56]. For a discussion of the significance of this effect, see below.

Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 1.34K. By contrast, had the intervention not taken place, we would have expected a sum of 1.22K. The 95% interval of this prediction is [1.21K, 1.23K].

The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +10%. The 95% interval of this percentage is [+9%, +10%].

This means that the positive effect observed during the intervention period is statistically significant and unlikely to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears substantive significance can only be answered by comparing the absolute effect (9.77) to the original goal of the underlying intervention.

The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0.001). This means the causal effect can be considered statistically significant. 
Warning messages:
1: Removed 52 rows containing missing values (geom_path). 
2: Removed 104 rows containing missing values (geom_path). 
[1] "series"  "summary" "report"  "model"  
Posterior inference {CausalImpact}

                         Average        Cumulative  
Actual                   112            1340        
Prediction (s.d.)        102 (0.37)     1223 (4.49) 
95% CI                   [101, 103]     [1214, 1231]
                                                    
Absolute effect (s.d.)   9.8 (0.37)     117.3 (4.49)
95% CI                   [9.1, 11]      [108.9, 126]
                                                    
Relative effect (s.d.)   9.6% (0.37%)   9.6% (0.37%)
95% CI                   [8.9%, 10%]    [8.9%, 10%] 

Posterior tail-area probability p:   0.00132
Posterior prob. of a causal effect:  99.8679%

For more details, type: summary(impact, "report")

Warning messages:
1: Removed 52 rows containing missing values (geom_path). 
2: Removed 104 rows containing missing values (geom_path). 
=-=-=-=-= Iteration 0 Mon May  7 07:42:43 2018 =-=-=-=-=
=-=-=-=-= Iteration 100 Mon May  7 07:42:43 2018 =-=-=-=-=
=-=-=-=-= Iteration 200 Mon May  7 07:42:43 2018 =-=-=-=-=
=-=-=-=-= Iteration 300 Mon May  7 07:42:43 2018 =-=-=-=-=
=-=-=-=-= Iteration 400 Mon May  7 07:42:43 2018 =-=-=-=-=
=-=-=-=-= Iteration 500 Mon May  7 07:42:43 2018 =-=-=-=-=
=-=-=-=-= Iteration 600 Mon May  7 07:42:43 2018 =-=-=-=-=
=-=-=-=-= Iteration 700 Mon May  7 07:42:43 2018 =-=-=-=-=
=-=-=-=-= Iteration 800 Mon May  7 07:42:43 2018 =-=-=-=-=
=-=-=-=-= Iteration 900 Mon May  7 07:42:43 2018 =-=-=-=-=
Warning messages:
1: Removed 52 rows containing missing values (geom_path). 
2: Removed 104 rows containing missing values (geom_path). 

CausalImpact documentation built on Jan. 9, 2020, 1:10 a.m.