Description Usage Arguments Value Note Author(s) Examples
View source: R/impact_analysis.R
CausalImpact()
performs causal inference through
counterfactual predictions using a Bayesian structural
timeseries model.
See the package documentation (http://google.github.io/CausalImpact/) to understand the underlying assumptions. In particular, the model assumes that the time series of the treated unit can be explained in terms of a set of covariates which were themselves not affected by the intervention whose causal effect we are interested in.
The easiest way of running a causal analysis is to call
CausalImpact()
with data
, pre.period
,
post.period
, model.args
(optional), and
alpha
(optional). In this case, a timeseries model
is automatically constructed and estimated. The argument
model.args
offers some control over the model. See
Example 1 below.
An alternative is to supply a custom model. In this case,
the function is called with bsts.model
,
post.period.response
, and alpha
(optional).
See Example 3 below.
1 2 3 4 
data 
Time series of response variable and any
covariates. This can be a 
pre.period 
A vector specifying the first and the last time point of the
preintervention period in the response vector 
post.period 
A vector specifying the first and the last day of the
postintervention period we wish to study. This is the period
after the intervention has begun whose effect we are
interested in. The relationship between response variable and
covariates, as determined during the preperiod, will be used
to predict how the response variable should have evolved
during the postperiod had no intervention taken place. If

model.args 
Further arguments to adjust the default
construction of the statespace model used for inference.
One particularly important parameter is

bsts.model 
Instead of passing in 
post.period.response 
Actual observed data during
the postintervention period. This is required if and
only if a fitted 
alpha 
Desired tailarea probability for posterior intervals. Defaults to 0.05, which will produce central 95% intervals. 
CausalImpact()
returns a CausalImpact
object containing the original observed response, its
counterfactual predictions, as well as pointwise and
cumulative impact estimates along with posterior credible
intervals. Results can summarised using summary()
and visualized using plot()
. The object is a list
with the following fields:
series
. Timeseries object (zoo
)
containing the original response response
as
well as the computed inferences. The added columns are
listed in the table below.
summary
. Summary statistics for the
postintervention period. This includes the posterior
expectation of the overall effect, the corresponding
posterior credible interval, and the posterior
probability that the intervention had any effect,
expressed in terms of a onesided pvalue. Note that
checking whether the posterior interval includes zero
corresponds to a twosided hypothesis test. In contrast,
checking whether the pvalue is below alpha
corresponds to a onesided hypothesis test.
report
. A suggested verbal
interpretation of the results.
model
. A list with four elements pre.period
,
post.period
, bsts.model
and alpha
. pre.period
and post.period
indicate the first and last time point of
the time series in the respective period, bsts.model
is
the fitted model returned by bsts()
, and alpha
is the userspecified tailarea probability.
The field series
is a
zoo
timeseries object with the following columns:
response 
Observed response as supplied to CausalImpact() . 
cum.response  Cumulative response during the modeling period. 
point.pred  Posterior mean of counterfactual predictions. 
point.pred.lower 
Lower limit of a (1  alpha ) posterior interval. 
point.pred.upper 
Upper limit of a (1  alpha ) posterior interval. 
cum.pred  Posterior cumulative counterfactual predictions. 
cum.pred.lower 
Lower limit of a (1  alpha ) posterior interval.

cum.pred.upper 
Upper limit of a (1  alpha ) posterior interval.

point.effect  Pointwise posterior causal effect. 
point.effect.lower  Lower limit of the posterior interval (as above). 
point.effect.lower  Upper limit of the posterior interval (as above). 
cum.effect  Posterior cumulative effect. 
cum.effect.lower  Lower limit of the posterior interval (as above). 
cum.effect.upper  Upper limit of the posterior interval (as above). 
Optional arguments can be passed as a list in model.args
,
providing additional control over model construction:
niter
. Number of MCMC samples to draw. Higher numbers
yield more accurate inferences. Defaults to 1000.
standardize.data
. Whether to standardize all columns of the
data using moments estimated from the preintervention period before fitting
the model. This is equivalent to an empirical Bayes approach to setting the
priors. It ensures that results are invariant to linear transformations of
the data. Defaults to TRUE
.
prior.level.sd
. Prior standard deviation of the Gaussian random
walk of the local level, expressed in terms of data standard deviations.
Defaults to 0.01, a typical choice for wellbehaved and stable datasets
with low residual volatility after regressing out known predictors (e.g.,
web searches or sales in high quantities). When in doubt, a safer option is
to use 0.1, as validated on synthetic data, although this may sometimes give
rise to unrealistically wide prediction intervals.
nseasons
. Period of the seasonal components. In order to
include a seasonal component, set this to a whole number greater than 1. For
example, if the data represent daily observations, use 7 for a dayofweek
component. This interface currently only supports up to one seasonal
component. To specify multiple seasonal components, use bsts
to
specify the model directly, then pass the fitted model in as
bsts.model
. Defaults to 1, which means no seasonal component is used.
season.duration
. Duration of each season, i.e., number of data
points each season spans. For example, to add a dayofweek component to
data with daily granularity, supply the arguments
model.args = list(nseasons = 7, season.duration = 1)
.
Alternatively, use
model.args = list(nseasons = 7, season.duration = 24)
to add a dayofweek component to data with hourly granularity.
Defaults to 1.
dynamic.regression
. Whether to include timevarying regression
coefficients. In combination with a timevarying local trend or even a
timevarying local level, this often leads to overspecification, in which
case a static regression is safer. Defaults to FALSE
.
max.flips
. By default, each iteration tries to flip each
predictor in or out of the model. Setting max.flips
tells the
algorithm to randomly choose that many variables to explore flipping. Useful
to set (e.g. 100) when you have a large number of controls (e.g. 10000); in
such cases, a lower max.flips
can speed up computation. Defaults to
1.
Kay H. Brodersen kbrodersen@google.com
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51  # Example 1
#
# Example analysis on a simple artificial dataset
# consisting of a response variable y and a
# single covariate x1.
set.seed(1)
x1 < 100 + arima.sim(model = list(ar = 0.999), n = 52)
y < 1.2 * x1 + rnorm(52)
y[41:52] < y[41:52] + 10
data < cbind(y, x1)
pre.period < c(1, 40)
post.period < c(41, 52)
impact < CausalImpact(data, pre.period, post.period)
# Print and plot results
summary(impact)
summary(impact, "report")
plot(impact)
plot(impact, "original")
plot(impact$model$bsts.model, "coefficients")
# For further output, type:
names(impact)
## Not run:
# Example 2
#
# Weekly time series: same data as in example 1, annotated
# with dates.
times < seq.Date(as.Date("20160103"), by = 7, length.out = 52)
data < zoo(cbind(y, x1), times)
impact < CausalImpact(data, times[pre.period], times[post.period])
summary(impact) # Same as in example 1.
plot(impact) # The plot now shows dates on the xaxis.
# Example 3
#
# For full flexibility, specify a custom model and pass the
# fitted model to CausalImpact(). To run this example, run
# the code for Example 1 first.
post.period.response < y[post.period[1] : post.period[2]]
y[post.period[1] : post.period[2]] < NA
ss < AddLocalLevel(list(), y)
bsts.model < bsts(y ~ x1, ss, niter = 1000)
impact < CausalImpact(bsts.model = bsts.model,
post.period.response = post.period.response)
plot(impact)
## End(Not run)

Loading required package: bsts
Loading required package: BoomSpikeSlab
Loading required package: Boom
Loading required package: MASS
Attaching package: 'Boom'
The following object is masked from 'package:stats':
rWishart
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Loading required package: xts
Posterior inference {CausalImpact}
Average Cumulative
Actual 112 1340
Prediction (s.d.) 102 (0.39) 1223 (4.72)
95% CI [101, 103] [1214, 1232]
Absolute effect (s.d.) 9.8 (0.39) 117.3 (4.72)
95% CI [9.1, 11] [108.8, 127]
Relative effect (s.d.) 9.6% (0.39%) 9.6% (0.39%)
95% CI [8.9%, 10%] [8.9%, 10%]
Posterior tailarea probability p: 0.00132
Posterior prob. of a causal effect: 99.8679%
For more details, type: summary(impact, "report")
Analysis report {CausalImpact}
During the postintervention period, the response variable had an average value of approx. 111.70. By contrast, in the absence of an intervention, we would have expected an average response of 101.93. The 95% interval of this counterfactual prediction is [101.14, 102.63]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 9.77 with a 95% interval of [9.07, 10.56]. For a discussion of the significance of this effect, see below.
Summing up the individual data points during the postintervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 1.34K. By contrast, had the intervention not taken place, we would have expected a sum of 1.22K. The 95% interval of this prediction is [1.21K, 1.23K].
The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +10%. The 95% interval of this percentage is [+9%, +10%].
This means that the positive effect observed during the intervention period is statistically significant and unlikely to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears substantive significance can only be answered by comparing the absolute effect (9.77) to the original goal of the underlying intervention.
The probability of obtaining this effect by chance is very small (Bayesian onesided tailarea probability p = 0.001). This means the causal effect can be considered statistically significant.
Warning messages:
1: Removed 52 rows containing missing values (geom_path).
2: Removed 104 rows containing missing values (geom_path).
[1] "series" "summary" "report" "model"
Posterior inference {CausalImpact}
Average Cumulative
Actual 112 1340
Prediction (s.d.) 102 (0.37) 1223 (4.49)
95% CI [101, 103] [1214, 1231]
Absolute effect (s.d.) 9.8 (0.37) 117.3 (4.49)
95% CI [9.1, 11] [108.9, 126]
Relative effect (s.d.) 9.6% (0.37%) 9.6% (0.37%)
95% CI [8.9%, 10%] [8.9%, 10%]
Posterior tailarea probability p: 0.00132
Posterior prob. of a causal effect: 99.8679%
For more details, type: summary(impact, "report")
Warning messages:
1: Removed 52 rows containing missing values (geom_path).
2: Removed 104 rows containing missing values (geom_path).
===== Iteration 0 Mon May 7 07:42:43 2018 =====
===== Iteration 100 Mon May 7 07:42:43 2018 =====
===== Iteration 200 Mon May 7 07:42:43 2018 =====
===== Iteration 300 Mon May 7 07:42:43 2018 =====
===== Iteration 400 Mon May 7 07:42:43 2018 =====
===== Iteration 500 Mon May 7 07:42:43 2018 =====
===== Iteration 600 Mon May 7 07:42:43 2018 =====
===== Iteration 700 Mon May 7 07:42:43 2018 =====
===== Iteration 800 Mon May 7 07:42:43 2018 =====
===== Iteration 900 Mon May 7 07:42:43 2018 =====
Warning messages:
1: Removed 52 rows containing missing values (geom_path).
2: Removed 104 rows containing missing values (geom_path).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.