CausalArima: Causal effect of an intervention

View source: R/CausalARIMA.R

CausalArimaR Documentation

Causal effect of an intervention

Description

The function estimates the causal effect of an intervention on a time series by using the best-fitting ARIMA model on pre-intervention data to forecast what would have happened afterwards in the absence of the intervention. The prediction step can be improved by adding covariates that are linked to the outcome variable. Then, the causal effect is estimated by difference between the observed outcome post-intervention ad the prediction.

Usage

CausalArima(
  y,
  auto = TRUE,
  order = c(0, 0, 0),
  seasonal = c(0, 0, 0),
  ic = "aic",
  xreg = NULL,
  dates,
  int.date,
  arima.args = list(),
  auto.args = list(),
  nboot = NULL,
  alpha = 0.05
)

Arguments

y

Vector of observations, univariate time series of class ts or numeric.

auto

If set to TRUE, auto.arima returns the best fitting model in the pre-intervention period according to the selected ic. Defaults to TRUE.

order

Order of the non-seasonal part of the ARIMA model, (p, d, q). Required when auto is set to FALSE.

seasonal

Order of the seasonal part of the ARIMA model, (P, D, Q). Defaults to c(0, 0, 0).

ic

Information criterion to use for model selection. Possible values in c("aicc", "aic", "bic"). Defaults to "aic".

xreg

Optional vector, matrix or data.frame of regressors to be included in the model. It must have the same number of rows as y.

dates

Vector of dates of length t (with elements of class Date) corresponding to the observations in y.

int.date

Date of the intervention (must be of class Date).

arima.args

Additional arguments to be passed to Arima.

auto.args

Additional arguments to be passed to auto.arima.

nboot

Number of bootstrap iteration. Optional, if nboot > 0 the function samples nboot times from the empirical distribution of model residuals and computes empirical critical values for the point, cumulative and temporal average effect statistics.

alpha

Confidence level for the prediction intervals. Defaults to 0.05.

Details

The function accepts both ts and numeric objects. In the latter case, y is internally transformed in a ts object whose frequency is set by findfrequency from the forecast package.

NA values are allowed in y but not in xreg. Users are then left to choose their preferred technique to deal with missing data in the outcome (e.g., imputation, deletion, analysis etc.)

Value

A list with the following components:

norm

The inferred point, cumulative and temporal average effects at each point in time after the intervention with their standard deviations and p-values (left-sided, right-sided, bidirectional) estimated under the assumption of Normally distributed residuals.

boot

The inferred point, cumulative and temporal average effects at each point in time after the intervention with their standard deviations and p-values (left-sided, right-sided, bidirectional) estimated by bootstrap.

causal.effect

Estimated causal effect at each point in time after the intervention date (point effect).

model

The model estimated in the pre-intervention period, result of a call to Arima.

dates

The vector of dates considered in the analysis.

int.date

The intervention date.

y

The vector of observations considered in the analysis.

xreg

The covariates, if any, used in the analysis.

forecast

Forecasted time series in the absence of intervention.

forecast_lower

Lower confidence bound for the forecasted series.

forecast_upper

Upper confidence bound for the forecasted series.

alpha

Confidence level for the prediction intervals.

References

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581-592.

Little, R. J. A., and Rubin, D.B. (1987). Statistical Analysis with Missing Data. New York: John Wiley & Sons.

Imbens, G. W., and and Rubin, D.B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge, U.K.: Cambridge University Press.

Examples

## Example 1 (daily data, no predictors)
# Generating a time series of length 1000 and a vector of dates
y <- 0.5*seq(0.5, 250, by = 0.5) + rnorm(500, sd = 6)
dates <- seq.Date(from = as.Date("2014-01-05"), by = "days", length.out = 500)

# Adding a fictional intervention
int.date <- as.Date("2015-04-01")
horizon <- as.Date(c("2015-04-10", "2015-04-20"))
y.new <- y ; y.new[dates >= int.date] <- y.new[dates >= int.date]*1.25

# Plot
plot(y = y.new, x = dates, type = "l", col = "cadetblue", xlim = c(as.Date("2014-10-01"), tail(dates, 1)))
lines(y = y, x = dates, col = "orange")

# Causal effect estimation
ce <- CausalArima(y = y.new, dates = dates, int.date = int.date)

## Example 2 (daily data, with predictors)
# Loading data and setting dates
data(sales)
y <- sales[, "Sales"]
dates <- as.Date(sales[, "Dates"])
int.date <- as.Date("2018-10-04")
xreg <- sales[, "Price"]

# Plot
oldpar <- par(no.readonly = TRUE)
par(mfrow = c(1,2))
plot(y = y, x = dates, type = "l", main = "Time series of daily sales")
abline(v = int.date, col = "red"); Acf(y, main = "Acf sales")
par(oldpar)

# Causal effect estimation
# The autocorrelation function indicates a weekly sesonal pattern
ce <- CausalArima(y = ts(y, frequency = 7), xreg = xreg, int.date = int.date,
                  dates = dates, nboot = 100)

FMenchetti/CausalArima documentation built on May 14, 2024, 10:14 p.m.