Description Usage Arguments Details Value Author(s) References See Also Examples
Estimates the causal parents S of the target variable Y using invariant causal prediction and fits a linear model of the form
Y = a X^S + N.
1 2 3 4 5 |
X |
matrix of predictor variables. Each column corresponds to one predictor variable. |
Y |
vector of target variable, with length(Y)=nrow(X). |
test |
string specifying the hypothesis test used to test for invariance of a parent set S (i.e. the null hypothesis H0_S). The following tests are available: "decoupled", "combined", "trend", "variance", "block.mean", "block.variance", "block.decoupled", "smooth.mean", "smooth.variance", "smooth.decoupled" and "hsic". |
par.test |
parameters specifying hypothesis test. The
following parameters are available: |
model |
string specifying the underlying model class. Either "iid" if Y consists of independent observations or "ar" if Y has a linear time dependence structure. |
par.model |
parameters specifying model. The following
parameters are available: |
max.parents |
integer specifying the maximum size for admissible parents. Reducing this below the number of predictor variables saves computational time but means that the confidence intervals lose their coverage property. |
stopIfEmpty |
if ‘TRUE’, the procedure will stop computing confidence intervals if the empty set has been accepted (and hence no variable can have a signicificant causal effect). Setting to ‘TRUE’ will save computational time in these cases, but means that the confidence intervals lose their coverage properties for values different to 0. |
silent |
If 'FALSE', the procedure will output progress notifications consisting of the currently computed set S together with the p-value resulting from the null hypothesis H0_S |
The function can be applied to two types of models
(1) a linear model (model="iid")
Y_i = a X_i^S + N_i
with iid noise N_i and
(2) a linear autoregressive model (model="ar")
Y_t = a_0 X_t^S + ... + a_p (Y_(t-p),X_(t-p)) + N_t
with iid noise N_t.
For both models the invariant prediction procedure is applied
using the hypothesis test specified by the test
parameter
to determine whether a candidate model is invariant. For further
details see the references.
object of class 'seqICP' consisting of the following elements
parent.set |
vector of the estimated causal parents. |
test.results |
matrix containing the result from each individual test as rows. |
S |
list of all the sets that were tested. The position within the list corresponds to the index in the first column of the test.results matrix. |
p.values |
p-value for being not included in the set of true causal parents. (If a p-value is smaller than alpha, the corresponding variable is a member of parent.set.) |
coefficients |
vector of coefficients resulting from a regression based on the estimated parent set. |
stopIfEmpty |
a boolean value indicating whether computations stop as soon as intersection of accepted sets is empty. |
modelReject |
a boolean value indicating if the whole model was rejected (the p-value of the best fitting model is too low). |
pknown |
a boolean value indicating whether the number of lags in the model was known. Only relevant if model was set to "ar". |
alpha |
significance level at which the hypothesis tests were performed. |
n.var |
number of predictor variables. |
model |
either "iid" or "ar" depending on which model was selected. |
Niklas Pfister and Jonas Peters
Pfister, N., P. Bühlmann and J. Peters (2017). Invariant Causal Prediction for Sequential Data. ArXiv e-prints (1706.08058).
Peters, J., P. Bühlmann, and N. Meinshausen (2016). Causal inference using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society, Series B (with discussion) 78 (5), 947–1012.
The function seqICP.s
allows to perform
hypothesis test for individual sets S. For non-linear
models the functions seqICPnl
and
seqICPnl.s
can be used.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | set.seed(1)
# environment 1
na <- 140
X1a <- 0.3*rnorm(na)
X3a <- X1a + 0.2*rnorm(na)
Ya <- -.7*X1a + .6*X3a + 0.1*rnorm(na)
X2a <- -0.5*Ya + 0.5*X3a + 0.1*rnorm(na)
# environment 2
nb <- 80
X1b <- 0.3*rnorm(nb)
X3b <- 0.5*rnorm(nb)
Yb <- -.7*X1b + .6*X3b + 0.1*rnorm(nb)
X2b <- -0.5*Yb + 0.5*X3b + 0.1*rnorm(nb)
# combine environments
X1 <- c(X1a,X1b)
X2 <- c(X2a,X2b)
X3 <- c(X3a,X3b)
Y <- c(Ya,Yb)
Xmatrix <- cbind(X1, X2, X3)
# Y follows the same structural assignment in both environments
# a and b (cf. the lines Ya <- ... and Yb <- ...).
# The direct causes of Y are X1 and X3.
# A linear model considers X1, X2 and X3 as significant.
# All these variables are helpful for the prediction of Y.
summary(lm(Y~Xmatrix))
# apply seqICP to the same setting
seqICP.result <- seqICP(X = Xmatrix, Y,
par.test = list(grid = seq(0, na + nb, (na + nb)/10), complements = FALSE, link = sum,
alpha = 0.05, B =100), max.parents = 4, stopIfEmpty=FALSE, silent=FALSE)
summary(seqICP.result)
# seqICP is able to infer that X1 and X3 are causes of Y
|
Call:
lm(formula = Y ~ Xmatrix)
Residuals:
Min 1Q Median 3Q Max
-0.205831 -0.061317 -0.001113 0.057515 0.266640
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001799 0.005980 0.301 0.764
XmatrixX1 -0.583158 0.027397 -21.285 < 2e-16 ***
XmatrixX2 -0.379482 0.047765 -7.945 1.06e-13 ***
XmatrixX3 0.687121 0.018082 38.000 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.08813 on 216 degrees of freedom
Multiple R-squared: 0.904, Adjusted R-squared: 0.9027
F-statistic: 678.1 on 3 and 216 DF, p-value: < 2.2e-16
Currently fitting set S = {}
p-value: 0.02
Currently fitting set S = {1}
p-value: 0.02
Currently fitting set S = {2}
p-value: 0.02
Currently fitting set S = {3}
p-value: 0.02
Currently fitting set S = {1, 2}
p-value: 0.02
Currently fitting set S = {1, 3}
p-value: 0.32
Currently fitting set S = {2, 3}
p-value: 0.02
Currently fitting set S = {1, 2, 3}
p-value: 0.2
Invariant Linear Causal Regression at level 0.05
Variables X1, X3 show a significant causal effect
coefficient lower bound upper bound p-value
intercept 0.0 -0.05900 0.0179 NA
X1 -0.7 -0.75200 -0.5292 0.02 *
X2 0.0 0.00000 0.0000 0.32
X3 0.6 0.57000 0.7228 0.02 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.