hiddenICP: Invariant Causal Prediction with hidden variables

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/hiddenICP.R

Description

Confidence intervals for causal effects in a regression setting with possible confounders.

Usage

1
hiddenICP(X, Y, ExpInd, alpha = 0.1, mode = "asymptotic", intercept=FALSE)

Arguments

X

A matrix (or data frame) with the predictor variables for all experimental settings

Y

The response or target variable of interest. Can be numeric for regression or a factor with two levels for binary classification.

ExpInd

Indicator of the experiment or the intervention type an observation belongs to. Can be a numeric vector of the same length as Y with K unique entries if there are K different experiments (for example entry 1 for all observational data and entry 2 for intervention data). Can also be a list, where each element of the list contains the indices of all observations that belong to the corresponding grouping in the data (for example two elements: first element is a vector that contains indices of observations that are observational data and second element is a vector that contains indices of all observations that are of interventional type).

alpha

The level of the test procedure. Use the default alpha=0.1 to obtain 90% confidence intervals.

mode

Currently only mode "asymptotic" is implemented; the argument is thus in the current version without effect.

intercept

Boolean variable; if TRUE, an intercept is added to the design matrix (but coefficients are returned without intercept term).

Value

A list with elements

betahat

The point estimator for the causal effects

maximinCoefficients

The value in the confidence interval for each variable effects that is closest to 0. Is hence non-zero for variables with significant effects.

ConfInt

The matrix with confidence intervals for the causal coefficient of all variables. First row is the upper bound and second row the lower bound.

pvalues

The p-values of all variables.

colnames

The column-names of the predictor variables.

alpha

The chosen level.

Author(s)

Nicolai Meinshausen <meinshausen@stat.math.ethz.ch>

References

none yet.

See Also

ICP for reconstructing the parents of a variable under arbitrary interventions on all other variables (but no hidden variables). See package "backShift" for constructing point estimates of causal cyclic models in the presence of hidden variables (again under shift interventions) .

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
 ##########################################
 ####### 1st example:
 ####### Simulate data with interventions
      set.seed(1)
    ## sample size n
      n <- 2000
    ## 4 predictor variables
      p  <- 4
    ## simulate as independent Gaussian variables
      X <- matrix(rnorm(n*p),nrow=n)
    ## divide data into observational (ExpInd=1) and interventional (ExpInd=2)
      ExpInd <- c(rep(1,n/2),rep(2,n/2))
    ## for interventional data (ExpInd==2): change distribution
      nI <- sum(ExpInd==2)
      X[ExpInd==2,] <- X[ExpInd==2,] + matrix( 5*rt( nI*p,df=3),ncol=p)
      ## add hidden variables
      W <- rnorm(n) * 5
      X <- X + outer(W, rep(1,4))
      
      ## first two variables are the causal predictors of Y
      beta <- c(1,1,0,0)
    ## response variable Y
      Y <- as.numeric(X%*%beta - 2*W + rnorm(n))
       

####### Compute "hidden Invariant Causal Prediction" Confidence Intervals
      icp <- hiddenICP(X,Y,ExpInd,alpha=0.01)
      print(icp)

 ###### Print point estimates and points in the confidence interval closest to 0
      print(icp$betahat)
      print(icp$maximinCoefficients)
      cat("true coefficients are:", beta)

 #### compare with coefficients from a linear model
      cat("coefficients from linear model:")
      print(summary(lm(Y ~ X-1)))



      
##########################################
####### 2nd example:
####### Simulate model X -> Y -> Z with hidden variables, trying to
######  estimate causal effects from (X,Z) on Y
      set.seed(1)
    ## sample size n
      n <- 10000
    ## simulate as independent Gaussian variables
      W <- rnorm(n)
      noiseX <- rnorm(n)
      noiseY <- rnorm(n)
      noiseZ <- rnorm(n)
    ## divide data into observational (ExpInd=1) and interventional (ExpInd=2)
      ExpInd <- c(rep(1,n/2),rep(2,n/2))
      noiseX[ which(ExpInd==2)] <- noiseX[ which(ExpInd==2)] * 5
      noiseZ[ which(ExpInd==2)] <- noiseZ[ which(ExpInd==2)] * 3

    ## simulate equilibrium data
      beta <- -0.5
      alpha <- 0.9
      X <- noiseX + 3*W
      Y <- beta* X + noiseY + 3*W
      Z <- alpha*Y + noiseZ
    

 ####### Compute "Invariant Causal Prediction" Confidence Intervals
      icp <- hiddenICP(cbind(X,Z),Y,ExpInd,alpha=0.1)
      print(icp)

 ###### Print/plot/show summary of output (truth here is (beta,0))
      print(signif(icp$betahat,3))
      print(signif(icp$maximinCoefficients,3))
      cat("true coefficients are:", beta,0)

 #### compare with coefficients from a linear model
      cat("coefficients from linear model:")
      print(summary(lm(Y ~ X + Z -1)))   
  

Example output

Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-16

Loading required package: mboost
Loading required package: parallel
Loading required package: stabs
This is mboost 2.9-1. See 'package?mboost' and 'news(package  = "mboost")'
for a complete list of changes.


 Invariant Linear Causal Regression (with hidden variables) at level 0.01
 Variables: Variable_1, Variable_2 show a significant causal effect
 
             LOWER BOUND  UPPER BOUND  MAXIMIN EFFECT  P-VALUE    
Variable_1         0.80         1.11            0.80   <1e-09 ***
Variable_2         0.79         1.01            0.79   <1e-09 ***
Variable_3        -0.19         0.15            0.00     1.00    
Variable_4        -0.20         0.12            0.00     0.94    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


[1]  0.95521468  0.90298167 -0.01826879 -0.04006316
[1] 0.8044002 0.7918343 0.0000000 0.0000000
true coefficients are: 1 1 0 0coefficients from linear model:
Call:
lm(formula = Y ~ X - 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-22.720  -2.648  -0.085   2.547  42.038 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
X1  0.63423    0.01740   36.46   <2e-16 ***
X2  0.69459    0.01579   43.99   <2e-16 ***
X3 -0.40159    0.01840  -21.82   <2e-16 ***
X4 -0.41061    0.01819  -22.57   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.284 on 1996 degrees of freedom
Multiple R-squared:  0.6721,	Adjusted R-squared:  0.6715 
F-statistic:  1023 on 4 and 1996 DF,  p-value: < 2.2e-16


 Invariant Linear Causal Regression (with hidden variables) at level 0.1
 Variable X shows a significant causal effect
 
    LOWER BOUND  UPPER BOUND  MAXIMIN EFFECT  P-VALUE    
X        -0.64        -0.50           -0.50   <1e-09 ***
Z        -0.15         0.03            0.00     0.37    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


[1] -0.5670 -0.0599
[1] -0.497  0.000
true coefficients are: -0.5 0coefficients from linear model:
Call:
lm(formula = Y ~ X + Z - 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.6010 -1.0426  0.0168  1.0640  7.0944 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
X -0.040632   0.003788  -10.73   <2e-16 ***
Z  0.572050   0.005536  103.34   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.768 on 9998 degrees of freedom
Multiple R-squared:  0.5287,	Adjusted R-squared:  0.5286 
F-statistic:  5607 on 2 and 9998 DF,  p-value: < 2.2e-16

InvariantCausalPrediction documentation built on Nov. 10, 2019, 5:06 p.m.