Description Usage Arguments Details Value References See Also Examples
A method for finding causal predictors of a target variable described by either a linear, generalized linear or hazard model. The methodology uses heterogeneous data to make causal inference.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Y |
The response or target variable of interest. Either a numeric vector
or |
X |
A matrix (or data frame) with the predictor variables. |
E |
Indicator of the experiment or the intervention type an observation
belongs to. Can be a vector of the same length as |
model |
A character indicating how to model the ditribution of the target variable given covariates. Possible choices are
|
method |
A character indicating which method to use. Possible values are
See detailes for more guidence on methods. |
level |
Numerical value between 0 and 1 denoting the significance level used when testing. If not specified the algorithm will only calculate the p-values of the null hypotheses (H0,S>) and draw no conclusions based on these values. |
gof |
If no set of variables (including the empty set) leads to a
p-value larger than the goodness-of-fit cutoff |
maxNoVariables |
The maximal subset size (choosing smaller values saves computational resources but increases approximation error). |
fullAnalysis |
If |
progress |
If |
... |
Additional arguments carried to the lower level functions. |
The ICP
function implements different concrete methods within the
methodology of invariant Causal Predictions which was first desriced
in Peters et al. (2016) (see references below). This implementation of
invariant Causal Predictions is well suited when the distribution of
the target variable may be described by a linear model, generalized linear
model or hazard model. There are three different methods for testing
invariance implemented in ICP
- EnvirRel
, CR
and
TimeVar
- and they are each given a description below under "The
Invarince Test Methods".
As input the ICP
function takes a target variable Y
which is
either a numeric vector or a Survival
object, a
matrix or data.frame of covariates X
and possibly - depending on the
method - a vector of environments E
. The ICP
function computes
a p-value of the following family of null hypotheses:
X
encodes p
covariates). The results of
these hypothesis tests may be found in model.analysis
.
If level
is specified (a subset of) the causal predictors is estimated
using the formula (see Peters et al. (2016) for details):
A
is outputted under the name accepted.model
. This
computation is done by the function model_analysis
, which is
also a function in its own right.
Moreover, if both level
is specified and fullAnalysis = TRUE
then the function variable_analysis
will calculate the
significance of each individual variable in X
. This significance table
is returned under the name variable.analysis
.
The gof
parameter protects against making statements when the model is
obviously not suitable for the data. If no model reaches the threshold
gof
significance level, i.e. the p-values for
(H0,S) are all smaller then
gof
, we report that there is no evidence for individual variables, as
there is no evidence for an invariant set.
The Invarince Test Methods
Three different invariance test methods have been implemented:
method = "EnvirRel"
: The invariance test method of Environment
Relevance is the standard method and can be applied data from to all
model
types (lm, glm & hazard). This method requires environments
E
as input.
method = "CR"
: The invariance test method of Intersecting
Confidence Regions can be applied to data from to all model
types
(lm, glm & hazard). This method requires environments E
as input.
Moreover, a solution within the CR
method framework may be found in
tree different ways: The standard is solver = "QC"
, which is ususally
also the slowest solver. If computational time is an issue the user may need
to use the approximate solvers solver = "pairwise"
or
solver = "marginal"
.
method = "TimeVar"
: The invariance test method of Time
Variability can only be applied to data from "ph
" or "ah
" type
models. This method does not require environment information, as it
uses time as environment. The "TimeVar
" method has three different
concrete nonparamtest
s: a Kolmogorov–Smirnov test type test denotes
"sup
", a Cramér–von Mises criterion type test denoted "int
",
or simply both tests denoted "test
".
The ICP
function returns an object of class
ICP
. Such an object will contain the following
model.analysis |
A data frame listing the different models tested in the first column and the found p-values in the second column. |
call |
The matched call. |
level |
The significance level. If not specified this is |
method |
The method object used for the model fitting and hypothesis testing. |
accepted.model |
The estimated causal predictors. Only returned if
|
empty.message |
If the empty set is returned as |
variable.analysis |
A data.frame with each predictor variables
significance as causal predictors. Will only be returned if
|
Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78.5 (2016): 947-1012.
model_analysis
calculates the accepted model.
variable_analysis
calculates the individual variables
significance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | # ===========================================================================
# An example with normal distributions
n <- 500
E <- sample(5L, n, replace = TRUE)
X <- data.frame(X1 = rnorm(n, E, 1), X2 = rnorm(n, 3 * (E %in% c(1,5)), 1))
Y <- rnorm(n, X$X1, 1) # X1 is the true parent
# Environment Relevance Test:
ICP(Y, X, E)
# Intersecting Confidence Region Test, Quadratically Constrained Solver:
ICP(Y, X, E, method = "CR")
# Intersecting Confidence Region Test, Pairwise Solver:
ICP(Y, X, E, method = "CR", solver = "pairwise")
# Intersecting Confidence Region Test, Marginal Solver:
ICP(Y, X, E, method = "CR", solver = "marginal")
# ===========================================================================
# An example with a poisson distribution
Y <- rpois(n, exp(X$X1)) # true causal is X1
# Environment Relevance Test
ICP(Y, X, E, model = "glm", family = "poisson")
# Intersecting Confidence Region Test, Quadratically Constrained Solver:
ICP(Y, X, E, model = "glm", family = "poisson", method = "CR")
# Intersecting Confidence Region Test, Pairwise Solver:
ICP(Y, X, E, model = "glm", family = "poisson",
method = "CR", solver = "pairwise")
# Intersecting Confidence Region Test, Marginal Solver:
ICP(Y, X, E, model = "glm", family = "poisson",
method = "CR", solver = "marginal")
# ===========================================================================
# An example with right censored survival times
Y <- rexp(n, exp(- 0.5 * X$X1))
C <- rexp(n, exp(- 1.5))
time <- pmin(Y, C) # trues causal is X1
status <- time == Y
# Environment Relevance Test
ICP(survival::Surv(time, status), X, E, model = "ph")
# The user may also define their own link functions, see
# ?survival::survreg.distributions
my_dist <- survival::survreg.distributions$exponential
my_dist$trans <- function(y) log(y / 365)
my_dist$dtrans <- function(y) 1 / y
my_dist$itrans <- function(y) 365 * exp(y)
ICP(survival::Surv(time, status), X, E, model = "hazard", dist = my_dist)
# this example is simply a reparametrization and therefore
# gives the same results as above.
# Intersecting Confidence Regions Test, Quadratically Constrained Solver:
ICP(survival::Surv(time, status), X, E, model = "ph", method = "CR")
# Intersecting Confidence Regions Test, Pairwise Solver:
ICP(survival::Surv(time, status), X, E, model = "ph",
method = "CR", solver = "pairwise")
# Intersecting Confidence Regions Test, Marginal Solver:
ICP(survival::Surv(time, status), X, E, model = "ph",
method = "CR", solver = "marginal")
# Non-parametric Tests of Time Varying Effect
ICP(survival::Surv(time, status), X, E, model = "ph", method = "TimeVar")
# Non-parametric Tests of Time Varying Effect with n.sim = 1000
ICP(survival::Surv(time, status), X, E, model = "ph",
method = "TimeVar", n.sim = 1000)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.