Creates a CausalFX Problem Instance

Share:

Description

Set up an object describing a causal inference problem of finding the average causal effect of some treatment on some outcome. Currently, only binary data is supported. The problem specification also allows the specification of a synthetic model, for simulation studies.

Usage

1
2
cfx(x, y, latent_idx = NULL, dat = NULL, g = NULL, model = NULL,
  num_v_max = 20)

Arguments

x

the index of the treatment variable.

y

the index of the outcome variable.

latent_idx

an array with the indices of variables which should be considered latent

dat

a matrix of binary data, can be ignored if a model is provided.

g

a binary matrix encoding a causal graph, where g[i, j] == 1 if a directed edge from vertex j to i should exist, 0 otherwise. This is only required if a ground truth model exists.

model

if g is specified, this needs to be specified too. This argument should be a list of conditional probability tables, each encoding the conditional probability of each vertex in g given its parents. Entry model[[i]] is an array of non-negative numbers, describing the probability of random variable/vertex i being equal to 1. In particular, model[[i]][j] is the conditional probability of this event given that the parents of i are in state j. States are indexed as follows. If S is the binary string corresponding to the binary values of the parents of i in g, sorted by their index, then j is given by 1 + bin2dec(S), where bin2dec is the transformation of a binary string into a decimal number.

num_v_max

the maximum dimensionality in which the joint distribution implied by a model is pre-computed. Having this pre-computed can speed up some computations for methods that use the provided ground truth model. Because the space required to store a joint distribution grows exponentially with the dimensionality, this quantity cannot be too large.

Value

A cfx object, which contains the following fields:

X_idx

the index of the treatment variable in the data/graph.

Y_idx

the index of the outcome variable in the data/graph.

latent_idx

the array of latent variable indices given as input.

data

the data given as input.

graph

the graph given as input.

varnames

an array of strings with the names of the variables, as given by data. If data has no column names or it is not provided, this is given a default value, where variable i is assigned the name "X".

model

the model given as input.

ancestrals

a list of arrays (if g is provided), where ancestrals[[i]] is the array of the indices of the ancestrals of i in g, excluding i itself.

probs

a multidimensional array (if g is provided) of the same dimensionality as g, where each entry corresponds to the probability of that particular assignment of variable values. This is NULL if the dimensionality of g is greater than num_v_max.