causal_direction: Determine the causal direction between 2 variables
In IyarLin/orientDAG: Orient DAG edges

Description Usage Arguments Details Value Examples

View source: R/causal_direction.R

causal_direction determines the causal direction between 2 variables based on input measurements assuming a causal relationship exists and there are no hidden confounders.

1	causal_direction(vec_1, vec_2, continuous_thresh, discrete_thresh)

`vec_1`	Measurements of the first variable
`vec_2`	Measurements of the second variable numeric variables can be costly in time. For this reason one can cap the number of measurements used for it using this argument.
`continuous_thresh`	minimum absolute sum magnitude required to re-orient a continuous-continuous pair edge
`discrete_thresh`	minimum absolute distance correlation magnitude required to re-orient a discrete-continuous/discrete pair edge

Depending on the 2 variables encoding (each is either numeric or discrete) a specific method is dispatched to determine the causal direction between them. When the 2 variables are continuous, we can use several the general correlation measure and related criteria by calling some0pairs (see also Vinod 2017)

When the 2 variables are discrete, we can use the distance correlation measure by calling dcor (see also Liu and Chan 2016).

When one of the variables is discrete, and the other is continuous we can discretisize the continuous variable by calling discretize and use the method for two discrete variables.

A string denoting whether vec_1 causes vec_2 or vice versa

library(orientDAG)
library(dagitty)
library(simMixedDAG)
library(carData)
library(bnlearn)

# load dataset and define underlying DAG
data("GSSvocab")
GSSvocab <- GSSvocab %>%
  filter(complete.cases(.)) %>%
  mutate(year = as.numeric(as.character(year)))

true_dag_dagitty <- dagitty("dag{
                            age -> educGroup;
                            age -> nativeBorn;
                            nativeBorn -> ageGroup;
                            nativeBorn -> vocab;
                            educ -> age;
                            educ -> gender;
                            educ -> year;
                            vocab -> gender;
                            vocab -> year
                            }")

# DAG adjacency matrix representation for distance calculations
true_dag <- dagitty_to_adjmatrix(true_dag_dagitty)

# Fit a non-parametric DAG model 
non_param_dag_model <- non_parametric_dag_model(true_dag_dagitty, GSSvocab)

# Generate a dataset from the above model
sim_data <- sim_mixed_dag(non_param_dag_model, N = 20000)

# First pass - estimate DAG using bnlearn::tabu function
est_dag <- tabu(sim_data)
est_dag <- bn_to_adjmatrix(est_dag)
est_dag <- est_dag[
  match(rownames(true_dag), rownames(est_dag)),
  match(colnames(true_dag), colnames(est_dag))
  ]
tabu_dist <- dag_dist(true_dag, est_dag, distance_measure = "sid")
tabu_dist

# Improve on our first pass by re-orienting edges using the orient_dag function

est_dag_orient_dag <- orient_dag(
  adjmatrix = est_dag,
  x = sim_data, 
  max_continuous_pairs_sample = 5000) # continuous pairs re-orientation takes time so sample size is kept small)
orient_dag_dist <- orientDAG::dag_dist(true_dag, est_dag_orient_dag, distance_measure = "sid")
orient_dag_dist