Getting Started

library(CausalQueries)
library(dplyr)
library(knitr)

Make a model

Generating: To make a model you need to provide a DAG statement to make_model. For instance

# examples of models
xy_model <- make_model("X -> Y")
iv_model <- make_model("Z -> X -> Y <-> X")

Graphing: Once you have made a model you can inspect the DAG:

plot(xy_model)

Simple model

Simple summaries: You can access a simple summary using summary()

summary(xy_model)
#> 
#> Causal statement: 
#> X -> Y
#> 
#> Nodal types: 
#> 
#> Nodal types for X:
#> 0  1
#> 
#> Nodal types for Y:
#> 00  10  01  11
#> 
#> Guide to interpreting nodal types for Y:
#> 
#>   index  interpretation
#> 1    *-  Y = * if X = 0
#> 2    -*  Y = * if X = 1
#> 
#> Number of nodal types by node:
#> X Y 
#> 2 4 
#> 
#> Number of causal types:  8
#> 
#> Note: Model does not contain: posterior_distribution, stan_objects;
#> to include these objects use update_model()
#> 
#> Note: To pose causal queries of this model use query_model()

or you can examine model details using inspect().

Inspecting: The model has a set of parameters and a default distribution over these.

xy_model |> inspect("parameters_df")
#> 
#> parameters_df
#> Mapping of model parameters to nodal types: 
#> 
#>   param_names: name of parameter
#>   node:        name of endogeneous node associated
#>                with the parameter
#>   gen:         partial causal ordering of the
#>                parameter's node
#>   param_set:   parameter groupings forming a simplex
#>   given:       if model has confounding gives
#>                conditioning nodal type
#>   param_value: parameter values
#>   priors:      hyperparameters of the prior
#>                Dirichlet distribution 
#> 
#>   param_names node gen param_set nodal_type given param_value priors
#> 1         X.0    X   1         X          0              0.50      1
#> 2         X.1    X   1         X          1              0.50      1
#> 3        Y.00    Y   2         Y         00              0.25      1
#> 4        Y.10    Y   2         Y         10              0.25      1
#> 5        Y.01    Y   2         Y         01              0.25      1
#> 6        Y.11    Y   2         Y         11              0.25      1

Tailoring: These features can be edited using set_restrictions, set_priors and set_parameters.

Here is an example of setting a monotonicity restriction (see ?set_restrictions for more):

iv_model <-
  iv_model |> set_restrictions(decreasing('Z', 'X'))

Here is an example of setting priors (see ?set_priors for more):

iv_model <-
  iv_model |> set_priors(distribution = "jeffreys")
#> Altering all parameters.

Simulation: Data can be drawn from a model like this:

data <- make_data(iv_model, n = 4)

data |> kable()

| Z| X| Y| |--:|--:|--:| | 0| 1| 1| | 1| 0| 0| | 1| 0| 1| | 1| 1| 1|

Update the model

Updating: Update using update_model. You can pass all rstan arguments to update_model.

df <-
  data.frame(X = rbinom(100, 1, .5)) |>
  mutate(Y = rbinom(100, 1, .25 + X*.5))

xy_model <-
  xy_model |>
  update_model(df, refresh = 0)

Inspecting: You can access the posterior distribution on model parameters directly thus:

xy_model |> grab("posterior_distribution") |>
  head() |> kable()

| X.0| X.1| Y.00| Y.10| Y.01| Y.11| |---------:|---------:|---------:|---------:|---------:|---------:| | 0.5237547| 0.4762453| 0.1948964| 0.0523208| 0.5664182| 0.1863645| | 0.4261285| 0.5738715| 0.0597984| 0.1743038| 0.6544728| 0.1114249| | 0.5796467| 0.4203533| 0.1538045| 0.1640282| 0.4844825| 0.1976849| | 0.5133653| 0.4866347| 0.0667460| 0.1497083| 0.5849814| 0.1985644| | 0.5559260| 0.4440740| 0.1106234| 0.1599240| 0.6523280| 0.0771246| | 0.5738242| 0.4261758| 0.0211059| 0.3386846| 0.5650897| 0.0751198|

where each row is a draw of parameters.

Query the model

Arbitrary queries

Querying: You ask arbitrary causal queries of the model.

Examples of unconditional queries:

xy_model |>
  query_model("Y[X=1] > Y[X=0]",
              using = c("priors", "posteriors"))
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |label           |using      |  mean|    sd| cred.low| cred.high|
#> |:---------------|:----------|-----:|-----:|--------:|---------:|
#> |Y[X=1] > Y[X=0] |priors     | 0.249| 0.195|    0.009|     0.705|
#> |Y[X=1] > Y[X=0] |posteriors | 0.529| 0.102|    0.313|     0.705|

This query asks the probability that $Y(1)> Y(0)$.

Examples of conditional queries:

xy_model |>
  query_model("Y[X=1] > Y[X=0] :|: X == 1 & Y == 1", using = c("priors", "posteriors"))
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |label                                 |using      |  mean|    sd| cred.low| cred.high|
#> |:-------------------------------------|:----------|-----:|-----:|--------:|---------:|
#> |Y[X=1] > Y[X=0] given X == 1 & Y == 1 |priors     | 0.499| 0.283|    0.023|     0.970|
#> |Y[X=1] > Y[X=0] given X == 1 & Y == 1 |posteriors | 0.751| 0.134|    0.479|     0.978|

This query asks the probability that $Y(1) > Y(0)$ given $X=1$ and $Y=1$; it is a type of "causes of effects" query. Note that ":|:" is used to separate the main query element from the conditional statement to avoid ambiguity, since "|" is reserved for the "or" operator.

Queries can even be conditional on counterfactual quantities. Here the probability of a positive effect given some effect:

xy_model |>
  query_model("Y[X=1] > Y[X=0] :|: Y[X=1] != Y[X=0]",
              using = c("priors", "posteriors"))
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |label                                  |using      |  mean|   sd| cred.low| cred.high|
#> |:--------------------------------------|:----------|-----:|----:|--------:|---------:|
#> |Y[X=1] > Y[X=0] given Y[X=1] != Y[X=0] |priors     | 0.492| 0.29|    0.027|     0.974|
#> |Y[X=1] > Y[X=0] given Y[X=1] != Y[X=0] |posteriors | 0.809| 0.09|    0.648|     0.980|

Note that we use ":" to separate the base query from the condition rather than "|" to avoid confusion with logical operators.

Output

Query output is ready for printing as tables, but can also be plotted, which is especially useful with batch requests:

batch_queries <- xy_model |>
  query_model(queries = list(ATE = "Y[X=1] - Y[X=0]",
                             `Positive effect given any effect` = "Y[X=1] > Y[X=0] :|: Y[X=1] != Y[X=0]"),
              using = c("priors", "posteriors"),
              expand_grid = TRUE)

batch_queries |> kable(digits = 2, caption = "tabular output")

Table: tabular output

|label |query |given |using |case_level | mean| sd| cred.low| cred.high| |:--------------------------------|:---------------|:----------------|:----------|:----------|----:|----:|--------:|---------:| |ATE |Y[X=1] - Y[X=0] |- |priors |FALSE | 0.00| 0.31| -0.64| 0.63| |ATE |Y[X=1] - Y[X=0] |- |posteriors |FALSE | 0.39| 0.09| 0.21| 0.56| |Positive effect given any effect |Y[X=1] > Y[X=0] |Y[X=1] != Y[X=0] |priors |FALSE | 0.50| 0.29| 0.02| 0.97| |Positive effect given any effect |Y[X=1] > Y[X=0] |Y[X=1] != Y[X=0] |posteriors |FALSE | 0.81| 0.09| 0.65| 0.98|

batch_queries |> plot()

Simple query



Try the CausalQueries package in your browser

Any scripts or data that you put into this service are public.

CausalQueries documentation built on Jan. 29, 2026, 5:07 p.m.