| generate_dag_data | R Documentation |
Generates synthetic data from a directed acyclic graph (DAG) specified as a
caugi graph object. Each node is modeled as a linear combination of its
parents plus additive Gaussian noise. Coefficients are randomly signed with
a minimum absolute value, and noise standard deviations are sampled
log-uniformly from a specified range. Custom node equations can override
automatic linear generation.
generate_dag_data(
cg,
n,
...,
standardize = TRUE,
coef_range = c(0.1, 0.9),
error_sd = c(0.3, 2),
seed = NULL
)
cg |
A |
n |
Integer. Number of observations to simulate. |
... |
Optional named node equations to override automatic linear generation. Each should be an expression referencing all parent nodes. |
standardize |
Logical. If |
coef_range |
Numeric vector of length 2 specifying the minimum and maximum
absolute value of edge coefficients. For each edge, an absolute value is sampled
uniformly from this range and then assigned a positive or negative sign with equal
probability. Must satisfy |
error_sd |
Numeric vector of length 2 specifying the minimum and maximum
standard deviation of the additive Gaussian noise at each node. For each node,
a standard deviation is sampled from a log-uniform distribution over this range.
Must satisfy |
seed |
Optional integer. Sets the random seed for reproducibility. |
A tibble of simulated data with one column per node in the DAG,
ordered according to the graph's node order. Standardization is applied
if standardize = TRUE.
The returned tibble has an attribute generating_model, which is a list containing:
sd: Named numeric vector of node-specific noise standard deviations.
coef: Named list of numeric vectors, where each element corresponds
to a child node. For a child node, the vector stores the coefficients of
its parent nodes in the linear structural equation. That is:
generating_model$coef[[child]][parent] gives the coefficient
of parent in the equation for child.
cg <- caugi::caugi(A %-->% B, B %-->% C, A %-->% C, class = "DAG")
# Simulate 1000 observations
sim_data <- generate_dag_data(
cg,
n = 1000,
coef_range = c(0.2, 0.8),
error_sd = c(0.5, 1.5)
)
head(sim_data)
attr(sim_data, "generating_model")
# Simulate with custom equation for node C
sim_data_custom <- generate_dag_data(
cg,
n = 1000,
C = A^2 + B + rnorm(n, sd = 0.7),
seed = 1405
)
head(sim_data_custom)
attr(sim_data_custom, "generating_model")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.