node_mixture | R Documentation |
This node type allows users to apply different nodes to different subsets of the already generated data, making it possible to generate data for arbitrary mixture distributions. It is similar to node_conditional_distr
and node_conditional_prob
, with the main difference being that the former only allow univariate distributions conditional on categorical variables, while this function allows any kind of node definition and condition. This makes it, for example, possible to generate data for a variable from different regression models for different subsets of simulated individuals.
node_mixture(data, parents, name, distr, default=NA)
data |
A |
parents |
A character vector specifying the names of the parents that this particular child node has. This vector should include all nodes that are used in the conditions and the |
name |
A single character string specifying the name of the node. |
distr |
A unnamed list that specifies both the conditions and the |
default |
A single value of some kind, used as a default value for those individuals not covered by all the conditions defined in |
Internally, the data is generated by extracting only the relevant part of the already generated data
as defined by the condition and using node
function to generate the new response-part. This generation is done in the order in which the distr
was specified, meaning that data for the first condition is checked first and so on. There are no safeguards to guarantee that the conditions do not overlap. For example, users are free to set the first condition to something like A > 10
and the next one to A > 11
, in which case the value for every individual with A > 11
is generated twice (first with the first specification, secondly with the next specification). In this case, only the last generated value is retained.
Note that it is also possible to use the mixture node itself inside the conditions or node
calls in distr
, because it is directly added to the data
before the first condition is applied (by setting everyone to the default
value). See examples.
Additionally, because the output of each of the parts of the mixture distributions is forced into one vector, they might be coerced from one class to another, depending on the input to distr
and the order used. This also needs to be taken care of by the user.
Returns a vector of length nrow(data)
. The class of the vector is determined by what is specified in distr
.
Robin Denz
empty_dag
, node
, node_td
, sim_from_dag
, sim_discrete_time
library(simDAG)
set.seed(1234)
## different linear regression models per level of a different covariate
# here, A is the group that is used for the conditioning, B is a predictor
# and Y is the mixture distributed outcome
dag <- empty_dag() +
node("A", type="rbernoulli") +
node("B", type="rnorm") +
node("Y", type="mixture", parents="A",
distr=list(
"A==0", node(".", type="gaussian", formula= ~ -2 + B*2, error=1),
"A==1", node(".", type="gaussian", formula= ~ 3 + B*5, error=1)
))
data <- sim_from_dag(dag, n_sim=100)
head(data)
# also works with multiple conditions
dag <- empty_dag() +
node(c("A", "C"), type="rbernoulli") +
node("B", type="rnorm") +
node("Y", type="mixture", parents=c("A", "C"),
distr=list(
"A==0 & C==1", node(".", type="gaussian", formula= ~ -2 + B*2, error=1),
"A==1", node(".", type="gaussian", formula= ~ 3 + B*5, error=1)
))
data <- sim_from_dag(dag, n_sim=100)
head(data)
# using the mixture node itself in the condition
# see cookbook vignette, section on outliers for more info
dag <- empty_dag() +
node(c("A", "B", "C"), type="rnorm") +
node("Y", type="mixture", parents=c("A", "B", "C"),
distr=list(
"TRUE", node(".", type="gaussian", formula= ~ -2 + A*0.1 + B*1 + C*-2,
error=1),
"Y > 2", node(".", type="rnorm", mean=10000, sd=500)
))
data <- sim_from_dag(dag, n_sim=100)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.