View source: R/node_binomial.r
| node_binomial | R Documentation |
Data from the parents is used to generate the node using logistic regression by predicting the covariate specific probability of 1 and sampling from a Bernoulli distribution accordingly. Allows inclusion of arbitrary random effects and slopes.
node_binomial(data, parents, formula=NULL, betas, intercept,
return_prob=FALSE, output="logical", labels=NULL,
var_corr=NULL)
data |
A |
parents |
A character vector specifying the names of the parents that this particular child node has. If non-linear combinations or interaction effects should be included, the user may specify the |
formula |
An optional |
betas |
A numeric vector with length equal to |
intercept |
A single number specifying the intercept that should be used when generating the node. |
return_prob |
Either |
output |
A single character string, must be either |
labels |
A character vector of length 2 or |
var_corr |
Variances and covariances for random effects. Only used when |
Using the normal form a logistic regression model, the observation specific event probability is generated for every observation in the dataset. Using the rbernoulli function, this probability is then used to take one bernoulli sample for each observation in the dataset. If only the probability should be returned return_prob should be set to TRUE.
Formal Description:
Formally, the data generation can be described as:
Y \sim Bernoulli(logit(\texttt{intercept} + \texttt{parents}_1 \cdot \texttt{betas}_1 + ... + \texttt{parents}_n \cdot \texttt{betas}_n)),
where Bernoulli(p) denotes one Bernoulli trial with success probability p, n is the number of parents (length(parents)) and the logit(x) function is defined as:
logit(x) = ln(\frac{x}{1-x}).
For example, given intercept=-15, parents=c("A", "B") and betas=c(0.2, 1.3) the data generation process is defined as:
Y \sim Bernoulli(logit(-15 + A \cdot 0.2 + B \cdot 1.3)).
Output Format:
By default this function returns a logical vector containing only TRUE and FALSE entries, where TRUE corresponds to an event and FALSE to no event. This may be changed by using the output and labels arguments. The last three arguments of this function are ignored if return_prob is set to TRUE.
Random Effects and Random Slopes:
This function also allows users to include arbitrary amounts of random slopes and random effects using the formula argument. If this is done, the formula, and data arguments are passed to the variables of the same name in the makeGlmer function of the simr package. The fixef argument of that function will be passed the numeric vector c(intercept, betas) and the VarCorr argument receives the var_corr argument as input. If used as a node type in a DAG, all of this is taken care of behind the scenes. Users can simply use the regular enhanced formula interface of the node function to define these formula terms, as shown in detail in the formula vignette (vignette(topic="v_using_formulas", package="simDAG")). Please consult that vignette for examples. Also, please note that inclusion of random effects or random slopes usually results in significantly longer computation times.
Returns a logical vector (or numeric vector if return_prob=TRUE) of length nrow(data).
Robin Denz
empty_dag, node, node_td, sim_from_dag, sim_discrete_time
library(simDAG)
set.seed(5425)
# define needed DAG
dag <- empty_dag() +
node("age", type="rnorm", mean=50, sd=4) +
node("sex", type="rbernoulli", p=0.5) +
node("smoking", type="binomial", parents=c("age", "sex"),
betas=c(1.1, 0.4), intercept=-2)
# define the same DAG, but using a pretty formula
dag <- empty_dag() +
node("age", type="rnorm", mean=50, sd=4) +
node("sex", type="rbernoulli", p=0.5) +
node("smoking", type="binomial",
formula= ~ -2 + age*1.1 + sexTRUE*0.4)
# simulate data from it
sim_dat <- sim_from_dag(dag=dag, n_sim=100)
# returning only the estimated probability instead
dag <- empty_dag() +
node("age", type="rnorm", mean=50, sd=4) +
node("sex", type="rbernoulli", p=0.5) +
node("smoking", type="binomial", parents=c("age", "sex"),
betas=c(1.1, 0.4), intercept=-2, return_prob=TRUE)
sim_dat <- sim_from_dag(dag=dag, n_sim=100)
## an example using a random effect
if (requireNamespace("simr")) {
library(simr)
dag_mixed <- empty_dag() +
node("School", type="rcategorical", probs=rep(0.1, 10),
labels=LETTERS[1:10]) +
node("Age", type="rnorm", mean=12, sd=2) +
node("Grade", type="binomial", formula= ~ -10 + Age*1.2 + (1|School),
var_corr=0.3)
sim_dat <- sim_from_dag(dag=dag_mixed, n_sim=100)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.