net | R Documentation |
DAG
This function may be used in the formula
of nodes in which the value of the observation of one individual are dependent on its' neighbors in a defined static network
or dynamic network_td
. Given the network and a previously generated variable, net()
aggregates data of the neighbors according to an arbitrary function under the hood. The resulting variable can then be used directly in a formula
.
net(expr, net=NULL, mode="all", order=1,
mindist=0, na=NA)
expr |
Any R expression, usually containing one or more previously generated variables, that returns one numeric value given a vector, such as |
net |
A single character string specifying the name of the network that should be used to define the neighbors of an observation. If only one network is present in the |
mode |
A single character, specifying how to use the direction of the edges if a directed network is supplied (ignored otherwise). If |
order |
A single integer giving the order of the neighborhood. If |
mindist |
A single integer >= 0, specifying the minimum distance the neighbors needs to have to an observation to be considered neighbors. Only makes sense with |
na |
A single value assigned to the variable if |
How it works:
Internally the following procedure is used whenever a net()
function call is included in a formula
of a node
(regardless of whether time-fixed or time-dependent). First, the associated network (defined using the net
argument) is used to identify the neighbors of each observation. Every vertex that is directly connected to an observation is considered its' neighbor. The parent variable(s) specified in the net()
call are then aggregated over these neighbors using the given expr
. A simple example: consider observation 1
with four neighbors named 2, 5, 8
and 10
. The formula
contains the following net()
call: net(sum(infected))
. The value of the infected
variable is 0, 0, 1, 1 for persons 2, 5, 8
and 10
respectively. These values are then summed up to result in a value of 2 for person 1
. The same is done for every person in the simulated data. The resulting variable is then used as-is in the simulation.
Supported inputs:
Any function that returns a single (usually numeric) value, given the neighbors' values can be used. It is therefore also possible to make the simulation dependent on specific neighbors only. For example, using infected[1]
instead of sum(infected)
would return a value of 0 for observation 1 in the above example, because person 2
is the first neighbor and has a value of 0. Note that the internally used variable named ..neighbor..
includes the ids of the neighbors. The entire expr
is evaluated in a data.table call of the form: data[, .(variable = eval(expr)), by=id]
, making it also possible to use any data.table syntax such as .N
(which would return the number of neighbors a person has).
Specifying parents:
Whenever a net()
call is used in a formula
, we recommend specifying the parents
argument of the node as well. The reason for this recommendation is, that it is sometimes difficult to identify which variables are used in net()
calls, depending on the expr
. This may cause issues if a DAG
is not specified in a topologically sorted manner and users rely on the sort_dag
argument of sim_from_dag
to re-order the variables. Specifying the parents
ensures that this issue cannot occur.
A small warning:
Note that it never really makes sense to use this function outside of a formula
argument: if you look at its source code you will realize that it does not actually do anything, except returning its input. It is only a piece of syntax for the formula
interface. Please consult the network
documentation page or the associated vignette for more information.
"Returns" a vector of length n_sim
when used properly in a sim_from_dag
or sim_discrete_time
call. Returns a list of its input when used outside formula
.
Robin Denz
library(igraph)
library(data.table)
library(simDAG)
# define a random network for illustration, with 10 vertices
set.seed(234)
g <- igraph::sample_smallworld(1, 10, 2, 0.5)
# a simple dag containing only two time-constant variables and the network
dag <- empty_dag() +
node("A", type="rnorm", mean=0, sd=1) +
node("B", type="rbernoulli", p=0.5) +
network("Net1", net=g)
# using the mean of A of each observations neighbor in a linear model
dag2 <- dag +
node("Y", type="gaussian", formula= ~ -2 + net(mean(A))*4, error=1)
# using an indicator of whether any of an observations neighbors has
# a 1 in B in a linear model
dag3 <- dag +
node("Y", type="gaussian", formula= ~ 1.5 + net(as.numeric(any(B==1)))*3,
error=1.2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.