create_riskset: Process and Create Risk Sets for a One- and Two-Mode...
In dream: Dynamic Relational Event Analysis and Modeling

create_riskset

R Documentation

Process and Create Risk Sets for a One- and Two-Mode Relational Event Sequences

Description

This function creates one- and two-mode post-sampling eventset with options for case-control sampling (Vu et al. 2015) and sampling from the observed event sequence (Lerner and Lomi 2020). Case-control sampling samples an arbitrary m number of controls from the risk set for any event (Vu et al. 2015). Lerner and Lomi (2020) proposed sampling from the observed event sequence where observed events are sampled with probability p. Importantly, this function generates risk sets that assume that the risk set for each event is fixed across all time points, that is, all actors active at any time point across the event sequence are in the set of potential events. Users interested in generating time-varying risks sets should consult the create_riskset_dynamic function for one- and two-mode event sequences.

Usage

create_riskset(
  type = c("two-mode", "one-mode"),
  time,
  eventID,
  sender,
  receiver,
  p_samplingobserved = 1,
  n_controls,
  combine = TRUE,
  seed = 9999
)

Arguments

`type`	"two-mode" indicates that this is a two-mode event sequence. "one-mode" indicates that the event sequence is one-mode.
`time`	The vector of event time values from the observed event sequence.
`eventID`	The vector of event IDs from the observed event sequence (typically a numerical event sequence that goes from 1 to n).
`sender`	The vector of event senders from the observed event sequence.
`receiver`	The vector of event receivers from the observed event sequence.
`p_samplingobserved`	The numerical value for the probability of selection for sampling from the observed event sequence. Set to 1 by default indicating that all observed events from the event sequence will be included in the post-processing event sequence.
`n_controls`	The numerical value for the number of null event controls for each (sampled) observed event.
`combine`	TRUE/FALSE. TRUE indicates that the post-sampling (processing) event sequence should be merged with the pre-processing dataset. FALSE only returns the post-processing event sequence (that is, only the sampled events).
`seed`	The random number seed for user replication.

Details

This function processes observed events from the set E, where each event e_i is defined as:

e_{i} \in E = (s_i, r_i, t_i, G[E;t])

where:

s_i is the sender of the event.
r_i is the receiver of the event.
t_i represents the time of the event.
G[E;t] = \{e_1, e_2, \ldots, e_{t'} \mid t' < t\} is the network of past events, that is, all events that occurred prior to the current event, e_i.

Following Butts (2008) and Butts and Marcum (2017), for one-mode event sequences, the risk (support) set is defined as all possible events at time t, A_t, as the full Cartesian product of prior senders and receivers in the set G[E;t] that could have occurred at time t. Formally:

A_t = \{ (s, r) \mid s \in S \times r \in R\}

where S is the set of potential event senders and R is the set of potential event receivers. In this function, the full risk set is considered fixed across all time points.

For two-mode event sequences, the risk (support) set is defined as all possible events at time t, A_t, as the cross product of two disjoint sets, namely, prior senders and receivers, in the set G[E;t] that could have occurred at time t. Formally:

A_t = \{ (s, r) \mid s \in S \times r \in R\}

where S is the set of potential event senders and R is the set of potential event receivers. In this function, the full risk set is considered fixed across all time points.

Case-control sampling maintains the full set of observed events, that is, all events in E, and samples an arbitrary number m of non-events from the support set A_t (Vu et al. 2015; Lerner and Lomi 2020). This process generates a new support set, SA_t, for any relational event e_i contained in E given a network of past events G[E;t]. SA_t is formally defined as:

SA_t \subseteq \{ (s, r) \mid s \in S \times r \in R \}

and in the process of sampling from the observed events, n number of observed events are sampled from the set E with known probability 0 < p \le 1. More formally, sampling from the observed set generates a new set SE \subseteq E.

Value

A post-processing data.table object with the following columns:

time - The event time for the sampled and observed events.
eventID - The numerical event sequence ID for the sampled and observed events.
sender - The event senders of the sampled and observed events.
receiver - The event targets (receivers) of the sampled and observed events.
observed - Boolean indicating if the event is an observed or control event. (1 = observed; 0 = control)
sampled - Boolean indicating if the event is sampled or not sampled. (1 = sampled; 0 = not sampled)

Author(s)

Kevin A. Carson kacarson@arizona.edu, Diego F. Leal dflc@arizona.edu

References

Butts, Carter T. 2008. "A Relational Event Framework for Social Action." Sociological Methodology 38(1): 155-200.

Butts, Carter T. and Christopher Steven Marcum. 2017. "A Relational Event Approach to Modeling Behavioral Dynamics." In A. Pilny & M. S. Poole (Eds.), Group processes: Data-driven computational approaches. Springer International Publishing.

Lerner, Jürgen and Alessandro Lomi. 2020. "Reliability of relational event model estimates under sampling: How to fit a relational event model to 360 million dyadic events." Network Science 8(1): 97–135.

Vu, Duy, Philippa Pattison, and Garry Robins. 2015. "Relational event models for social learning in MOOCs." Social Networks 43: 121-135.

Examples


data("WikiEvent2018.first100k")
WikiEvent2018.first100k$time <- as.numeric(WikiEvent2018.first100k$time)
### Creating the EventSet By Employing Case-Control Sampling With M = 10 and
### Sampling from the Observed Event Sequence with P = 0.01
EventSet <- create_riskset(
  type = "two-mode",
  time = WikiEvent2018.first100k$time, # The Time Variable
  eventID = WikiEvent2018.first100k$eventID, # The Event Sequence Variable
  sender = WikiEvent2018.first100k$user, # The Sender Variable
  receiver = WikiEvent2018.first100k$article, # The Receiver Variable
  p_samplingobserved = 0.01, # The Probability of Selection
  n_controls = 10, # The Number of Controls to Sample from the Full Risk Set
  seed = 9999) # The Seed for Replication


### Creating A New EventSet with more observed events and less control events
### Sampling from the Observed Event Sequence with P = 0.02
### Employing Case-Control Sampling With M = 2
EventSet1 <- create_riskset(
  type = "two-mode",
  time = WikiEvent2018.first100k$time, # The Time Variable
  eventID = WikiEvent2018.first100k$eventID, # The Event Sequence Variable
  sender = WikiEvent2018.first100k$user, # The Sender Variable
  receiver = WikiEvent2018.first100k$article, # The Receiver Variable
  p_samplingobserved = 0.02, # The Probability of Selection
  n_controls = 2, # The Number of Controls to Sample from the Full Risk Set
  seed = 9999) # The Seed for Replication

set.seed(9999)
events <- data.frame(time = sort(rexp(1:18)),
                                eventID = 1:18,
                                sender = c("A", "B", "C",
                                           "A", "D", "E",
                                           "F", "B", "A",
                                           "F", "D", "B",
                                           "G", "B", "D",
                                          "H", "A", "D"),
                               target = c("B", "C", "D",
                                          "E", "A", "F",
                                          "D", "A", "C",
                                          "G", "B", "C",
                                          "H", "J", "A",
                                          "F", "C", "B"))

# Creating a one-mode relational risk set with p = 1.00 (all true events)
# and 5 controls
eventSet <- create_riskset( type = "two-mode",
                      time = events$time,
                      eventID = events$eventID,
                      sender = events$sender,
                      receiver = events$target,
                      p_samplingobserved = 1.00,
                      n_controls = 5,
                      seed = 9999)

dream documentation built on Feb. 7, 2026, 5:06 p.m.