View source: R/createriskset.R
| create_riskset | R Documentation |
This function creates one- and two-mode post-sampling eventset with options for case-control
sampling (Vu et al. 2015) and sampling from the observed event sequence (Lerner and Lomi 2020). Case-control
sampling samples an arbitrary m number of controls from the risk set for any event
(Vu et al. 2015). Lerner and Lomi (2020) proposed sampling from the observed event sequence
where observed events are sampled with probability p. Importantly, this function generates risk sets
that assume that the risk set for each event is fixed across all time points, that is, all actors active
at any time point across the event sequence are in the set of potential events. Users interested in
generating time-/event-varying risks sets should consult the processOMEventSeq function
for one-mode event sequences and the processTMEventSeq function for two-mode event
sequences. Future versions of the dream package will incorporate this option into this function in a
principled manner.
create_riskset(
type = c("two-mode", "one-mode"),
time,
eventID,
sender,
receiver,
p_samplingobserved = 1,
n_controls,
combine = TRUE,
seed = 9999
)
type |
"two-mode" indicates that this is a two-mode event sequence. "one-mode" indicates that the event sequence is one-mode. |
time |
The vector of event time values from the observed event sequence. |
eventID |
The vector of event IDs from the observed event sequence (typically a numerical event sequence that goes from 1 to n). |
sender |
The vector of event senders from the observed event sequence. |
receiver |
The vector of event receivers from the observed event sequence. |
p_samplingobserved |
The numerical value for the probability of selection for sampling from the observed event sequence. Set to 1 by default indicating that all observed events from the event sequence will be included in the post-processing event sequence. |
n_controls |
The numerical value for the number of null event controls for each (sampled) observed event. |
combine |
TRUE/FALSE. TRUE indicates that the post-sampling (processing) event sequence should be merged with the pre-processing dataset. FALSE only returns the post-processing event sequence (that is, only the sampled events). |
seed |
The random number seed for user replication. |
This function processes observed events from the set E, where each event e_i is
defined as:
e_{i} \in E = (s_i, r_i, t_i, G[E;t])
where:
s_i is the sender of the event.
r_i is the receiver of the event.
t_i represents the time of the event.
G[E;t] = \{e_1, e_2, \ldots, e_{t'} \mid t' < t\} is the network of past events, that is, all events that occurred prior to the current event, e_i.
Following Butts (2008) and Butts and Marcum (2017), for one-mode event sequences, the risk (support)
set is defined as all possible events at time t, A_t, as the full Cartesian
product of prior senders and receivers in the set G[E;t] that could have
occurred at time t. Formally:
A_t = \{ (s, r) \mid s \in S \times r \in R\}
where S is the set of potential event senders and R is the set of potential event receivers. In this function,
the full risk set is considered fixed across all time points.
For two-mode event sequences, the risk (support) set is defined as all possible
events at time t, A_t, as the cross product of two disjoint sets, namely, prior senders and receivers,
in the set G[E;t] that could have occurred at time t. Formally:
A_t = \{ (s, r) \mid s \in S \times r \in R\}
where S is the set of potential event senders and R is the set of potential event receivers. In this function,
the full risk set is considered fixed across all time points.
Case-control sampling maintains the full set of observed events, that is, all events in E, and
samples an arbitrary number m of non-events from the support set A_t (Vu et al. 2015; Lerner
and Lomi 2020). This process generates a new support set, SA_t, for any relational event
e_i contained in E given a network of past events G[E;t]. SA_t is formally defined as:
SA_t \subseteq \{ (s, r) \mid s \in S \times r \in R \}
and in the process of sampling from the observed events, n number of observed events are
sampled from the set E with known probability 0 < p \le 1. More formally, sampling from
the observed set generates a new set SE \subseteq E.
A post-processing data.table object with the following columns:
time - The event time for the sampled and observed events.
eventID - The numerical event sequence ID for the sampled and observed events.
sender - The event senders of the sampled and observed events.
receiver - The event targets (receivers) of the sampled and observed events.
observed - Boolean indicating if the event is an observed or control event. (1 = observed; 0 = control)
sampled - Boolean indicating if the event is sampled or not sampled. (1 = sampled; 0 = not sampled)
Kevin A. Carson kacarson@arizona.edu, Diego F. Leal dflc@arizona.edu
Butts, Carter T. 2008. "A Relational Event Framework for Social Action." Sociological Methodology 38(1): 155-200.
Butts, Carter T. and Christopher Steven Marcum. 2017. "A Relational Event Approach to Modeling Behavioral Dynamics." In A. Pilny & M. S. Poole (Eds.), Group processes: Data-driven computational approaches. Springer International Publishing.
Lerner, Jürgen and Alessandro Lomi. 2020. "Reliability of relational event model estimates under sampling: How to fit a relational event model to 360 million dyadic events." Network Science 8(1): 97–135.
Vu, Duy, Philippa Pattison, and Garry Robins. 2015. "Relational event models for social learning in MOOCs." Social Networks 43: 121-135.
data("WikiEvent2018.first100k")
WikiEvent2018.first100k$time <- as.numeric(WikiEvent2018.first100k$time)
### Creating the EventSet By Employing Case-Control Sampling With M = 10 and
### Sampling from the Observed Event Sequence with P = 0.01
EventSet <- create_riskset(
type = "two-mode",
time = WikiEvent2018.first100k$time, # The Time Variable
eventID = WikiEvent2018.first100k$eventID, # The Event Sequence Variable
sender = WikiEvent2018.first100k$user, # The Sender Variable
receiver = WikiEvent2018.first100k$article, # The Receiver Variable
p_samplingobserved = 0.01, # The Probability of Selection
n_controls = 10, # The Number of Controls to Sample from the Full Risk Set
seed = 9999) # The Seed for Replication
### Creating A New EventSet with more observed events and less control events
### Sampling from the Observed Event Sequence with P = 0.02
### Employing Case-Control Sampling With M = 2
EventSet1 <- create_riskset(
type = "two-mode",
time = WikiEvent2018.first100k$time, # The Time Variable
eventID = WikiEvent2018.first100k$eventID, # The Event Sequence Variable
sender = WikiEvent2018.first100k$user, # The Sender Variable
receiver = WikiEvent2018.first100k$article, # The Receiver Variable
p_samplingobserved = 0.02, # The Probability of Selection
n_controls = 2, # The Number of Controls to Sample from the Full Risk Set
seed = 9999) # The Seed for Replication
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.