View source: R/rem_onemoderiskset.R
processOMEventSeq | R Documentation |
This function creates a one-mode post-sampling eventset with options for case-control
sampling (Vu et al. 2015), sampling from the observed event sequence (Lerner and Lomi 2020), and time- or event-dependent
risk sets. Case-control sampling samples an arbitrary m number of controls from the risk set for any event
(Vu et al. 2015). Lerner and Lomi (2020) proposed sampling from the observed event sequence
where observed events are sampled with probability p. The time- and event-dependent risk sets generate risk sets where the
potential null events are based upon a specified past relational time window, such as events that have occurred in the past year.
Importantly, this function creates risk sets based upon the assumption that only actors active in past events are
in relevant for the creation of the risk set. Users interested in generating risk sets that assume all actors
active at any time point within the event sequence are in the risk set at every time point should consult the
createRemDataset
and remify
functions. Future versions of this package will
incorporate this option into the function.
processOMEventSeq(
data,
time,
eventID,
sender,
receiver,
p_samplingobserved = 1,
n_controls,
time_dependent = FALSE,
timeDV = NULL,
timeDif = NULL,
seed = 9999
)
data |
The full relational event sequence dataset. |
time |
The vector of event time values from the observed event sequence. |
eventID |
The vector of event IDs from the observed event sequence (typically a numerical event sequence that goes from 1 to n). |
sender |
The vector of event senders from the observed event sequence. |
receiver |
The vector of event receivers from the observed event sequence. |
p_samplingobserved |
The numerical value for the probability of selection for sampling from the observed event sequence. Set to 1 by default indicating that all observed events from the event sequence will be included in the post-processing event sequence. |
n_controls |
The numerical value for the number of null event controls for each (sampled) observed event. |
time_dependent |
TRUE/FALSE. TRUE indicates that a time- or event-dependent dynamic risk set will be created in which only actors involved in a user-specified relationally relevant (time or event) span (i.e., the ‘stretch’ of relational relevancy, such as one month for a time-dependent risk set or 100 events for an event-dependent risk set) are included in the potential risk set. FALSE indicates the complete set of actors involved in past events will be included in the risk set (see the details section). Set to FALSE by default. |
timeDV |
If time_dependent = TRUE, the vector of event time values that corresponds to the creation of the time- or event-dependent dynamic risk set (see the details section). This may or may not be the same vector provided to the time argument. The timeDV vector can be the same vector provided to the time argument, in which the relational time span will be based on the event timing within the dataset. In contrast, the timeDV vector can also be the vector of numerical event IDs which correspond to the number sequence of events. Moreover, the timeDV can also be another measurement that is not the time argument or a numerical event ID sequence, such as the number of days, months, years, etc. since the first event. |
timeDif |
If time_dependent = TRUE, the numerical value that represents the time or event span for the creation of the risk set (see the details section). This argument must be in the same measurement unit as the |
seed |
The random number seed for user replication. |
This function processes observed events from the set E
, where each event e_i
is
defined as:
e_{i} \in E = (s_i, r_i, t_i, G[E;t])
where:
s_i
is the sender of the event.
r_i
is the receiver of the event.
t_i
represents the time of the event.
G[E;t] = \{e_1, e_2, \ldots, e_{t'} \mid t' < t\}
is the network of past events, that is, all events that occurred prior to the current event, e_i
.
Following Butts (2008) and Butts and Marcum (2017), we define the risk (support)
set of all possible events at time t
, A_t
, as the full Cartesian
product of prior senders and receivers in the set G[E;t]
that could have
occurred at time t
. Formally:
A_t = \{ (s, r) \mid s \in G[E;t] \text{ X } r \in G[E;t] \}
where G[E;t]
is the set of events up to time t
.
Case-control sampling maintains the full set of observed events, that is, all events in E
, and
samples an arbitrary number m
of non-events from the support set A_t
(Vu et al. 2015; Lerner
and Lomi 2020). This process generates a new support set, SA_t
, for any relational event
e_i
contained in E
given a network of past events G[E;t]
. SA_t
is formally defined as:
SA_t \subseteq \{ (s, r) \mid s \in G[E;t] \text{ X } r \in G[E;t] \}
and in the process of sampling from the observed events, n
number of observed events are
sampled from the set E
with known probability 0 < p \le 1
. More formally, sampling from
the observed set generates a new set SE \subseteq E
.
A time or event-dependent dynamic risk set can be created where the set of potential events,
that is, all events in the risk set, At, is based only on the set of actors active in a
specified event or time span from the current event (e.g., such as within the past month
or within the past 100 events). In other words, the specified event or time span can be
based on either: a) a specified time span based upon the actual timing of the past events
(e.g., years, months, days or even milliseconds as in the case of Lerner and Lomi 2020),
or b) a specified number of events based on the ordering of the past events (e.g., such
as all actors involved in the past 100 events). Thus, if time- or event-dependent dynamic
risk sets are desired, the user should set time_dependent to TRUE, and then specify the
accompanying time vector, timeDV
, defined as the number of time units (e.g., days) or the
number of events since the first event. Moreover, the user should also specify the cutoff
threshold with the timeDif
value that corresponds directly to the measurement unit of
timeDV
(e.g., days). For example, let’s say you wanted to create a time-dependent dynamic
risk set that only includes actors active within the past month, then you should create a
vector of values timeDV
, which for each event represents the number of days since the first
event, and then specify timeDif
to 30. Similarly, let’s say you wanted to create an event-dependent
dynamic risk set that only includes actors involved in the past 100 events, then you should create
a vector of values timeDV
, that is, the counts of events since the first event (e.g., 1:n), and
then specify timeDif
to 100.
A post-processing data table with the following columns:
sender
- The event senders of the sampled and observed events.
receiver
- The event targets (receivers) of the sampled and observed events.
time
- The event time for the sampled and observed events.
sequenceID
- The numerical event sequence ID for the sampled and observed events.
observed
- Boolean indicating if the event is a sampled event or observed event. (1 = observed; 0 = sampled)
Kevin A. Carson kacarson@arizona.edu, Diego F. Leal dflc@arizona.edu
Butts, Carter T. 2008. "A Relational Event Framework for Social Action." Sociological Methodology 38(1): 155-200.
Butts, Carter T. and Christopher Steven Marcum. 2017. "A Relational Event Approach to Modeling Behavioral Dynamics." In A. Pilny & M. S. Poole (Eds.), Group processes: Data-driven computational approaches. Springer International Publishing.
Lerner, Jürgen and Alessandro Lomi. 2020. "Reliability of relational event model estimates under sampling: How to fit a relational event model to 360 million dyadic events." Network Science 8(1): 97–135.
Vu, Duy, Philippa Pattison, and Garry Robins. 2015. "Relational event models for social learning in MOOCs." Social Networks 43: 121-135.
# A random one-mode relational event sequence
set.seed(9999)
events <- data.frame(time = sort(rexp(1:18)),
eventID = 1:18,
sender = c("A", "B", "C",
"A", "D", "E",
"F", "B", "A",
"F", "D", "B",
"G", "B", "D",
"H", "A", "D"),
target = c("B", "C", "D",
"E", "A", "F",
"D", "A", "C",
"G", "B", "C",
"H", "J", "A",
"F", "C", "B"))
# Creating a one-mode relational risk set with p = 1.00 (all true events)
# and 5 controls
eventSet <- processOMEventSeq(data = events,
time = events$time,
eventID = events$eventID,
sender = events$sender,
receiver = events$target,
p_samplingobserved = 1.00,
n_controls = 5,
seed = 9999)
# Creating a event-dependent one-mode relational risk set with p = 1.00 (all
# true events) and 3 controls based upon the past 5 events prior to the current event.
events$timeseq <- 1:nrow(events)
eventSetT <- processOMEventSeq(data = events,
time = events$time,
eventID = events$eventID,
sender = events$sender,
receiver = events$target,
p_samplingobserved = 1.00,
time_dependent = TRUE,
timeDV = events$timeseq,
timeDif = 5,
n_controls = 3,
seed = 9999)
# Creating a time-dependent one-mode relational risk set with p = 1.00 (all
# true events) and 3 controls based upon the past 0.40 time units.
eventSetT <- processOMEventSeq(data = events,
time = events$time,
eventID = events$eventID,
sender = events$sender,
receiver = events$target,
p_samplingobserved = 1.00,
time_dependent = TRUE,
timeDV = events$time, #the original time variable
timeDif = 0.40, #time difference of 0.40 units
n_controls = 3,
seed = 9999)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.