processTMEventSeq: Process and Create Risk Sets for a Two-Mode Relational Event...

View source: R/rem_twomoderiskset.R

processTMEventSeqR Documentation

Process and Create Risk Sets for a Two-Mode Relational Event Sequence

Description

This function creates a two-mode post-sampling eventset with options for case-control sampling (Vu et al. 2015), sampling from the observed event sequence (Lerner and Lomi 2020), and time- or event-dependent risk sets. Case-control sampling samples an arbitrary m number of controls from the risk set for any event (Vu et al. 2015). Lerner and Lomi (2020) proposed sampling from the observed event sequence where observed events are sampled with probability p. The time- and event-dependent risk sets generate risk sets where the potential null events are based upon a specified past relational time window, such as events that have occurred in the past month. Users interested in generating risk sets that assume all actors active at any time point within the event sequence are in the risk set at every time point should consult the createRemDataset and remify functions. Future versions of this package will incorporate this option into the function.

Usage

processTMEventSeq(
  data,
  time,
  eventID,
  sender,
  receiver,
  p_samplingobserved = 1,
  n_controls,
  time_dependent = FALSE,
  timeDV = NULL,
  timeDif = NULL,
  seed = 9999
)

Arguments

data

The full relational event sequence dataset.

time

The vector of event time values from the observed event sequence.

eventID

The vector of event IDs from the observed event sequence (typically a numerical event sequence that goes from 1 to n).

sender

The vector of event senders from the observed event sequence.

receiver

The vector of event receivers from the observed event sequence.

p_samplingobserved

The numerical value for the probability of selection for sampling from the observed event sequence. Set to 1 by default indicating that all observed events from the event sequence will be included in the post-processing event sequence.

n_controls

The numerical value for the number of null event controls for each (sampled) observed event.

time_dependent

TRUE/FALSE. TRUE indicates that a time- or event-dependent dynamic risk set will be created in which only actors involved in a user-specified relationally relevant (time or event) span (i.e., the ‘stretch’ of relational relevancy, such as one month for a time-dependent risk set or 100 events for an event-dependent risk set) are included in the potential risk set. FALSE indicates the complete set of actors involved in past events will be included in the risk set (see the details section). Set to FALSE by default.

timeDV

If time_dependent = TRUE, the vector of event time values that corresponds to the creation of the time- or event-dependent dynamic risk set (see the details section). This may or may not be the same vector provided to the time argument. The timeDV vector can be the same vector provided to the time argument, in which the relational time span will be based on the event timing within the dataset. In contrast, the timeDV vector can also be the vector of numerical event IDs which correspond to the number sequence of events. Moreover, the timeDV can also be another measurement that is not the time argument or a numerical event ID sequence, such as the number of days, months, years, etc. since the first event.

timeDif

If time_dependent = TRUE, the numerical value that represents the time or event span for the creation of the risk set (see the details section). This argument must be in the same measurement unit as the timeDV argument. For instance, in an event-dependent dynamic risk set, if timeDV is the number of events since the first event (i.e., a numerical event ID sequence) and only those actors involved in the past, say, 100 events, are considered relationally relevant for the creation of the null events for the current observed event, then timeDIF should be set to 100. In the time-dependent dynamic risk set case, let’s say that only those actors involved in events that occurred in the past month are considered relationally relevant for the risk set. Let’s also assume that the timeDV vector is measured in the number of days since the first event. Then timeDif should be set to 30 in this particular case.

seed

The random number seed for user replication.

Details

This function processes observed events from the set E, where each event e_i is defined as:

e_{i} \in E = (s_i, r_i, t_i, G[E;t])

where:

  • s_i is the sender of the event.

  • r_i is the receiver of the event.

  • t_i represents the time of the event.

  • G[E;t] = \{e_1, e_2, \ldots, e_{t'} \mid t' < t\} is the network of past events, that is, all events that occurred prior to the current event, e_i.

Following Butts (2008) and Butts and Marcum (2017), we define the risk (support) set of all possible events at time t, A_t, as the cross product of two disjoint sets, namely, prior senders and receivers, in the set G[E;t] that could have occurred at time t. Formally:

A_t = \{ (s, r) \mid s \in G[E;t] \text{ X } r \in G[E;t] \}

where G[E;t] is the set of events up to time t.

Case-control sampling maintains the full set of observed events, that is, all events in E, and samples an arbitrary number m of non-events from the support set A_t (Vu et al. 2015; Lerner and Lomi 2020). This process generates a new support set, SA_t, for any relational event e_i contained in E given a network of past events G[E;t]. SA_t is formally defined as:

SA_t \subseteq \{ (s, r) \mid s \in G[E;t] \text{ X } r \in G[E;t] \}

and in the process of sampling from the observed events, n number of observed events are sampled from the set E with known probability 0 < p \le 1. More formally, sampling from the observed set generates a new set SE \subseteq E.

A time or event-dependent dynamic risk set can be created where the set of potential events, that is, all events in the risk set, At, is based only on the set of actors active in a specified event or time span from the current event (e.g., such as within the past month or within the past 100 events). In other words, the specified event or time span can be based on either: a) a specified time span based upon the actual timing of the past events (e.g., years, months, days or even milliseconds as in the case of Lerner and Lomi 2020), or b) a specified number of events based on the ordering of the past events (e.g., such as all actors involved in the past 100 events). Thus, if time- or event-dependent dynamic risk sets are desired, the user should set time_dependent to TRUE, and then specify the accompanying time vector, timeDV, defined as the number of time units (e.g., days) or the number of events since the first event. Moreover, the user should also specify the cutoff threshold with the timeDif value that corresponds directly to the measurement unit of timeDV (e.g., days). For example, let’s say you wanted to create a time-dependent dynamic risk set that only includes actors active within the past month, then you should create a vector of values timeDV, which for each event represents the number of days since the first event, and then specify timeDif to 30. Similarly, let’s say you wanted to create an event-dependent dynamic risk set that only includes actors involved in the past 100 events, then you should create a vector of values timeDV, that is, the counts of events since the first event (e.g., 1:n), and then specify timeDif to 100.

Value

A post-processing data table with the following columns:

  • sender - The event senders of the sampled and observed events.

  • receiver - The event targets (receivers) of the sampled and observed events.

  • time - The event time for the sampled and observed events.

  • sequenceID - The numerical event sequence ID for the sampled and observed events.

  • observed - Boolean indicating if the event is a sampled event or observed event. (1 = observed; 0 = sampled)

Author(s)

Kevin A. Carson kacarson@arizona.edu, Diego F. Leal dflc@arizona.edu

References

Butts, Carter T. 2008. "A Relational Event Framework for Social Action." Sociological Methodology 38(1): 155-200.

Butts, Carter T. and Christopher Steven Marcum. 2017. "A Relational Event Approach to Modeling Behavioral Dynamics." In A. Pilny & M. S. Poole (Eds.), Group processes: Data-driven computational approaches. Springer International Publishing.

Lerner, Jürgen and Alessandro Lomi. 2020. "Reliability of relational event model estimates under sampling: How to fit a relational event model to 360 million dyadic events." Network Science 8(1): 97–135.

Vu, Duy, Philippa Pattison, and Garry Robins. 2015. "Relational event models for social learning in MOOCs." Social Networks 43: 121-135.

Examples


data("WikiEvent2018.first100k")
WikiEvent2018.first100k$time <- as.numeric(WikiEvent2018.first100k$time)
### Creating the EventSet By Employing Case-Control Sampling With M = 10 and
### Sampling from the Observed Event Sequence with P = 0.01
EventSet <- processTMEventSeq(
  data = WikiEvent2018.first100k, # The Event Dataset
  time = WikiEvent2018.first100k$time, # The Time Variable
  eventID = WikiEvent2018.first100k$eventID, # The Event Sequence Variable
  sender = WikiEvent2018.first100k$user, # The Sender Variable
  receiver = WikiEvent2018.first100k$article, # The Receiver Variable
  p_samplingobserved = 0.01, # The Probability of Selection
  n_controls = 10, # The Number of Controls to Sample from the Full Risk Set
  seed = 9999) # The Seed for Replication


### Creating A New EventSet with more observed events and less control events
### Sampling from the Observed Event Sequence with P = 0.02
### Employing Case-Control Sampling With M = 2
EventSet1 <- processTMEventSeq(
  data = WikiEvent2018.first100k, # The Event Dataset
  time = WikiEvent2018.first100k$time, # The Time Variable
  eventID = WikiEvent2018.first100k$eventID, # The Event Sequence Variable
  sender = WikiEvent2018.first100k$user, # The Sender Variable
  receiver = WikiEvent2018.first100k$article, # The Receiver Variable
  p_samplingobserved = 0.02, # The Probability of Selection
  n_controls = 2, # The Number of Controls to Sample from the Full Risk Set
  seed = 9999) # The Seed for Replication

### Creating An Event-Dependent EventSet with P = 0.001 and m = 5 with
### where only actors involved in the past 20 events are involved in the
### creation of the risk set.
event_dependent <- processTMEventSeq(
 data = WikiEvent2018.first100k,
 time = WikiEvent2018.first100k$time,
 sender = WikiEvent2018.first100k$user,
 receiver = WikiEvent2018.first100k$article,
 eventID = WikiEvent2018.first100k$eventID,
 p_samplingobserved = 0.001,
 n_controls = 5,
 time_dependent = TRUE,
 timeDV = 1:nrow(WikiEvent2018.first100k),
 timeDif = 20, #20 past events
 seed = 9999)
### Creating An Time-Dependent EventSet with P = 0.001 and m = 5 with
### where only actors involved in the past 30 days are involved in the
### creation of the risk set.
timeSinceStart <- WikiEvent2018.first100k$time-WikiEvent2018.first100k$time[1]
timeDifMonth <- 30*24*60*60*1000
timedependent <- processTMEventSeq(
 data = WikiEvent2018.first100k,
 time = WikiEvent2018.first100k$time,
 sender = WikiEvent2018.first100k$user,
 receiver = WikiEvent2018.first100k$article,
 eventID = WikiEvent2018.first100k$eventID,
 p_samplingobserved = 0.001,
 n_controls = 5,
 time_dependent = TRUE,
 timeDV = timeSinceStart,
 timeDif = timeDifMonth,
 seed = 9999)

dream documentation built on Aug. 8, 2025, 6:36 p.m.