simcdnet: Simulating Count Data Models with Social Interactions Under...
In CDatanet: Econometrics of Network Data

simcdnet

R Documentation

Simulating Count Data Models with Social Interactions Under Rational Expectations

Description

simcdnet simulates the count data model with social interactions under rational expectations developed by Houndetoungan (2024).

Usage

simcdnet(
  formula,
  group,
  Glist,
  parms,
  lambda,
  Gamma,
  delta,
  Rmax,
  Rbar,
  tol = 1e-10,
  maxit = 500,
  data
)

Arguments

`formula`	A class object of class formula: a symbolic description of the model. `formula` should be specified, for example, as `y ~ x1 + x2 + gx1 + gx2`, where `y` is the endogenous vector and `x1`, `x2`, `gx1`, and `gx2` are control variables. These control variables can include contextual variables, such as averages among the peers. Peer averages can be computed using the function `peer.avg`.
`group`	A vector indicating the individual groups. By default, this assumes a common group. If there are 2 groups (i.e., `length(unique(group)) = 2`, such as `A` and `B`), four types of peer effects are defined: peer effects of `A` on `A`, `A` on `B`, `B` on `A`, and `B` on `B`.
`Glist`	An adjacency matrix or list of adjacency matrices. For networks consisting of multiple subnets (e.g., schools), `Glist` can be a list of subnet matrices, where the `m`-th element is an `n_m \times n_m` adjacency matrix, with `n_m` representing the number of nodes in the `m`-th subnet. For heterogeneous peer effects (`length(unique(group)) = h > 1`), the `m`-th element should be a list of `h^2` `n_m \times n_m` adjacency matrices corresponding to different network specifications (see Houndetoungan, 2024). For heterogeneous peer effects in a single large network, `Glist` should be a one-item list, where the item is a list of `h^2` network specifications. The order of these networks is important and must match `sort(unique(group))` (see examples).
`parms`	A vector defining the true values of `\theta = (\lambda', \Gamma', \delta')'` (see model specification in the details section). Each parameter `\lambda`, `\Gamma`, or `\delta` can also be provided separately to the arguments `lambda`, `Gamma`, or `delta`.
`lambda`	The true value of the vector `\lambda`.
`Gamma`	The true value of the vector `\Gamma`.
`delta`	The true value of the vector `\delta`.
`Rmax`	An integer indicating the theoretical upper bound of `y` (see model specification in detail).
`Rbar`	An `L`-vector, where `L` is the number of groups. For large `Rmax`, the cost function is assumed to be semi-parametric (i.e., nonparametric from 0 to `\bar{R}` and quadratic beyond `\bar{R}`). The `l`-th element of `Rbar` indicates `\bar{R}` for the `l`-th value of `sort(unique(group))` (see model specification in detail).
`tol`	The tolerance value used in the Fixed Point Iteration Method to compute the expectancy of `y`. The process stops if the `\ell_1`-distance between two consecutive `E(y)` is less than `tol`.
`maxit`	The maximum number of iterations in the Fixed Point Iteration Method.
`data`	An optional data frame, list, or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `simcdnet` is called.

Details

The count variable y_i takes the value r with probability.

P_{ir} = F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r}) - F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r + 1}).

In this equation, \mathbf{z}_i is a vector of control variables; F is the distribution function of the standard normal distribution; \bar{y}_i^{e,s} is the average of E(y) among peers using the s-th network definition; a_{h(i),r} is the r-th cut-point in the cost group h(i).

The following identification conditions have been introduced: \sum_{s = 1}^S \lambda_s > 0, a_{h(i),0} = -\infty, a_{h(i),1} = 0, and a_{h(i),r} = \infty for any r \geq R_{\text{max}} + 1. The last condition implies that P_{ir} = 0 for any r \geq R_{\text{max}} + 1. For any r \geq 1, the distance between two cut-points is a_{h(i),r+1} - a_{h(i),r} = \delta_{h(i),r} + \sum_{s = 1}^S \lambda_s. As the number of cut-points can be large, a quadratic cost function is considered for r \geq \bar{R}_{h(i)}, where \bar{R} = (\bar{R}_{1}, ..., \bar{R}_{L}). With the semi-parametric cost function, a_{h(i),r + 1} - a_{h(i),r} = \bar{\delta}_{h(i)} + \sum_{s = 1}^S \lambda_s.

The model parameters are: \lambda = (\lambda_1, ..., \lambda_S)', \Gamma, and \delta = (\delta_1', ..., \delta_L')', where \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l}, \bar{\delta}_l)' for l = 1, ..., L. The number of single parameters in \delta_l depends on R_{\text{max}} and \bar{R}_l. The components \delta_{l,2}, ..., \delta_{l,\bar{R}_l} or/and \bar{\delta}_l must be removed in certain cases.
If R_{\text{max}} = \bar{R}_l \geq 2, then \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l})'.
If R_{\text{max}} = \bar{R}_l = 1 (binary models), then \delta_l must be empty.
If R_{\text{max}} > \bar{R}_l = 1, then \delta_l = \bar{\delta}_l.

Value

A list consisting of:

`yst`	`y^{\ast}`, the latent variable.
`y`	the observed count variable.
`Ey`	`E(y)`, the expectation of y.
`GEy`	the average of `E(y)` among peers.
`meff`	a list including average and individual marginal effects.
`Rmax`	infinite sums in the marginal effects are approximated by sums up to Rmax.
`iteration`	number of iterations performed by sub-network in the Fixed Point Iteration Method.

References

Houndetoungan, A. (2024). Count Data Models with Heterogeneous Peer Effects. Available at SSRN 3721250, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2139/ssrn.3721250")}.

Examples


set.seed(123)
M      <- 5 # Number of sub-groups
nvec   <- round(runif(M, 100, 200)) # Random group sizes
n      <- sum(nvec) # Total number of individuals

# Adjacency matrix for each group
A      <- list()
for (m in 1:M) {
  nm           <- nvec[m] # Size of group m
  Am           <- matrix(0, nm, nm) # Empty adjacency matrix
  max_d        <- 30 # Maximum number of friends
  for (i in 1:nm) {
    tmp        <- sample((1:nm)[-i], sample(0:max_d, 1)) # Sample friends
    Am[i, tmp] <- 1 # Set friendship links
  }
  A[[m]]       <- Am # Add to the list
}
Anorm  <- norm.network(A) # Row-normalization of the adjacency matrices

# Covariates (X)
X      <- cbind(rnorm(n, 1, 3), rexp(n, 0.4)) # Random covariates

# Two groups based on first covariate
group  <- 1 * (X[,1] > 0.95) # Assign to groups based on x1

# Networks: Define peer effects based on group membership
# The networks should capture:
# - Peer effects of `0` on `0`
# - Peer effects of `1` on `0`
# - Peer effects of `0` on `1`
# - Peer effects of `1` on `1`
G        <- list()
cums     <- c(0, cumsum(nvec)) # Cumulative indices for groups
for (m in 1:M) {
  tp     <- group[(cums[m] + 1):(cums[m + 1])] # Group membership for group m
  Am     <- A[[m]] # Adjacency matrix for group m
  # Define networks based on peer effects
  G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)),
                              Am * ((1 - tp) %*% t(tp)),
                              Am * (tp %*% t(1 - tp)),
                              Am * (tp %*% t(tp))))
}

# Parameters for the model
lambda <- c(0.2, 0.3, -0.15, 0.25) 
Gamma  <- c(4.5, 2.2, -0.9, 1.5, -1.2)
delta  <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) # Repeated values for delta

# Prepare data for the model
data   <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 = X[,2]))) 
colnames(data) = c("x1", "x2", "gx1", "gx2") # Set column names

# Simulate outcomes using the `simcdnet` function
ytmp   <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2),
                   lambda = lambda, Gamma = Gamma, delta = delta, group = group,
                   data = data)
y      <- ytmp$y

# Plot histogram of the simulated outcomes
hist(y, breaks = max(y) + 1)

# Display frequency table of the simulated outcomes
table(y)

CDatanet documentation built on April 3, 2025, 11:07 p.m.