cdnet: Estimating Count Data Models with Social Interactions under...

View source: R/cdnet.R

cdnetR Documentation

Estimating Count Data Models with Social Interactions under Rational Expectations Using the NPL Method

Description

cdnet estimates count data models with social interactions under rational expectations using the NPL algorithm (see Houndetoungan, 2024).

Usage

cdnet(
  formula,
  Glist,
  group,
  Rmax,
  Rbar,
  starting = list(lambda = NULL, Gamma = NULL, delta = NULL),
  Ey0 = NULL,
  ubslambda = 1L,
  optimizer = "fastlbfgs",
  npl.ctr = list(),
  opt.ctr = list(),
  cov = TRUE,
  data
)

Arguments

formula

a class object formula: a symbolic description of the model. The formula must be, for example, y ~ x1 + x2 + gx1 + gx2, where y is the endogenous vector, and x1, x2, gx1, and gx2 are control variables, which may include contextual variables (i.e., averages among the peers). Peer averages can be computed using the function peer.avg.

Glist

adjacency matrix. For networks consisting of multiple subnets (e.g., schools), Glist can be a list of subnets, with the m-th element being an n_m \times n_m adjacency matrix, where n_m is the number of nodes in the m-th subnet. For heterogeneous peer effects (i.e., when length(unique(group)) = h > 1), the m-th element must be a list of h^2 n_m \times n_m adjacency matrices corresponding to the different network specifications (see Houndetoungan, 2024, Section 2.1). For heterogeneous peer effects in the case of a single large network (a single school), Glist must be a one-item list (since there is one school). This item must be a list of h^2 network specifications. The order in which the networks are specified is important and must match the order of the groups in sort(unique(group)) (see argument group and examples).

group

a vector indicating the individual groups. The default assumes a common group. For two groups, i.e., length(unique(group)) = 2 (e.g., A and B), four types of peer effects are defined: peer effects of A on A, of A on B, of B on A, and of B on B. In this case, in the argument Glist, the networks must be defined in this order: AA, AB, BA, BB.

Rmax

an integer indicating the theoretical upper bound of y (see model specification in detail).

Rbar

an L-vector, where L is the number of groups. For large Rmax, the cost function is assumed to be semi-parametric (i.e., nonparametric from 0 to \bar{R} and quadratic beyond \bar{R}).

starting

(optional) a starting value for \theta = (\lambda, \Gamma', \delta'), where \lambda, \Gamma, and \delta are the parameters to be estimated (see details).

Ey0

(optional) a starting value for E(y).

ubslambda

a positive value indicating the upper bound of \sum_{s = 1}^S \lambda_s > 0.

optimizer

specifies the optimization method, which can be one of: fastlbfgs (L-BFGS optimization method from the RcppNumerical package), nlm (from the function nlm), or optim (from the function optim). Arguments for these functions, such as control and method, can be set via the argument opt.ctr.

npl.ctr

a list of controls for the NPL method (see details).

opt.ctr

a list of arguments to be passed to optim_lbfgs from the RcppNumerical package, or to nlm or optim (the solver specified in optimizer), such as maxit, eps_f, eps_g, control, method, etc.

cov

a Boolean indicating whether the covariance should be computed.

data

an optional data frame, list, or environment (or an object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which cdnet is called.

Details

Model

The count variable y_i takes the value r with probability.

P_{ir} = F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r}) - F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r + 1}).

In this equation, \mathbf{z}_i is a vector of control variables; F is the distribution function of the standard normal distribution; \bar{y}_i^{e,s} is the average of E(y) among peers using the s-th network definition; a_{h(i),r} is the r-th cut-point in the cost group h(i).

The following identification conditions have been introduced: \sum_{s = 1}^S \lambda_s > 0, a_{h(i),0} = -\infty, a_{h(i),1} = 0, and a_{h(i),r} = \infty for any r \geq R_{\text{max}} + 1. The last condition implies that P_{ir} = 0 for any r \geq R_{\text{max}} + 1. For any r \geq 1, the distance between two cut-points is a_{h(i),r+1} - a_{h(i),r} = \delta_{h(i),r} + \sum_{s = 1}^S \lambda_s. As the number of cut-points can be large, a quadratic cost function is considered for r \geq \bar{R}_{h(i)}, where \bar{R} = (\bar{R}_{1}, ..., \bar{R}_{L}). With the semi-parametric cost function, a_{h(i),r + 1} - a_{h(i),r} = \bar{\delta}_{h(i)} + \sum_{s = 1}^S \lambda_s.

The model parameters are: \lambda = (\lambda_1, ..., \lambda_S)', \Gamma, and \delta = (\delta_1', ..., \delta_L')', where \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l}, \bar{\delta}_l)' for l = 1, ..., L. The number of single parameters in \delta_l depends on R_{\text{max}} and \bar{R}_l. The components \delta_{l,2}, ..., \delta_{l,\bar{R}_l} or/and \bar{\delta}_l must be removed in certain cases.
If R_{\text{max}} = \bar{R}_l \geq 2, then \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l})'.
If R_{\text{max}} = \bar{R}_l = 1 (binary models), then \delta_l must be empty.
If R_{\text{max}} > \bar{R}_l = 1, then \delta_l = \bar{\delta}_l.

npl.ctr

The model parameters are estimated using the Nested Partial Likelihood (NPL) method. This approach begins with an initial guess for \theta and E(y) and iteratively refines them. The solution converges when the \ell_1-distance between two consecutive estimates of \theta and E(y) is smaller than a specified tolerance.

The argument npl.ctr must include the following parameters:

tol

the tolerance level for the NPL algorithm (default is 1e-4).

maxit

the maximum number of iterations allowed (default is 500).

print

a boolean value indicating whether the estimates should be printed at each step.

S

the number of simulations performed to compute the integral in the covariance using importance sampling.

Value

A list consisting of:

info

a list containing general information about the model.

estimate

the NPL estimator.

Ey

E(y), the expectation of y.

GEy

the average of E(y) across peers.

cov

a list that includes (if cov == TRUE): parms, the covariance matrix, and another list, var.comp, which contains Sigma (\Sigma) and Omega (\Omega), the matrices used to compute the covariance matrix.

details

step-by-step output returned by the optimizer.

References

Houndetoungan, A. (2024). Count Data Models with Heterogeneous Peer Effects. Available at SSRN 3721250, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2139/ssrn.3721250")}.

See Also

sart, sar, simcdnet.

Examples


set.seed(123)
M      <- 5 # Number of sub-groups
nvec   <- round(runif(M, 100, 200))
n      <- sum(nvec)

# Adjacency matrix
A      <- list()
for (m in 1:M) {
  nm           <- nvec[m]
  Am           <- matrix(0, nm, nm)
  max_d        <- 30 #maximum number of friends
  for (i in 1:nm) {
    tmp        <- sample((1:nm)[-i], sample(0:max_d, 1))
    Am[i, tmp] <- 1
  }
  A[[m]]       <- Am
}
Anorm  <- norm.network(A) #Row-normalization

# X
X      <- cbind(rnorm(n, 1, 3), rexp(n, 0.4))

# Two group:
group  <- 1*(X[,1] > 0.95)

# Networks
# length(group) = 2 and unique(sort(group)) = c(0, 1)
# The networks must be defined as to capture:
# peer effects of `0` on `0`, peer effects of `1` on `0`
# peer effects of `0` on `1`, and peer effects of `1` on `1`
G        <- list()
cums     <- c(0, cumsum(nvec))
for (m in 1:M) {
  tp     <- group[(cums[m] + 1):(cums[m + 1])]
  Am     <- A[[m]]
  G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)),
                              Am * ((1 - tp) %*% t(tp)),
                              Am * (tp %*% t(1 - tp)),
                              Am * (tp %*% t(tp))))
}

# Parameters
lambda <- c(0.2, 0.3, -0.15, 0.25) 
Gamma  <- c(4.5, 2.2, -0.9, 1.5, -1.2)
delta  <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) 

# Data
data   <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 =  X[,2])))
colnames(data) = c("x1", "x2", "gx1", "gx2")

ytmp   <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2),
                   lambda = lambda, Gamma = Gamma, delta = delta, group = group,
                   data = data)
y      <- ytmp$y
hist(y, breaks = max(y) + 1)
table(y)

# Estimation
est    <- cdnet(formula = y ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), group = group,
                optimizer = "fastlbfgs", data = data,
                opt.ctr = list(maxit = 5e3, eps_f = 1e-11, eps_g = 1e-11))
summary(est)


CDatanet documentation built on April 3, 2025, 11:07 p.m.