cdnet | R Documentation |
cdnet
estimates count data models with social interactions under rational expectations using the NPL algorithm (see Houndetoungan, 2024).
cdnet(
formula,
Glist,
group,
Rmax,
Rbar,
starting = list(lambda = NULL, Gamma = NULL, delta = NULL),
Ey0 = NULL,
ubslambda = 1L,
optimizer = "fastlbfgs",
npl.ctr = list(),
opt.ctr = list(),
cov = TRUE,
data
)
formula |
a class object formula: a symbolic description of the model. The |
Glist |
adjacency matrix. For networks consisting of multiple subnets (e.g., schools), |
group |
a vector indicating the individual groups. The default assumes a common group. For two groups, i.e., |
Rmax |
an integer indicating the theoretical upper bound of |
Rbar |
an |
starting |
(optional) a starting value for |
Ey0 |
(optional) a starting value for |
ubslambda |
a positive value indicating the upper bound of |
optimizer |
specifies the optimization method, which can be one of: |
npl.ctr |
a list of controls for the NPL method (see details). |
opt.ctr |
a list of arguments to be passed to |
cov |
a Boolean indicating whether the covariance should be computed. |
data |
an optional data frame, list, or environment (or an object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in |
The count variable y_i
takes the value r
with probability.
P_{ir} = F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r}) - F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r + 1}).
In this equation, \mathbf{z}_i
is a vector of control variables; F
is the distribution function of the standard normal distribution;
\bar{y}_i^{e,s}
is the average of E(y)
among peers using the s
-th network definition;
a_{h(i),r}
is the r
-th cut-point in the cost group h(i)
.
The following identification conditions have been introduced: \sum_{s = 1}^S \lambda_s > 0
, a_{h(i),0} = -\infty
, a_{h(i),1} = 0
, and
a_{h(i),r} = \infty
for any r \geq R_{\text{max}} + 1
. The last condition implies that P_{ir} = 0
for any r \geq R_{\text{max}} + 1
.
For any r \geq 1
, the distance between two cut-points is a_{h(i),r+1} - a_{h(i),r} = \delta_{h(i),r} + \sum_{s = 1}^S \lambda_s
.
As the number of cut-points can be large, a quadratic cost function is considered for r \geq \bar{R}_{h(i)}
, where \bar{R} = (\bar{R}_{1}, ..., \bar{R}_{L})
.
With the semi-parametric cost function,
a_{h(i),r + 1} - a_{h(i),r} = \bar{\delta}_{h(i)} + \sum_{s = 1}^S \lambda_s
.
The model parameters are: \lambda = (\lambda_1, ..., \lambda_S)'
, \Gamma
, and \delta = (\delta_1', ..., \delta_L')'
,
where \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l}, \bar{\delta}_l)'
for l = 1, ..., L
.
The number of single parameters in \delta_l
depends on R_{\text{max}}
and \bar{R}_l
. The components \delta_{l,2}, ..., \delta_{l,\bar{R}_l}
or/and
\bar{\delta}_l
must be removed in certain cases.
If R_{\text{max}} = \bar{R}_l \geq 2
, then \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l})'
.
If R_{\text{max}} = \bar{R}_l = 1
(binary models), then \delta_l
must be empty.
If R_{\text{max}} > \bar{R}_l = 1
, then \delta_l = \bar{\delta}_l
.
npl.ctr
The model parameters are estimated using the Nested Partial Likelihood (NPL) method. This approach
begins with an initial guess for \theta
and E(y)
and iteratively refines them.
The solution converges when the \ell_1
-distance between two consecutive estimates of
\theta
and E(y)
is smaller than a specified tolerance.
The argument npl.ctr
must include the following parameters:
the tolerance level for the NPL algorithm (default is 1e-4).
the maximum number of iterations allowed (default is 500).
a boolean value indicating whether the estimates should be printed at each step.
the number of simulations performed to compute the integral in the covariance using importance sampling.
A list consisting of:
info |
a list containing general information about the model. |
estimate |
the NPL estimator. |
Ey |
|
GEy |
the average of |
cov |
a list that includes (if |
details |
step-by-step output returned by the optimizer. |
Houndetoungan, A. (2024). Count Data Models with Heterogeneous Peer Effects. Available at SSRN 3721250, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2139/ssrn.3721250")}.
sart
, sar
, simcdnet
.
set.seed(123)
M <- 5 # Number of sub-groups
nvec <- round(runif(M, 100, 200))
n <- sum(nvec)
# Adjacency matrix
A <- list()
for (m in 1:M) {
nm <- nvec[m]
Am <- matrix(0, nm, nm)
max_d <- 30 #maximum number of friends
for (i in 1:nm) {
tmp <- sample((1:nm)[-i], sample(0:max_d, 1))
Am[i, tmp] <- 1
}
A[[m]] <- Am
}
Anorm <- norm.network(A) #Row-normalization
# X
X <- cbind(rnorm(n, 1, 3), rexp(n, 0.4))
# Two group:
group <- 1*(X[,1] > 0.95)
# Networks
# length(group) = 2 and unique(sort(group)) = c(0, 1)
# The networks must be defined as to capture:
# peer effects of `0` on `0`, peer effects of `1` on `0`
# peer effects of `0` on `1`, and peer effects of `1` on `1`
G <- list()
cums <- c(0, cumsum(nvec))
for (m in 1:M) {
tp <- group[(cums[m] + 1):(cums[m + 1])]
Am <- A[[m]]
G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)),
Am * ((1 - tp) %*% t(tp)),
Am * (tp %*% t(1 - tp)),
Am * (tp %*% t(tp))))
}
# Parameters
lambda <- c(0.2, 0.3, -0.15, 0.25)
Gamma <- c(4.5, 2.2, -0.9, 1.5, -1.2)
delta <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2)
# Data
data <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 = X[,2])))
colnames(data) = c("x1", "x2", "gx1", "gx2")
ytmp <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2),
lambda = lambda, Gamma = Gamma, delta = delta, group = group,
data = data)
y <- ytmp$y
hist(y, breaks = max(y) + 1)
table(y)
# Estimation
est <- cdnet(formula = y ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), group = group,
optimizer = "fastlbfgs", data = data,
opt.ctr = list(maxit = 5e3, eps_f = 1e-11, eps_g = 1e-11))
summary(est)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.