R/cdnet.R

Defines functions print.simcdEy simcdEy cdnet simcdnet

Documented in cdnet print.simcdEy simcdEy simcdnet

#' @title Simulating count data models with social interactions under rational expectations
#' @param formula a class object \link[stats]{formula}: a symbolic description of the model. `formula` must be as, for example, \code{y ~ x1 + x2 + gx1 + gx2}
#' where `y` is the endogenous vector and `x1`, `x2`, `gx1` and `gx2` are control variables, which can include contextual variables, i.e. averages among the peers.
#' Peer averages can be computed using the function \code{\link{peer.avg}}.
#' @param Glist adjacency matrix. For networks consisting of multiple subnets, `Glist` can be a list of subnets with the `m`-th element being an \eqn{n_s\times n_s}-adjacency matrix, where \eqn{n_s} is the number of nodes in the `m`-th subnet.
#' For heterogeneous peer effects (`length(unique(group)) = h > 1`), the `m`-th element must be a list of \eqn{h^2} \eqn{n_s\times n_s}-adjacency matrices corresponding to the different network specifications (see Houndetoungan, 2024).
#' For heterogeneous peer effects in the case of a single large network, `Glist` must be a one-item list. This item must be a list of \eqn{h^2} network specifications. 
#' The order in which the networks in are specified are important and must match `sort(unique(group))` (see examples).
#' @param group the vector indicating the individual groups. The default assumes a common group. For 2 groups; that is, `length(unique(group)) = 2`, (e.g., `A` and `B`), 
#' four types of peer effects are defined: peer effects of `A` on `A`, of `A` on `B`, of `B` on `A`, and of `B` on `B`.
#' @param parms a vector defining the true value of \eqn{\theta = (\lambda', \Gamma', \delta')'} (see the model specification in details). 
#' Each parameter \eqn{\lambda}, \eqn{\Gamma}, or \eqn{\delta} can also be given separately to the arguments `lambda`, `Gamma`, or `delta`.
#' @param lambda the true value of the vector \eqn{\lambda}.
#' @param Gamma the true value of the vector \eqn{\Gamma}.
#' @param delta the true value of the vector \eqn{\delta}.
#' @param Rmax an integer indicating the theoretical upper bound of `y`. (see the model specification in details).
#' @param Rbar an \eqn{L}-vector, where  \eqn{L} is the number of groups. For large `Rmax` the cost function is assumed to be semi-parametric (i.e., nonparametric from 0 to \eqn{\bar{R}} and quadratic beyond \eqn{\bar{R}}). 
#' The `l`-th element of `Rbar` indicates \eqn{\bar{R}} for the `l`-th value of `sort(unique(group))` (see the model specification in details).
#' @param tol the tolerance value used in the Fixed Point Iteration Method to compute the expectancy of `y`. The process stops if the \eqn{\ell_1}-distance 
#' between two consecutive \eqn{E(y)} is less than `tol`.
#' @param maxit the maximal number of iterations in the Fixed Point Iteration Method.
#' @param data an optional data frame, list or environment (or object coercible by \link[base]{as.data.frame} to a data frame) containing the variables
#' in the model. If not found in data, the variables are taken from \code{environment(formula)}, typically the environment from which `simcdnet` is called.
#' @description
#' `simcdnet` simulate the count data model with social interactions under rational expectations developed by Houndetoungan (2024).
#' @details 
#' The count variable \eqn{y_i}{yi} take the value \eqn{r} with probability. 
#' \deqn{P_{ir} = F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s}  + \mathbf{z}_i'\Gamma - a_{h(i),r}) - F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s}  + \mathbf{z}_i'\Gamma - a_{h(i),r + 1}).}
#' In this equation, \eqn{\mathbf{z}_i} is a vector of control variables; \eqn{F} is the distribution function of the standard normal distribution;
#' \eqn{\bar{y}_i^{e,s}} is the average of \eqn{E(y)} among peers using the `s`-th network definition;
#' \eqn{a_{h(i),r}} is the `r`-th cut-point in the cost group \eqn{h(i)}. \cr\cr
#' The following identification conditions have been introduced: \eqn{\sum_{s = 1}^S \lambda_s > 0}, \eqn{a_{h(i),0} = -\infty}, \eqn{a_{h(i),1} = 0}, and 
#' \eqn{a_{h(i),r} = \infty} for any \eqn{r \geq R_{\text{max}} + 1}. The last condition implies that \eqn{P_{ir} = 0} for any \eqn{r \geq R_{\text{max}} + 1}.
#' For any \eqn{r \geq 1}, the distance between two cut-points is \eqn{a_{h(i),r+1} - a_{h(i),r} =  \delta_{h(i),r} + \sum_{s = 1}^S \lambda_s}
#' As the number of cut-point can be large, a quadratic cost function is considered for \eqn{r \geq \bar{R}_{h(i)}}, where \eqn{\bar{R} = (\bar{R}_{1}, ..., \bar{R}_{L})}.
#' With the semi-parametric cost-function,
#' \eqn{a_{h(i),r + 1} - a_{h(i),r}= \bar{\delta}_{h(i)} + \sum_{s = 1}^S \lambda_s}.  \cr\cr
#' The model parameters are: \eqn{\lambda = (\lambda_1, ..., \lambda_S)'}, \eqn{\Gamma}, and \eqn{\delta = (\delta_1', ..., \delta_L')'}, 
#' where \eqn{\delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l}, \bar{\delta}_l)'} for \eqn{l = 1, ..., L}. 
#' The number of single parameters in \eqn{\delta_l} depends on  \eqn{R_{\text{max}}} and \eqn{\bar{R}_{l}}. The components \eqn{\delta_{l,2}, ..., \delta_{l,\bar{R}_l}} or/and 
#' \eqn{\bar{\delta}_l} must be removed in certain cases.\cr
#' If \eqn{R_{\text{max}} = \bar{R}_{l} \geq 2}, then \eqn{\delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l})'}.\cr
#' If \eqn{R_{\text{max}} = \bar{R}_{l} = 1} (binary models), then \eqn{\delta_l} must be empty.\cr
#' If \eqn{R_{\text{max}} > \bar{R}_{l} = 1}, then \eqn{\delta_l = \bar{\delta}_l}.
#' @seealso \code{\link{cdnet}}, \code{\link{simsart}}, \code{\link{simsar}}.
#' @return A list consisting of:
#'     \item{yst}{\eqn{y^{\ast}}, the latent variable.}
#'     \item{y}{the observed count variable.}
#'     \item{Ey}{\eqn{E(y)}, the expectation of y.}
#'     \item{GEy}{the average of \eqn{E(y)} friends.}
#'     \item{meff}{a list includinh average and individual marginal effects.}
#'     \item{Rmax}{infinite sums in the marginal effects are approximated by sums up to Rmax.}
#'     \item{iteration}{number of iterations performed by sub-network in the Fixed Point Iteration Method.}
#' @references 
#' Houndetoungan, E. A. (2024). Count Data Models with Social Interactions under Rational Expectations. Available at SSRN 3721250, \doi{10.2139/ssrn.3721250}.
#' @examples 
#' \donttest{
#' set.seed(123)
#' M      <- 5 # Number of sub-groups
#' nvec   <- round(runif(M, 100, 200))
#' n      <- sum(nvec)
#' 
#' # Adjacency matrix
#' A      <- list()
#' for (m in 1:M) {
#'   nm           <- nvec[m]
#'   Am           <- matrix(0, nm, nm)
#'   max_d        <- 30 #maximum number of friends
#'   for (i in 1:nm) {
#'     tmp        <- sample((1:nm)[-i], sample(0:max_d, 1))
#'     Am[i, tmp] <- 1
#'   }
#'   A[[m]]       <- Am
#' }
#' Anorm  <- norm.network(A) #Row-normalization
#' 
#' # X
#' X      <- cbind(rnorm(n, 1, 3), rexp(n, 0.4))
#' 
#' # Two group:
#' group  <- 1*(X[,1] > 0.95)
#' 
#' # Networks
#' # length(group) = 2 and unique(sort(group)) = c(0, 1)
#' # The networks must be defined as to capture:
#' # peer effects of `0` on `0`, peer effects of `1` on `0`
#' # peer effects of `0` on `1`, and peer effects of `1` on `1`
#' G        <- list()
#' cums     <- c(0, cumsum(nvec))
#' for (m in 1:M) {
#'   tp     <- group[(cums[m] + 1):(cums[m + 1])]
#'   Am     <- A[[m]]
#'   G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)),
#'                               Am * ((1 - tp) %*% t(tp)),
#'                               Am * (tp %*% t(1 - tp)),
#'                               Am * (tp %*% t(tp))))
#' }
#' 
#' # Parameters
#' lambda <- c(0.2, 0.3, -0.15, 0.25) 
#' Gamma  <- c(4.5, 2.2, -0.9, 1.5, -1.2)
#' delta  <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) 
#' 
#' # Data
#' data   <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 =  X[,2])))
#' colnames(data) = c("x1", "x2", "gx1", "gx2")
#' 
#' ytmp   <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2),
#'                    lambda = lambda, Gamma = Gamma, delta = delta, group = group,
#'                    data = data)
#' y      <- ytmp$y
#' hist(y, breaks = max(y) + 1)
#' table(y)}
#' @importFrom Rcpp sourceCpp
#' @importFrom stats rnorm
#' @export
simcdnet   <- function(formula,
                       group,
                       Glist,
                       parms,
                       lambda,
                       Gamma,
                       delta,
                       Rmax,
                       Rbar,
                       tol        = 1e-10,
                       maxit      = 500,
                       data) {
  if(missing(Rmax)) Rmax <- Inf
  if(is.null(Rmax)) Rmax <- Inf
  stopifnot((Rmax >= 1) & (Rmax <= Inf))
  if(!missing(Rbar)){
    stopifnot(all(Rbar <= Rmax))
    stopifnot(all(Rbar >= 1))
  }
  # Network
  stopifnot(inherits(Glist, c("list", "matrix", "array")))
  if (!is.list(Glist)) {
    Glist  <- list(Glist)
  }
  
  if(inherits(Glist[[1]], "list")){
    stopifnot(all(sapply(Glist, function(x_) inherits(x_, "list"))))
  } else if(inherits(Glist[[1]], c("matrix", "array"))) {
    stopifnot(all(sapply(Glist, function(x_) inherits(x_, c("matrix", "array")))))
    Glist  <- lapply(Glist, function(x_) list(x_))
  }
  M        <- length(Glist)
  nvec     <- sapply(Glist, function(x_) nrow(x_[[1]]))
  sumn     <- sum(nvec)
  igr      <- matrix(c(cumsum(c(0, nvec[-M])), cumsum(nvec) - 1), ncol = 2)
  nCl      <- length(Glist[[1]])
  
  # group
  if(missing(group)){
    group  <- rep(0, sumn)
  }
  if(is.null(group)){
    group  <- rep(0, sumn)
  }
  if(length(group) != sumn) stop("length(group) != n")
  uCa      <- sort(unique(group))
  nCa      <- length(uCa)
  lCa      <- lapply(uCa, function(x_) which(group == x_) - 1)
  na       <- sapply(lCa, length)
  if(!missing(Rbar)){
    if(length(Rbar) == 1) Rbar = rep(Rbar, nCa)
    if(nCa != length(Rbar)) stop("length(Rbar) is not equal to the number of groups.")
  }
  if((nCa^2) != nCl) stop("The number of network specifications does not match the number of groups.")
  
  # Parameters
  if(!missing(parms)){
    if(missing(lambda) & missing(Gamma) & missing(delta)) warnings("parms is defined: lambda, Gamma, or delta is ignored.")
    lambda  <- parms[1:nCl]
    delta   <- tail(parms, sum(ifelse(Rbar == Rmax, Rbar - 1, Rbar)))
    Gamma   <- parms[(nCl + 1):(length(parms) - sum(ifelse(Rbar == Rmax, Rbar - 1, Rbar)))]
  } else{
    if(missing(lambda) | missing(Gamma)) stop("lambda or Gamma is missing.")
    if(missing(delta)){
      if(Rmax > 1){
        stop("delta is missing.")
      } else {
        delta <- numeric(0)
      }
    }
    if(nCa == 1){
      if(!missing(Rbar)){
        if(ifelse(Rbar == Rmax, Rbar - 1, Rbar) != length(delta)) stop("length(delta) does not match Rbar and Rmax.")
      } else {
        Rbar <- length(delta)
      }
    }
    if(length(delta) != sum(ifelse(Rbar == Rmax, Rbar - 1, Rbar))) stop("length(delta) does not match Rbar and Rmax.")
    if(nCl != length(lambda)) stop("length(lambda) is not equal to the number of network specifications.")
  }
  
  ndelta  <- ifelse(Rbar == Rmax, Rbar - 1, Rbar)
  stopifnot(all(Rbar <= Rmax))
  if(any(Rmax > 1)) stopifnot(all(delta > 0))
  idelta  <- matrix(c(0, cumsum(ndelta)[-length(ndelta)], cumsum(ndelta) - 1), ncol = 2); idelta[ndelta == 0,] <- NA
  delta   <- c(fdelta(deltat = delta, lambda = lambda, idelta = idelta, ndelta = ndelta, nCa = nCa))
  
  # data
  f.t.data  <- formula.to.data(formula = formula, contextual = FALSE, Glist = Glist, M = M, igr = igr, 
                               data = data, type = "sim", theta0  = 0)
  X         <- f.t.data$X
  K         <- length(Gamma)
  if(ncol(X) != K) stop("ncol(X) = ", ncol(X), " is different from length(Gamma) = ", length(Gamma))
  coln      <- c("lambda", colnames(X))
  if(nCl > 1) {
    coln    <- c(paste0(coln[1], ":", 1:nCl), coln[-1])
  }
  
  # xb
  xb        <- c(X %*% Gamma)
  eps       <- rnorm(sumn, 0, 1)
  
  # E(y) and G*E(y)
  Ey        <- rep(0, sumn)
  GEy       <- matrix(0, sumn, nCl)
  t         <- fye(ye = Ey, Gye = GEy, G = Glist, lCa = lCa, nCa = nCa, igroup = igr, ngroup = M, 
                   psi = xb, lambda = lambda, delta = delta, idelta = idelta, n = na, sumn = sumn,
                   Rbar = Rbar, R = Rmax, tol = tol, maxit = maxit)
  if(t >= maxit) warning("Point-fixed: The maximum number of iterations has been reached.")
  # y
  Ztlamda  <- c(GEy %*% lambda + xb)
  yst      <- Ztlamda + eps
  y        <- c(fy(yst = yst, maxyst = sapply(lCa, function(x_) max(yst[x_ + 1])), lCa = lCa, nCa = nCa, delta = delta, 
                   idelta = idelta, n = na, sumn = sumn, Rbar = Rbar, R = Rmax))
  if(nCl == 1){
    GEy    <- c(GEy)
  }
  
  # marginal effects
  Gamma2   <- Gamma
  colnme   <- coln
  if("(Intercept)" %in% coln){
    Gamma2 <- Gamma[tail(coln, K) != "(Intercept)"]
    colnme <- colnme[coln != "(Intercept)"]
  }
  meff     <- fmeffects(ZtLambda = Ztlamda, lambda = lambda, Gamma2 = Gamma2, lCa = lCa, nCa = nCa,
                        delta = delta, idelta = idelta, sumn = sumn, Rbar = Rbar, R = Rmax)
  # Rmax     <- meff$Rmax
  imeff    <- meff$imeff; colnames(imeff) <- colnme
  
  
  list("yst"       = yst,
       "y"         = y,
       "Ey"        = Ey,
       "GEy"       = GEy,
       "meff"      = list(ameff = apply(imeff, 2, mean), imeff = imeff),
       "Rmax"      = Rmax,
       "iteration" = c(t))
}


#' @title Estimating count data models with social interactions under rational expectations using the NPL method
#' @param formula a class object \link[stats]{formula}: a symbolic description of the model. `formula` must be as, for example, \code{y ~ x1 + x2 + gx1 + gx2}
#' where `y` is the endogenous vector and `x1`, `x2`, `gx1` and `gx2` are control variables, which can include contextual variables, i.e. averages among the peers.
#' Peer averages can be computed using the function \code{\link{peer.avg}}.
#' @param Glist adjacency matrix. For networks consisting of multiple subnets, `Glist` can be a list of subnets with the `m`-th element being an \eqn{n_s\times n_s}-adjacency matrix, where \eqn{n_s} is the number of nodes in the `m`-th subnet.
#' For heterogeneous peer effects (`length(unique(group)) = h > 1`), the `m`-th element must be a list of \eqn{h^2} \eqn{n_s\times n_s}-adjacency matrices corresponding to the different network specifications (see Houndetoungan, 2024).
#' For heterogeneous peer effects in the case of a single large network, `Glist` must be a one-item list. This item must be a list of \eqn{h^2} network specifications. 
#' The order in which the networks in are specified are important and must match `sort(unique(group))` (see examples).
#' @param group the vector indicating the individual groups. The default assumes a common group. For 2 groups; that is, `length(unique(group)) = 2`, (e.g., `A` and `B`), 
#' four types of peer effects are defined: peer effects of `A` on `A`, of `A` on `B`, of `B` on `A`, and of `B` on `B`.
#' @param Rmax an integer indicating the theoretical upper bound of `y`. (see the model specification in details).
#' @param Rbar an \eqn{L}-vector, where  \eqn{L} is the number of groups. For large `Rmax` the cost function is assumed to be semi-parametric (i.e., nonparametric from 0 to \eqn{\bar{R}} and quadratic beyond \eqn{\bar{R}}). 
#' @param starting (optional) a starting value for \eqn{\theta = (\lambda, \Gamma', \delta')'}, where \eqn{\lambda}, \eqn{\Gamma}, and \eqn{\delta} are the parameters to be estimated (see details).
#' @param Ey0 (optional) a starting value for \eqn{E(y)}.
#' @param optimizer is either `fastlbfgs` (L-BFGS optimization method of the package \pkg{RcppNumerical}), `nlm` (referring to the function \link[stats]{nlm}), or `optim` (referring to the function \link[stats]{optim}). 
#' Arguments for these functions such as, `control` and `method` can be set via the argument `opt.ctr`.
#' @param npl.ctr a list of controls for the NPL method (see details).
#' @param opt.ctr a list of arguments to be passed in `optim_lbfgs` of the package \pkg{RcppNumerical}, \link[stats]{nlm} or \link[stats]{optim} (the solver set in `optimizer`), such as `maxit`, `eps_f`, `eps_g`, `control`, `method`, etc.
#' @param cov a Boolean indicating if the covariance should be computed.
#' @param ubslambda a positive value indicating the upper bound of \eqn{\sum_{s = 1}^S \lambda_s > 0}.
#' @param data an optional data frame, list or environment (or object coercible by \link[base]{as.data.frame} to a data frame) containing the variables
#' in the model. If not found in data, the variables are taken from \code{environment(formula)}, typically the environment from which `cdnet` is called.
#' @return A list consisting of:
#'     \item{info}{a list of general information about the model.}
#'     \item{estimate}{the NPL estimator.}
#'     \item{Ey}{\eqn{E(y)}, the expectation of y.}
#'     \item{GEy}{the average of \eqn{E(y)} friends.}
#'     \item{cov}{a list including (if `cov == TRUE`) `parms` the covariance matrix and another list `var.comp`, which includes `Sigma`, as \eqn{\Sigma}, and `Omega`, as \eqn{\Omega}, matrices used for
#'     compute the covariance matrix.}
#'     \item{details}{step-by-step output as returned by the optimizer.}
#' @description
#' `cdnet` estimates count data models with social interactions under rational expectations using the NPL algorithm (see Houndetoungan, 2024). 
#' @details 
#' ## Model
#' The count variable \eqn{y_i}{yi} take the value \eqn{r} with probability. 
#' \deqn{P_{ir} = F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s}  + \mathbf{z}_i'\Gamma - a_{h(i),r}) - F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s}  + \mathbf{z}_i'\Gamma - a_{h(i),r + 1}).}
#' In this equation, \eqn{\mathbf{z}_i} is a vector of control variables; \eqn{F} is the distribution function of the standard normal distribution;
#' \eqn{\bar{y}_i^{e,s}} is the average of \eqn{E(y)} among peers using the `s`-th network definition;
#' \eqn{a_{h(i),r}} is the `r`-th cut-point in the cost group \eqn{h(i)}. \cr\cr
#' The following identification conditions have been introduced: \eqn{\sum_{s = 1}^S \lambda_s > 0}, \eqn{a_{h(i),0} = -\infty}, \eqn{a_{h(i),1} = 0}, and 
#' \eqn{a_{h(i),r} = \infty} for any \eqn{r \geq R_{\text{max}} + 1}. The last condition implies that \eqn{P_{ir} = 0} for any \eqn{r \geq R_{\text{max}} + 1}.
#' For any \eqn{r \geq 1}, the distance between two cut-points is \eqn{a_{h(i),r+1} - a_{h(i),r} =  \delta_{h(i),r} + \sum_{s = 1}^S \lambda_s}
#' As the number of cut-point can be large, a quadratic cost function is considered for \eqn{r \geq \bar{R}_{h(i)}}, where \eqn{\bar{R} = (\bar{R}_{1}, ..., \bar{R}_{L})}.
#' With the semi-parametric cost-function,
#' \eqn{a_{h(i),r + 1} - a_{h(i),r}= \bar{\delta}_{h(i)} + \sum_{s = 1}^S \lambda_s}.  \cr\cr
#' The model parameters are: \eqn{\lambda = (\lambda_1, ..., \lambda_S)'}, \eqn{\Gamma}, and \eqn{\delta = (\delta_1', ..., \delta_L')'}, 
#' where \eqn{\delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l}, \bar{\delta}_l)'} for \eqn{l = 1, ..., L}. 
#' The number of single parameters in \eqn{\delta_l} depends on  \eqn{R_{\text{max}}} and \eqn{\bar{R}_{l}}. The components \eqn{\delta_{l,2}, ..., \delta_{l,\bar{R}_l}} or/and 
#' \eqn{\bar{\delta}_l} must be removed in certain cases.\cr
#' If \eqn{R_{\text{max}} = \bar{R}_{l} \geq 2}, then \eqn{\delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l})'}.\cr
#' If \eqn{R_{\text{max}} = \bar{R}_{l} = 1} (binary models), then \eqn{\delta_l} must be empty.\cr
#' If \eqn{R_{\text{max}} > \bar{R}_{l} = 1}, then \eqn{\delta_l = \bar{\delta}_l}.
#' ## \code{npl.ctr}
#' The model parameters are estimated using the Nested Partial Likelihood (NPL) method. This approach 
#' starts with a guess of \eqn{\theta} and \eqn{E(y)} and constructs iteratively a sequence
#' of \eqn{\theta} and \eqn{E(y)}. The solution converges when the \eqn{\ell_1}-distance
#' between two consecutive \eqn{\theta} and \eqn{E(y)} is less than a tolerance. \cr
#' The argument \code{npl.ctr} must include
#' \describe{
#' \item{tol}{the tolerance of the NPL algorithm (default 1e-4),}
#' \item{maxit}{the maximal number of iterations allowed (default 500),}
#' \item{print}{a boolean indicating if the estimate should be printed at each step.}
#' \item{S}{the number of simulations performed use to compute integral in the covariance by important sampling.} 
#' }
#' @references 
#' Houndetoungan, E. A. (2024). Count Data Models with Social Interactions under Rational Expectations. Available at SSRN 3721250, \doi{10.2139/ssrn.3721250}.
#' @seealso \code{\link{sart}}, \code{\link{sar}}, \code{\link{simcdnet}}.
#' @examples 
#' \donttest{
#' set.seed(123)
#' M      <- 5 # Number of sub-groups
#' nvec   <- round(runif(M, 100, 200))
#' n      <- sum(nvec)
#' 
#' # Adjacency matrix
#' A      <- list()
#' for (m in 1:M) {
#'   nm           <- nvec[m]
#'   Am           <- matrix(0, nm, nm)
#'   max_d        <- 30 #maximum number of friends
#'   for (i in 1:nm) {
#'     tmp        <- sample((1:nm)[-i], sample(0:max_d, 1))
#'     Am[i, tmp] <- 1
#'   }
#'   A[[m]]       <- Am
#' }
#' Anorm  <- norm.network(A) #Row-normalization
#' 
#' # X
#' X      <- cbind(rnorm(n, 1, 3), rexp(n, 0.4))
#' 
#' # Two group:
#' group  <- 1*(X[,1] > 0.95)
#' 
#' # Networks
#' # length(group) = 2 and unique(sort(group)) = c(0, 1)
#' # The networks must be defined as to capture:
#' # peer effects of `0` on `0`, peer effects of `1` on `0`
#' # peer effects of `0` on `1`, and peer effects of `1` on `1`
#' G        <- list()
#' cums     <- c(0, cumsum(nvec))
#' for (m in 1:M) {
#'   tp     <- group[(cums[m] + 1):(cums[m + 1])]
#'   Am     <- A[[m]]
#'   G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)),
#'                               Am * ((1 - tp) %*% t(tp)),
#'                               Am * (tp %*% t(1 - tp)),
#'                               Am * (tp %*% t(tp))))
#' }
#' 
#' # Parameters
#' lambda <- c(0.2, 0.3, -0.15, 0.25) 
#' Gamma  <- c(4.5, 2.2, -0.9, 1.5, -1.2)
#' delta  <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) 
#' 
#' # Data
#' data   <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 =  X[,2])))
#' colnames(data) = c("x1", "x2", "gx1", "gx2")
#' 
#' ytmp   <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2),
#'                    lambda = lambda, Gamma = Gamma, delta = delta, group = group,
#'                    data = data)
#' y      <- ytmp$y
#' hist(y, breaks = max(y) + 1)
#' table(y)
#' 
#' # Estimation
#' est    <- cdnet(formula = y ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), group = group,
#'                 optimizer = "fastlbfgs", data = data,
#'                 opt.ctr = list(maxit = 5e3, eps_f = 1e-11, eps_g = 1e-11))
#' summary(est)
#' }
#' @importFrom stats quantile
#' @importFrom utils head
#' @importFrom utils tail
#' @importFrom stats runif
#' @export
cdnet    <- function(formula,
                     Glist, 
                     group,
                     Rmax,
                     Rbar,
                     starting  = list(lambda = NULL, Gamma = NULL, delta = NULL), 
                     Ey0       = NULL,
                     ubslambda = 1L,
                     optimizer = "fastlbfgs", 
                     npl.ctr   = list(), 
                     opt.ctr   = list(), 
                     cov       = TRUE,
                     data) {
  if(missing(Rmax)) Rmax <- Inf
  stopifnot(optimizer %in% c("fastlbfgs", "optim", "nlm"))
  stopifnot((Rmax >= 1) & (Rmax <= Inf))
  stopifnot(length(ubslambda) == 1)
  stopifnot(ubslambda > 0)
  if(Rmax == 1) Rbar <- 1
  lb_sl       <- 0
  ub_sl       <- ubslambda
  env.formula <- environment(formula)
  
  # controls
  npl.print   <- npl.ctr$print
  npl.tol     <- npl.ctr$tol
  npl.maxit   <- npl.ctr$maxit
  npl.S       <- npl.ctr$S
  npl.incdit  <- npl.ctr$incdit
  if (is.null(npl.print)) {
    npl.print <- TRUE
  }
  if (is.null(npl.tol)) {
    npl.tol   <- 1e-4
  }
  if (is.null(npl.maxit)) {
    npl.maxit <- 500L
  }
  if (is.null(npl.S)) {
    npl.S     <- 1e3L
  }
  if (is.null(npl.incdit)) {
    npl.incdit<- 30L
  }
  
  # Network
  stopifnot(inherits(Glist, c("list", "matrix", "array")))
  if (!is.list(Glist)) {
    Glist  <- list(Glist)
  }
  
  if(inherits(Glist[[1]], "list")){
    stopifnot(all(sapply(Glist, function(x_) inherits(x_, "list"))))
  } else if(inherits(Glist[[1]], c("matrix", "array"))) {
    stopifnot(all(sapply(Glist, function(x_) inherits(x_, c("matrix", "array")))))
    Glist  <- lapply(Glist, function(x_) list(x_))
  }
  M        <- length(Glist)
  nvec     <- sapply(Glist, function(x_) nrow(x_[[1]]))
  sumn     <- sum(nvec)
  igr      <- matrix(c(cumsum(c(0, nvec[-M])), cumsum(nvec) - 1), ncol = 2)
  nCl      <- length(Glist[[1]])
  
  # group
  if(missing(group)){
    group <- rep(0, sumn)
  }
  if(is.null(group)){
    group <- rep(0, sumn)
  }
  
  if(length(group) != sumn) stop("length(group) != n")
  uCa      <- sort(unique(group))
  nCa      <- length(uCa)
  lCa      <- lapply(uCa, function(x_) which(group == x_) - 1)
  na       <- sapply(lCa, length)
  if(!missing(Rbar)){
    if(length(Rbar) == 1) Rbar = rep(Rbar, nCa)
    if(nCa != length(Rbar)) stop("length(Rbar) is not equal to the number of groups.")
  }
  if((nCa^2) != nCl) stop("The number of network specifications does not match the number of groups.")
  
  # starting values
  thetat   <- NULL
  Deltat   <- NULL
  if(!inherits(starting, "list") & !is.null(starting)) {
    Deltat <- tail(starting, sum(ifelse(Rbar == Rmax, Rbar - 1, Rbar)))
    Lmdt   <- starting[1:nCl]
    Gamt   <- starting[(nCl + 1):(length(starting) - sum(ifelse(Rbar == Rmax, Rbar - 1, Rbar)))]
    thetat <- c(Lmdt, Gamt)
  }
  if(inherits(starting, "list")){
    if(!is.null(starting$lambda)){
      thetat <- c(starting$lambda, starting$Gamma)
      Deltat <- starting$delta
      if(Rmax == 1) Deltat <- numeric(0)
    }
  }
  if(!is.null(Deltat)){
    if(length(Deltat) != sum(ifelse(Rbar == Rmax, Rbar - 1, Rbar))) stop("`length(delta)` does not match Rbar and Rmax.")
  }
  tmp      <- c(is.null(thetat), is.null(Deltat))
  if((all(tmp) != any(tmp))){
    stop("Starting parameters are defined, but not all of them.")
  }
  
  # data
  f.t.data <- formula.to.data(formula, FALSE, Glist, M, igr, data, theta0 = 0) #because G are directly included in X
  formula  <- f.t.data$formula
  y        <- f.t.data$y
  X        <- f.t.data$X
  
  if(!is.null(thetat)){
    if((ncol(X) + nCl) != length(thetat)) stop("ncol(X) == length(Gamma) is not true.")
  }
  coln     <- c("lambda", colnames(X))
  if(nCl > 1) {
    coln   <- c(paste0(coln[1], ":", 1:nCl), coln[-1])
  }
  maxy     <- sapply(lCa, function(x_) max(y[x_ + 1]))
  if(any(maxy > Rmax)) stop("`Rmax` is lower than y.")
  K        <- ncol(X)
  
  # Ey 
  Eyt      <- runif(sumn, y*0.98, y*1.02)
  if (!is.null(Ey0)) {
    Eyt    <- Ey0
  }
  
  # GEyt
  GEyt     <- lapply(1:nCl, function(x_2) unlist(lapply(1:M, function(x_1) Glist[[x_1]][[x_2]] %*% Eyt[(igr[x_1, 1]:igr[x_1, 2]) + 1])))
  GEyt     <- as.matrix(as.data.frame(GEyt)); colnames(GEyt) <- head(coln, nCl)
  
  # Rbar
  if(is.null(Deltat) & missing(Rbar)) {
    Rbar   <- rep(1, nCa)
  }
  stopifnot(all(Rbar >= 1))
  stopifnot(all(Rbar <= Rmax))
  if(any(sapply(1:nCa, function(x_) Rbar[x_] > max(y[lCa[[x_]]])))) {
    stop("Rbar > max(y) in at least one cost group.")
  }
  
  # check starting and computed them if not defined
  lmbd0       <- NULL  
  ndelta      <- ifelse(Rbar == Rmax, Rbar - 1, Rbar)
  if (!is.null(thetat)) {
    if(length(thetat) != (K + nCl)) {
      stop("theta length is inappropriate.")
    }
    t0il      <- c(0, cumsum(rep(nCa, nCa)))
    if(any(sapply(1:nCa, function(x_) sum(thetat[(t0il[x_] + 1):t0il[x_ + 1]])) < 0)) stop("Negative peer effects are not supported in this version.")
    lmbd0     <- head(thetat, nCl)
    thetat    <- c(fcdlambdat(head(thetat, nCl), nCa, lb_sl, ub_sl), tail(thetat, K))
    if(any(Deltat <= 0)) stop("all(delta) > 0 is not true for starting values.")
  } else {
    Gy        <- lapply(1:nCl, function(x_2) unlist(lapply(1:M, function(x_1) Glist[[x_1]][[x_2]] %*% y[(igr[x_1, 1]:igr[x_1,2]) + 1])))
    Gy        <- as.matrix(as.data.frame(Gy)); colnames(Gy) <- head(coln, nCl)
    Xtmp      <- cbind(Gy, X)
    b         <- solve(crossprod(Xtmp), crossprod(Xtmp, y))
    b         <- b/sqrt(sum((y - Xtmp%*%b)^2)/sumn)
    b[1:nCl]  <- sapply(b[1:nCl], function(x_) min(max(x_, 0.01), 0.99)) 
    lmbd0     <- b[1:nCl]
    Deltat    <- rep(1, sum(ndelta))
    thetat    <- c(fcdlambdat(head(b, nCl), nCa, lb_sl, ub_sl), tail(b, K))
  }
  
  idelta      <- matrix(c(0, cumsum(ndelta)[-length(ndelta)], cumsum(ndelta) - 1), ncol = 2); idelta[ndelta == 0,] <- NA
  lDelta      <- log(Deltat)
  thetat      <- c(thetat, lDelta)
  thetat0     <- thetat
  
  # other variables
  cont     <- TRUE
  t        <- 0
  theta    <- NULL
  covout   <- NULL
  REt      <- NULL
  llht     <- 0
  par0     <- NULL
  par1     <- NULL
  like     <- NULL
  var.comp <- NULL
  steps    <- list()
  
  # arguments
  ctr      <- c(list(lb_sl = lb_sl, ub_sl = ub_sl, Gye = GEyt, X = X, lCa = lCa, nCa = nCa, 
                     K = K, n = na, sumn = sumn, idelta = idelta, ndelta = ndelta, Rbar = Rbar, 
                     R = Rmax, y = y, maxy = maxy) , opt.ctr)
  
  # Arguments used in the optimizer
  if (optimizer == "fastlbfgs"){
    ctr    <- c(ctr, list(par = thetat)); 
    optimizer <- "cdnetLBFGS"
    par0   <- "par"
    par1   <- "par"
    like   <- "value"
  } else if (optimizer == "optim") {
    ctr    <- c(ctr, list(fn = foptimREM_NPL, par = thetat))
    par0   <- "par"
    par1   <- "par"
    like   <- "value"
  } else {
    ctr    <- c(ctr, list(f = foptimREM_NPL,  p = thetat))
    par0   <- "p"
    par1   <- "estimate"
    like   <- "minimum"
  }
  
  # NPL algorithm
  ninc.d   <- 0
  dist0    <- Inf
  
  while(cont) {
    # tryCatch({
    Eyt0        <- Eyt + 0    #copy in different memory
    
    # compute theta
    # print(ctr$Gye)
    # print(ctr[[par]])
    REt         <- do.call(get(optimizer), ctr)
    thetat      <- REt[[par1]]
    llht        <- -REt[[like]]
    theta       <- c(fcdlambda(head(thetat, nCl), nCa, lb_sl, ub_sl), thetat[(nCl + 1):(K + nCl)], exp(tail(thetat, sum(ndelta))) + 1e-323)
    
    # compute y
    fL_NPL(ye = Eyt, Gye = GEyt, theta = theta, X = X, G = Glist, lCa = lCa, nCa = nCa, igroup = igr, 
           ngroup = M, K = K, n = na, sumn = sumn, idelta = idelta, ndelta = ndelta, Rbar = Rbar, R = Rmax)
    
    # distance
    var.eps     <- abs(c((ctr[[par0]] - thetat)/thetat, (Eyt0 - Eyt)/Eyt))
    if(all(!is.finite(var.eps))){
      var.eps   <- abs(c(ctr[[par0]] - thetat, Eyt0 - Eyt))
    }
    dist        <- max(var.eps, na.rm = TRUE)
    ninc.d      <- (ninc.d + 1)*(dist > dist0) #counts the successive number of times distance increases
    dist0       <- dist
    cont        <- (dist > npl.tol & t < (npl.maxit - 1))
    t           <- t + 1
    REt$dist    <- dist
    ctr[[par0]] <- thetat
    steps[[t]]  <- REt
    
    if(npl.print){
      cat("---------------\n")
      cat(paste0("Step          : ", t), "\n")
      cat(paste0("Distance      : ", round(dist,3)), "\n")
      cat(paste0("Likelihood    : ", round(llht,3)), "\n")
      cat("Estimate:", "\n")
      print(theta)
    }
    
    if((ninc.d > npl.incdit) | (!is.finite(llht)) | (llht == 1e250)) {
      cont          <- (t < (npl.maxit - 1))
      cat("** Non-convergence ** Redefining theta and computing a new E(y)\n")
      if(t > 1){
        thetat      <- steps[[t - 1]][[par1]]
        if(any(!is.finite(thetat))){
          thetat        <- thetat0
          thetat[1:nCl] <- 1e-7
          theta         <- c(thetat[1:(K + nCl)], exp(tail(thetat, sum(ndelta))) + 1e-323)
          thetat        <- c(fcdlambdat(head(theta, nCl), nCa, lb_sl, ub_sl), theta[(nCl + 1):(K + nCl)], log(tail(theta, sum(ndelta))))
        }
      } else {
        thetat          <- thetat0
        thetat[1:nCl]   <- 1e-7
        theta           <- c(thetat[1:(K + nCl)], exp(tail(thetat, sum(ndelta))) + 1e-323)
        thetat          <- c(fcdlambdat(head(theta, nCl), nCa, lb_sl, ub_sl), theta[(nCl + 1):(K + nCl)], log(tail(theta, sum(ndelta))))
      }
      Eyt           <- runif(sumn, y*0.98, y*1.02)
      GEyt          <- lapply(1:nCl, function(x_2) unlist(lapply(1:M, function(x_1) Glist[[x_1]][[x_2]] %*% Eyt[(igr[x_1, 1]:igr[x_1, 2]) + 1])))
      GEyt          <- as.matrix(as.data.frame(GEyt)); colnames(GEyt) <- head(coln, nCl)
      dist0         <- Inf
      # fnewye(ye = Eyt, Gye = GEyt, theta = theta, X = X, G = Glist, lCa = lCa, nCa = nCa, igroup = igr, 
      #        ngroup = M, K = K, n = na, sumn = sumn, idelta = idelta, ndelta = ndelta, Rbar = Rbar, tol = npl.tol, 
      #        maxit = npl.maxit, R = Rmax)
      ctr[[par0]] <- thetat
    }
  }
  
  # name theta
  namtheta          <- c(coln, unlist(lapply(1:nCa, function(x_){
    if((Rbar[x_] == 1) & (Rmax == 1)) return(c())
    if(Rbar[x_] == Rmax) return(paste0("delta:", x_, ".", 2:(Rbar[x_])))
    if(Rbar[x_] > 1) return(c(paste0("delta:", x_, ".", 2:(Rbar[x_])), paste0("deltabar:", x_)))
    paste0("deltabar:", x_)
  })))
  names(theta)      <- namtheta
  lbda              <- head(theta, nCl)
  Gam               <- theta[(nCl + 1):(nCl + K)]
  Del               <- tail(theta, sum(ndelta))
  
  environment(formula) <- env.formula
  sdata                <- list(
    "formula"       = formula,
    "Glist"         = deparse(substitute(Glist))
  )
  if (!missing(data)) {
    sdata             <- c(sdata, list("data" = deparse(substitute(data))))
  }
  
  if (npl.maxit == t) {
    warning("NPL: The maximum number of iterations has been reached.")
  }
  # covariance and ME
  Gamma2   <- Gam
  colnme   <- coln
  ixWi     <- 0:(K - 1)
  if("(Intercept)" %in% coln){
    Gamma2 <- Gam[tail(coln, K) != "(Intercept)"]
    colnme <- colnme[coln != "(Intercept)"]
    ixWi   <- ixWi[tail(coln, K) != "(Intercept)"]
  }
  
  tmp      <- fcovCDI(theta = theta, Gamma2 = Gamma2, Gye = GEyt, X = X, ixWi = ixWi, G = Glist, lCa = lCa, nCa = nCa, 
                      igroup = igr, ngroup = M, K = K, n = na, sumn = sumn, idelta = idelta, ndelta = ndelta,
                      Rbar = Rbar, R = Rmax, S = npl.S, ccov = cov)
  # Marginal effects
  imeff    <- tmp$imeff; colnames(imeff) <- colnme
  meff     <- list(ameff = apply(imeff, 2, mean), imeff = imeff)
  
  # Covariances
  covt     <- tmp$covt
  covm     <- tmp$covm
  if(cov){
    colnames(covt)  <- namtheta
    rownames(covt)  <- namtheta
    colnames(covm)  <- colnme
    rownames(covm)  <- colnme
  }
  
  AIC                  <- 2*length(theta) - 2*llht
  BIC                  <- length(theta)*log(sumn) - 2*llht
  
  INFO                 <- list("M"          = M,
                               "n"          = nvec,
                               "Kz"         = K,
                               "formula"    = formula,
                               "n.lambda"   = nCl,
                               "bslambda"   = c(lb_sl, ub_sl),
                               "Rbar"       = Rbar,
                               "Rmax"       = Rmax,
                               "group"      = group,
                               "log.like"   = llht, 
                               "npl.iter"   = t,
                               "AIC"        = AIC,
                               "BIC"        = BIC)
  
  out                  <- list("info"       = INFO,
                               "estimate"   = list(parms = theta, lambda =  lbda, Gamma = Gam, delta = Del),
                               "meff"       = meff,
                               "Ey"         = Eyt, 
                               "GEy"        = GEyt,
                               "cov"        = list(parms = covt, ameff = covm),
                               "details"    = steps)
  class(out)           <- "cdnet"
  out
}


#' @title Summary for the estimation of count data models with social interactions under rational expectations
#' @description Summary and print methods for the class `cdnet` as returned by the function \link{cdnet}.
#' @param object an object of class `cdnet`, output of the function \code{\link{cdnet}}.
#' @param x an object of class `summary.cdnet`, output of the function \code{\link{summary.cdnet}}
#' or class `cdnet`, output of the function \code{\link{cdnet}}.
#' @param Glist adjacency matrix. For networks consisting of multiple subnets, `Glist` can be a list of subnets with the `m`-th element being an `ns*ns` adjacency matrix, where `ns` is the number of nodes in the `m`-th subnet.
#' For heterogenous peer effects (e.g., boy-boy, boy-girl friendship effects), the `m`-th element can be a list of many `ns*ns` adjacency matrices corresponding to the different network specifications (see Houndetoungan, 2024).
#' For heterogeneous peer effects in the case of a single large network, `Glist` must be a one-item list. This item must be a list of many specifications of large networks.
#' @param data an optional data frame, list or environment (or object coercible by \link[base]{as.data.frame} to a data frame) containing the variables
#' in the model. If not found in data, the variables are taken from \code{environment(formula)}, typically the environment from which `summary.cdnet` is called.
#' @param S number of simulations to be used to compute integral in the covariance by important sampling.
#' @param ... further arguments passed to or from other methods.
#' @return A list of the same objects in `object`.
#' @export 
"summary.cdnet" <- function(object,
                            Glist,
                            data,
                            S = 1e3L,
                            ...) {
  stopifnot(class(object) == "cdnet")
  out           <- c(object, list("..." = ...))
  if(is.null(object$cov$parms)){
    env.formula <- environment(object$info$formula)
    Rbar        <- object$info$Rbar
    Rmax        <- object$info$Rmax
    Kz          <- object$info$Kz
    nCl         <- object$info$n.lambda
    group       <- object$info$group
    M           <- object$info$M
    nvec        <- object$info$n
    sumn        <- sum(nvec)
    nCa         <- length(Rbar)
    theta       <- object$estimate$parms
    lambda      <- object$estimate$lambda
    Gamma       <- object$estimate$Gamma
    delta       <- object$estimate$delta
    ndelta      <- ifelse(Rbar == Rmax, Rbar - 1, Rbar)
    idelta      <- matrix(c(0, cumsum(ndelta)[-length(ndelta)], cumsum(ndelta) - 1), ncol = 2); idelta[ndelta == 0,] <- NA
    
    # Cost group
    uCa         <- sort(unique(group))
    lCa         <- lapply(uCa, function(x_) which(group == x_) - 1)
    na          <- sapply(lCa, length)
    
    # Network
    stopifnot(inherits(Glist, c("list", "matrix", "array")))
    if (!is.list(Glist)) {
      Glist     <- list(Glist)
    }
    if(inherits(Glist[[1]], "list")){
      stopifnot(all(sapply(Glist, function(x_) inherits(x_, "list"))))
    } else if(inherits(Glist[[1]], c("matrix", "array"))) {
      stopifnot(all(sapply(Glist, function(x_) inherits(x_, c("matrix", "array")))))
      Glist     <- lapply(Glist, function(x_) list(x_))
    }
    if(M != length(Glist)) stop("`Glist` structure does not match. Perhaps another `Glist` structure is used in `cdnet`.")
    if(any(nvec != sapply(Glist, function(x_) nrow(x_[[1]])))) stop("`Glist` structure does not match. Perhaps another `Glist` structure is used in `cdnet`.")
    if(nCl != length(Glist[[1]])) stop("`Glist` structure does not match. Perhaps another `Glist` structure is used in `cdnet`.")
    igr         <- matrix(c(cumsum(c(0, nvec[-M])), cumsum(nvec) - 1), ncol = 2)
    
    # Data
    GEyt        <- object$GEy
    npl.S       <- S
    if (is.null(npl.S)) {
      npl.S     <- 1e3L
    }
    formula     <- object$info$formula
    f.t.data    <- formula.to.data(formula, FALSE, Glist, M, igr, data, theta0 = 0) #because G are directly included in X
    X           <- f.t.data$X
    if(Kz != ncol(X)) stop("`data` structure does not match. Perhaps another `data` is used in `cdnet` or `formula` has been changed.")
    if(sumn != nrow(X)) stop("`data` structure does not match. Perhaps another `data` is used in `cdnet` or `formula` has been changed.")
    if(any(colnames(X) != names(Gamma))) stop("`data` structure does not match. Perhaps another `data` is used in `cdnet` or `formula` has been changed.")
    coln        <- c("lambda", colnames(X))
    if(nCl > 1) {
      coln      <- c(paste0(coln[1], ":", 1:nCl), coln[-1])
    }
    
    namtheta    <- c(coln, unlist(lapply(1:nCa, function(x_){
      if((Rbar[x_] == 1) & (Rmax == 1)) return(c())
      if(Rbar[x_] == Rmax) return(paste0("delta:", x_, ".", 2:(Rbar[x_])))
      if(Rbar[x_] > 1) return(c(paste0("delta:", x_, ".", 2:(Rbar[x_])), paste0("deltabar:", x_)))
      paste0("deltabar:", x_)
    })))
    
    Gamma2      <- Gamma
    ixWi        <- 0:(Kz - 1)
    if("(Intercept)" %in% coln){
      Gamma2    <- Gamma[tail(coln, Kz) != "(Intercept)"]
      ixWi      <- ixWi[tail(coln, Kz) != "(Intercept)"]
    }
    tmp         <- fcovCDI(theta = theta, Gamma2 = Gamma2, Gye = GEyt, X = X, ixWi = ixWi, G = Glist, lCa = lCa, nCa = nCa, 
                           igroup = igr, ngroup = M, K = Kz, n = na, sumn = sumn, idelta = idelta, ndelta = ndelta,
                           Rbar = Rbar, R = Rmax, S = npl.S, ccov = TRUE)
    
    # Covariances
    covt        <- tmp$covt
    covm        <- tmp$covm
    colnames(covt)  <- namtheta
    rownames(covt)  <- namtheta
    colnames(covm)  <- names(out$meff$ameff)
    rownames(covm)  <- names(out$meff$ameff)
    
    out$cov         <- list(parms = covt, ameff = covm)
  }
  class(out) <- "summary.cdnet"
  out
}


#' @rdname summary.cdnet
#' @importFrom stats pchisq
#' @export
"print.summary.cdnet"  <- function(x, ...) {
  stopifnot(class(x) == "summary.cdnet")
  
  M                    <- x$info$M
  n                    <- x$info$n
  iteration            <- x$info$npl.iter
  Rbar                 <- x$info$Rbar
  Rmax                 <- x$info$Rmax
  csRbar               <- c(0, cumsum(Rbar - (Rbar == Rmax)))
  formula              <- x$info$formula
  Kz                   <- x$info$Kz
  AIC                  <- x$info$AIC
  BIC                  <- x$info$BIC
  nCa                  <- length(Rbar)
  nCl                  <- x$info$n.lambda
  
  coef                 <- c(x$estimate$lambda, x$estimate$Gamma)
  K                    <- length(coef)
  meff                 <- x$meff$ameff
  std                  <- sqrt(head(diag(x$cov$parms), K))
  std.meff             <- sqrt(diag(x$cov$ameff))
  delta                <- x$estimate$delta
  
  llh                  <- x$info$log.like
  
  tmp                  <- fcoefficients(coef, std)
  out_print            <- tmp$out_print
  out                  <- tmp$out
  out_print            <- c(list(out_print), x[-(1:7)], list(...))
  
  tmp.meff             <- fcoefficients(meff, std.meff)
  out_print.meff       <- tmp.meff$out_print
  out.meff             <- tmp.meff$out
  out_print.meff       <- c(list(out_print.meff), x[-(1:7)], list(...))
  
  cat("Count data Model with Social Interactions\n\n")
  cat("Call:\n")
  print(formula)
  cat("\nMethod: Nested pseudo-likelihood (NPL) \nIteration: ", iteration, sep = "", "\n\n")
  cat("Network:\n")
  cat("Number of groups         : ", M, sep = "", "\n")
  cat("Sample size              : ", sum(n), sep = "", "\n\n")
  
  cat("Coefficients:\n")
  do.call("print", out_print)
  
  cat("\nMarginal Effects:\n")
  do.call("print", out_print.meff)
  cat("---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n")
  
  cat("Cost function -- Number of groups: ", nCa, "\n", sep = "")
  
  for(k in 1:nCa){
    cat("Group ", k, ", Rbar = ", Rbar[k], "\n", sep = "")
    if(Rbar[k] > 1){
      cat("delta:", delta[(csRbar[k] + 1):(csRbar[k + 1] - 1)], "\n", sep = " ")
      if(Rmax > Rbar[k]){
        cat("deltabar: ", delta[csRbar[k + 1]], "\n\n", sep = "")
      } else {
        cat("deltabar not defined: Rbar = Rmax\n\n", sep = "")
      }
    } else {
      if(Rmax > Rbar[k]){
        cat("delta not defined: Rbar = 1\n", sep = " ")
        cat("deltabar: ", delta[k], "\n\n", sep = "")
      } else{
        cat("delta and deltabar not defined: Rbar = Rmax = 1\n\n", sep = " ")
      }
    }
  }
  
  cat("log pseudo-likelihood: ", llh, sep = "", "\n")
  cat("AIC: ", AIC, " -- BIC: ", BIC, sep = "", "\n")
  
  invisible(x)
}

#' @rdname summary.cdnet
#' @export
"print.cdnet" <- function(x, ...) {
  stopifnot(class(x) == "cdnet")
  print(summary(x, ...))
}

#' @title Counterfactual analyses with count data models and social interactions 
#' @param object an object of class `summary.cdnet`, output of the function \code{\link{summary.cdnet}}
#' or class `cdnet`, output of the function \code{\link{cdnet}}.
#' @param Glist adjacency matrix. For networks consisting of multiple subnets, `Glist` can be a list of subnets with the `m`-th element being an `ns*ns` adjacency matrix, where `ns` is the number of nodes in the `m`-th subnet.
#' For heterogenous peer effects (e.g., boy-boy, boy-girl friendship effects), the `m`-th element can be a list of many `ns*ns` adjacency matrices corresponding to the different network specifications (see Houndetoungan, 2024).
#' For heterogeneous peer effects in the case of a single large network, `Glist` must be a one-item list. This item must be a list of many specifications of large networks.
#' @param data an optional data frame, list or environment (or object coercible by \link[base]{as.data.frame} to a data frame) containing the variables
#' in the model. If not found in data, the variables are taken from \code{environment(formula)}, typically the environment from which `summary.cdnet` is called.
#' @param S number of simulations to be used to compute integral in the covariance by important sampling.
#' @param tol the tolerance value used in the Fixed Point Iteration Method to compute the expectancy of `y`. The process stops if the \eqn{\ell_1}-distance 
#' between two consecutive \eqn{E(y)} is less than `tol`.
#' @param maxit the maximal number of iterations in the Fixed Point Iteration Method.
#' @param group the vector indicating the individual groups (see function \code{\link{cdnet}}). If missing, the former group saved in `object` will be used.
#' @description
#' `simcdpar` computes the average expected outcomes for count data models with social interactions and standard errors using the Delta method. 
#' This function can be used to examine the effects of changes in the network or in the control variables.
#' @seealso \code{\link{simcdnet}}
#' @return A list consisting of:
#'     \item{Ey}{\eqn{E(y)}, the expectation of y.}
#'     \item{GEy}{the average of \eqn{E(y)} friends.}
#'     \item{aEy}{the sampling mean of \eqn{E(y)}.}
#'     \item{se.aEy}{the standard error of the sampling mean of \eqn{E(y)}.}
#' @export 
simcdEy <- function(object,
                    Glist,
                    data,
                    group,
                    tol          = 1e-10,
                    maxit        = 500,
                    S            = 1e3){
  stopifnot(inherits(object, c("cdnet", "summary.cdnet")))
  covparms    <- object$cov$parms
  if(is.null(covparms)) stop("`object` does not include a covariance matrix")
  env.formula <- environment(object$info$formula)
  Rbar        <- object$info$Rbar
  Rmax        <- object$info$Rmax
  Kz          <- object$info$Kz
  nCl         <- object$info$n.lambda
  if(missing(group)){
    group     <- object$info$group
  }
  uCa         <- sort(unique(group))
  nCa         <- length(uCa)
  lCa         <- lapply(uCa, function(x_) which(group == x_) - 1)
  na          <- sapply(lCa, length)
  if(nCa != length(Rbar)) stop("length(Rbar) is not equal to the number of groups.")
  if((nCa^2) != nCl) stop("The number of network specifications does not match the number of groups.")
  M           <- object$info$M
  nvec        <- object$info$n
  sumn        <- sum(nvec)
  theta       <- object$estimate$parms
  lambda      <- object$estimate$lambda
  Gamma       <- object$estimate$Gamma
  delta       <- object$estimate$delta
  ndelta      <- ifelse(Rbar == Rmax, Rbar - 1, Rbar)
  idelta      <- matrix(c(0, cumsum(ndelta)[-length(ndelta)], cumsum(ndelta) - 1), ncol = 2); idelta[ndelta == 0,] <- NA
  delta       <- c(fdelta(deltat = delta, lambda = lambda, idelta = idelta, ndelta = ndelta, nCa = nCa))
  

  
  # Network
  stopifnot(inherits(Glist, c("list", "matrix", "array")))
  if (!is.list(Glist)) {
    Glist     <- list(Glist)
  }
  if(inherits(Glist[[1]], "list")){
    stopifnot(all(sapply(Glist, function(x_) inherits(x_, "list"))))
  } else if(inherits(Glist[[1]], c("matrix", "array"))) {
    stopifnot(all(sapply(Glist, function(x_) inherits(x_, c("matrix", "array")))))
    Glist     <- lapply(Glist, function(x_) list(x_))
  }
  if(M != length(Glist)) stop("`Glist` structure does not match. Perhaps another `Glist` structure is used in `cdnet`.")
  if(any(nvec != sapply(Glist, function(x_) nrow(x_[[1]])))) stop("`Glist` structure does not match. Perhaps another `Glist` structure is used in `cdnet`.")
  if(nCl != length(Glist[[1]])) stop("`Glist` structure does not match. Perhaps another `Glist` structure is used in `cdnet`.")
  igr         <- matrix(c(cumsum(c(0, nvec[-M])), cumsum(nvec) - 1), ncol = 2)
  
  # Data
  npl.S       <- S
  if (is.null(npl.S)) {
    npl.S     <- 1e3L
  }
  formula     <- object$info$formula
  f.t.data    <- formula.to.data(formula, FALSE, Glist, M, igr, data, theta0 = 0) #because G are directly included in X
  X           <- f.t.data$X
  if(Kz != ncol(X)) stop("`data` structure does not match. Perhaps another `data` is used in `cdnet` or `formula` has been changed.")
  if(sumn != nrow(X)) stop("`data` structure does not match. Perhaps another `data` is used in `cdnet` or `formula` has been changed.")
  if(any(colnames(X) != names(Gamma))) stop("`data` structure does not match. Perhaps another `data` is used in `cdnet` or `formula` has been changed.")
  
  # E(y) and G*E(y)
  Ey          <- rep(0, sumn)
  GEy         <- matrix(0, sumn, nCl)
  xb          <- c(X %*% Gamma)
  t           <- fye(ye = Ey, Gye = GEy, G = Glist, lCa = lCa, nCa = nCa, igroup = igr, ngroup = M, 
                   psi = xb, lambda = lambda, delta = delta, idelta = idelta, n = na, sumn = sumn,
                   Rbar = Rbar, R = Rmax, tol = tol, maxit = maxit)
  if(t >= maxit) warning("Point-fixed: The maximum number of iterations has been reached.")
  
  dEy         <- fcddEy(theta = theta, Gye = GEy, X = X, psi = xb, G = Glist, lCa = lCa, nCa = nCa, igroup = igr, 
                        ngroup = M, K = Kz, n = na, sumn = sumn, idelta = idelta, ndelta = ndelta, Rbar = Rbar,
                        R = Rmax, S = npl.S)
  
  if(nCl == 1){
    GEy       <- c(GEy)
  }
  Ey          <- c(Ey)
  aEy         <- mean(Ey)
  
  # if(fixed.lambda){
  #   dEy       <- dEy[, -(1:nCl), drop = FALSE]
  #   covparms  <- covparms[-(1:nCl), -(1:nCl), drop = FALSE]
  # }
  adEy        <- apply(dEy, 2, mean)
  se.aEy      <- sqrt(c(t(adEy) %*% covparms %*% adEy))
  
  out         <- list(Ey     = Ey,
                      GEy    = GEy,
                      aEy    = aEy,
                      se.aEy = se.aEy)
  class(out)  <- "simcdEy"
  out
}

#' @title Printing the average expected outcomes for count data models with social interactions
#' @description Summary and print methods for the class `simcdEy` as returned by the function \link{simcdEy}.
#' @param object an object of class `simcdEy`, output of the function \code{\link{simcdEy}}.
#' @param x an object of class `summary.simcdEy`, output of the function \code{\link{summary.simcdEy}}
#' or class `simcdEy`, output of the function \code{\link{simcdEy}}.
#' @param ... further arguments passed to or from other methods.
#' @return A list of the same objects in `object`.
#' @export 
print.simcdEy <- function(x, ...){
  stopifnot(class(x) == "simcdEy")
  cat("Average expected outcome\n\n")
  tp           <- cbind("Estimate" = x$aEy, "Std. Error" = x$se.aEy)
  rownames(tp) <- ""
  print(tp, ...)
  invisible(x)
}

#' @rdname print.simcdEy
#' @export
"summary.simcdEy"  <- function(object, ...) return(object)

#' @rdname print.simcdEy
#' @export
"print.summary.simcdEy"  <- function(x, ...) print.simcdEy(x, ...)

Try the CDatanet package in your browser

Any scripts or data that you put into this service are public.

CDatanet documentation built on June 22, 2024, 11:14 a.m.