simsar: Simulating Data from Linear-in-Mean Models with Social...
In CDatanet: Econometrics of Network Data

simsar

R Documentation

Simulating Data from Linear-in-Mean Models with Social Interactions

Description

simsar simulates continuous variables under linear-in-mean models with social interactions, following the specifications described in Lee (2004) and Lee et al. (2010). The model incorporates peer interactions, where the value of an individual’s outcome depends not only on their own characteristics but also on the average characteristics of their peers in the network.

Usage

simsar(formula, Glist, theta, cinfo = TRUE, data)

Arguments

`formula`	A symbolic description of the model, passed as a class object of type formula. The formula must specify the endogenous variable and control variables, for example: `y ~ x1 + x2 + gx1 + gx2`, where `y` is the endogenous vector, and `x1`, `x2`, `gx1`, and `gx2` are the control variables, which may include contextual variables (peer averages). Peer averages can be computed using the function `peer.avg`.
`Glist`	A list of network adjacency matrices representing multiple subnets. The `m`-th element in the list should be an `ns * ns` matrix, where `ns` is the number of nodes in the `m`-th subnet.
`theta`	A numeric vector defining the true values of the model parameters `\theta = (\lambda, \Gamma, \sigma)`. These parameters are used to define the model specification in the details section.
`cinfo`	A Boolean flag indicating whether the information is complete (`cinfo = TRUE`) or incomplete (`cinfo = FALSE`). If information is incomplete, the model operates under rational expectations.
`data`	An optional data frame, list, or environment (or an object coercible by as.data.frame to a data frame) containing the variables in the model. If not provided, the variables are taken from the environment of the function call.

Details

In the complete information model, the outcome y_i for individual i is defined as:

y_i = \lambda \bar{y}_i + \mathbf{z}_i'\Gamma + \epsilon_i,

where \bar{y}_i represents the average outcome y among individual i's peers, \mathbf{z}_i is a vector of control variables, and \epsilon_i \sim N(0, \sigma^2) is the error term. In the case of incomplete information models with rational expectations, the outcome y_i is defined as:

y_i = \lambda E(\bar{y}_i) + \mathbf{z}_i'\Gamma + \epsilon_i,

where E(\bar{y}_i) is the expected average outcome of i's peers, as perceived by individual i.

Value

A list containing the following elements:

`y`	the observed count data.
`Gy`	the average of y among friends.

References

Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica, 72(6), 1899-1925, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/j.1468-0262.2004.00558.x")}.

Lee, L. F., Liu, X., & Lin, X. (2010). Specification and estimation of social interaction models with network structures. The Econometrics Journal, 13(2), 145-176, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/j.1368-423X.2010.00310.x")}

Examples


# Groups' size
set.seed(123)
M      <- 5 # Number of sub-groups
nvec   <- round(runif(M, 100, 1000))
n      <- sum(nvec)

# Parameters
lambda <- 0.4
Gamma  <- c(2, -1.9, 0.8, 1.5, -1.2)
sigma  <- 1.5
theta  <- c(lambda, Gamma, sigma)

# X
X      <- cbind(rnorm(n, 1, 1), rexp(n, 0.4))

# Network
G      <- list()

for (m in 1:M) {
  nm           <- nvec[m]
  Gm           <- matrix(0, nm, nm)
  max_d        <- 30
  for (i in 1:nm) {
    tmp        <- sample((1:nm)[-i], sample(0:max_d, 1))
    Gm[i, tmp] <- 1
  }
  rs           <- rowSums(Gm); rs[rs == 0] <- 1
  Gm           <- Gm/rs
  G[[m]]       <- Gm
}

# data
data   <- data.frame(X, peer.avg(G, cbind(x1 = X[,1], x2 =  X[,2])))
colnames(data) <- c("x1", "x2", "gx1", "gx2")

ytmp    <- simsar(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, 
                  theta = theta, data = data) 
y       <- ytmp$y

CDatanet documentation built on April 3, 2025, 11:07 p.m.