net_dep: Implement a number of modifications to the linear-in-means...
In econet: Estimation of Parameter-Dependent Network Centrality Measures

net_dep

R Documentation

Implement a number of modifications to the linear-in-means model to obtain different weighted versions of Katz-Bonacich centrality.

Description

Implement a number of modifications to the linear-in-means model to obtain different weighted versions of Katz-Bonacich centrality.

Usage

net_dep(
  formula = formula(),
  data = list(),
  G = list(),
  model = c("model_A", "model_B"),
  estimation = c("NLLS", "MLE"),
  hypothesis = c("lim", "het", "het_l", "het_r", "par", "par_split_with",
    "par_split_btw", "par_split_with_btw"),
  endogeneity = FALSE,
  correction = NULL,
  first_step = NULL,
  z = NULL,
  formula_first_step = NULL,
  exclusion_restriction = NULL,
  start.val = NULL,
  to_weight = NULL,
  time_fixed_effect = NULL,
  ind_fixed_effect = NULL,
  mle_controls = NULL,
  kappa = NULL,
  delta = NULL
)

Arguments

`formula`	an object of class `formula`: a symbolic description of the model to be fitted. The constant (i.e. intercept) and the autogressive parameter needs not to be specified.
`data`	an object of class `data.frame` containing the variables in the model. If data are longitudinal, observations must be ordered by time period and then by individual.
`G`	an object of class `Matrix` representing the social network. Row and column names must be specified and match the order of the observations in `data`.
`model`	string. One of `c("model_A","model_B")`. See details.
`estimation`	string. One of `c("NLLS","MLE")`. They are used to implement respectively a non-linear least square and a Maximum Likelihood estimator.
`hypothesis`	string. One of `c("lim","het", "het_l", "het_r", "par", "par_split_with", "par_split_btw", "par_split_with_btw")`. See details.
`endogeneity`	logical. Default is `FALSE`. If `TRUE`, `net_dep` implements a two-step correction procedure to control for the endogeneity of the network.
`correction`	Default is `NULL`. If `endogeneity = TRUE`, it is required to specify if the main regression should use an instrumental variable ("iv") or Heckman ("heckman") approach.
`first_step`	Default is NULL. If `endogeneity = TRUE`, it requires to specify one of `c("standard","fe", "shortest", "coauthors", "degree")`. See details.
`z`	numeric vector. It specifies the source of heterogeneity for peer effects when `hypothesis` is equal to `"het"`, `"het_l"`, or `"het_r"`. Alternatively, it specifies the groups in which the network should be partitioned when `hypothesis` is equal to `"par"`, `"par_split_with"`, `"par_split_btw"`, or `"par_split_with_btw")`. See details.
`formula_first_step`	an optional object of class `formula`. If provided, it is used to implement the first step of the estimation when `endogeneity = TRUE`. The name of the dependent variable must be the same used in the field `formula`.
`exclusion_restriction`	an object of class `Matrix` representing the exogenous matrix used to instrument the endogenous social network, if `endogeneity = TRUE`. Row and column names must be specified and match the order of the observations in `data`.
`start.val`	an optional list containing the starting values for the estimations. Object names must match the names provided in `formula`. It is also required to specify the value of both the constant and the decay parameter(s). See details.
`to_weight`	an optional vector of weights to be used in the fitting process to indicate that different observations have different variances. Should be `NULL` or a numeric vector. If non-`NULL`, it can be used to fit a weighted non-linear least squares (`estimation = "NLLS"`).
`time_fixed_effect`	an optional string. It indicates the name of the time index used in formula. It is used for models with longitudinal data.
`ind_fixed_effect`	an optional string. Default is `NULL`. It indicates the name of the individual index contained in the data. If provided, individual fixed effects are automatically added to the `formula` of the main equation. If `endogeneity = TRUE`, the field `first_step` is overridden, and automatically set equal to `"fe"`. It is used for models with longitudinal data. Observe that inclusion of individual fixed effects is in its beta version. When `estimation == "MLE"`, `net_dep` is guaranteed to work only if `hypothesis == "lim"`.
`mle_controls`	a list allowing the user to set upper and lower bounds for control variables in MLE estimation and the variance for the ML estimator. See details.
`kappa`	a normalization level with default equals 1 used in MLE estimation.
`delta`	Default is `NULL`. To be used when `estimation = "NLLS"`. It has to be a number between zero (included) and one (excluded). When used, `econet` performs a constrained NLLS estimation. In this case, the estimated peer effect parameter, taken in absolute value, is forced to be between the spectral radius of `G` and its opposite value. Specifically, `delta` is a penalizing factor, decreasing the goodness of fit of the NLLS estimation, when the peer effect parameter approaches one of the two bounds. Observe that very high values of `delta` may cause NLLS estimation not to converge.

Details

Agent's parameter-dependent centrality is obtained as a function of

the agent's characteristics and the performance of its socially connected peers, as in Battaglini, Leone Sciabolazza, Patacchini (2020), if model = "model_B";
the performance of its socially connected peers, as in Battaglini, Patacchini (2018), if model = "model_A".

Peer effects are assumed to be homogenous if hypothesis = "lim". They are assumed to be heterogenous by setting:

hypothesis = "het", when peers' performance is susceptible to agent's characteristics and model = "model_A".
hypothesis = "het_l", when peers' performance is susceptible to agent's characteristics and model = "model_B".
hypothesis = "het_r", when agent's performance is susceptible to peers' characteristics and model = "model_B".
hypothesis = "par", when model = "model_B", if the network is formed by interactions between and within two different groups.
hypothesis = "par_split_with", when model = "model_B", if the network is formed by interactions between and within two different groups, and interactions within each group are different from the other.
hypothesis = "par_split_btw", when model = "model_B", if the network is formed by interactions between and within two different groups, and interactions between groups are different according to their direction.
hypothesis = "par_split_with_btw", when model = "model_B", if the network is formed by interactions between and within two different groups, interactions within each group are different from the other, and interactions between groups are different according to their direction.

When hypothesis is equal to "het", "het_l", or "het_r", the argument z is used to specify the source of heterogeneity: i.e. the attribute affecting the ability of the agent to influence or be influenced by peers. When hypothesis is equal to "par", "par_split_with", "par_split_btw", or "par_split_with_btw" the argument "z" is used to partition observations in two groups: e.g. the generic element i of vector z takes the value 1 if agent i is member of the first group, and it takes 2 otherwise.

If endogeneity = TRUE, a two-step estimation is implemented to control for network endogeneity. The argument first_step is used to control for the specification of the first-step model, e.g.:

first_step = "standard" is used when agents' connection are predicted by the differences in their characteristics (i.e. those on the right hand side of formula), and an exclusion_restriction: i.e., their connections in a different network.
first_step = "fe" adds to the standard model, individual fixed effects, as in Graham (2017).
first_step = "shortest" adds to the standard model, the shortest distance between i and j, excluding the link between i and j itself, as in Fafchamps et al (2010).
first_step = "coauthor" adds to the standard model, the number of shared connections between i and j, as in Graham (2015).
first_step = "degree" adds to the standard model, the difference in the degree centrality of i and j.

The argument start.val is used to specify starting estimates. This can be done with a named list. If a factor is present, a value for each treatment contrast must be provided. Labels of treatment contrasts must be assigned following R model design standards: e.g., a number is appended to contrast names as in contrasts().
The starting value referring to the intercept (constant) must be labelled as "alpha". The label(s) for decay parameter(s) must be:

"phi", if hypothesis="lim" or hypothesis="het"
"theta_0","theta_1", if hypothesis="het_l"
"eta_0","eta_1", if hypothesis="het_r"
"phi_within","phi_between", if hypothesis="par"
"phi_within_0","phi_within_1","phi_between", if hypothesis="par_split_with"
"phi_within","phi_between_0","phi_between_1", if hypothesis="par_split_btw"
"phi_within_0","phi_within_1","phi_between_0","phi_between_1", if hypothesis="par_split_with_btw"

The interaction term when hypothesis="het" must be labelled "gamma". The label to be used for unobservables when endogeneity = TRUE is "unobservables". When estimation = "MLE", it is required to set also the starting value for the variance of the ML estimator. This should be labelled as "sigma".
The argument mle_controls takes a list of two objects. The first is a named numeric vector used to set upper and lower bounds for control variables. The second object is a vector used to set upper (first value) and lower (second value) bounds for the variance of the Maximum Likelihood estimator.
Names in mle_controls must be equal to those used in start.val. For additional details, see the vignette (doi:10.18637/jss.v102.i08).

Value

A list of three objects: i) Estimates of the main regression; ii) The vector of agents' parameter-dependent centrality; iii) Estimates of the first-step regression (if endogeneity = TRUE)

References

Battaglini M., E. Patacchini (2018), "Influencing Connected Legislators," Journal of Political Economy, 126(6): 2277-2322.
Battaglini M., V. Leone Sciabolazza, E. Patacchini (2020), "Effectiveness of Connected Legislators," American Journal of Political Science, forthcoming.
Battaglini M., V. Leone Sciabolazza, E. Patacchini, S. Peng (2020), "Econet: An R package for the Estimation of parameter-dependent centrality measures", Mimeo.
Fafchamps, M., M. J. Leij and S. Goyal (2010), “Matching and network effects,” Journal of the European Economic Association, 8(1): 203-231.
Graham B. (2015), “Methods of identification in social networks,” Annual Review of Economics, 7, 465 - 485.
Graham B. (2017), “An econometric model of network formation with degree heterogeneity,” Econometrica 85 (4), 1033 - 1063.

Examples


# Model A

# Load data
data("a_db_alumni")
data("a_G_alumni_111")
db_model_A <- a_db_alumni
G_model_A <- a_G_alumni_111
are_factors <- c("party", "gender", "nchair", "isolate")
db_model_A[are_factors] <- lapply(db_model_A[are_factors] ,factor)
db_model_A$PAC <- db_model_A$PAC/1e+06

# Specify formula
f_model_A <- formula("PAC ~ gender + party + nchair + isolate")

# Specify starting values
starting <- c(alpha = 0.47325,
              beta_gender1 = -0.26991,
              beta_party1 = 0.55883,
              beta_nchair1 = -0.17409,
              beta_isolate1 = 0.18813,
              phi = 0.21440)

# Fit Linear-in-means model
lim_model_A <- net_dep(formula = f_model_A, data = db_model_A,
                       G = G_model_A, model = "model_A", estimation = "NLLS",
                       hypothesis = "lim", start.val = starting)

summary(lim_model_A)
lim_model_A$centrality

# Test Heterogeneity

# Heterogeneous factor
z <- as.numeric(as.character(db_model_A$gender))

# Specify formula
f_het_model_A <- formula("PAC ~ party + nchair + isolate")

# Specify starting values
starting <- c(alpha = 0.44835,
              beta_party1 = 0.56004,
              beta_nchair1 = -0.16349,
              beta_isolate1 = 0.21011,
              beta_z = -0.26015,
              phi = 0.34212,
              gamma = -0.49960)

# Fit model
het_model_A <- net_dep(formula = f_het_model_A, data = db_model_A,
                       G = G_model_A, model = "model_A", estimation = "NLLS",
                       hypothesis = "het", z = z, start.val = starting)

summary(het_model_A)
het_model_A$centrality

# Model B

# Load data
data("db_cosponsor")
data("G_alumni_111")
db_model_B <- db_cosponsor
G_model_B <- G_cosponsor_111
G_exclusion_restriction <- G_alumni_111
are_factors <- c("party", "gender", "nchair")
db_model_B[are_factors] <- lapply(db_model_B[are_factors], factor)

# Specify formula
f_model_B <- formula("les ~ gender + party + nchair")

# Specify starting values
starting <- c(alpha = 0.23952,
              beta_gender1 = -0.22024,
              beta_party1 = 0.42947,
              beta_nchair1 = 3.09615,
              phi = 0.40038,
              unobservables = 0.07714)

# Fit Linear-in-means model
lim_model_B <- net_dep(formula = f_model_B, data = db_model_B,
                       G = G_model_B, model = "model_B", estimation = "NLLS",
                       hypothesis = "lim", endogeneity = TRUE,
                       correction = "heckman", first_step = "standard",
                       exclusion_restriction = G_exclusion_restriction,
                       start.val = starting)

summary(lim_model_B)
lim_model_B$centrality
summary(lim_model_B, print = "first.step")

# Test Heterogeneity

# Heterogeneous factor (node -level)
z <- as.numeric(as.character(db_model_B$gender))

# Specify formula
f_het_model_B <- formula("les ~ party + nchair")

# Specify starting values
starting <- c(alpha = 0.23952,
              beta_party1 = 0.42947,
              beta_nchair1 = 3.09615,
              beta_z = -0.12749,
              theta_0 = 0.42588,
              theta_1 = 0.08007)

# Fit model
het_model_B_l <- net_dep(formula = f_het_model_B,
                         data = db_model_B,
                         G = G_model_B, model = "model_B", estimation = "NLLS",
                         hypothesis = "het_l", z = z, start.val = starting)

# Store and print results
summary(het_model_B_l)
het_model_B_l$centrality

# Specify starting values
starting <- c(alpha = 0.04717,
              beta_party1 = 0.51713,
              beta_nchair1 = 3.12683,
              beta_z = 0.01975,
              eta_0 = 1.02789,
              eta_1 = 2.71825)

# Fit model
het_model_B_r <- net_dep(formula = f_het_model_B,
                         data = db_model_B,
                         G = G_model_B, model = "model_B", estimation = "NLLS",
                         hypothesis = "het_r", z = z, start.val = starting)

# Store and print results
summary(het_model_B_r)
het_model_B_r$centrality

# Heterogeneous factor (edge -level)
z <- as.numeric(as.character(db_model_B$party))

# Specify starting values
starting <- c(alpha = 0.242486,
              beta_gender1 = -0.229895,
              beta_party1 = 0.42848,
              beta_nchair1 = 3.0959,
              phi_within  = 0.396371,
              phi_between = 0.414135)

# Fit model
party_model_B <- net_dep(formula = f_model_B, data = db_model_B,
                         G = G_model_B, model = "model_B",
                         estimation = "NLLS", hypothesis = "par",
                         z = z, start.val = starting)

# Store and print results
summary(party_model_B)
party_model_B$centrality

# WARNING, This toy example is provided only for runtime execution.
# Please refer to previous examples for sensible calculations.
data("db_alumni_test")
data("G_model_A_test")
db_model_A <- db_alumni_test
G_model_A <- G_model_A_test
f_model_A <- formula("les ~ dw")
lim_model_A_test <- net_dep(formula = f_model_A, data = db_model_A,
                       G = G_model_A, model = "model_A", estimation = "NLLS",
                       hypothesis = "lim", start.val = c(alpha = 0.09030594,
                                                         beta_dw = 1.21401940,
                                                         phi = 1.47140647))
summary(lim_model_A_test)

econet documentation built on Sept. 11, 2024, 6:46 p.m.