net_dep: Implement a number of modifications to the linear-in-means...

View source: R/net_dep.R

net_depR Documentation

Implement a number of modifications to the linear-in-means model to obtain different weighted versions of Katz-Bonacich centrality.

Description

Implement a number of modifications to the linear-in-means model to obtain different weighted versions of Katz-Bonacich centrality.

Usage

net_dep(
  formula = formula(),
  data = list(),
  G = list(),
  model = c("model_A", "model_B"),
  estimation = c("NLLS", "MLE"),
  hypothesis = c("lim", "het", "het_l", "het_r", "par", "par_split_with",
    "par_split_btw", "par_split_with_btw"),
  endogeneity = FALSE,
  correction = NULL,
  first_step = NULL,
  z = NULL,
  formula_first_step = NULL,
  exclusion_restriction = NULL,
  start.val = NULL,
  to_weight = NULL,
  time_fixed_effect = NULL,
  ind_fixed_effect = NULL,
  mle_controls = NULL,
  kappa = NULL,
  delta = NULL
)

Arguments

formula

an object of class formula: a symbolic description of the model to be fitted. The constant (i.e. intercept) and the autogressive parameter needs not to be specified.

data

an object of class data.frame containing the variables in the model. If data are longitudinal, observations must be ordered by time period and then by individual.

G

an object of class Matrix representing the social network. Row and column names must be specified and match the order of the observations in data.

model

string. One of c("model_A","model_B"). See details.

estimation

string. One of c("NLLS","MLE"). They are used to implement respectively a non-linear least square and a Maximum Likelihood estimator.

hypothesis

string. One of c("lim","het", "het_l", "het_r", "par", "par_split_with", "par_split_btw", "par_split_with_btw"). See details.

endogeneity

logical. Default is FALSE. If TRUE, net_dep implements a two-step correction procedure to control for the endogeneity of the network.

correction

Default is NULL. If endogeneity = TRUE, it is required to specify if the main regression should use an instrumental variable ("iv") or Heckman ("heckman") approach.

first_step

Default is NULL. If endogeneity = TRUE, it requires to specify one of c("standard","fe", "shortest", "coauthors", "degree"). See details.

z

numeric vector. It specifies the source of heterogeneity for peer effects when hypothesis is equal to "het", "het_l", or "het_r". Alternatively, it specifies the groups in which the network should be partitioned when hypothesis is equal to "par", "par_split_with", "par_split_btw", or "par_split_with_btw"). See details.

formula_first_step

an optional object of class formula. If provided, it is used to implement the first step of the estimation when endogeneity = TRUE. The name of the dependent variable must be the same used in the field formula.

exclusion_restriction

an object of class Matrix representing the exogenous matrix used to instrument the endogenous social network, if endogeneity = TRUE. Row and column names must be specified and match the order of the observations in data.

start.val

an optional list containing the starting values for the estimations. Object names must match the names provided in formula. It is also required to specify the value of both the constant and the decay parameter(s). See details.

to_weight

an optional vector of weights to be used in the fitting process to indicate that different observations have different variances. Should be NULL or a numeric vector. If non-NULL, it can be used to fit a weighted non-linear least squares (estimation = "NLLS").

time_fixed_effect

an optional string. It indicates the name of the time index used in formula. It is used for models with longitudinal data.

ind_fixed_effect

an optional string. Default is NULL. It indicates the name of the individual index contained in the data. If provided, individual fixed effects are automatically added to the formula of the main equation. If endogeneity = TRUE, the field first_step is overridden, and automatically set equal to "fe". It is used for models with longitudinal data. Observe that inclusion of individual fixed effects is in its beta version. When estimation == "MLE", net_dep is guaranteed to work only if hypothesis == "lim".

mle_controls

a list allowing the user to set upper and lower bounds for control variables in MLE estimation and the variance for the ML estimator. See details.

kappa

a normalization level with default equals 1 used in MLE estimation.

delta

Default is NULL. To be used when estimation = "NLLS". It has to be a number between zero (included) and one (excluded). When used, econet performs a constrained NLLS estimation. In this case, the estimated peer effect parameter, taken in absolute value, is forced to be between the spectral radius of G and its opposite value. Specifically, delta is a penalizing factor, decreasing the goodness of fit of the NLLS estimation, when the peer effect parameter approaches one of the two bounds. Observe that very high values of delta may cause NLLS estimation not to converge.

Details

Agent's parameter-dependent centrality is obtained as a function of

  • the agent's characteristics and the performance of its socially connected peers, as in Battaglini, Leone Sciabolazza, Patacchini (2020), if model = "model_B";

  • the performance of its socially connected peers, as in Battaglini, Patacchini (2018), if model = "model_A".

Peer effects are assumed to be homogenous if hypothesis = "lim". They are assumed to be heterogenous by setting:

  • hypothesis = "het", when peers' performance is susceptible to agent's characteristics and model = "model_A".

  • hypothesis = "het_l", when peers' performance is susceptible to agent's characteristics and model = "model_B".

  • hypothesis = "het_r", when agent's performance is susceptible to peers' characteristics and model = "model_B".

  • hypothesis = "par", when model = "model_B", if the network is formed by interactions between and within two different groups.

  • hypothesis = "par_split_with", when model = "model_B", if the network is formed by interactions between and within two different groups, and interactions within each group are different from the other.

  • hypothesis = "par_split_btw", when model = "model_B", if the network is formed by interactions between and within two different groups, and interactions between groups are different according to their direction.

  • hypothesis = "par_split_with_btw", when model = "model_B", if the network is formed by interactions between and within two different groups, interactions within each group are different from the other, and interactions between groups are different according to their direction.

When hypothesis is equal to "het", "het_l", or "het_r", the argument z is used to specify the source of heterogeneity: i.e. the attribute affecting the ability of the agent to influence or be influenced by peers. When hypothesis is equal to "par", "par_split_with", "par_split_btw", or "par_split_with_btw" the argument "z" is used to partition observations in two groups: e.g. the generic element i of vector z takes the value 1 if agent i is member of the first group, and it takes 2 otherwise.

If endogeneity = TRUE, a two-step estimation is implemented to control for network endogeneity. The argument first_step is used to control for the specification of the first-step model, e.g.:

  • first_step = "standard" is used when agents' connection are predicted by the differences in their characteristics (i.e. those on the right hand side of formula), and an exclusion_restriction: i.e., their connections in a different network.

  • first_step = "fe" adds to the standard model, individual fixed effects, as in Graham (2017).

  • first_step = "shortest" adds to the standard model, the shortest distance between i and j, excluding the link between i and j itself, as in Fafchamps et al (2010).

  • first_step = "coauthor" adds to the standard model, the number of shared connections between i and j, as in Graham (2015).

  • first_step = "degree" adds to the standard model, the difference in the degree centrality of i and j.

The argument start.val is used to specify starting estimates. This can be done with a named list. If a factor is present, a value for each treatment contrast must be provided. Labels of treatment contrasts must be assigned following R model design standards: e.g., a number is appended to contrast names as in contrasts().
The starting value referring to the intercept (constant) must be labelled as "alpha". The label(s) for decay parameter(s) must be:

  • "phi", if hypothesis="lim" or hypothesis="het"

  • "theta_0","theta_1", if hypothesis="het_l"

  • "eta_0","eta_1", if hypothesis="het_r"

  • "phi_within","phi_between", if hypothesis="par"

  • "phi_within_0","phi_within_1","phi_between", if hypothesis="par_split_with"

  • "phi_within","phi_between_0","phi_between_1", if hypothesis="par_split_btw"

  • "phi_within_0","phi_within_1","phi_between_0","phi_between_1", if hypothesis="par_split_with_btw"

The interaction term when hypothesis="het" must be labelled "gamma". The label to be used for unobservables when endogeneity = TRUE is "unobservables". When estimation = "MLE", it is required to set also the starting value for the variance of the ML estimator. This should be labelled as "sigma".
The argument mle_controls takes a list of two objects. The first is a named numeric vector used to set upper and lower bounds for control variables. The second object is a vector used to set upper (first value) and lower (second value) bounds for the variance of the Maximum Likelihood estimator.
Names in mle_controls must be equal to those used in start.val. For additional details, see the vignette (doi:10.18637/jss.v102.i08).

Value

A list of three objects: i) Estimates of the main regression; ii) The vector of agents' parameter-dependent centrality; iii) Estimates of the first-step regression (if endogeneity = TRUE)

References

Battaglini M., E. Patacchini (2018), "Influencing Connected Legislators," Journal of Political Economy, 126(6): 2277-2322.
Battaglini M., V. Leone Sciabolazza, E. Patacchini (2020), "Effectiveness of Connected Legislators," American Journal of Political Science, forthcoming.
Battaglini M., V. Leone Sciabolazza, E. Patacchini, S. Peng (2020), "Econet: An R package for the Estimation of parameter-dependent centrality measures", Mimeo.
Fafchamps, M., M. J. Leij and S. Goyal (2010), “Matching and network effects,” Journal of the European Economic Association, 8(1): 203-231.
Graham B. (2015), “Methods of identification in social networks,” Annual Review of Economics, 7, 465 - 485.
Graham B. (2017), “An econometric model of network formation with degree heterogeneity,” Econometrica 85 (4), 1033 - 1063.

Examples


# Model A

# Load data
data("a_db_alumni")
data("a_G_alumni_111")
db_model_A <- a_db_alumni
G_model_A <- a_G_alumni_111
are_factors <- c("party", "gender", "nchair", "isolate")
db_model_A[are_factors] <- lapply(db_model_A[are_factors] ,factor)
db_model_A$PAC <- db_model_A$PAC/1e+06

# Specify formula
f_model_A <- formula("PAC ~ gender + party + nchair + isolate")

# Specify starting values
starting <- c(alpha = 0.47325,
              beta_gender1 = -0.26991,
              beta_party1 = 0.55883,
              beta_nchair1 = -0.17409,
              beta_isolate1 = 0.18813,
              phi = 0.21440)

# Fit Linear-in-means model
lim_model_A <- net_dep(formula = f_model_A, data = db_model_A,
                       G = G_model_A, model = "model_A", estimation = "NLLS",
                       hypothesis = "lim", start.val = starting)

summary(lim_model_A)
lim_model_A$centrality

# Test Heterogeneity

# Heterogeneous factor
z <- as.numeric(as.character(db_model_A$gender))

# Specify formula
f_het_model_A <- formula("PAC ~ party + nchair + isolate")

# Specify starting values
starting <- c(alpha = 0.44835,
              beta_party1 = 0.56004,
              beta_nchair1 = -0.16349,
              beta_isolate1 = 0.21011,
              beta_z = -0.26015,
              phi = 0.34212,
              gamma = -0.49960)

# Fit model
het_model_A <- net_dep(formula = f_het_model_A, data = db_model_A,
                       G = G_model_A, model = "model_A", estimation = "NLLS",
                       hypothesis = "het", z = z, start.val = starting)

summary(het_model_A)
het_model_A$centrality

# Model B

# Load data
data("db_cosponsor")
data("G_alumni_111")
db_model_B <- db_cosponsor
G_model_B <- G_cosponsor_111
G_exclusion_restriction <- G_alumni_111
are_factors <- c("party", "gender", "nchair")
db_model_B[are_factors] <- lapply(db_model_B[are_factors], factor)

# Specify formula
f_model_B <- formula("les ~ gender + party + nchair")

# Specify starting values
starting <- c(alpha = 0.23952,
              beta_gender1 = -0.22024,
              beta_party1 = 0.42947,
              beta_nchair1 = 3.09615,
              phi = 0.40038,
              unobservables = 0.07714)

# Fit Linear-in-means model
lim_model_B <- net_dep(formula = f_model_B, data = db_model_B,
                       G = G_model_B, model = "model_B", estimation = "NLLS",
                       hypothesis = "lim", endogeneity = TRUE,
                       correction = "heckman", first_step = "standard",
                       exclusion_restriction = G_exclusion_restriction,
                       start.val = starting)

summary(lim_model_B)
lim_model_B$centrality
summary(lim_model_B, print = "first.step")

# Test Heterogeneity

# Heterogeneous factor (node -level)
z <- as.numeric(as.character(db_model_B$gender))

# Specify formula
f_het_model_B <- formula("les ~ party + nchair")

# Specify starting values
starting <- c(alpha = 0.23952,
              beta_party1 = 0.42947,
              beta_nchair1 = 3.09615,
              beta_z = -0.12749,
              theta_0 = 0.42588,
              theta_1 = 0.08007)

# Fit model
het_model_B_l <- net_dep(formula = f_het_model_B,
                         data = db_model_B,
                         G = G_model_B, model = "model_B", estimation = "NLLS",
                         hypothesis = "het_l", z = z, start.val = starting)

# Store and print results
summary(het_model_B_l)
het_model_B_l$centrality

# Specify starting values
starting <- c(alpha = 0.04717,
              beta_party1 = 0.51713,
              beta_nchair1 = 3.12683,
              beta_z = 0.01975,
              eta_0 = 1.02789,
              eta_1 = 2.71825)

# Fit model
het_model_B_r <- net_dep(formula = f_het_model_B,
                         data = db_model_B,
                         G = G_model_B, model = "model_B", estimation = "NLLS",
                         hypothesis = "het_r", z = z, start.val = starting)

# Store and print results
summary(het_model_B_r)
het_model_B_r$centrality

# Heterogeneous factor (edge -level)
z <- as.numeric(as.character(db_model_B$party))

# Specify starting values
starting <- c(alpha = 0.242486,
              beta_gender1 = -0.229895,
              beta_party1 = 0.42848,
              beta_nchair1 = 3.0959,
              phi_within  = 0.396371,
              phi_between = 0.414135)

# Fit model
party_model_B <- net_dep(formula = f_model_B, data = db_model_B,
                         G = G_model_B, model = "model_B",
                         estimation = "NLLS", hypothesis = "par",
                         z = z, start.val = starting)

# Store and print results
summary(party_model_B)
party_model_B$centrality

# WARNING, This toy example is provided only for runtime execution.
# Please refer to previous examples for sensible calculations.
data("db_alumni_test")
data("G_model_A_test")
db_model_A <- db_alumni_test
G_model_A <- G_model_A_test
f_model_A <- formula("les ~ dw")
lim_model_A_test <- net_dep(formula = f_model_A, data = db_model_A,
                       G = G_model_A, model = "model_A", estimation = "NLLS",
                       hypothesis = "lim", start.val = c(alpha = 0.09030594,
                                                         beta_dw = 1.21401940,
                                                         phi = 1.47140647))
summary(lim_model_A_test)

econet documentation built on April 28, 2022, 1:07 a.m.