poolChoiceModel: Run fixed effects conditional logit model on multiple...

View source: R/fakeunion_functions.r

poolChoiceModelR Documentation

Run fixed effects conditional logit model on multiple datasets and pool results

Description

Given a list of multiple datasets of real and counterfactual unionts and a model formula, this function will calculate a discrete choice model for each dataset and pool the results, taking account of both within and between variance in calculating the standard error.

Usage

poolChoiceModel(formula, datasets, method = "exact", parallel = FALSE)

Arguments

formula

an object of class formula specifying the clogit model to be performed on each dataset.

datasets

a list of datasets where each dataset is produced from the generateCouples function.

method

A character string indicating the estimation method to use in the clogit model.

parallel

A boolean indicating whether to use the parallel package to increase the speed of estimation via multiple cores.

Details

Because the dataset of real and counterfactual unions is created by sampling among all the possible alternate partners, the results of models will vary as a result of this sampling process. Therefore, it may be useful to generate multiple datasets and pool model results across these datasets in a manner identical to multiple imputation, where the standard errors of estimates are adjusted for the variance in coefficient estimates across datasets.

This function is a convenience function that will perform this pooling and produce properly adjusted results. The reported coefficients from the model are given by taking the mean across all datasets. The reported variance V for each parameter is given by:

V=W+(1+1/m)B

Where m is the number of datasets, W is the within variance, estimated by the square of the mean standard error across datasets, and B is the between variance estimated by the variance of coefficient estimates across datasets.

Models are estimated using the clogit function from the survival package. This package must be installed.

Value

a list containing the following objects:

coefficients

a data.frame object with the following elements:

  • b.pool: The average coefficient across datasets.

  • se.pool: standard error that combined within and between variance

  • z.pool: z-statistic from dividing b by se

  • pvalue.pool: p-value for the hypothesis test that the coefficient is zero in the population

  • within.var: The square of the mean standard error across datasets

  • between.var: the variance of the coefficient across datasets

deviance

A vector of deviances for each model.

bic

A vector of BIC statistic for each dataset relative to the null model.

Examples

markets <- replicate(5, generateCouples(3,acs.couples,
                           acs.malealters,acs.femalealters,
                           "state",weight="perwt",verbose=FALSE),
                     simplify=FALSE)

poolChoiceModel(choice~ageh+I(ageh^2)+I(ageh-agew)+I((ageh-agew)^2)+strata(group),
                markets)

AaronGullickson/fakeunion documentation built on Aug. 6, 2023, 7:19 p.m.