generateCouples: Generate Counterfactual Couples
In AaronGullickson/fakeunion: Create Dataset of Counterfactual and Actual Unions

View source: R/fakeunion_functions.r

generateCouples

R Documentation

Generate Counterfactual Couples

Description

This function will sample a set of alternate partners from a data.frame for a set of actual unions in order to generate a set of counterfactual unions that can be used in a conditional logit model to predict the characteristics that are likely to lead to a union match.

Usage

generateCouples(
  n,
  actual,
  men,
  women,
  geo,
  id = "id",
  weight = NULL,
  keep = NULL,
  verbose = TRUE
)

Arguments

`n`	the number of counterfactual unions to create for each real union.
`actual`	a data.frame identifying identifying the actual couples. This dataset should have the variable identified by `geo` and variables for husbands and wives ids that end in "h" and "w" respectively. It should also contain all the same variables ending in "h" and "w" that should be kept in the final analysis.
`men`	a data.frame object identifying all the potential "male" partners. This data.frame must include the variable identified by `geo` above and it must also contain an id variable whose name starts with the string given by `id` and end with "h". Any person-specific variable that should be kept in the results should end with an "h" (e.g. ageh, raceh). These variables should correspond exactly to variables in `actual` and should correspond to variables in `women` that end in "w".
`women`	a data.frame object identifying all the potential "female" partners. Its format should be the same as `men`, except variables should end in "w" rather than "h".
`geo`	a character string giving the name of the variable that identifies the clusters that alternate partners should be sampled from. This variable must be named the same way in `men`, `women`, and `actual`.
`id`	a character string giving part of the variable name to identify partners in `actual`, `men`, and `women`. In each of the datasets, the actual variable name should be appended with "h" and "w" for husbands and wives respectively.
`weight`	A character string that identifies a weight variable for sampling of potential partners. If left `NULL`, partners will be sampled with equal probability. This weight variable must exist in both the `men` and `women` data.
`keep`	a vector of character strings identifying additional variables in `actual` that should be kept in the results.
`verbose`	if true, the function will report its progress.

Details

This program uses the wrswoR package to do fast weighted sampling without replacement. Large sets of alternate spouses and actual unions can be sampled relatively quickly, but processing time will begin to increase exponentially with very large datasets within each cluster.

For each actual union, the program randomly determines one of the two spouses to sample with 50/50 odds. This is to ensure that all characteristics of a single spouse are fixed within the fixed effects conditional logit model.

Currently the format of the datasets must be followed exactly for the function to work correctly.

Value

The output of this program is a data.frame of actual and counterfactual unions. It will keep all variables in the three datasets that end in an "h" or "w" as well as:

`geo`	description the cluster identifier used in the function.
`group`	a unique identifier based on the id of the spouse from the actual union who had partners sampled for them. This should be used as the fixed effect in fixed effects models.
`choice`	a boolean variables that is `TRUE` if this is an actual union and `FALSE` if this is a counterfactual union. This variable should be used as the dependent variable in a fixed effect conditional logit model.

Examples

#generate three counterfactual couples for each real couples
#in example ACS data
market <- generateCouples(3,acs.couples,
                          acs.malealters,acs.femalealters,
                          "state",weight="perwt",keep="hhwt")

#check that there is one real marriage and three counterfactual
#marriages for each case
summary(tapply(market$choice,market$group,sum))
summary(tapply(!market$choice,market$group,sum))

## Not run: 
#load survival function and run clogit command to estimate how age
#differences and racial exogamy affect the log-odds of union formation
require(survival)
model <- clogit(choice~I(ageh-agew)+I((ageh-agew)^2)+I(raceh!=racew)
                       +strata(group), data=market)
summary(model)

## End(Not run)

AaronGullickson/fakeunion documentation built on Aug. 6, 2023, 7:19 p.m.