View source: R/fakeunion_functions.r
generateCouples | R Documentation |
This function will sample a set of alternate partners from a data.frame for a set of actual unions in order to generate a set of counterfactual unions that can be used in a conditional logit model to predict the characteristics that are likely to lead to a union match.
generateCouples(
n,
actual,
men,
women,
geo,
id = "id",
weight = NULL,
keep = NULL,
verbose = TRUE
)
n |
the number of counterfactual unions to create for each real union. |
actual |
a data.frame identifying identifying the actual couples. This dataset should have the variable identified
by |
men |
a data.frame object identifying all the potential "male" partners. This data.frame must include the
variable identified by |
women |
a data.frame object identifying all the potential "female" partners. Its format should be the
same as |
geo |
a character string giving the name of the variable that identifies the clusters that alternate
partners should be sampled from. This variable must be named the same way in |
id |
a character string giving part of the variable name to identify partners in |
weight |
A character string that identifies a weight variable for sampling of potential partners. If left
|
keep |
a vector of character strings identifying additional variables in |
verbose |
if true, the function will report its progress. |
This program uses the wrswoR package to do fast weighted sampling without replacement. Large sets of alternate spouses and actual unions can be sampled relatively quickly, but processing time will begin to increase exponentially with very large datasets within each cluster.
For each actual union, the program randomly determines one of the two spouses to sample with 50/50 odds. This is to ensure that all characteristics of a single spouse are fixed within the fixed effects conditional logit model.
Currently the format of the datasets must be followed exactly for the function to work correctly.
The output of this program is a data.frame of actual and counterfactual unions. It will keep all variables in the three datasets that end in an "h" or "w" as well as:
geo |
description the cluster identifier used in the function. |
group |
a unique identifier based on the id of the spouse from the actual union who had partners sampled for them. This should be used as the fixed effect in fixed effects models. |
choice |
a boolean variables that is |
#generate three counterfactual couples for each real couples
#in example ACS data
market <- generateCouples(3,acs.couples,
acs.malealters,acs.femalealters,
"state",weight="perwt",keep="hhwt")
#check that there is one real marriage and three counterfactual
#marriages for each case
summary(tapply(market$choice,market$group,sum))
summary(tapply(!market$choice,market$group,sum))
## Not run:
#load survival function and run clogit command to estimate how age
#differences and racial exogamy affect the log-odds of union formation
require(survival)
model <- clogit(choice~I(ageh-agew)+I((ageh-agew)^2)+I(raceh!=racew)
+strata(group), data=market)
summary(model)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.