vars_redun_ols: The choice of the optimal matching variables based on OLS.

vars_redun_olsR Documentation

The choice of the optimal matching variables based on OLS.

Description

Use OLS to select the optimal subset of the matching variables. The variables which appear more times in the OLS models are considered as more important variables for statistical matching. Multicollinearity is checked for all the final variables.

Usage

vars_redun_ols(data = data, DVList = DVList, IDVList = IDVList,
  out = NULL)

Arguments

data

An object such as a data.frame or matrix that has colnames of dependents/independents variables.

DVList

The dependent variables for model variables selection. It should be the variables need to be fused into the recipent data from the donor data.

IDVList

The independent variables for model variables selection. It should be the original matching variables need to be redundented.

out

Output filename. The output file is in spreadsheet format, the file name should have a spreadsheet file extension (.xlsx). If ignored, no spreadsheet output will be generated.

Value

A synthetic data frame with the variables freqency in OLS models.

Examples


#define dependent variables, which are the variables need to be fused from the donor data
DVList <- names(mag %>% select(starts_with("AT")))

#define match variables, some need to be the factors to generate dummy variables
match.var =  c("NFAC1_2", "NFAC2_2", "NFAC3_2", "NFAC4_2", "NFAC5_2", "NFAC6_2", "NFAC7_2", "childhh", "agemid", "incmid", "ethnic", "maritalstat",
               "educat", "homestat", "employstat", "dvryes", "cabdsl")
               
match.var.factor <- c("NFAC1_2", "NFAC2_2", "NFAC3_2", "NFAC4_2", "NFAC5_2", "NFAC6_2", "NFAC7_2", "childhh", "ethnic", "maritalstat",
                      "educat", "homestat", "employstat", "dvryes", "cabdsl")
match.var.num <- c("agemid", "incmid")

#only keep dependent and independent variables
don <- mag %>%
  mutate_at(vars(match.var.factor), as.factor) %>%
  mutate_at(vars(match.var.num), as.numeric) %>%
  mutate_at(vars(DVList), as.numeric) %>%
  select(DVList, match.var.factor, match.var.num)
  
#generate dummy variables
don.new <- dummy_recodes(don, drop=TRUE, all=TRUE)

IDVListData <- don.new %>%
  select(setdiff(names(don.new), c(DVList))) %>%
  slice(1)

#match variable redunction   
results <- vars_redun(data=don.new, DVList = DVList, IDVList = names(IDVListData), out="results.xlsx")


yangx227/SimmonsResearchR documentation built on April 24, 2022, 6:44 a.m.