all_geog_synthetic_new_attribute: Add a new attribute to a set (ie list) of synthetic_micro...

View source: R/sim_anneal_wrappers.R

all_geog_synthetic_new_attributeR Documentation

Add a new attribute to a set (ie list) of synthetic_micro datasets

Description

Add a new attribute to a set (ie list) of synthetic_micro datasets using conditional relationships between the new attribute and existing attributes (eg. wage rate conditioned on age and education level). The same attribute is added to *each* synthetic_micro dataset, where each dataset is supplied a distinct relationship for attribute creation.

Usage

all_geog_synthetic_new_attribute(
  df_list,
  prob_name = "p",
  attr_name = "variable",
  conditional_vars = NULL,
  st_list = NULL,
  leave_cores = 1L
)

Arguments

df_list

A list of R objects each of class "synthetic_micro".

prob_name

A string specifying the column name of each data.frame in df_list containing the probabilities for each synthetic observation.

attr_name

A string specifying the desired name of the new attribute to be added to the data.

conditional_vars

An character vector specifying the existing variables, if any, on which the new attribute (variable) is to be conditioned on for each dataset. Variables must be specified in order. Defaults to NULL ie- an unconditional new attribute.

st_list

A list of equal length to df_list. Each element of st_list is a data.frame symbol table with N + 2 columns. The last two columns must be: 1. A vector containing the new attribute counts or percentages; 2. is a vector of the new attribute levels. The first N columns must match the conditioning scheme imposed by the variables in conditional_vars. See synthetic_new_attribute and examples.

leave_cores

An integer for the number of cores you wish to leave open for other processing.

Value

A list of new synthetic_micro datasets each with class "synthetic_micro".

See Also

synthetic_new_attribute

Examples

## Not run: 
 set.seed(567L)
 df <- data.frame(gender= factor(sample(c("male", "female"), size= 100, replace= TRUE)),
                 age= factor(sample(1:5, size= 100, replace= TRUE)),
                 pov= factor(sample(c("lt_pov", "gt_eq_pov"),
                                    size= 100, replace= TRUE, prob= c(.15,.85))),
                 p= runif(100))
df$p <- df$p / sum(df$p)
class(df) <- c("data.frame", "micro_synthetic")

# and example test elements
cond_v <- c("gender", "pov")
levels <- c("employed", "unemp", "not_in_LF")
sym_tbl <- data.frame(gender= rep(rep(c("male", "female"), each= 3), 2),
                      pov= rep(c("lt_pov", "gt_eq_pov"), each= 6),
                      cnts= c(52, 8, 268, 72, 12, 228, 1338, 93, 297, 921, 105, 554),
                      lvls= rep(levels, 4))



df_list <- replicate(10, df, simplify= FALSE)
st_list <- replicate(10, sym_tbl, simplify= FALSE)

# run
library(parallel)
syn <- all_geog_synthetic_new_attribute(df_list, prob_name= "p", attr_name= "variable",
                                        conditional_vars= cond_v,st_list= st_list)

## End(Not run)

alexWhitworth/synthACS documentation built on Nov. 2, 2022, 9:14 a.m.