prepareData: prepareData
In emunozh/GREGWT: Implements the GREGWT algorithm in R.

Description Usage Arguments Value Author(s) Examples

Prepares the data for simulation

prepareData(census, survey, census_area_id = 1, survey_id = 1,
  convert = TRUE, use_base = TRUE, census_categories = FALSE,
  survey_weights = FALSE, survey_categories = FALSE,
  reference_col = FALSE, group = FALSE, na.rm = FALSE, breaks = FALSE,
  pop_benchmark = FALSE, du_benchmark = FALSE, building_benchmark = FALSE,
  align = FALSE, pop_total_col = FALSE, verbose = FALSE)

`census`	Census data of small areas.
`survey`	A survey of individual records (microdata).
`census_area_id`	(optional, default=1) row name or row index with area id in the census data. Define as 'FALSE' if area code should be generated.
`survey_id`	(optional, default=1) individual records id's. Define as 'FALSE' to generate an id.
`convert`	(optional, default=TRUE) Converts data to binary format.
`use_base`	(optional, default=TRUE) use the model.matrix function form base R.
`census_categories`	(optional, default=FALSE) row names or row index of with categories to be used in the simulation.
`survey_weights`	(optional, default=FALSE) row name or row index of initial weights in the survey data. 'FALSE' will use the last column.
`survey_categories`	(optional, default=FALSE) survey categories to be used in the simulation.
`reference_col`	(optional, default=FALSE) Category used as reference.
`group`	(optional, default=FALSE) Used variable to run an integrated re-weighting simulation.
`na.rm`	(optional, default=FALSE) remove records with nan values.
`breaks`	(optional, default=FALSE) define the beaks to calculate population totals, if FALSE population totals won't be computed
`pop_benchmark`	(optional, default=FALSE) define the benchmark to be used for the computation of the total population, pass as a vector/ containing the breaks of the benchmark (e.g. `pop_benchmark=c(1,5)`). If FALSE the function will compute total population as the mean of the all benchmarks.
`align`	(optional, default=FALSE) align values to population totals
`pop_total_col`	(optional, default=FALSE) col containing the population totals
`verbose`	(optional, default=FALSE) be verbose
`pop_du`	(optional, default=FALSE) define the benchmark to be used for the computation of total dwelling units. Analog to `pop_benchmark`
`pop_building`	(optional, default=FALSE) define the benchmark to be used for the computation of total building units. Analog to `pop_benchmark`

X Prepared survey matrix.

Tx Marginal totals for simulation area.

dx Survey design weights.

area_id Small area ID.

total_pop mean population totals for each area

X_complete binary formatted survey with all all categories

Tx_complete marginal sums with all categories

M. Esteban Munoz H.

data("GREGWT.census")
data("GREGWT.survey")

simulation_data <- prepareData(GREGWT.census, GREGWT.survey,
                               census_categories=seq(2,24),
                               survey_categories=seq(1,3))

simulation_data1 <- prepareData(GREGWT.census, GREGWT.survey,
                                census_categories=seq(2,24),
                                survey_categories=seq(1,3),
                                pop_benchmark=c(2,12),
                                verbose=TRUE)

# compute the total population as the mean of all benchmarks. Breaks parameters
# needs to be defined. In this case the breaks are displaced by one because the
# area code is on the first column.
simulation_data2 <- prepareData(GREGWT.census, GREGWT.survey,
                                census_categories=seq(2,24),
                                survey_categories=seq(1,3),
                                breaks=c(11, 17),
                                verbose=TRUE)

total_pop1 <- simulation_data1$total_pop
plot(total_pop1$pop)
total_pop2 <- simulation_data2$total_pop
points(total_pop2$pop, col="red", pch="+")