saeplus_modelunitlevel: Estimate the unit level model

View source: R/saeplus_modelunitlevel.R

saeplus_modelunitlevelR Documentation

Estimate the unit level model

Description

This function prepares remote sensing data for imputation by checking the admin level shapefiles for geometric consistency. Prior to the imputation, an algorithm is used to select an appropriate set of variables that best predicts the outcome variable. The household survey data is used to create a synthetic census of households. The empirical best predictor is estimated using the household survey data and is used as the basis for imputation into the synthetic census.

Usage

saeplus_modelunitlevel(
  hhsurvey_dt,
  hhid_var,
  size_hh,
  adminshp_dt,
  target_id,
  geopolycensus_dt,
  geopopvar,
  geopoly_id = "id",
  crs_set = rep(4326, 3),
  agr_set = rep("constant", 3),
  cand_vars,
  cons_var,
  wgt_vartype = "hh",
  weight,
  create_dummy = TRUE,
  dummy_var,
  aggregate_id,
  ...,
  ncpu = 30,
  pline = 5006362,
  pline_transform = "inclusion_line",
  result_dir = getwd()
)

Arguments

hhsurvey_dt

object of class data.frame/data.table corresponding to household survey data (unit level data)

hhid_var

the household ID variable from the hhsurvey_dt object

size_hh

an integer/numeric for household size variable within the hhsurvey_dt object

adminshp_dt

an object of class sf, data.table and/or data.frame containing administrative level boundaries with multipolygons/polygons geometries

target_id

a character string representing an integer column vector for the admin level at which small area estimates will be computed for the poverty map

geopolycensus_dt

an object of class sf, data.table and/or data.frame containing polygon/multipolygon geometries and geospatial indicators

geopopvar

a character string for the population count variable name in the geopolycensus_dt

geopoly_id

a string/character variable representing the polygon ID within geopolycensus_dt

crs_set

an integer list of the coordinate reference systems for of the aforementioned objects i.e. the CRS for hhsurvey_dt, adminshp_dt and geopolycensus_dt in this order

agr_set

a character/string list representing attribute-geometry-relationships specified for each non-geometry attribute column and how it relates to the geometry, and can have one of following values "constant", "aggregate" and "identity". The default is constant. See details for more. The AGR will listed in the same order as the crs_set.

cand_vars

a character vector of candidate explanatory variables to be included in the model selection process

cons_var

the dependent variable for small area estimation (typically household per capita consumption)

wgt_vartype

a character string representing the weighting type. The options could be "hh", "pop" i.e. households vs population weights.

weight

a numeric/integer weight variable

create_dummy

if TRUE, a dummy variable will be created if dummy_var is specified. Variable must be from adminshp_dt

dummy_var

a list of variables from which a dummies will be created for each level.

aggregate_id

if argument is not NULL, all cand_vars will be aggregated at the admin level specified as the aggregate_id argument

...

include any set of arguments available within emdi::ebp() function as you see fit.

ncpu

the number of CPUs for parallelizing the small area estimation algorithm

pline

the national poverty line

pline_transform

select an order norm transformation method. There are three options. The default inclusion_line" option adds the poverty line value stipulated to the vector of welfare vector provided. The order norm value obtained from the normalization process is the converted value. The "interpolation_line" option applies a linear interpolation to outcome_var to estimate a conversion The "limsup_line" option takes the converted value of the greatest welfare value below the poverty line

result_dir

local folder directory where all tables and charts will be stored. If result_dir is not specified, the default is the result of getwd()

hhsurvey_lat

the latitude variable within the hh_dt object

hhsurvey_lon

the longitude variable within the hh_dt object


SSA-Statistical-Team-Projects/SAEplus documentation built on Aug. 24, 2022, 11:26 a.m.