knitr::opts_chunk$set(collapse = T, comment = "#>")
options(tibble.print_min = 6, tibble.print_max = 6)

Introduction

This vignette provides the basic information on the use of SAEplus for users. The purpose of SAEplus is to create a one-stop-shop for pulling, cleaning and preparing data for use within small area estimation as well as create post-estimation tables and poverty maps for report writing. Some of the tools (functions) created towards also being generalized for other areas. While small area estimation is extensively used for poverty analysis, it may also be useful for other topic areas as well.

Underlying Philosophy

The basic philosophy behind creating the SAEplus functions is to create functional wrappers on existing R functions that are accessible to those new to R. In addition, other functionality has been produced to carry out data cleaning. The package also plays well with the EMDI R package.

The SAEplus Process

Process Overview

In general, the SAEplus functions span the following tasks: data pulling, data cleaning and preparation and post-SAE functions

| SAEPlus Functions | Usage | |-------------------------------|-----------------------------------------------------------------------------| | gengrid, gengrid2 | Creates a grid system from raster and shapefile running a raster with grid | | | statistics |
| | | | gee_datapull, gee_datapull2, | Pulls geospatial image collections from Google Earth Engine into shapefiles | | gee_pullbigdata | with some functionality for optimizing for speed. | | | |
| gee_pullimage | Pulls geospatial images from Google Earth Engine into shapefiles | | | | | hdx_pull | Pulls geospatial from the UN's Humanitarian Data Exchange server | | | | | wpopbuilding_check | Checks whether country contains building data in World Pop data | | | | | | | | wpopbuilding_pull | Pulls the building data for the countries that have World Pop data | | | | | osm_datapull | Pulls geospatial open street map data on roads and other visible amenities | | | | | osm_processpoints | Cleans and prepares point count data obtained via osm_datapull() | | | | | osm_processmp | Cleans and prepares multipolygon data indicators obtained via | | | osm_datapull() | | | | | osm_processlines | Cleans and prepares road network from osm_pull and uses this for analysis | | | | | saeplus_gencensus | Generates a synthetic census from grid level data to be used for | | | small area estimation | | | | | saeplus_hhestpoly | Compute any indicator from grid level geospatial data for any/no groups | | | | | saeplus_ordernormpl | Convert outcome indicator in its order norm transformation | | | | | saeplus_selectmodel | Perform model selection prior to SAE model estimation | | | | | saeplus_calibratepovrate | Post EMDI estimation function for benchmarking poverty rates | | | | | saeplus_addbenchmark | Post benchmarking function for processing calibrated results | | | | | saeplus_ebpreports | Creates post-estimation model diagnostic reports based on EMDI::EBP results | | | | | saeplus_makesumtable | Creates summary table from household level data to be used in reports | | | | | saeplus_gboost (To do!) | Empirical Best Predictor that uses the gradient boosting algorithm to select| | | estimates |

How to start with SAEplus

Data Storage

The data is currently stored on local machines and housed on github. However, to improve the efficiency of our work, we will be migrating to store the data on the CWAPOV WB internal server (\cwapov\cwapov). This will make it easier to integrate our work on the server as well. There are sample scripts within our github page here.

Future to do

  1. Complete saeplus_gboost() function

  2. Create system for efficient reading of data from CWAPOV server for different types of data

  3. Complete internal vignette for STCs to learn how to use SAEplus to pull and process data showing our to use it with EMDI

Lessons Learnt



SSA-Statistical-Team-Projects/SAEplus documentation built on Aug. 24, 2022, 11:26 a.m.