Here we use the GridSample package (https://doi.org/10.1186/s12942-017-0098-4) to replicate the sample design of the 2010 Rwanda DHS. Further details are available at:
The 2010 Rwanda DHS sampled 12,540 households from 492 PSUs comprising rural villages and urban neighborhoods (30). The sample was stratified by Rwanda's 30 districts, urban areas were oversampled by adding 12 PSUs in Kigali's three districts, and 26 households were sampled from each urban and rural PSU. The average village in Rwanda had 610 occupants according to the sample frame.
Therefore, the corresponding parameters for the gs_sample
algorithm are
cfg_hh_per_stratum <- 416 # cfg_hh_per_urban <- 26 # households from each urban PSU cfg_hh_per_rural <- 26 # households from each rural PSU cfg_pop_per_psu <- 610 # limit PSU size to the average village size
We start with two files, a population raster and a shapefile RWAshp
defining the various strata in the census. Here, the strata shapefile consists of the 30 districts. The population raster can be downloaded from RWA_ppp_v2b_2010_UNadj.tif
.
First, we read in the relevant datasets, and rasterize the strata layer using a field that contains a unique identifier for each polygon.
library(gridsample) library(raster) population_raster <- raster(system.file("extdata", "RWA_ppp_2010_adj_v2.tif", package="gridsample")) plot(population_raster) data(RWAshp) strata_raster <- rasterize(RWAshp,population_raster,field = "STL.2") plot(strata_raster)
We need a raster that defines urban and rural areas as well. Here, we'll define urban areas by determining the population cell density associated with the official proportion of people in an urban area. The 2012 Rwanda census found 16% of people to live in urban areas, so we'll change the most dense cells as urban, until 16% of the population is defined as being in urban areas.
total_pop <- cellStats(population_raster, stat = "sum") urban_pop_value <- total_pop * .16 pop_df <- data.frame(index = 1:length(population_raster[]), pop = population_raster[]) pop_df <- pop_df[!is.na(pop_df$pop), ] pop_df <- pop_df[order(pop_df$pop,decreasing = T), ] pop_df$cumulative_pop <- cumsum(pop_df$pop) pop_df$urban <- 0 pop_df$urban[which(pop_df$cumulative_pop <= urban_pop_value)] <- 1 urban_raster <- population_raster >= min(subset(pop_df,urban == 1)$pop) plot(urban_raster)
gs_sample
Now that we have the population, strata, and urbanization rasters, we are ready to use the gridsample algorithm. We exclude any grid cells with a population less than 0.01 to prevent the algorithm from creating overly large sampling areas (cfg_min_pop_per_cell = 0.01
), also restricting the total PSU size to less than 10km^2 (cfg_max_psu_size = 10
). Finally, because we wanted the sample to be representative of both urban and rural areas, we allowed the algorithm to oversample rural or urban areas by setting cfg_sample_rururb = TRUE
. We save the output shapefile to a temporary directory.
psu_polygons <- gs_sample( population_raster = population_raster, strata_raster = strata_raster, urban_raster = urban_raster, cfg_desired_cell_size = NA, cfg_hh_per_stratum = 416, cfg_hh_per_urban = 26, cfg_hh_per_rural = 26, cfg_min_pop_per_cell = 0.01, cfg_max_psu_size = 10, cfg_pop_per_psu = 610, cfg_sample_rururb = TRUE, cfg_sample_spatial = FALSE, cfg_sample_spatial_scale = 100, output_path = tempdir(), sample_name = "rwanda_psu" ) plot(psu_polygons)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.