knitr::opts_chunk$set( collapse = TRUE, comment = "#>", autodep = TRUE, cache = TRUE )
From r system("git config --get remote.origin.url", intern = TRUE)
on r date()
, generated by r Sys.info()[["effective_user"]]
.
There are two parts to this supplement. This first section aligns and formats maps. The next section, appended below, performs calculations and makes graphs.
This notebook retrieves population maps for Bioko Island, in Equatorial Guinea, and converts it into gridded data about population density. This notebook should retrieve and clean data so that we can pass that to another notebook to do statistics.
The two sections here regrid in two different ways. The first section uses BIMEP GPS data, so we know where each house is. It re-grids the GPS data to each gridded dataset with which we will compare. You'll see that this matters for the 100m datasets but not for the 1km datasets. The second section uses the BIMEP area and section grids. It realigns HRSL, LandScan, and WorldPop to those grids.
The BIMEP GPS data is not publicly available. It comes from the BIMEP Mapping Team, with the support of the National Malaria Control Program and the Ministry of Health and Social Welfare of Equatorial Guinea, as well as Marathon Oil, Noble Energy, AMPCO (Atlantic Methanol Production Company) and the Ministry of Mines and Energy of Equatorial Guinea.
I have an initial concern about population counts near ocean borders. Different source datasets handle the land boundary differently, both in how they demarcate it in the raster image and in how the decide where land and sea meet. LandScan, for instance, errs in favor of more land, so that they can include littoral populations in counts. We can address this by enforcing each incoming dataset to mark sea with the same NA value, and we can avoid questions about mismatches by comparing the grid to the BIMEP GPS-driven data, set on the same grid as the incoming raster. Land areas will still be a little different, but I won't have to worry about population counts off the edge.
We could also rescale populations according to the growth rate for Bioko. A commonly-used adjustment is to multiply values by $\exp(rt)$, where $r$ is the yearly growth rate. For Equatorial Guinea, that's 3.7% in 2017-2018, according to UN sources. This wouldn't affect the structure of zeroes, though.
UN World Population Prospects 2019. Equatorial Guinea, average annual rate of population change (percentage), from https://population.un.org/wpp/DataQuery/:
df = data.frame( five_year = c("2000-2005", "2005-2010", "2010-2015", "2015-2020"), annual_percent_change = c(4.24, 4.61, 4.28, 3.66) )
e^rt == 1.0366 rt = log(1.0366) r = log(1.0366) / t
growth_rate_per_year = log(1.0366) / 1 # year
If WorldPop has a size of 362,962 people in 2020, what would that mean in 2018?
c(362962 * exp(-2 * growth_rate_per_year), exp(-2 * growth_rate_per_year))
So that's 337,784, a decrease to 93% of what it was, and BIMEP is at 239,056.
Pixels in the water should be marked as NA and excluded from calculations later.
All work will use pixels in latitude and longitude (lat-long). This is also known as an unprojected space. We'll need to transform the BIMEP projection to lat-long, but the rest are already in lat-long, and the BIMEP coarse and fine grids are rectilinear in lat-long.
We will ensure each dataset has NA values where it's ocean and 0 values inland of the ocean.
library(popbioko) library(rprojroot) library(sp) library(sf) library(tmap) library(GISTools)
Specify a directory into which to put the data.
The inst/extdata
subdirectory is a popular place.
data_dir <- params$data_directory if (!dir.exists(data_dir)) { dir.create(data_dir, recursive = TRUE) }
The data will be stored in r data_dir
.
The Bioko shapefile has one geometry, a polygon outline of Bioko. We will use it to project population from rasters.
latitude_longitude_projection <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs" bioko_sf <- sf::st_read(fs::path(data_dir, "source", "bioko.shp"))
If you have an account at IHME, the following will copy files to
your local drive, under the data directory. Look at the
help for data_configuration
to see about creating a config file
for the download.
#download_worldpop(local_directory = data_dir) #download_bioko_grids(local_directory = data_dir) #download_hrsl_points(local_directory = data_dir)
This section uses the original BIMEP GPS data, so it's a longitude-latitude for each house an an integer count of the number of people. That means there is no section or area grid to consider. We align BIMEP to any location.
The output of this section is a set of GeoTIFFS in the aligned
subdirectory of your data directory (likely inst/extdata/aligned
).
The names are of the form <grid><resolution>_<source>.tif
.
So the HRSL is on a 100m grid. It is saved as HRSL100_HRSL.tif
.
Bioko data on that grid is saved as HRSL100_Bioko.tif
.
The incoming BIMEP data is GPS coordinates, and a few are farther in the ocean than one could comfortably kayak, so let's remove those.
bimep_points_file <- fs::path(data_dir, "bimep_gps.shp") if (!file.exists(bimep_points_file)) { bimep_raw <- bimep_population_as_points() intersected_with_bioko <- sf::st_intersection(bimep_raw, bioko_sf) sf::st_write(intersected_with_bioko, bimep_points_file, driver = "ESRI Shapefile") }
For each grid, the code to project the GPS data is the same.
The HRSL marks where houses are with values and everywhere else with NA. The incoming data is for all of Equatorial Guinea, so we crop it. We need to have zeroes where there is land, so add the zeroes to the HRSL. Pixels with no land are marked NA. Otherwise, we don't modify values.
hrsl_raster <- read_hrsl(local_directory = data_dir) hrsl_raster_crop <- raster::crop(hrsl_raster, bioko_sf, snap = "out") hrsl_zero_mask <- raster::rasterize(bioko_sf, hrsl_raster_crop, field = 0) hrsl_raster_zero <- raster::cover(hrsl_raster_crop, hrsl_zero_mask) plot(hrsl_raster_zero)
Now put Bioko data on that grid:
bimep_on_hrsl <- bimep_on_grid(hrsl_raster_zero, bioko_sf, local_directory = data_dir) plot(bimep_on_hrsl)
popbioko::write_aligned_raster( hrsl_raster_zero, list(source = "HRSL", grid = "HRSL", resolution = 30), local_directory = data_dir ) popbioko::write_aligned_raster( bimep_on_hrsl, list(source = "BIMEP", grid = "HRSL", resolution = 30), local_directory = data_dir )
aggregate_and_write <- function(raster, name) { pixel_area <- square_meters_per_pixel.raster(raster) desired_area <- 924^2 # The largest grid is 924x924, not 1km. fact <- round(sqrt(desired_area / pixel_area)) cat(paste("aggregating factor for", name, "is", fact, "with pixel side", sqrt(pixel_area), "\n")) aggregated <- raster::aggregate(raster, fact = fact, fun = sum, expand = TRUE) agg_pixel_side <- sqrt(square_meters_per_pixel.raster(aggregated)) popbioko::write_aligned_raster( aggregated, list(source = name$source, grid = name$grid, resolution = agg_pixel_side), local_directory = data_dir) }
aggregate_and_write( hrsl_raster_zero, list(source = "HRSL", grid = "HRSL", resolution = 30) )
aggregate_and_write( bimep_on_hrsl, list(source = "BIMEP", grid = "HRSL", resolution = 30) )
LandScan is 1 km data, so we don't aggregate it.
landscan <- read_landscan(local_directory = data_dir) landscan <- raster::crop(landscan, bioko_sf, snap = "out") bimep_on_landscan <- bimep_on_grid( landscan, bioko_sf, local_directory = data_dir) popbioko::write_aligned_raster( landscan, list(source = "LandScan", grid = "LandScan", resolution = 1000), local_directory = data_dir ) popbioko::write_aligned_raster( bimep_on_landscan, list(source = "BIMEP", grid = "LandScan", resolution = 1000), local_directory = data_dir )
Actually, let's disaggregate it, to see what happens if we use it at 100m.
factor <- 10 landscan100 <- raster::disaggregate(landscan, fact = factor) / factor^2 bimep_on_landscan100 <- bimep_on_grid( landscan100, bioko_sf, local_directory = data_dir) popbioko::write_aligned_raster( landscan100, list(source = "LandScan", grid = "LandScan", resolution = 100), local_directory = data_dir ) popbioko::write_aligned_raster( bimep_on_landscan100, list(source = "BIMEP", grid = "LandScan", resolution = 100), local_directory = data_dir )
WorldPop is 100m, like the HRSL data, so we aggregate it. This is the unconstrained, individual-country data.
wp_options <- list(raw = "gnq_ppp_2018.tif", adjusted = "gnq_ppp_2018_UNadj.tif") worldpop <- raster::raster(fs::path( data_dir, "Equatorial_Guinea_100m_Population", wp_options[["adjusted"]])) worldpop_na <- raster::crop(worldpop, bioko_sf, snap = "out") worldpop_zero_mask <- raster::rasterize(bioko_sf, worldpop_na, field = 0) worldpop <- raster::cover(worldpop_na, worldpop_zero_mask) worldpop_id <- list(source = "WorldPop-U", grid = "WorldPop", resolution = 100) bimep_id <- worldpop_id bimep_id$source <- "BIMEP" popbioko::write_aligned_raster( worldpop, worldpop_id, local_directory = data_dir ) bimep_on_worldpop <- bimep_on_grid( worldpop, bioko_sf, local_directory = data_dir) popbioko::write_aligned_raster( bimep_on_worldpop, bimep_id, local_directory = data_dir ) aggregate_and_write( worldpop, worldpop_id ) aggregate_and_write( bimep_on_worldpop, bimep_id )
tm_shape(worldpop) + tm_raster()
tm_shape(bimep_on_worldpop) + tm_raster()
This WorldPop dataset is the constrained, individual-country data.
Called gnq_ppp_2020_UNadj_constrained.tif.
wp_options <- list(adjusted = "gnq_ppp_2020_UNadj_constrained.tif") worldpop_constrained <- raster::raster(fs::path( data_dir, "worldpop_constrained", wp_options[["adjusted"]])) worldpop_constrained_na <- raster::crop(worldpop_constrained, bioko_sf, snap = "out") worldpop_c_zero_mask <- raster::rasterize(bioko_sf, worldpop_constrained_na, field = 0) worldpop_constrained2020 <- raster::cover(worldpop_constrained_na, worldpop_c_zero_mask) # Here we apply a growth rate to go from 2020 to 2018. It's rough but an improvement. worldpop_constrained <- exp(-2 * growth_rate_per_year) * worldpop_constrained2020 worldpop_constrained_id <- list(source = "WorldPop-C", grid = "WorldPop", resolution = 100) popbioko::write_aligned_raster( worldpop_constrained, worldpop_constrained_id, local_directory = data_dir ) aggregate_and_write( worldpop_constrained, worldpop_constrained_id )
tm_shape(worldpop_constrained) + tm_raster()
This is on the same grid as LandScan.
gpw = raster::raster(fs::path( data_dir, "gpw", "gpw_v4_population_count_rev11_2020_30_sec.tif")) gpw <- raster::crop(gpw, bioko_sf, snap = "out") gpw_id <- list(source = "GPW", grid = "LandScan", resolution = 1000) stopifnot(as.character(raster::crs(gpw)) == "+proj=longlat +datum=WGS84 +no_defs") stopifnot(raster::extent(gpw) == raster::extent(landscan)) stopifnot(raster::res(gpw) == raster::res(landscan))
Therefore, disaggregate this like we did LandScan.
factor <- 10 gpw100 <- raster::disaggregate(gpw, fact = factor) / factor^2 popbioko::write_aligned_raster( gpw, gpw_id, local_directory = data_dir ) popbioko::write_aligned_raster( gpw100, list(source = "GPW", grid = "LandScan", resolution = 100), local_directory = data_dir )
tm_shape(gpw) + tm_raster()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.