From r system("git config --get remote.origin.url", intern = TRUE)
on r date()
, generated by r Sys.info()[["effective_user"]]
.
Both BIMEP gold standard population data and maps from LandScan, WorldPop, and HRSL, were aligned and formatted by the get-data vignette. This notebook reads those datasets and calculates metrics from a population density map by comparing them with the BIMEP map.
library(dplyr) library(ggplot2) library(maptools) library(plotrix) library(sf) library(sp) library(spatstat) #library(viridis) # Crashing with R 4.0.0. library(popbioko) knitr::opts_chunk$set( dpi = 150 )
The maps were aligned and formatted by the "get-data" vignette. We load them all at once here for convenience. The naming scheme is dataset-on-grid, where grids are 100m or 1km, according to where they come from.
These maps are estimates of the number of people in each pixel.
data_dir <- params$data_directory map_root <- fs::path(data_dir, "aligned") files <- list.files(map_root, pattern = "*.tif$") files_df <- filenames_to_description(files) maps <- lapply(files, function(x) raster::raster(fs::path(map_root, x))) names(maps) <- rownames(files_df) files_df
There are 16 rows for (5 data sources) x (2 resolutions each) x (data source and BIMEP comparison).
We will need some canonical projections, too. Universal Transverse Mercator (UTM) is a projection that will give us the island measured in meters.
utm_projection <- "+proj=utm +zone=32N +ellps=WGS84 +no_defs +units=m +datum=WGS84" bioko_sf <- sf::st_read(fs::path(data_dir, "source", "bioko.shp"))
The first figure is summary statistics, done at the pixel level.
How should we calculate urban fraction? There isn't a single, standard definition. Countries each choose their own ways to assess the important demographic movement from rural to urban. These assessments often combine population density, administrative boundaries, resource availability (like sewer and water), and functional use patterns. We have one kind of data here, population density, and we're interested in how this data contributes to more nuanced urban metrics, but let's stick to density.
Equatorial Guinea defines urban population density as 1500 people
per square kilometer. That's 1500 people per pixel for a grid
with 1 km pixels and 15 people per pixel for a grid with 100 m pixels.
This is equivalent to r 1500 / (10^3)^2
people per meter squared.
We know we want to measure population density, but there are three ways to measure population density.
We measure pixel threshold as a number of people per square meter. We need to know the size of each pixel in order to do that. It helps to know how many pixels we're talking about, so we return the numerator and denominator of each fraction.
We can reuse this function for the kernel density estimation by applying it to the estimated density instead of the raw density.
Kernel density estimation is a way to determine density from a point set. We have a grid, not points, so we take a few steps.
This work will use the raster
and spatstat
packages
for statistics, and it will use the proj4
package
to project the points from latitude-longitude to UTM.
pop_raster <- maps[["LandScan on LandScan coarse"]] sum(raster::values(pop_raster) > 1500, na.rm = TRUE) / nrow(pop_raster) / ncol(pop_raster)
It will help to use the Bioko outlines for windowing in spatstat. You need to translate sf to sp to sp geometry to spatstat owin.
bioko_sp <- as(bioko_sf, Class = "Spatial") bioko_sp_polygon <- as(bioko_sp, "SpatialPolygons") # Cannot make an owin from an unprojected space. Apparently. bioko_proj_sf <- sf::st_transform(bioko_sf, crs = utm_projection) # The projected space will have points outside. Let's make a little # buffer to catch those points. meters <- 1 bioko_buffer_sf <- sf::st_buffer(bioko_proj_sf, 500 * meters) bioko_proj_sp <- as(bioko_proj_sf, Class = "Spatial") bioko_proj_polygon_sp <- as(bioko_proj_sp, "SpatialPolygons") bioko_owin <- as.owin(bioko_proj_polygon_sp)
Landscan measures people in thousands, so urban is pixels
above r 1500 / (10^3)^2 / 1000
. The population density is
on a map in meters, so let's count the pixel size in square meters.
Let's make population maps for every incoming map.
This step approximates the map values by projecting them from lat-long
to UTM coordinates. It uses bilinear interpolation. We avoid this kind
of approximation through most of our work. This step is less sensitive
to interpolation because it will construct point process models during
the population_count_estimator
.
kde_dir <- fs::path(params$data_directory, "kde_raster") if (!dir.exists(kde_dir)) { dir.create(kde_dir) } for (map_idx in 1:nrow(files_df)) { filename <- fs::path(kde_dir, files_df[map_idx, "filename"]) if (!file.exists(filename)) { # or method = "ngb" for nearest-neighbor proj_raster <- raster::projectRaster(maps[[map_idx]], crs = utm_projection, method = "bilinear") # Bilinear interpolation, which can cause negative values. proj_raster <- raster::clamp(proj_raster, lower = 0) pop_density_im <- population_count_estimator(proj_raster) pop_density_raster <- im_to_raster(pop_density_im, utm_projection) raster::writeRaster(pop_density_raster, filename = filename, format = "GTiff") } }
This makes density maps in a way that's very clear. For each pixel, it averages density within a circular, 1 square km neighborhood.
density_dir <- fs::path(params$data_directory, "density_raster") if (!dir.exists(density_dir)) { dir.create(density_dir) } for (map_idx in 1:nrow(files_df)) { filename <- fs::path(density_dir, files_df[map_idx, "filename"]) if (!file.exists(filename)) { # or method = "ngb" for nearest-neighbor proj_raster <- raster::projectRaster(maps[[map_idx]], crs = utm_projection, method = "ngb") pop_density_raster <- density_from_disc(proj_raster) raster::writeRaster(pop_density_raster, filename = filename, format = "GTiff") } }
The density maps are saved as GeoTIFFs in the density_raster
subdirectory,
so load them here before making the summary table. Loading from density_dir
means
we are using the disc.
density <- lapply(files_df$filename, function(x) raster::raster(fs::path(density_dir, x))) names(density) <- names(maps)
This function makes the first table in the paper. It makes data for two urban densities, 1000 people per square km and 1500 people per square km. We use the former to compare with the current numbers, but the latter is what Equatorial Guinea uses to define urban.
summary_help <- function(urban_cutoff) { function(map_idx) { popbioko::summary_statistics( maps[[map_idx]], density[[map_idx]], urban_per_kilometer_sq = urban_cutoff ) } } make_summary_df <- function(urban_cutoff) { summary_stats <- lapply(1:length(maps), summary_help(urban_cutoff)) summary_list <- do.call(rbind, summary_stats) rownames(summary_list) <- names(maps) summary_df <- as.data.frame(summary_list) summary_df["name"] <- rownames(summary_list) summary_df <- merge(summary_df, files_df, by = "name") } summary1000_df <- make_summary_df(1000) summary1500_df <- make_summary_df(1500) names(summary1500_df)
Those dataframes are both strings and numeric. When we rotate them, they will be all strings, so let's do the conversion by hand.
for_show <- function(summary_df) { urban_raw_percent <- as.numeric(summary_df[, "urban_raw"]) * 100 urban_percent <- as.numeric(summary_df[, "urban_fit"]) * 100 data.frame( source = summary_df$source, grid = summary_df$grid, resolution = as.character(sprintf("%.1f", summary_df$side_meters)), total = sprintf("%.0f", summary_df$total), max_density = sprintf("%.0f", summary_df$max_density), empty_percent = sprintf("%.4g", summary_df$empty_percent), na_percent = sprintf("%.4g", summary_df$na_percent), urban_raw = sprintf("%.4g", urban_raw_percent), urban_fit = sprintf("%.4g", urban_percent), pareto_fraction = sprintf("%.4g", 100 * as.numeric(summary_df$pareto_fraction)), stringsAsFactors = FALSE ) } show1000_df <- for_show(summary1000_df) show1500_df <- for_show(summary1500_df)
We're looking for trends that show there are fewer urban areas when we raise the urban criteria from 1000 people to 1500 people per square kilometer.
as.data.frame(t(show1000_df[show1000_df$grid == "HRSL", ]))
as.data.frame(t(show1500_df[show1500_df$grid == "HRSL", ]))
We would like to see whether the estimated urban fraction has a reasonable-looking gaussian kernel size.
sq_m_per <- c(square_meters_per_pixel.raster(maps[[2]]), square_meters_per_pixel.raster(density[[2]])) sq_m_per
cutoffs <- data.frame(urban_per_meter_sq = c(1000, 1500) / 10^6) cutoffs["raw"] <- cutoffs[["urban_per_meter_sq"]] * square_meters_per_pixel.raster(maps[[2]]) cutoffs["density"] <- cutoffs[["urban_per_meter_sq"]] * square_meters_per_pixel.raster(density[[2]]) cutoffs
Plot these with cutoffs to see that the maps are similar.
plot(maps[[2]] / cutoffs[1, "raw"]) plot(maps[[2]] / cutoffs[2, "raw"]) plot(density[[2]] / cutoffs[1, "density"]) plot(density[[2]] / cutoffs[2, "density"])
This shows all HRSL grid together, then all WorldPop, then all LandScan.
as.data.frame(t(show1000_df[show1000_df$grid == "HRSL", ]))
as.data.frame(t(show1000_df[show1000_df$grid == "WorldPop", ]))
as.data.frame(t(show1000_df[show1000_df$grid == "LandScan", ]))
This groups data with the same resolution.
as.data.frame(t(show1000_df[as.numeric(show1000_df$resolution) < 500, ]))
as.data.frame(t(show1000_df[as.numeric(show1000_df$resolution) > 500, ]))
display_order <- data.frame( source = c("BIMEP", "HRSL", "LandScan", "WorldPop-C", "WorldPop-U", "GPW", "BIMEP", "HRSL", "WorldPop-C", "WorldPop-U"), granularity = c(rep("coarse", 6), rep("fine", 4)), grid = c("HRSL", "HRSL", "LandScan", "WorldPop", "WorldPop", "LandScan", "HRSL", "HRSL", "WorldPop", "WorldPop") ) short_source <- c( BIMEP = "BIMEP", HRSL = "HRSL", LandScan = "LS", "WorldPop-C" = "WP-C", "WorldPop-U" = "WP-U", GPW = "GPW") show1000_df$granularity <- ifelse(as.numeric(show1000_df$resolution) < 500, rep("fine", nrow(show1000_df)), rep("coarse", nrow(show1000_df))) show1000_df display_order do1000_df <- merge(display_order, show1000_df, by = c("source", "granularity", "grid"), sort = TRUE) stopifnot(nrow(do1000_df) == nrow(display_order)) ord1000_df <- do.call(rbind, lapply(1:nrow(display_order), function(row_idx) { with(do1000_df, do1000_df[source == display_order[row_idx, "source"] & granularity == display_order[row_idx, "granularity"], ]) })) col_ord_df <- ord1000_df[, c("source", "resolution", "total", "max_density", "empty_percent", "urban_raw", "pareto_fraction")] col_ord_df$source <- short_source[col_ord_df$source] col_ord_df
Use this on command line: knitr::kable(t(col_ord_df), format = "latex")
Carlos asked how much the exact alignment of the grid matters. The gold data is point data. Grids that are shifted slightly will have slightly different values in the tables. So let's shift each major grid a bunch of times and see what the values are.
# With side_len = 10, this can take two hours to run. error_dir <- fs::path(params$data_directory, "errors") if (!dir.exists(error_dir)) { dir.create(error_dir) } side_len <- 10 for (grid in c("landscan", "hrsl", "worldpop")) { stats_file <- fs::path(error_dir, paste(grid, "csv", sep = ".")) if (!file.exists(stats_file)) { popbioko::summary_statistics_multiple_grids(grid, side_len, stats_file) } else { warning(paste("file already exists", stats_file)) } }
error_dir <- fs::path(params$data_directory, "errors") csvs <- do.call( rbind, lapply( list.files(error_dir, pattern = ".csv$", full.names = TRUE), function(csv) {read.csv(csv, header = TRUE, stringsAsFactors = TRUE)} ), )
I'm curious how much urban fit varies. What I see is that the mean doesn't change much but the more coarse grids have a larger standard deviation.
ggplot(csvs, aes(x = urban_fit, group = grid)) + geom_histogram(bins = 50) + facet_grid(grid ~ .)
The percent of NA values is the size of the border around the island. This is kind of a control. We see it doesn't vary a lot.
ggplot(csvs, aes(x = na_percent, group = grid)) + geom_histogram(bins = 200) + facet_grid(grid ~ .)
I don't know how I expect pareto fraction to vary as the grid size changes. I feel kind of surprised. I think this is a result of having small statistics and using a non-statistical estimator.
ggplot(csvs, aes(x = pareto_fraction, group = grid)) + geom_histogram(bins = 200) + facet_grid(grid ~ .)
The maximum pixel value, of the population, not the population density. It looks like there's more noise in the coarse grid.
ggplot(csvs, aes(x = max_density, group = grid)) + geom_histogram(bins = 200) + facet_grid(grid ~ .)
Let's look more closely by dividing by the mean in each case. We see that they all have about the same uncertainty in the maximum when you divide by the mean.
mean_max <- aggregate(max_density ~ grid, FUN = mean, data = csvs) names(mean_max)[names(mean_max) %in% "max_density"] <- "mean_max" with_mean <- merge(csvs, mean_max) with_mean["nmax"] <- with_mean$max_density / with_mean$mean_max ggplot(with_mean, aes(x = nmax, group = grid)) + geom_histogram(bins = 50) + facet_grid(grid ~ .)
OK, so the plots aren't terribly skewed. We should use regular confidence intervals instead of bootstrapping. Maybe small values should use a Wald. Let's make values with confidence intervals for each of these datasets and take a look.
add_ci_to_column <- function(df, column) { names(df)[names(df) %in% column] <- "target" counted <- aggregate(target ~ grid, data = df, FUN = function(x) sum(!is.na(x))) names(counted)[names(counted) %in% "target"] <- "n" mean_col <- aggregate(target ~ grid, FUN = mean, data = df) names(mean_col)[names(mean_col) %in% "target"] <- "mean_val" sd_col <- aggregate(target ~ grid, FUN = sd, data = df) names(sd_col)[names(sd_col) %in% "target"] <- "sd_val" with_mean <- merge(merge(counted, mean_col), sd_col) lower <- paste(column, "low", sep = "_") upper <- paste(column, "up", sep = "_") with_mean[lower] <- with(with_mean, mean_val - 1.96 * sd_val / sqrt(n)) with_mean[upper] <- with(with_mean, mean_val + 1.96 * sd_val / sqrt(n)) names(with_mean)[names(with_mean) %in% "mean_val"] <- column with_mean[names(with_mean) %in% c(column, lower, upper, "grid")] } add_sd_to_column <- function(df, column) { names(df)[names(df) %in% column] <- "target" counted <- aggregate(target ~ grid, data = df, FUN = function(x) sum(!is.na(x))) names(counted)[names(counted) %in% "target"] <- "n" mean_col <- aggregate(target ~ grid, FUN = mean, data = df) names(mean_col)[names(mean_col) %in% "target"] <- column sd_col <- aggregate(target ~ grid, FUN = sd, data = df) sd_name <- paste(column, "sd", sep = "_") names(sd_col)[names(sd_col) %in% "target"] <- sd_name together <- merge(mean_col, sd_col, by = "grid") together[names(together) %in% c(column, sd_name, "grid")] } zdf <- data.frame( rate = c(rnorm(10, mean = 0.5, sd = 0.2), rnorm(10, mean = 0.7, sd = 0.1)), grid = c(rep("HRSL", 10), rep("LandScan", 10)), stringsAsFactors = FALSE ) # A little test zzdf <- add_ci_to_column(zdf, "rate") stopifnot(nrow(zzdf) == 2) stopifnot(all(zzdf$rate_low < zzdf$rate)) stopifnot(all(zzdf$rate < zzdf$rate_up)) zzdf <- add_sd_to_column(zdf, "rate") zzdf
consider_error <- c("max_density", "empty_percent", "urban_fit", "pareto_fraction") stopifnot(all(consider_error %in% names(csvs))) single_ci <- function(col) add_sd_to_column(csvs, col) ans_df <- single_ci(consider_error[1]) for (col in consider_error[2:length(consider_error)]) { ans_df <- merge(ans_df, single_ci(col)) } t(ans_df)
This section makes figure 4, scatter plots of population data versus ground truth population data. The density plots use maps of either resolution to plot estimated density per square km. The population size plots instead use the count of people in a pixel.
Set up consistent colors for the different data sets.
sources_ordered <- c("BIMEP", "HRSL", "LandScan", "WorldPop-C", "WorldPop-U", "GPW") # Viridis is causing problems in one installation, so let's make our own. color_component <- function(x) sprintf("%02x", floor(ifelse(x >= 1.0, 255, x * 256.0))) as_color <- function(xxx) paste( c("#", color_component(xxx[1]), color_component(xxx[2]), color_component(xxx[3])), collapse = "") viridis_mock <- function(n) { float_colors <- read.csv(rprojroot::find_rstudio_root_file("vignettes/plasma.csv")) key_colors <- sapply(1:nrow(float_colors), function(n) as_color(as.numeric(float_colors[n, ]))) colorRampPalette(key_colors)(n) } #viridis_colors <- viridis::plasma(8, alpha = 1, begin = 0, end = 1)[c(1, 3, 5, 7)] viridis_colors <- viridis_mock(length(sources_ordered))
colors <- c(viridis_colors, grDevices::grey(0.7)) names(colors) <- c(sources_ordered, "grey") logarithmic_axis_labels <- c( "1", "10", "100", "1,000", expression(10^4), expression(10^5), expression(10^6))
log_offset <- function(offset) { function(x) { log(x + offset) } }
logoff <- log_offset(1) symbol_size <- c(0.5, 0.25) names(symbol_size) <- c("coarse", "fine") symbol_choice <- c(19, 15) names(symbol_choice) <- c("coarse", "fine") check_name <- function(value, which, res, source) { if (length(value) != 1) { cat(sprintf("Could not find %s-%s-%s\n", which, res, source)) } value } map_from_source_resolution <- function(source, resolution, map_list) { grid_to_use <- unique(files_df[files_df$source == source, "grid"]) xy_df <- files_df[files_df$resolution == resolution & files_df$grid == grid_to_use, ] x_map_name <- check_name(xy_df[xy_df$source == "BIMEP", "name"], "bimep", resolution, source) y_map_name <- check_name(xy_df[xy_df$source == source, "name"], "y", resolution, source) gv <- function(a_map_name) { if (!a_map_name %in% names(map_list)) { msg <- sprintf("%s not in map names", a_map_name) cat(paste(msg, "\n")) stop(msg) } raster::getValues(map_list[[a_map_name]]) } x <- gv(x_map_name) y <- gv(y_map_name) list(x = logoff(x), y = logoff(y)) } #' This gets population counts, not density. xy_from_source_resolution <- function(source, resolution) { map_from_source_resolution(source, resolution, maps) } density_from_source_resolution <- function(source, resolution) { map_from_source_resolution(source, resolution, density) } pop_scatter <- function( y_name, resolution, labels, title, maximum, background = NULL, counts = NULL ) { plot( vector(mode = "numeric", length = 0), xlim = c(0, maximum), ylim = c(0, maximum), xlab = labels[[1]], ylab = labels[[2]], main = title, xaxt = "n", yaxt = "n" ) axis(1, logoff(10^c(0:5)), 10^c(0:5)) axis(2, logoff(10^c(0:5)), 10^c(0:5)) segments(logoff(0), logoff(0), logoff(10^5), logoff(10^5)) segments(logoff(1), logoff(1), logoff(1), logoff(10^5), col = colors["grey"]) segments(logoff(1), logoff(1), logoff(10^5), logoff(1), col = colors["grey"]) if (is.null(counts)) { data_function <- density_from_source_resolution } else { data_function <- xy_from_source_resolution } for (plot_idx in 1:length(y_name)) { y_source <- y_name[plot_idx] y_resolution <- resolution[plot_idx] xy <- data_function(y_source, y_resolution) cex <- symbol_size[as.character(y_resolution)] pch <- symbol_choice[as.character(y_resolution)] if (resolution[plot_idx] == "coarse") { color <- colors[y_source] } else { color <- scales::alpha(colors[y_source], 0.2) } if (!is.null(background) & plot_idx %in% background) { color <- colors["grey"] } points(xy, col = color, cex = cex, pch = pch) } }
density_label <- c("BIMEP Population Density", "Mapped Population Density") maximum1k <- 1.01 * max(density_from_source_resolution("LandScan", "coarse")$y, na.rm = TRUE) scatter1k_together <- function() { maximum1k <- 1.01 * max(density_from_source_resolution("LandScan", "coarse")$y, na.rm = TRUE) subjects <- c("HRSL", "LandScan", "WorldPop-C", "WorldPop-U", "GPW", "HRSL", "WorldPop-C", "WorldPop-U") uniq_cnt <- length(unique(subjects)) resolution <- c(rep("coarse", uniq_cnt), rep("fine", length(subjects) - uniq_cnt)) pop_scatter(subjects, resolution, density_label, "A", maximum1k) uniq <- unique(subjects) legend("topleft", legend = uniq, col = colors[uniq], lty = 1, lwd = 2, cex = 1, bg = "white", bty = "n") } scatter1k_together()
scatter1k_ls <- function() pop_scatter( "LandScan", "coarse", density_label, "A) LandScan", maximum1k) scatter1k_ls()
scatter1k_hrsl <- function() pop_scatter( rep("HRSL", 2), c("fine", "coarse"), density_label, "B) HRSL", maximum1k, background = 1) scatter1k_hrsl()
scatter1k_wpc <- function() pop_scatter( rep("WorldPop-C", 2), c("fine", "coarse"), density_label, "C) WorldPop-C", maximum1k, background = 1) scatter1k_wpc()
scatter1k_wpu <- function() pop_scatter( rep("WorldPop-U", 2), c("fine", "coarse"), density_label, "D) WorldPop-U", maximum1k, background = 1) scatter1k_wpu()
scatter1k_gpw <- function() pop_scatter( rep("GPW", 2), c("fine", "coarse"), density_label, "E) GPW", maximum1k, background = 1) scatter1k_gpw()
maximum100 <- 6.2 pop_scatter_label <- c("BIMEP 100x100 m", "Population Size") scatter1k_hrsl_count <- function() pop_scatter( rep("HRSL", 1), c("fine"), pop_scatter_label, "F) HRSL, 100x100 m", maximum100, counts = TRUE) scatter1k_hrsl_count()
maximum100 <- 6.2 pop_scatter_label <- c("BIMEP 100x100 m", "Population Size") scatter1k_wpc_count <- function() pop_scatter( rep("WorldPop-C", 1), c("fine"), pop_scatter_label, "G) WP-C, 100x100 m", maximum100, counts = TRUE) scatter1k_wpc_count()
maximum100 <- 6.2 pop_scatter_label <- c("BIMEP 100x100 m", "Population Size") scatter1k_wpu_count <- function() pop_scatter( rep("WorldPop-U", 1), c("fine"), pop_scatter_label, "H) WP-U, 100x100 m", maximum100, counts = TRUE) scatter1k_wpu_count()
make_figure <- function() { #par(mfrow = c(5, 2), mar = c(5, 4.2, 3, 2)) par(mfrow = c(4, 2), mar = c(4, 3, 3, 2)) # scatter1k_together() scatter1k_ls() scatter1k_hrsl() scatter1k_wpc() scatter1k_wpu() scatter1k_gpw() scatter1k_hrsl_count() scatter1k_wpc_count() scatter1k_wpu_count() } popbioko::save_plot(make_figure, "scatter1k") make_figure()
This section builds the cumulative maps in figure 5. It separates the work into one part to generate x and y data and one part to decide axes and such for that data.
linear_from_source_resolution <- function(source, resolution, map_list) { grid <- unique(files_df[files_df$source == source, "grid"]) xy_df <- files_df[files_df$resolution == resolution & files_df$grid == grid, ] x_map_name <- xy_df[xy_df$source == "BIMEP", "name"] y_map_name <- xy_df[xy_df$source == source, "name"] x <- raster::getValues(map_list[[x_map_name]]) y <- raster::getValues(map_list[[y_map_name]]) list(x = x, y = y) } sources <- c("BIMEP", "HRSL", "LandScan", "WorldPop-C", "WorldPop-U", "GPW") # This draws each line for the plots, taking care of color and line type. lines_for_datasets <- function(datasets, linewidth) { for (data_idx in 1:length(datasets)) { lines( datasets[[data_idx]][["x"]], datasets[[data_idx]][["y"]], type = "l", col = colors[datasets[[data_idx]][["source"]]], lwd = linewidth, lty = ifelse(datasets[[data_idx]][["resolution"]] == "coarse", 1, 2) ) } }
cdfAreaLines <- function(H, source, resolution, linewidth = 2) { H <- sort(H) area <- c(1:length(H)) / length(H) lines( area, cumsum(H) / sum(H), col = colors[source], lwd = linewidth, lty = ifelse(resolution == "coarse", 1, 2) ) } cdfAreaFigure <- function(llwd = 1, label = "") { plot( vector(mode = "numeric", length = 0), xlim = c(0, 1), ylim = c(0, 1), xlab = "% Area", ylab = "Proportion of Population", main = label ) sources <- c("BIMEP", "HRSL", "LandScan", "WorldPop-C", "WorldPop-U", "GPW") for (source in sources) { for (resolution in c("coarse", "fine")) { if (source == "BIMEP") { data <- linear_from_source_resolution("HRSL", resolution, density)$x } else { data <- linear_from_source_resolution(source, resolution, density)$y } cdfAreaLines(data, source, resolution, linewidth = llwd) } } legend("topleft", legend = sources, col = colors[sources], lty = 1, lwd = 2, cex = 1) } cdfAreaFigure(2)
cdfDens <- function(H, log_base) { H <- sort(H) mx <- ceiling(log(max(H), base = log_base)) c2 <- 1:mx ixz <- which(H == 0) Pz <- length(ixz) for (i in c2) { ix <- which(H < log_base^i) Pz <- c(Pz, sum(H[ix])) } pop <- c(0, log_base^c2) cbind(pop, Pz) } logPopAreaLines <- function(H, source, resolution, log_base, linewidth = 2) { dd <- cdfDens(H,log_base)[-1,] P <- dd[,1] H <- dd[,2] l <- log10(max(H)) lines( P, H / max(H), col = colors[source], lwd = linewidth, lty = ifelse(resolution == "coarse", 1, 2) ) } logPopAreaFigure <- function(llwd = 1, log_base = 2, label = "") { plot( vector(mode = "numeric", length = 0), log = "x", xlim = c(1.05, 0.5 * 10^5), ylim = c(0, 1), xlab = "log(Pop Dens)", ylab = "Proportion of Population", xaxt = "n", main = label ) log_max <- 0 for (source in sources_ordered) { for (resolution in c("coarse", "fine")) { if (source == "BIMEP") { data <- linear_from_source_resolution("HRSL", resolution, density)$x } else { data <- linear_from_source_resolution(source, resolution, density)$y } log_max <- max(log_max, log10(max(data, na.rm = TRUE))) logPopAreaLines(data, source, resolution, log_base, linewidth = llwd) } } axis(1, 10^(0:(log_max + 1)), logarithmic_axis_labels[1:(log_max + 2)]) legend("topleft", legend = sources, col = colors[sources], lty = 1, lwd = 2, cex = 1) } logPopAreaFigure(2)
Now the third part of Figure 5, a density plot that is just density, not cumulative.
log_density <- function(H, base_power = 2) { H <- H[!is.na(H)] c2 <- ceiling(max(log(H[H > 0], base = base_power))) ixz <- which(H <= 0) upper <- 0 if (length(ixz) > 0) { HH <- H[-ixz] } else { HH <- H } Pz <- length(ixz) for (i in 1:c2) { upper <- c(upper, base_power^(i - 1)) ix <- which(HH < base_power^(i - 1)) if (length(ix) > 0) { Pz <- c(Pz, sum(HH[ix])) HH <- HH[-ix] } else { Pz <- c(Pz, 0) } } while (Pz[length(Pz)] == 0) { Pz <- Pz[-length(Pz)] upper <- upper[-length(Pz)] } cbind(round(Pz), upper) } single_test <- linear_from_source_resolution("LandScan", "coarse", density)$y dens_test <- log_density(single_test, base_power = 1.2) density_kernel_data <- function(base_power, offset = 0) { datasets <- vector(mode = "list", length = 2 * length(sources)) running_idx <- 1 for (source in sources_ordered) { for (resolution in c("coarse", "fine")) { if (source == "BIMEP") { data <- linear_from_source_resolution("HRSL", resolution, density)$x } else { data <- linear_from_source_resolution(source, resolution, density)$y } D <- log_density(data, base_power)[-1, 1] l <- length(D) x <- base_power^c(c(1:(l + 1))) * base_power^offset y <- c(D / sum(data, na.rm = TRUE), 0) datasets[[running_idx]] <- list(source = source, resolution = resolution, x = x, y = y, l = l) running_idx <- running_idx + 1 } } datasets } density_kernel_figure <- function(base_power, linewidth = 1, label = "") { datasets <- density_kernel_data(base_power) l = max(sapply(datasets, function(ds) ds$l)) plot( vector(mode = "numeric", length = 0), xaxt = "n", log = "x", xlim = c(5, base_power^(l + 2)), ylim = c(0, 0.4), xlab = "log(Pop Dens)", ylab = "Proportion of Population", main = label ) lines_for_datasets(datasets, linewidth) L <- 4 axis(1, 10^(0:(L + 1)) + 0.5, logarithmic_axis_labels[1:(L + 2)]) legend("topleft", legend = sources, col = colors[sources], lty = 1, lwd = 2, cex = 1) } density_kernel_figure(1.2, 2, "C")
cdf_figure <- function() { par(mfrow = c(3, 1), mar = c(5, 5, 3, 2)) cdfAreaFigure(2, label = "A") logPopAreaFigure(2, label = "B") density_kernel_figure(1.2, 2, "C") } save_plot(cdf_figure, "CDF") cdf_figure()
Figure 6 is proportion, accuracy, recall, and precision.
The following functions in compute the specificity and sensitivity of a map compared with a gold standard:
breakpoints <- c( empty = 0, rural1 = 1, rural2 = 50, periurban = 250, urban = 1000 ) break_colors <- c( empty = grey(0.95), rural1 = "ivory", rural2 = "lavender", periurban = "cornsilk", urban = "aliceblue" ) stopifnot(all(names(breakpoints) == names(break_colors))) lessThanCat <- function(tau, map, gold, km2 = 1) { tau <- tau * km2 accuracy_profile(truth_table(function(observed) {observed < tau}, gold, map)) } greaterThanCat <- function(tau, map, gold, km2 = 1) { tau <- tau * km2 accuracy_profile(truth_table(function(observed) {tau <= observed}, gold, map)) } betweenCat <- function(L, U, map, gold, km2 = 1) { L <- L * km2 U <- U * km2 # Choose comparison so that it is cadlag. accuracy_profile(truth_table(function(obs) {L <= obs & obs < U }, gold, map)) } fullCat <- function(map, gold, km2 = 1) { with(as.list(breakpoints), { data.frame(as.vector(rbind( empty = lessThanCat(rural1, map, gold, km2), rural1 = betweenCat(rural1, rural2, map, gold, km2), rural2 = betweenCat(rural2, periurban, map, gold, km2), periurban = betweenCat(periurban, urban, map, gold, km2), urban = greaterThanCat(urban, map, gold, km2) ))) }) }
landAA <- function(H, breakpoints, area = 1) { stopifnot(class(breakpoints) == "numeric") breakpoints <- c(breakpoints, Inf) / area Pz <- rep(0, length(breakpoints)) for (cut_idx in 1:length(breakpoints)) { ixz <- which(H < breakpoints[cut_idx]) Pz[cut_idx] <- length(ixz) if (Pz[cut_idx] > 0) { H <- H[-ixz] } # Nothing to remove } Pz / sum(Pz) } over_arrays <- function(transform) { datasets <- vector(mode = "list", length = 2 * length(sources)) running_idx <- 1 for (source in sources) { for (resolution in c("coarse", "fine")) { if (source == "BIMEP") { data <- linear_from_source_resolution("HRSL", resolution, density)$x } else { data <- linear_from_source_resolution(source, resolution, density)$y } datasets[[running_idx]] <- list(source = source, resolution = resolution, x = transform(data)) running_idx <- running_idx + 1 } } datasets } proportion_data <- function(breakpoints) { over_arrays(function(data) { landAA(data, breakpoints) }) } example_data <- proportion_data(c(1, 50, 250, 1000))
#' Make shaded backgrounds for all four figures. #' @param splits The sides of the boxes, where colors change. make_boxes <- function(splits) { stopifnot(length(splits) == length(break_colors) + 1) boxes <- vector(mode = "list", length = length(break_colors)) for (color_idx in 1:length(break_colors)) { boxes[[color_idx]] <- list(splits[color_idx], splits[color_idx + 1], break_colors[color_idx]) } makebox <- function(x1, x2, color) { xx <- c(x1, x1, x2, x2) yy <- c(0,1,1,0) polygon(cbind(xx, yy), border = NA, col = color) } for (box_idx in 1:length(boxes)) do.call(makebox, boxes[[box_idx]]) } pdfAreaFigure <- function(lb = "") { # Start from 2 because this function uses the upper cutoff. prop_data <- proportion_data(breakpoints[2:length(breakpoints)]) plot( vector(mode = "numeric", length = 0), xlim = c(1, 7), ylim = c(0, 1), xaxt = "n", xlab = "", ylab = "Proportion", main = lb ) axis(1, c(2, 2:5 + 0.5), c(0,1,50,250,1000)) make_boxes(1:6 + 0.5) offset_idx <- 0 y_data <- NULL for (find_source in sources_ordered) { for (data_idx in 1:length(prop_data)) { y_data <- with(prop_data[[data_idx]], { if ((source == find_source) & (resolution == "coarse")) { return(x) } }) if (!is.null(y_data)) break } offset <- -.21 + 0.14 * offset_idx stopifnot(!is.null(y_data)) lines(2:6 + offset, y_data, type = "h", lwd = 5, col = colors[find_source], lty = 1) offset_idx <- offset_idx + 1 } legend("topright", legend = sources_ordered, col = colors[sources_ordered], lty = 1, lwd = 2, cex = 1, bg = "white") #mtext("A", side = 3, line = 0.2, outer = FALSE, cex = 1.6) } pdfAreaFigure()
Precision is also known as positive predictive value, so it's the ppv variable.
get_measure <- function(x, map, gold, measure) { if (x == 0) { lessThanCat(x, map, gold)[[measure]] } else { greaterThanCat(x, map, gold)[[measure]] } } getPPV <- function(x, map, gold) get_measure(x, map, gold, "ppv")
compare_sources <- sources[!sources %in% "BIMEP"] ppv_data <- sapply(compare_sources, function(x) { linear_from_source_resolution(x, "coarse", density) }) dimnames(ppv_data) <- list(c("x", "y"), compare_sources) mn <- 1 ppv_mx <- 0.7 * max(ppv_data["y", "LandScan"][[1]], na.rm = TRUE) tk <- c(breakpoints, 4000, 10000) xx <- c(0, seq(sqrt(mn), sqrt(ppv_mx), length.out = 100)^2) xxp <- sqrt(xx) tk0 <- sqrt(tk) pw <- 1 / 3.5 xx <- c(0, seq(mn^pw, ppv_mx^pw, length.out = 100)^(1 / pw)) xxp <- xx^pw tk0 <- tk^pw ppv_splits <- c(breakpoints, ppv_mx)^pw generate_accuracy_y <- function(xx, measure) { ppv_y <- matrix(0, nrow = length(xx), ncol = length(compare_sources)) for (source_idx in 1:length(compare_sources)) { source <- compare_sources[source_idx] gold <- ppv_data["x", source][[1]] observed <- ppv_data["y", source][[1]] ppv_y[1:length(xx), source_idx] <- as.vector(unlist(sapply( xx, function(x, map, gold) get_measure(x, map, gold, measure), map = observed, gold = gold ))) } colnames(ppv_y) <- compare_sources ppv_y } ppv_y <- generate_accuracy_y(xx, "ppv")
PPVProfile <- function(ppv_y, name, lb = "") { plot( vector(mode = "numeric", length = 0), xlim = c(0, max(xxp)), ylim = c(0, 1), xaxt = "n", xlab = "Population Density", ylab = name, main = lb ) axis(1, tk0, tk) make_boxes(ppv_splits) for (col_idx in 1:ncol(ppv_y)) { source <- colnames(ppv_y)[col_idx] lines(xxp, as.numeric(ppv_y[, col_idx]), type = "s", col = colors[source], lwd = 2) } } PPVProfile(ppv_y, "Precision")
acc_y <- generate_accuracy_y(xx, "acc") PPVProfile(acc_y, "Accuracy")
sens_y <- generate_accuracy_y(xx, "sens") PPVProfile(sens_y, "Sensitivity")
area_figure <- function() { par(new, mfrow = c(2, 2), mar = c(4, 3, 2, 2)) pdfAreaFigure(lb = "A") PPVProfile(acc_y, "Accuracy", lb = "B") PPVProfile(sens_y, "Sensitivity", lb = "C") PPVProfile(ppv_y, "Precision", lb = "D") } save_plot(area_figure, "AccuracyProfile") area_figure()
This is table 3, which has accuracy, recall, and precision for ranges of values.
HRSL <- fullCat(ppv_data["y", "HRSL"][[1]], ppv_data["x", "HRSL"][[1]]) LS <- fullCat(ppv_data["y", "LandScan"][[1]], ppv_data["x", "LandScan"][[1]]) WP <- fullCat(ppv_data["y", "WorldPop-C"][[1]], ppv_data["x", "WorldPop-C"][[1]]) WPU <- fullCat(ppv_data["y", "WorldPop-U"][[1]], ppv_data["x", "WorldPop-U"][[1]]) GPW <- fullCat(ppv_data["y", "GPW"][[1]], ppv_data["x", "GPW"][[1]]) # ppv <- cbind(HRSL = as.vector(HRSL$ppv), LS = as.vector(LS$ppv), WP = as.vector(WP$ppv)) # sens <- cbind(HRSL = as.vector(HRSL$sens), LS = as.vector(LS$sens), WP = as.vector(WP$sens)) # acc <- cbind(HRSL = as.vector(HRSL$acc), LS = as.vector(LS$acc), WP = as.vector(WP$acc)) ppv <- cbind(HRSL = as.numeric(HRSL$ppv), LS = as.numeric(LS$ppv), WPC = as.numeric(WP$ppv), WPU = as.numeric(WPU$ppv), GPW = as.numeric(GPW$ppv)) sens <- cbind(HRSL = as.numeric(HRSL$sens), LS = as.numeric(LS$sens), WPC = as.numeric(WP$sens), WPU = as.numeric(WPU$sens), GPW = as.numeric(GPW$sens)) acc <- cbind(HRSL = as.numeric(HRSL$acc), LS = as.numeric(LS$acc), WPC = as.numeric(WP$acc), WPU = as.numeric(WPU$acc), GPW = as.numeric(GPW$acc)) accuracy_as_table <- signif(cbind(acc, sens, ppv), 3) rownames(accuracy_as_table) <- rownames(HRSL) t(accuracy_as_table)
Remember that you can turn this into LaTex using the command
knitr::kable(t(accuracy_as_table), format = "latex")
.
The goodness of fit ratio (GOFR) is Dave's metric. It's both described in the paper and in ``Accuracy Metrics'' under the section SSD Profiles. Sean made an important comment about the GOFR, that it doesn't use an estimator where he expected one in a statistical model. He suggested we look at variance calculations and make the GOFR more similar to that.
We need the admin2 for Bioko in order to recreate a table with GOFR for both the island and the four admin2 regions on the island.
# The input map is in UTM, so we transform it to lat-long in order to agree # with the raster coordinates of all rasters. admin2_sf <- sf::st_transform(sf::st_read(rprojroot::find_package_root_file( "inst/extdata/admin2/bioko_admin2_fulldistricts.shp")), raster::crs(maps[1][[1]])) admin2_st <- as(st_geometry(admin2_sf), "Spatial") admin2_st_df <- as(admin2_sf[, "OBJECTID"], "Spatial") tmap::tm_shape(admin2_sf) + tmap::tm_polygons("admin2")
The GOFR itself is Sum-of-squared-errors divided by variance. We want to calculate this for the whole island and for districts on the island. Each GOFR calculation takes two maps as input, a source map, such as HRSL-fine, and a comparison map, which would be BIMEP-on-HRSL-fine-grid. So we will iterate through source maps, taking the source and the comparison map.
calculate_gofr <- function(normalize_gofr, gofr_map_type) { # For each grid and resolution, there should be a map and a BIMEP version of that map. admin_names <- c("Baney", "Malabo", "Luba", "Riaba") # If these are out of order, then the names in admin_names are in the wrong order. stopifnot(admin2_sf$OBJECTID == 1:4) gofr <- files_df[files_df$source != "BIMEP", ] rownames(gofr) <- paste(gofr$source, gofr$resolution) gofr$GOFR <- numeric(nrow(gofr)) gofr$Baney <- numeric(nrow(gofr)) gofr$Malabo <- numeric(nrow(gofr)) gofr$Luba <- numeric(nrow(gofr)) gofr$Riaba <- numeric(nrow(gofr)) # This is the GOFR equation. This defines it. gofr_of_maps_gold_variance <- function(pops_map, pops_bimep, normalize = FALSE) { if (normalize) { pops_bimep <- pops_bimep / sum(pops_bimep, na.rm = TRUE) pops_map <- pops_map / sum(pops_map, na.rm = TRUE) } pixel_cnt <- sum(is.finite(pops_map)) # Ignore NA because they tell us where there is water. Rasters are designed to agree on water location. pixel_error <- sum((pops_map - pops_bimep)^2, na.rm = TRUE) / pixel_cnt pixel_error / var(pops_bimep, na.rm = TRUE) } # This is the GOFR equation. This defines it. gofr_of_maps <- function(pops_map, pops_bimep, normalize = FALSE) { if (normalize) { pops_bimep <- pops_bimep / sum(pops_bimep, na.rm = TRUE) pops_map <- pops_map / sum(pops_map, na.rm = TRUE) } pixel_cnt <- sum(is.finite(pops_map)) # Ignore NA because they tell us where there is water. Rasters are designed to agree on water location. pixel_error <- sum((pops_map - pops_bimep)^2, na.rm = TRUE) pixel_error / sum((pops_map - mean(pops_bimep, na.rm = TRUE))^2, na.rm = TRUE) } for (gofr_make_idx in 1:nrow(gofr)) { gofr_source <- gofr[gofr_make_idx, "source"] gofr_grid <- gofr[gofr_make_idx, "grid"] gofr_resolution <- gofr[gofr_make_idx, "resolution"] bimep_name <- files_df[ files_df$grid == gofr_grid & files_df$resolution == gofr_resolution & files_df$source == "BIMEP", "name" ] pops_bimep <- raster::getValues(gofr_map_type[bimep_name][[1]]) compare_map <- gofr_map_type[gofr[gofr_make_idx, "name"]][[1]] pops_map <- raster::getValues(compare_map) stopifnot(length(pops_bimep) == length(pops_map)) # GOFR for the whole island. gofr$GOFR[gofr_make_idx] <- gofr_of_maps(pops_bimep, pops_map, normalize_gofr) # Iterate through the four admin2 by ID. segmented <- raster::getValues(raster::rasterize( admin2_st, compare_map, field = admin2_sf$OBJECTID)) for (adm2_idx in 1:length(admin_names)) { sub_bimep <- pops_bimep[segmented == adm2_idx] sub_pops <- pops_map[segmented == adm2_idx] gofr[[admin_names[adm2_idx]]][gofr_make_idx] <- gofr_of_maps(sub_bimep, sub_pops, normalize_gofr) } } gofr } # Set to maps or density for original map or density surface. gofr_map_type <- maps gofr_absolute <- calculate_gofr(FALSE, gofr_map_type) gofr_normalized <- calculate_gofr(TRUE, gofr_map_type) choose <- 1:nrow(gofr_normalized) cols <- c("grid", "resolution", "GOFR", "Baney", "Luba", "Malabo", "Riaba") gofr_table <- rbind(gofr_absolute[choose, cols], gofr_normalized[choose, cols]) gofr_table
The incoming maps are HRSL fine, WorldPop fine, and LandScan coarse.
# This is here to show how to get the table into LaTeX. Add format = "latex" to the line below. # You'll want to run this in the terminal and copy to the document. knitr::kable(t(gofr_table[, !names(gofr_table) %in% c("resolution")]), digits = 4)
Let's remind ourselves what the WorldPop looks like, because those numbers are huge.
plot(maps[["WorldPop-U on WorldPop fine"]])
We need a couple of pretty plots.
library(tmap) # I made this outline from the admin2 borders using QGis b/c the file # made with sf::st_combine wasn't valid. bbioko <- sf::st_transform(sf::st_read(rprojroot::find_package_root_file( "inst/extdata/Bioko_outline.shp")), raster::crs(maps[1][[1]])) qleg <- FALSE qbreaks <- c(0, 1, 10, 100, 1000, 10000, 100000) qnames <- c("<1", "1-10", "10-100", "100-1000", "1000-10000", ">10000") qpal <- c("#BBBBBBFF", viridis_mock(length(qnames) - 1)) # raster::projection(density[["BIMEP on HRSL coarse"]]) <- "+proj=longlat +datum=WGS84 +no_defs" # tmap::tm_shape(bbioko) + tmap::tm_borders(col = "black", lwd = 1) + for (set_crs_idx in 1:length(density)) { raster::crs(density[[set_crs_idx]]) <- utm_projection } qbm <- tm_shape(density[["BIMEP on HRSL coarse"]]) + tm_raster(title = "Population", palette = qpal, breaks = qbreaks, labels = qnames) + tm_layout(title = "BIMEP", frame = FALSE) + tm_legend(show = TRUE, legend.position = c(0.02, .01)) qwpc <- tm_shape(density[["WorldPop-C on WorldPop coarse"]]) + tm_raster(title = "qwpc", palette = qpal, breaks = qbreaks, labels = qnames) + tm_layout(title = "WorldPop-C", frame = FALSE) + tm_legend(show = qleg) qwpu <- tm_shape(density[["WorldPop-U on WorldPop coarse"]]) + tm_raster(title = "qwpu", palette = qpal, breaks = qbreaks, labels = qnames) + tm_layout(title = "WorldPop-U", frame = FALSE) + tm_legend(show = qleg) gpw <- tm_shape(density[["GPW on LandScan coarse"]]) + tm_raster(title = "gpw", palette = qpal, breaks = qbreaks, labels = qnames) + tm_layout(title = "GPW", frame = FALSE) + tm_legend(show = qleg) qls <- tm_shape(density[["LandScan on LandScan coarse"]]) + tm_raster(title = "qls", palette = qpal, breaks = qbreaks, labels = qnames) + tm_layout(title = "LandScan", frame = FALSE) + tm_legend(show = qleg) qhrsl <- tm_shape(density[["HRSL on HRSL coarse"]]) + tm_raster(title = "qhrsl", palette = qpal, breaks = qbreaks, labels = qnames) + tm_layout(title = "HRSL", frame = FALSE) + tm_legend(show = qleg) quadmap <- tmap_arrange(qbm, qls, qhrsl, qwpc, gpw, qwpu, ncol = 2, nrow = 3) tmap_save(quadmap, "islandquad.png") tmap_save(quadmap, "islandquad.pdf") quadmap
library(tmap) bcrop <- raster::extent(c(8.685, 8.85, 3.65, 3.80)) # for maps, not densities. #bcrop <- raster::extent(c(440000, 490000, 350000, 424000)) original_map_names <- c( "BIMEP on HRSL fine", "HRSL on HRSL fine", "LandScan on LandScan coarse", "WorldPop-C on WorldPop fine", "GPW on LandScan coarse", "WorldPop-U on WorldPop fine" ) short_names <- vapply(strsplit(original_map_names, " "), function(x) x[[1]], FUN.VALUE = character(1)) bmaps <- maps[original_map_names] bbreaks <- c(0, 1, 5, 10, 50, 100) bnames <- c("0-1", "1-5", "5-10", "10-50", "50-100") #bnames[seq(2, length(bnames), 2)] <- "" bpal <- c("#BBBBBBFF", viridis_mock(length(bbreaks) - 1))
The LandScan plot doesn't look right, so let's take a closer look before including it with the rest.
map_idx <- 3 landscan_closeup <- raster::crop(bmaps[map_idx][[1]], bcrop) raster::res(landscan_closeup)[[1]] prod(raster::res(landscan_closeup)) raster::plot(landscan_closeup, colNA="black") raster::hist(landscan_closeup) vv <- raster::getValues(landscan_closeup) c(min(vv, na.rm = TRUE), max(vv, na.rm = TRUE)) vv tm_shape(landscan_closeup + 0.1) + tmap_options(max.categories = 1936) + tm_raster(colorNA = "black") + tm_layout(frame = TRUE)
bcropped <- raster::crop(bmaps[1][[1]], bcrop) fig2_order <- c("BIMEP", "LandScan", "HRSL", "WorldPop-C", "GPW", "WorldPop-U") stopifnot(all(fig2_order %in% short_names)) stopifnot(all(short_names %in% fig2_order)) # Adding 0.1 is a way to put integers in the middle of the breakpoints. small_maps <- lapply(1:length(fig2_order), function(map_idx) { chosen_bmap <- which(startsWith(names(bmaps), fig2_order[map_idx])) crop_map <- raster::crop(bmaps[chosen_bmap][[1]], bcrop) + 0.1 # For maps with 10x the side length. density_adjust <- (.3 / 360)^2 / prod(raster::res(crop_map)) tm_shape(crop_map * density_adjust) + tm_raster(title = short_names[map_idx], palette = bpal, breaks = bbreaks, labels = bnames) + tm_legend(show = FALSE) + tm_layout(title = LETTERS[map_idx], frame = FALSE) }) bleg <- tm_shape(bcropped) + tm_raster(title = "Population", palette = bpal, breaks = bbreaks, labels = bnames) + tm_legend(legend.only = TRUE, position = c("center", "top"), title.size = 2, text.size = 1.5) small_maps[length(small_maps) + 1] <- bleg small_maps[c("ncol", "nrow")] <- c(2, 3) #closeup <- tmap_arrange(bp1, bp2, bp3, bleg, ncol = 2, nrow = 2) # closeup <- do.call(tmap_arrange, small_maps) sm <- small_maps closeup <- tmap_arrange(sm[[1]], sm[[2]], sm[[3]], sm[[4]], sm[[5]], sm[[6]], nrow = 3, ncol = 2) tmap_save(closeup, "closeup.png") tmap_save(closeup, "closeup.pdf") closeup
bp1 <- tm_shape(bcropped) + tm_raster(title = "BIMEP", palette = bpal, breaks = bbreaks, labels = bnames) + tm_legend(show = FALSE) + tm_layout(title = "A", frame = FALSE) bp2 <- tm_shape(raster::crop(bmaps[2][[1]], bcrop)) + tm_raster(title = "WorldPop", palette = bpal, breaks = bbreaks, labels = bnames) + tm_legend(show = FALSE) + tm_layout(title = "B", frame = FALSE) bp3 <- tm_shape(raster::crop(bmaps[3][[1]], bcrop)) + tm_raster(title = "HRSL", palette = bpal, breaks = bbreaks, labels = bnames) + tm_legend(show = FALSE) + tm_layout(title = "C", frame = FALSE)
Note, for comparison, how diffuse the density map is, because the density map averages over a disc that's a square kilometer in area.
bcrop <- raster::extent(c(465000, 490000, 400000, 424000)) bmaps <- density[c(3, 12, 4)] # bimep, worldpop, hrsl raster::plot(raster::crop(bmaps[1][[1]], bcrop))
This makes the plot for figure 7.
cdfAreaLines <- function(H, source, resolution, linewidth = 2) { H <- sort(H) area <- c(1:length(H)) / length(H) lines( area, cumsum(H) / sum(H), col = colors[source], lwd = linewidth, lty = ifelse(resolution == "coarse", 1, 2) ) } cdfAreaFigure <- function(llwd = 1, label = "") { plot( vector(mode = "numeric", length = 0), xlim = c(0, 1), ylim = c(0, 1), xlab = "% Area", ylab = "Proportion of Population", main = label ) sources <- c("BIMEP", "HRSL", "LandScan", "WorldPop-C", "WorldPop-U", "GPW") for (source in sources) { for (resolution in c("coarse", "fine")) { if (source == "BIMEP") { data <- linear_from_source_resolution("HRSL", resolution, density)$x } else { data <- linear_from_source_resolution(source, resolution, density)$y } cdfAreaLines(data, source, resolution, linewidth = llwd) } } legend("topleft", legend = sources, col = colors[sources], lty = 1, lwd = 2, cex = 1) } #cdfAreaFigure(2)
# Get the pfpr data. source_data <- "/home/adolgert/dev/popbioko/inst/extdata/source" load(file.path(source_data, "popData_image.RData")) load(file.path(source_data, "pfpr_pops.RData")) popk2 <- distinct(popk, popk$areaId, .keep_all = T) poppr <- merge(mcprk, popk2, by = "areaId", all.x = T) poppr <- poppr[!is.na(poppr$pop1k), ] PfPR <- poppr$pfpr H0 <- poppr$pop1k / sum(poppr$pop1k, na.rm = T) H1 <- poppr$wppop / sum(poppr$wppop, na.rm = T) H2 <- poppr$lspop / sum(poppr$lspop, na.rm = T) H3 <- poppr$popfb / sum(poppr$popfb, na.rm = T)
library(st) library(sf) areas_sf <- sf::st_as_sf(areasgr) areas_df <- sf::st_set_geometry(areas_sf, NULL) stopifnot(all(sort(areas_df[[1]]) == areas_df[[1]])) library(raster) sources <- c( "BIMEP on HRSL fine", "HRSL on HRSL fine", "LandScan on LandScan fine", "WorldPop-C on WorldPop fine", "WorldPop-U on WorldPop fine", "GPW on LandScan fine") aligned <- list() for (align_source in sources) { base_name <- strsplit(align_source, " ")[[1]][1] save_name <- paste0(base_name, "_gr.Rda") if (file.exists(save_name)) { aligned[[align_source]] <- local({ load(save_name) bimep_gr }) } else { density_map <- density[[align_source]] bimep_gr <- raster::extract( density_map, areasgr, cellnumbers = TRUE, method = "bilinear", na.rm = TRUE, fun = mean) save(bimep_gr, file = save_name) aligned[[align_source]] <- bimep_gr } } on_bgrid <- lapply(aligned, function(pop_matrix) { pm <- as.array(pop_matrix) pm[is.nan(pm)] <- 0 # Also an artifact. pm[pm < 0] <- 0 # largest negative is 10^-11. From bilinear interpolation. pm }) on_bgrid[["areaId"]] <- areas_df[["FID"]] on_bgrid_df <- do.call(data.frame, on_bgrid) poppr <- merge(mcprk, on_bgrid_df, by = "areaId", all.x = TRUE) poppr <- poppr[complete.cases(poppr),] row.is.finite <- apply(poppr, 1, function(x) any(is.finite(x))) poppr <- poppr[row.is.finite, ]
prBYh_cdf <- function(Pf, H, B = 100) { xx <- seq(0, 1, length.out = B + 1) cdf <- 0 * xx for (i in 0:B) { ix <- which(Pf <= xx[i + 1]) cdf[i + 1] <- ifelse(length(ix) == 0, 0, sum(H[ix])) } cbind(xx, cdf) } prBYh_pdf <- function(Pf, H, B = 100) { xx <- prBYh_cdf(Pf, H, B) xmid <- (xx[-1, 1] + xx[-B - 1, 1]) / 2 pdf <- diff(xx[, 2]) cbind(xmid, pdf) } cdfPfPlot <- function(Pf, H, B = 100, clr = "black", llwd = 2) { xx <- prBYh_cdf(Pf, H, B) plot(xx[, 1], xx[, 2], type = "l", xlab = "PfPR", ylab = "Population fraction", main = "A", lwd = llwd, col = clr, xlim = c(0, .45), ylim = c(0, 1) ) } cdfPfLines <- function(Pf, H, B = 100, clr = "red", llwd = 2) { xx <- prBYh_cdf(Pf, H, B) lines(xx[, 1], xx[, 2], lwd = llwd, col = clr) } pdfPfPlot <- function(Pf, H, B = 100, llwd = 2, clr = "black") { xx <- prBYh_pdf(Pf, H, B) plot(xx[, 1], xx[, 2], col = clr, type = "l", lwd = llwd, xlim = c(0, .45), ylim = c(0, 0.30), xlab = "PfPR", ylab = "Population fraction", main = "B") } pdfPfLines <- function(Pf, H, B = 100, clr = "red", llwd = 2) { xx <- prBYh_pdf(Pf, H, B) lines(xx[, 1], xx[, 2], col = clr, lwd = llwd) } PfPR <- poppr$pfpr sources <- c("BIMEP", "HRSL", "LandScan", "WorldPop.C", "WorldPop.U", "GPW") pop_cols <- vapply(sources, function(x) names(poppr)[startsWith(names(poppr), x)], character(1)) pfprFigure <- function() { par(mfrow = c(1, 2)) for (cdf_idx in seq(pop_cols)) { BB <- 80 pops <- poppr[[pop_cols[cdf_idx]]] normed_data <- pops / sum(pops) short_name <- gsub("\\.", "-", names(pop_cols)[cdf_idx]) if (cdf_idx == 1) { cdfPfPlot(PfPR, normed_data, BB, clr = colors[short_name]) } else { cdfPfLines(PfPR, normed_data, BB, clr = colors[short_name]) } } short_names <- character(0) colors_used <- character(0) for (pdf_idx in seq(pop_cols)) { pops <- poppr[[pop_cols[pdf_idx]]] normed_data <- pops / sum(pops) short_name <- gsub("\\.", "-", names(pop_cols)[pdf_idx]) if (pdf_idx == 1) { pdfPfPlot(PfPR, normed_data, BB, clr = colors[short_name]) } else { pdfPfLines(PfPR, normed_data, BB, clr = colors[short_name]) } short_names <- c(short_names, short_name) colors_used <- c(colors_used, colors[short_name]) } legend("topright", legend = short_names, col = colors_used, lty = 1, lwd = 2, cex = 1 ) } pfprFigure() pdf("PRbypop_.pdf", width = 10, height = 5) pfprFigure() a <- dev.off()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.