From r system("git config --get remote.origin.url", intern = TRUE)
on r date()
, generated by r Sys.info()[["effective_user"]]
.
This is the GATHER compliance checklist for the paper, Measuring the accuracy of gridded human population density surfaces: a case study in Bioko Island, Equatorial Guinea [@stevens2016guidelines]. This checklist summarizes data and methods in order to be transparent. It's intended for global health audiences but is general enough to offer some clarity for this paper.
Define the indicator(s), populations (including age, sex, and geographic entities), and time period(s) for which estimates were made. - The study area is Bioko Island, Equatorial Guinea. The Bioko Island data is 2018. LandScan data is 2018. WorldPop Unconstrained is the 2018 version. WorldPop Constrained is for 2020, so we adjusted it with caveats. HRSL is current, which makes it a 2019 version, but it assigns numbers from census data, which is older. The age and sex are all ages and all sexes. It's a total population count.
List the funding sources for the work. - Bill and Melinda Gates Foundation BMGF OPP1110495 for the analysis done here. That's separate from funding for the Bioko Island Malaria Elimination Project that collected this data. BIMEP lists donors on its website: https://www.mcdinternational.org/donors.
Describe how the data were identified and how the data were accessed. - The BIMEP GPS-level data was critical to a previous collaboration among IHME, BIMEP, and other organizations in order to estimate malaria prevalence on Bioko Island.
Gridded datasets were included if a) they were publicly available b) they cover most of Africa c) for the last five years, and d) they include spatial information at a subnational level.
Gridded Population of the World (GPWv4.11).
Excluded datasets of note
U.S. Census Bureaus's country grids (Demobase). This covers only three countries: Haiti, South Sudan, and Pakistan.
Specify the inclusion and exclusion criteria. Identify all ad-hoc exclusions. The gridded maps chosen are all those public maps which mostly cover Africa. The HRSL, in particular, excludes South Sudan for ethical reasons, but otherwise covers Africa. We excluded the Gridded Population of the World because it is directly from census data, so it will be constant across Bioko, by design.
Provide information about all included data sources and their main characteristics.
a) BIMEP GPS data. This data is not public because it's house-level GPS coordinates. Source is the Bioko Island Malaria Elimination Project, part of MCDI. Their contact information is MCD International Office, 8401 Colesville Rd, Suite 425, Silver Spring, MD 20910, P: 301-562-1920, F: 301-562-1921, Email: mcdi@mcd.org.
b) WorldPop. Direct download. ftp://ftp.worldpop.org.uk/GIS/Population/Individual_countries/GNQ/Equatorial_Guinea_100m_Population.7z
c) HRSL. Direct download: https://ciesin.columbia.edu/repository/hrsl/hrsl_gin_v1.zip
d) LandScan. You have to register on their website. The data is called "LandScan Global 2018." https://landscan.ornl.gov/landscan-datasets
Identify data that have potentially important biases. - The BIMEP data covers 88% of households, not 100%. The missing 12% isn't geographically biased. It's from people not being home when they checked several times. People tend to move seasonally for work, so undercounting people in houses is less of an undercount for the actual number of people. We chose to use the data as is. For the other datasets, they are public goods, so we don't see bias.
Describe and give sources for any other data inputs. - There is a shapefile for Bioko island, constructed by the island itself. We use this to define island boundaries. We noticed that the island boundaries for LandScan and WorldPop had more area near the Southwest shoreline. It was less than one percent, and we went with the Bioko map because it is supported by GPS coordinates of houses.
Provide all input datasets in a format which can be extracted. - We've handled this by providing the code we used to do the calculation. That code will automatically download all of the data except the BIMEP GPS data and the Bioko shapefile. It's unfortunately not enough for someone else to rerun this code, but it is the exact code that created the outputs.
Provide a conceptual overview of the data analysis method. The paper text is about the analysis method.
Provide a detailed description of all steps. The paper is exactly about the math of the steps, but it's also provided in the code we make available.
Describe how candidate models were evaluated and the final models selected. This seems more for statistical models. We did do some work to evaluate what should be the goodness-of-fit ratio. We settled on the final choice of sum-of-squared-errors divided by gold standard variance because it most resembles relative error, which is (observed - measured) / measured. So that made sense. This work is in the repository as "gofr_comparison.pdf."
Provide results of an evaluation of model performance. - This, too, seems about linear models. We did do some sensitivity analysis of the metrics by shifting grids a little North or South. This work is in the supplement section about map comparison.
Describe methods of calculating uncertainty of the estimates. The point of this paper is that it's very hard for the people who make the maps to estimate uncertainty. Our premise is that the gold standard data lacks uncertainty. We could have guessed how much undercounting really happened, but we didn't approach that statistically.
State how analytic or statistical source code can be accessed. - https://github.com/dd-harp/population_comparison_bioko
Provide published estimates in a file from which it can be extracted. - These are on the github, for the tables. They are in the vignettes directory under summary_statistics.csv, gofr_table.csv, and accuracy.csv.
Report quantitative measures of the uncertainty of the estimates. We put those into the summary statistics table. That's our best measure, by translating the grid.
Interpret results in light of existing evidence. - In the paper.
Discuss limitations on estimates. - In the paper's last section.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.