README.md
In popRF: Random Forest-Informed Population Disaggregation

High resolution, recent data on human population distributions are important for measuring impacts of population growth, monitoring human-environment interactions and for planning and policy development. Many methods are used to disaggregate census data and predict population densities for finer scale, gridded population data sets. popRF is a population modelling R package utilizing Random Forests to inform a dasymetric redistribution of census-based population count data. A description of using Random Forests machine learning method in popRF is described in Stevens et al.

The popRF package can be installed directly from Github.

install.packages("devtools")
devtools::install_github("wpgp/popRF")

The popRF package has a demo function popRFdemo to generate a population layer using the WorldPop geospatial covariates and subnational census-based population estimates for 230 countries. All necessary covariates will be downloaded and used to disaggregat population. All input datasets use a geographical coordinate system (GCS) with WGS 1984 datum (EPSG:4326) in Geotiff format at a resolution of 3 arc-second (0.00083333333 decimal degree, approximately 100m at the equator).

The following script will produce a population layer for Nepal (NPL) using 4 cores.

library("popRF")

popRFdemo(project_dir="/home/user/demo",
          country="NPL", 
          cores=4)

library("popRF")

# Specifying a name of the file from which the unique area ID and corresponding 
# population values are to be read from. The file should contain two columns 
# comma-separated with the value of administrative ID and population without
# columns names. If it does not contain an absolute path, the file name is 
# relative to the current working directory

pop_table <- list("NPL"="/user/npl_population.csv")


# Specifying a nested list of named list(s), i.e. where each element of the
# first list is a named list object with atomic elements. The name of
# each named list corresponds to the 3-letter ISO code of a specified
# country. The elements within each named list define the specified
# input covariates to be used in the random forest model, i.e. the name
# of the covariates and the corresponding, if applicable and local, path
# to them. If the path is not a full path, it is assumed to be relative
# to the current working directory

input_cov <- list(
                  "NPL"= list(
                             "cov1" = "covariate1.tif",
                             "cov2" = "covariate2.tif"
                     )
                  )

# Specifying a named list where each element of the list defines the
# path to the input mastergrid(s), i.e. the template gridded raster(s)
# that contains the unique area IDs as their value. The name(s)
# corresponds to the 3-letter ISO code(s) of a specified country(ies).
# Each corresponding element defines the path to the mastergrid(s). If
# the path is local and not a full path, it is assumed to be relative to
# the current working directory                     

input_mastergrid <- list("NPL" = "npl_mastergrid.tif")

# Specifying a named list where each element of the list defines the path
# to the input country-specific watermask. The name corresponds to the
# 3-letter ISO code of a specified country. Each corresponding element
# defines the path to the watermask, i.e. the binary raster that
# delineates the presence of water (1) and non-water (0), that is used
# to mask out areas from modelling. If the path is local and not a full
# path, it is assumed to be relative to the current working directory.


input_watermask <- list("NPL" = "npl_watermask.tif")

# Specifying a named list where each element of the list defines the path
# to the input raster(s) containing the pixel area. The name corresponds
# to the 3-letter ISO code of a specified country. Each corresponding
# element defines the path to the raster whose values indicate the area
# of each unprojected (WGS84) pixel. If the path is local and not a full
# path, it is assumed to be relative to the current working directory.

input_px_area <- list("NPL" = "npl_px_area.tif")

# Running a model

res <- popRF(pop=pop_table,
             cov=input_cov,
             mastergrid=input_mastergrid,
             watermask=input_watermask,
             px_area=input_px_area,
             output_dir="/user/output",
             cores=4)

# Plot populataion raster
plot(res$pop)

# Plot Error via Trees
plot(res$popfit)

Population raster layer in GeoTiff format.

Contributions are welcome. Please raise or respond to an issue, or create a new branch to develop a feature/modification and submit a pull request.

#> citation("popRF")

#> To cite popRF in publications use:
#> 
#> Bondarenko M., Nieves J.J., Forrest R.S., Andrea E.G., Jochem C., Kerr D., and Sorichetta A. (2021): popRF: Random Forest-informed Population
#> Disaggregation R package, _Comprehensive R Archive Network (CRAN)_, url:https://cran.r-project.org/package=popRF.
#>
#> A BibTeX entry for LaTeX users is
#> 
#> @Manual{,
#>  title = {popRF: Random Forest-informed Population Disaggregation R package.},
#>  author = {Maksym Bondarenko and Jeremiah J Nieves and Forrest R. Stevens and Andrea E. Gaughan and Chris Jochem and David Kerr and Alessandro Sorichetta},
#>  year = {2021},
#>  journal = {Comprehensive R Archive Network (CRAN)},
#>  url = {https://cran.r-project.org/package=popRF},
#>  language = {English},
#> }

GNU General Public License v3.0 (GNU GPLv3)