popRF | R Documentation |
Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data.
popRF(pop, cov, mastergrid, watermask, px_area, output_dir, cores=0,
quant=FALSE, set_seed=2010, fset=NULL, fset_incl=FALSE,
fset_cutoff=20, fix_cov=FALSE, check_result=TRUE, verbose=TRUE,
log=FALSE, ...)
pop |
Character vector containing the name of the file from which the unique area ID and corresponding population values are to be read from. The file should contain two columns comma-separated with the value of administrative ID and population without columns names. If it does not contain an absolute path, the file name is relative to the current working directory. |
cov |
A nested list of named list(s), i.e. where each element of the first list is a named list object with atomic elements. The name of each named list corresponds to the 3-letter ISO code of a specified country. The elements within each named list define the specified input covariates to be used in the random forest model, i.e. the name of the covariates and the corresponding, if applicable and local, path to them. If the path is not a full path, it is assumed to be relative to the current working directory. Example for Nepal (NPL): list( "NPL"=list( "covariate1" = "covariate1.tif", "covariate2" = "covariate2.tif" ) ) #> $NPL #> $NPL$covariate1 #> [1] "covariate1.tif" #> #> $NPL$covariate2 #> [1] "covariate2.tif" |
mastergrid |
A named list where each element of the list defines the path to the input mastergrid(s), i.e. the template gridded raster(s) that contains the unique area IDs as their value. The name(s) corresponds to the 3-letter ISO code(s) of a specified country(ies). Each corresponding element defines the path to the mastergrid(s). If the path is local and not a full path, it is assumed to be relative to the current working directory. Example: list( "NPL" = "npl_mastergrid.tif" ) |
watermask |
A named list where each element of the list defines the path to the input country-specific watermask. The name corresponds to the 3-letter ISO code of a specified country. Each corresponding element defines the path to the watermask, i.e. the binary raster that delineates the presence of water (1) and non-water (0), that is used to mask out areas from modelling. If the path is local and not a full path, it is assumed to be relative to the current working directory. Example: list( "NPL" = "npl_watermask.tif" ) |
px_area |
A named list where each element of the list defines the path to the input raster(s) containing the pixel area. The name corresponds to the 3-letter ISO code of a specified country. Each corresponding element defines the path to the raster whose values indicate the area of each unprojected (WGS84) pixel. If the path is local and not a full path, it is assumed to be relative to the current working directory. Example: list( "NPL" = "npl_px_area.tif" ) #> $NPL #> [1] "npl_px_area.tif" |
output_dir |
Character vector containing the path to the directory for writing output files. Default is the temp directory. |
cores |
Integer vector containing an integer. Indicates the number of
cores to use in parallel when executing the function. If set to 0
|
quant |
Logical vector indicating whether to produce the quantile
regression forests (TRUE) to generate prediction intervals.
Default is |
set_seed |
Integer, set the seed. Default is |
fset |
Named list containing character vector elements that give the path to the directory(ies) containing the random forest model objects (.RData) with which we are using as a "fixed set" in this modeling, i.e. are we parameterizing, in part or in full, this RF model run upon another country's(ies') RF model object. The list should have two named character vectors, "final" and "quant", with the character vectors corresponding to the directory paths of the corresponding folders that hold the random forest model objects and the quantile regression random forest model objects, respectively. Numerous model objects can be in each folder "./final/" and "./quant/" representing numerous countries with the understanding that the model being run will incorporate all model objects in the folder, e.g. if a model object for Mexico and |
fset_incl |
Logical vector indicating whether the RF model object
will or will not be combined with another RF model run upon another
country's(ies') RF model object. Default is |
fset_cutoff |
Numeric vector containing an integer. This parameter is
only used if |
fix_cov |
Logical vector indicating whether the raster extent of the
covariates will be corrected if the extent does not match mastergrid.
Default is |
check_result |
Logical vector indicating whether the results will be
compared with input data. Default is |
verbose |
Logical vector indicating whether to print
intermediate output from the function to the console, which might be
helpful for model debugging. Default is |
log |
Logical vector indicating whether to print intermediate
output from the function to the log.txt file.
Default is |
... |
Additional arguments: |
This function produces gridded population density estimates using a Random Forest model as described in Stevens, et al. (2015) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.pone.0107042")}. The unit-average log-transformed population density and covariate summary values for each census unit are then used to train a Random Forest model (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1023/A:1010933404324")}) to predict log population density. Random Forest models are an ensemble, nonparametric modelling approach that grows a "forest" of individual classification or regression trees and improves upon bagging by using the best f a random selection of predictors at each node in each tree. The Random Forest is used to produced grid, i.e. pixel, level population density estimates that are used as unit-relative weights to dasymetrically redistribute the census based areal population counts. This function also allows for modelling based upon a regional parameterisation (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/17538947.2014.965761")}) of other previously run models as well as the creation of models based upon multiple countries at once (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.compenvurbsys.2019.01.006")}). This function assumes that all data is unprojected and is in the WGS84 coordinate system.
Raster* object of gridded population.
Maksym Bondarenko mb4@soton.ac.uk, Jeremiah J. Nieves J.J.Nieves@liverpool.ac.uk, Forrest R. Stevens forrest.stevens@louisville.edu, Andrea E. Gaughan ae.gaughan@louisville.edu, David Kerr dk2n16@soton.ac.uk, Chris Jochem W.C.Jochem@soton.ac.uk and Alessandro Sorichetta as1v13@soton.ac.uk
Stevens, F. R., Gaughan, A. E., Linard, C. & A. J. Tatem. 2015. Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data. PLoS ONE 10, e0107042 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.pone.0107042")}
L. Breiman. 2001. Random Forests. Machine Learning, 45: 5-32. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1023/A:1010933404324")}
Gaughan, A. E., Stevens, F. R., Linard, C., Patel, N. N., & A. J. Tatem. 2015. Exploring Nationally and Regionally Defined Models for Large Area Population Mapping. International Journal of Digital Earth, 12(8): 989-1006. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/17538947.2014.965761")}
Sinha, P., Gaughan, A. E, Stevens, F. R., Nieves, J. J., Sorichetta, A., & A. J. Tatem. 2019. Assessing the Spatial Sensitivity of a Random Forest Model: Application in Gridded Population Modeling. Computers, Environment and Urban Systems, 75: 132-145. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.compenvurbsys.2019.01.006")}
## Not run:
library("popRF")
pop_table <- list("NPL"="/user/npl_population.csv")
input_cov <- list(
"NPL"=list(
"cov1" = "covariate1.tif",
"cov2" = "covariate2.tif"))
input_mastergrid <- list("NPL" = "npl_mastergrid.tif")
input_watermask <- list("NPL" = "npl_watermask.tif")
input_px_area <- list("NPL" = "npl_px_area.tif")
res <- popRF(pop=pop_table,
cov=input_cov,
mastergrid=input_mastergrid,
watermask=input_watermask,
px_area=input_px_area,
output_dir="/user/output",
cores=4)
# Plot populataion raster
plot(res$pop)
# Plot Error via Trees
plot(res$popfit)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.