knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Areal Interpolation may be defined as the process of transforming data reported over a set of spatial units (source) to another (target). Its application to population data has attracted considerable attention during the last few decades. A massive amount of methods have been reported in the scientific literature. Most of them focus on the improvement of the accuracy by using more sophisticated techniques rather than developing standardized methods. As a result, only a few implementation tools exists within the R community.
One of the most common, easy and straightforward methods of Areal Interpolation is Areal Weighting Interpolation (AWI). AWI proportionately interpolates the population values of the source features based on areal (or spatial) weights calculated by the area of intersection between the source and the target zones.
sf and
areal
packages provide Areal Interpolation functionality within the R ecosystem. Both
packages implement (AWI). sf functionality comes up with extensive and 
intensive interpolation options and calculates the areal weights based on the 
total area of the source features (total weights). sf functionality is suitable 
for completely overlapping data. areal extends the existing functionality of 
the sf package by introducing an additional formula for data without complete overlap. 
In this case weights are calculated using the sum of the remaining source areas 
after the intersection (sum weights).
When the case involves Areal Interpolation of urban population data (small scale
applications) where the source features (such as city blocks or census tracts) 
are somehow larger than target features (such as buildings) in terms of footprint
area the sf functionality (total weights) is unable to calculate areal weights 
properly and therefore, is not ideal for such applications. areal functionality
may be confusing for novice R (or GIS) users as it is not obvious that the weight
option should be set to sum to calculate areal weights correctly.
To overcome these limitations populR 
is introduced. populR is suitable for Areal Interpolation of urban population
and provides an AWI approach that matches the existing functionality 
of areal using sum weights and additionally, proposes a VWI approach which,
to our knowledge, extends the existing Areal Interpolation functionality within 
the R ecosystem. VWI uses the area of intersection between source and target 
features multiplied by the building height or number of floors (volume) 
to guide the interpolation process. 
In this vignette a comparative analysis of Areal Interpolation alternatives within the
programming environment of R is carried out. sf, areal and populR results
are obtained and further compared to a more realistic population distribution.
A small part of the city of Mytilini, Lesvos, Greece was chosen as the case study
(figure below).The study area consists of 9 city blocks (source) counting 911 
residents and 179 buildings units (target) including floor number information.
These data are included in populR package for further experimentation.
# attach library library(populR) # load data data('src') data('trg') source <- src target <- trg # plot data plot(source['geometry'], col = "#634B56", border = NA) plot(target['geometry'], col = "#FD8D3C", add = T)
In this section a demonstration of the sf, areal and populR packages takes place.
First, the packages are attached to the script and next populR built-in data
are loaded. Then, Areal Interpolation functions are executed for each one of the
aforementioned packages.
The st_interpolate_aw() function of the sf package takes:
x: an object of class sf with data to be interpolatedto: the target geometries (sf object) extensive: whether to use extensive (TRUE) or intensive interpolation (FALSE)areal provides the aw_interpolate() function which requires:
data: an sf object to be used as target tid: target identification numberssource: an sf object with data to be interpolatedsid: source identification numbersweight: may be either sum or total for extensive interpolation and 
sum intensive interpolationoutput: whether sf object or tibble extensive: a vector of quoted (extensive) variable names - optional if 
    intensive is specifiedintensive: a vector of quoted (intensive) variable names - optional if 
    extensive is specifiedFinally, populR offers pp_estimate() function which takes:
target: an sf object to be used as targetsource: an sf object with data to be interpolatedsid: source identification numberspop: source population values to be interpolatedvolume: target volume information (number of floors or height) - required
for the vwi approachpoint: whether to return point geometries (TRUE) or not (FALSE) - optionalmethod: whether to use awi or vwiEvidently, sf package's st_interpolate_aw function requires only 3 arguments
which make it very easy to implement while populR requires at least 5 and areal
at least 7 arguments which potentially increases the implementation complexity.
On the other hand, only areal may be used for multiple interpolations at once
as the extensive or intensive argument takes a vector of quoted values 
(not included in this vignette).
For the reader's convenience names were shortened as follows:
awi: populR awi approachvwi: populR vwi approachaws: areal using extensive interpolation and sum weightsawt: areal using extensive interpolation and total weightssf: sf using extensive interpolation# attach libraries library(populR) library(areal) library(sf) # load data data('src') data('trg') source <- src target <- trg # populR - awi awi <- pp_estimate(target = target, source = source, spop = pop, sid = sid, method = awi) # populR - vwi vwi <- pp_estimate(target = target, source = source, spop = pop, sid = sid, volume = floors, method = vwi) # areal - sum weights aws <- aw_interpolate(target, tid = tid, source = source, sid = 'sid', weight = 'sum', output = 'sf', extensive = 'pop') # areal - total weights awt <- aw_interpolate(target, tid = tid, source = source, sid = 'sid', weight = 'total', output = 'sf', extensive = 'pop') # sf - total weights sf <- st_interpolate_aw(source['pop'], target, extensive = TRUE)
The study area counts 911 residents as already mentioned in previous section. From
the code chunk below it is clear that awi, vwi and aws 
correctly estimated population values as they sum to 911 while awt 
and sf results underestimated values. This is expected as both methods use 
the total area of the source features during the interpolation process and are 
useful when source and target features completely overlap. 
# sum initial values sum(source$pop) # populR - awi sum(awi$pp_est) # populR - vwi sum(vwi$pp_est) # areal - awt sum(awt$pop) # areal - aws sum(aws$pop) # sf sum(sf$pop)
Moreover, identical results were obtained by the awi and aws approaches and
somehow different results by the vwi as shown in the code block below.
# order values using tid awi <- awi[order(awi$tid),] vwi <- vwi[order(vwi$tid),] # get values and create a df awi_values <- awi$pp_est vwi_values <- vwi$pp_est awt_values <- awt$pop aws_values <- aws$pop sf_values <- sf$pop df <- data.frame(vwi = vwi_values, awi = awi_values, aws = aws_values, awt = awt_values, sf = sf_values) df[1:20,]
Due to confidentiality concerns, population data at building level are not available in Greece. Therefore, an alternate population distribution previously published in Batsaris et al. 2019 was used as reference data set to compare the results.
This reference population values are included in the built-in data set as shown below in the field rf.
target
In the code chunk below the first 20 features are presented for comparison.
rf <- awi$rf df <- cbind(rf, df) df[1:20,]
populR provides a function (pp_compare()) to compare the results with alternate
population data. pp_compare() produces scatter diagram, linear regression model, correlation
coeficient ($R^2$), MAE (Mean Absolute Error) and RMSE (Root 
Mean Squared Error) to investigate the relationship of the results with the 
reference (or other) data. 
Generally, the diagrams suggest strong and positive relationships 
in all cases. However, vwi provides the strongest relationship and $R^2$
coefficient. vwi provides the smallest MAE value in comparison with
the other methods as shown below.
awi_error <- pp_compare(df, estimated = awi, actual = rf, title = "awi vs actual") awi_error vwi_error <- pp_compare(df, estimated = vwi, actual = rf, title = "vwi vs actual") vwi_error sf_error <- pp_compare(df, estimated = sf, actual = rf, title = "sf vs actual") sf_error awt_error <- pp_compare(df, estimated = awt, actual = rf, title = "awt vs actual") awt_error aws_error <- pp_compare(df, estimated = aws, actual = rf, title = "aws vs actual") aws_error
RMSE (Root Mean Squared Error) is also calculated. Again, vwi provides
the smallest error value as shown in the code block below.
Finally, a performance comparison (execution times) is carried out in this 
section using microbenchmark 
package. Execution time measurements suggest that populR functionality executed
much faster than areal and sf as shown below. Both awi and vwi achieved
the best mean  execution time (about 76.74 milliseconds). aws follows with
136.67  milliseconds and finally, awt with 180.53 milliseconds.
library(microbenchmark) # performance comparison microbenchmark( suppressWarnings(pp_estimate(target = target, source = source, spop = pop, sid = sid, method = awi)), suppressWarnings(pp_estimate(target = target, source = source, spop = pop, sid = sid, volume = floors, method = vwi)), aw_interpolate(target, tid = tid, source = source, sid = 'sid', weight = 'sum', output = 'sf', extensive = 'pop'), aw_interpolate(target, tid = tid, source = source, sid = 'sid', weight = 'total', output = 'sf', extensive = 'pop'), suppressWarnings(st_interpolate_aw(source['pop'], target, extensive = TRUE)) )
In this vignette a demonstration and a comparative analysis of areal 
interpolation packages implemented in  urban population data is undertaken. 
Both sf and areal packages provide general purpose AWI functionality while
populR package focuses on areal interpolation of population data. Additionally, populR
provides VWI which extends R's existing functionality.
The city of Mytilini, Greece was used as the case study to investigate three main pillars: 
a) implementation, b) results, c) performance.  Notes on implementation indicate 
that sf package requires only 3 arguments to use while populR at least 5 and areal 7. 
The results provide insight that sf and awt may not be ideal for data that 
are not completely overlapping. Moreover, aws and awi obtained the same 
results while vwi outperformed the others in comparison to the reference data set.
Finally, populR performs much faster than sf and areal packages.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.