knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Areal Interpolation may be defined as the process of transforming data reported over a set of spatial units (source) to another (target). Its application to population data has attracted considerable attention during the last few decades. A massive amount of methods have been reported in the scientific literature. Most of them focus on the improvement of the accuracy by using more sophisticated techniques rather than developing standardized methods. As a result, only a few implementation tools exists within the R community.
One of the most common, easy and straightforward methods of Areal Interpolation is Areal Weighting Interpolation (AWI). AWI proportionately interpolates the population values of the source features based on areal (or spatial) weights calculated by the area of intersection between the source and the target zones.
sf
and
areal
packages provide Areal Interpolation functionality within the R ecosystem. Both
packages implement (AWI). sf
functionality comes up with extensive and
intensive interpolation options and calculates the areal weights based on the
total area of the source features (total weights). sf
functionality is suitable
for completely overlapping data. areal
extends the existing functionality of
the sf
package by introducing an additional formula for data without complete overlap.
In this case weights are calculated using the sum of the remaining source areas
after the intersection (sum weights).
When the case involves Areal Interpolation of urban population data (small scale
applications) where the source features (such as city blocks or census tracts)
are somehow larger than target features (such as buildings) in terms of footprint
area the sf
functionality (total weights) is unable to calculate areal weights
properly and therefore, is not ideal for such applications. areal
functionality
may be confusing for novice R (or GIS) users as it is not obvious that the weight
option should be set to sum
to calculate areal weights correctly.
To overcome these limitations populR
is introduced. populR
is suitable for Areal Interpolation of urban population
and provides an AWI approach that matches the existing functionality
of areal
using sum weights
and additionally, proposes a VWI approach which,
to our knowledge, extends the existing Areal Interpolation functionality within
the R ecosystem. VWI uses the area of intersection between source and target
features multiplied by the building height or number of floors (volume)
to guide the interpolation process.
In this vignette a comparative analysis of Areal Interpolation alternatives within the
programming environment of R is carried out. sf
, areal
and populR
results
are obtained and further compared to a more realistic population distribution.
A small part of the city of Mytilini, Lesvos, Greece was chosen as the case study
(figure below).The study area consists of 9 city blocks (source) counting 911
residents and 179 buildings units (target) including floor number information.
These data are included in populR
package for further experimentation.
# attach library library(populR) # load data data('src') data('trg') source <- src target <- trg # plot data plot(source['geometry'], col = "#634B56", border = NA) plot(target['geometry'], col = "#FD8D3C", add = T)
In this section a demonstration of the sf
, areal
and populR
packages takes place.
First, the packages are attached to the script and next populR
built-in data
are loaded. Then, Areal Interpolation functions are executed for each one of the
aforementioned packages.
The st_interpolate_aw()
function of the sf
package takes:
x
: an object of class sf
with data to be interpolatedto
: the target geometries (sf object) extensive
: whether to use extensive (TRUE) or intensive interpolation (FALSE)areal
provides the aw_interpolate()
function which requires:
data
: an sf object to be used as target tid
: target identification numberssource
: an sf object with data to be interpolatedsid
: source identification numbersweight
: may be either sum
or total
for extensive interpolation and
sum
intensive interpolationoutput
: whether sf
object or tibble
extensive
: a vector of quoted (extensive) variable names - optional if
intensive is specifiedintensive
: a vector of quoted (intensive) variable names - optional if
extensive is specifiedFinally, populR
offers pp_estimate()
function which takes:
target
: an sf object to be used as targetsource
: an sf object with data to be interpolatedsid
: source identification numberspop
: source population values to be interpolatedvolume
: target volume information (number of floors or height) - required
for the vwi approachpoint
: whether to return point geometries (TRUE) or not (FALSE) - optionalmethod
: whether to use awi or vwiEvidently, sf
package's st_interpolate_aw
function requires only 3 arguments
which make it very easy to implement while populR
requires at least 5 and areal
at least 7 arguments which potentially increases the implementation complexity.
On the other hand, only areal
may be used for multiple interpolations at once
as the extensive
or intensive
argument takes a vector of quoted values
(not included in this vignette).
For the reader's convenience names were shortened as follows:
awi
: populR awi approachvwi
: populR vwi approachaws
: areal using extensive interpolation and sum weightsawt
: areal using extensive interpolation and total weightssf
: sf using extensive interpolation# attach libraries library(populR) library(areal) library(sf) # load data data('src') data('trg') source <- src target <- trg # populR - awi awi <- pp_estimate(target = target, source = source, spop = pop, sid = sid, method = awi) # populR - vwi vwi <- pp_estimate(target = target, source = source, spop = pop, sid = sid, volume = floors, method = vwi) # areal - sum weights aws <- aw_interpolate(target, tid = tid, source = source, sid = 'sid', weight = 'sum', output = 'sf', extensive = 'pop') # areal - total weights awt <- aw_interpolate(target, tid = tid, source = source, sid = 'sid', weight = 'total', output = 'sf', extensive = 'pop') # sf - total weights sf <- st_interpolate_aw(source['pop'], target, extensive = TRUE)
The study area counts 911 residents as already mentioned in previous section. From
the code chunk below it is clear that awi
, vwi
and aws
correctly estimated population values as they sum to 911 while awt
and sf
results underestimated values. This is expected as both methods use
the total area of the source features during the interpolation process and are
useful when source and target features completely overlap.
# sum initial values sum(source$pop) # populR - awi sum(awi$pp_est) # populR - vwi sum(vwi$pp_est) # areal - awt sum(awt$pop) # areal - aws sum(aws$pop) # sf sum(sf$pop)
Moreover, identical results were obtained by the awi
and aws
approaches and
somehow different results by the vwi
as shown in the code block below.
# order values using tid awi <- awi[order(awi$tid),] vwi <- vwi[order(vwi$tid),] # get values and create a df awi_values <- awi$pp_est vwi_values <- vwi$pp_est awt_values <- awt$pop aws_values <- aws$pop sf_values <- sf$pop df <- data.frame(vwi = vwi_values, awi = awi_values, aws = aws_values, awt = awt_values, sf = sf_values) df[1:20,]
Due to confidentiality concerns, population data at building level are not available in Greece. Therefore, an alternate population distribution previously published in Batsaris et al. 2019 was used as reference data set to compare the results.
This reference population values are included in the built-in data set as shown below in the field rf
.
target
In the code chunk below the first 20 features are presented for comparison.
rf <- awi$rf df <- cbind(rf, df) df[1:20,]
populR
provides a function (pp_compare()
) to compare the results with alternate
population data. pp_compare()
produces scatter diagram, linear regression model, correlation
coeficient ($R^2$), MAE (Mean Absolute Error) and RMSE (Root
Mean Squared Error) to investigate the relationship of the results with the
reference (or other) data.
Generally, the diagrams suggest strong and positive relationships
in all cases. However, vwi
provides the strongest relationship and $R^2$
coefficient. vwi
provides the smallest MAE value in comparison with
the other methods as shown below.
awi_error <- pp_compare(df, estimated = awi, actual = rf, title = "awi vs actual") awi_error vwi_error <- pp_compare(df, estimated = vwi, actual = rf, title = "vwi vs actual") vwi_error sf_error <- pp_compare(df, estimated = sf, actual = rf, title = "sf vs actual") sf_error awt_error <- pp_compare(df, estimated = awt, actual = rf, title = "awt vs actual") awt_error aws_error <- pp_compare(df, estimated = aws, actual = rf, title = "aws vs actual") aws_error
RMSE (Root Mean Squared Error) is also calculated. Again, vwi
provides
the smallest error value as shown in the code block below.
Finally, a performance comparison (execution times) is carried out in this
section using microbenchmark
package. Execution time measurements suggest that populR
functionality executed
much faster than areal
and sf
as shown below. Both awi
and vwi
achieved
the best mean execution time (about 76.74 milliseconds). aws
follows with
136.67 milliseconds and finally, awt
with 180.53 milliseconds.
library(microbenchmark) # performance comparison microbenchmark( suppressWarnings(pp_estimate(target = target, source = source, spop = pop, sid = sid, method = awi)), suppressWarnings(pp_estimate(target = target, source = source, spop = pop, sid = sid, volume = floors, method = vwi)), aw_interpolate(target, tid = tid, source = source, sid = 'sid', weight = 'sum', output = 'sf', extensive = 'pop'), aw_interpolate(target, tid = tid, source = source, sid = 'sid', weight = 'total', output = 'sf', extensive = 'pop'), suppressWarnings(st_interpolate_aw(source['pop'], target, extensive = TRUE)) )
In this vignette a demonstration and a comparative analysis of areal
interpolation packages implemented in urban population data is undertaken.
Both sf
and areal
packages provide general purpose AWI functionality while
populR
package focuses on areal interpolation of population data. Additionally, populR
provides VWI which extends R's existing functionality.
The city of Mytilini, Greece was used as the case study to investigate three main pillars:
a) implementation, b) results, c) performance. Notes on implementation indicate
that sf
package requires only 3 arguments to use while populR
at least 5 and areal
7.
The results provide insight that sf
and awt
may not be ideal for data that
are not completely overlapping. Moreover, aws
and awi
obtained the same
results while vwi
outperformed the others in comparison to the reference data set.
Finally, populR
performs much faster than sf
and areal
packages.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.