View source: R/ddbs_interpolate_aw.R
| ddbs_interpolate_aw | R Documentation |
Transfers attribute data from a source spatial layer to a target spatial layer based on the area of overlap between their geometries. This function executes all spatial calculations within DuckDB, enabling efficient processing of large datasets without loading all geometries into R memory.
ddbs_interpolate_aw(
target,
source,
tid,
sid,
extensive = NULL,
intensive = NULL,
weight = "sum",
mode = NULL,
keep_NA = TRUE,
na.rm = FALSE,
join_crs = NULL,
conn = NULL,
name = NULL,
overwrite = FALSE,
quiet = FALSE
)
target |
An |
source |
An |
tid |
Character. The name of the column in |
sid |
Character. The name of the column in |
extensive |
Character vector. Names of columns in |
intensive |
Character vector. Names of columns in |
weight |
Character. Determines the denominator calculation for extensive variables.
Either |
mode |
Character. Controls the return type. Options:
Can be set globally via |
keep_NA |
Logical. If |
na.rm |
Logical. If |
join_crs |
Numeric or Character (optional). EPSG code or WKT for the CRS to use
for area calculations. If provided, both |
conn |
A connection object to a DuckDB database. If |
name |
A character string of length one specifying the name of the table,
or a character string of length two specifying the schema and table
names. If |
overwrite |
Boolean. whether to overwrite the existing table if it exists. Defaults
to |
quiet |
A logical value. If |
Areal-weighted interpolation is used when the source and target geometries are incongruent (they do not align). It relies on the assumption of uniform distribution: values in the source polygons are assumed to be spread evenly across the polygon's area.
Coordinate Systems:
Area calculations are highly sensitive to the Coordinate Reference System (CRS).
While the function can run on geographic coordinates (lon/lat), it is strongly recommended
to use a projected CRS (e.g., EPSG:3857, UTM, or Albers) to ensure accurate area measurements.
Use the join_crs argument to project data on-the-fly during the interpolation.
Extensive vs. Intensive Variables:
Extensive variables are counts or absolute amounts (e.g., total population, number of voters). When a source polygon is split, the value is divided proportionally to the area.
Intensive variables are ratios, rates, or densities (e.g., population density, cancer rates). When a source polygon is split, the value remains constant for each piece.
Mass Preservation (The weight argument):
For extensive variables, the choice of weight determines the denominator used in calculations:
"sum" (default): The denominator is the sum of all overlapping areas
for that source feature. This preserves the "mass" of the variable relative to the target's coverage.
If the target polygons do not completely cover a source polygon, some data is technically "lost"
because it falls outside the target area. This matches areal::aw_interpolate(weight="sum").
"total": The denominator is the full geometric area of the source feature.
This assumes the source value is distributed over the entire source polygon. If the target
covers only 50% of the source, only 50% of the value is transferred. This is strictly
mass-preserving relative to the source. This matches sf::st_interpolate_aw(extensive=TRUE).
Note: Intensive variables are always calculated using the "sum" logic (averaging
based on intersection areas) regardless of this parameter.
Depends on the mode argument (or global preference set by ddbs_options):
duckspatial (default): A duckspatial_df (lazy spatial data frame) backed by dbplyr/DuckDB.
sf: An eagerly collected object in R memory, that will return the same data type as the
sf equivalent (e.g. sf or units vector).
When name is provided, the result is also written as a table or view in DuckDB and the function returns TRUE (invisibly).
Prener, C. and Revord, C. (2019). areal: An R package for areal weighted interpolation. Journal of Open Source Software, 4(37), 1221. Available at: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.21105/joss.01221")}
areal::aw_interpolate() — reference implementation.
library(sf)
# 1. Prepare Data
# Load NC counties (Source) and project to Albers (EPSG:5070)
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
nc <- st_transform(nc, 5070)
nc$sid <- seq_len(nrow(nc)) # Create Source ID
# Create a target grid
g <- st_make_grid(nc, n = c(10, 5))
g_sf <- st_as_sf(g)
g_sf$tid <- seq_len(nrow(g_sf)) # Create Target ID
# 2. Extensive Interpolation (Counts)
# Use weight = "total" for strict mass preservation (e.g., total births)
res_ext <- ddbs_interpolate_aw(
target = g_sf, source = nc,
tid = "tid", sid = "sid",
extensive = "BIR74",
weight = "total",
mode = "sf"
)
# Check mass preservation
sum(res_ext$BIR74, na.rm = TRUE) / sum(nc$BIR74) # Should be ~1
# 3. Intensive Interpolation (Density/Rates)
# Calculates area-weighted average (e.g., assumption of uniform density)
res_int <- ddbs_interpolate_aw(
target = g_sf, source = nc,
tid = "tid", sid = "sid",
intensive = "BIR74",
mode = "sf"
)
# 4. Quick Visualization
par(mfrow = c(1, 2))
plot(res_ext["BIR74"], main = "Extensive (Total Count)", border = NA)
plot(res_int["BIR74"], main = "Intensive (Weighted Avg)", border = NA)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.