knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7 ) knitr::opts_knit$set( global.par = TRUE )
par(mar = c(1, 1, 1, 1))
This vignette builds upon the sample data for São Miguel (introduced in
the previous vignette) to demonstrate the use of exactextractr
with categorical land cover data. A sample of the
CORINE 2018
raster dataset is included in exactextractr
.
As in the previous vignette, the following packages are used:
library(exactextractr) library(dplyr) library(sf) library(raster)
First, we load the CORINE land cover data and the concelho boundaries.
clc <- raster(system.file('sao_miguel/clc2018_v2020_20u1.tif', package = 'exactextractr')) concelhos <- st_read(system.file('sao_miguel/concelhos.gpkg', package = 'exactextractr'), quiet = TRUE)
corine_palette <- c( "#e6004d", "#ff0000", "#cc4df2", "#cc0000", "#e6cccc", "#e6cce6", "#a600cc", "#a64d00", "#ff4dff", "#ffa6ff", "#ffe6ff", "#ffffa8", "#ffff00", "#e6e600", "#e68000", "#f2a64d", "#e6a600", "#e6e64d", "#ffe6a6", "#ffe64d", "#e6cc4d", "#f2cca6", "#80ff00", "#00a600", "#4dff00", "#ccf24d", "#a6ff80", "#a6e64d", "#a6f200", "#e6e6e6", "#cccccc", "#ccffcc", "#000000", "#a6e6cc", "#a6a6ff", "#4d4dff", "#ccccff", "#e6e6ff", "#a6a6e6", "#00ccf2", "#80f2e6", "#00ffa6", "#a6ffe6", "#e6f2ff", "#ffffff") plot(clc, col = corine_palette, axes = FALSE, legend = FALSE) plot(st_geometry(concelhos), add = TRUE)
The land cover class descriptions are provided in a separate DBF file. We read
this in to a data frame, then use levels()
to associate the class descriptions
with the raster.
clc_classes <- foreign::read.dbf(system.file('sao_miguel/clc2018_v2020_20u1.tif.vat.dbf', package = 'exactextractr'), as.is = TRUE) %>% dplyr::select(value = Value, landcov = LABEL3) levels(clc) <- list(data.frame(ID = clc_classes$value, landcov = clc_classes$landcov))
This association provides us with a way to look up the description for a given
ID. Alternatively, we can relate the values using merge
or a `dplyr
join.
factorValues(clc, c(2, 18, 24))
One of the most basic questions we might ask is which land cover classification
is predominant in each concelho. We can do this with the built-in mode
summary operation. The minority
and variety
operations are also applicable
to categorical data and provide the least-common classification and number of
distinct classifications, respectively.
landcov_mode <- exact_extract(clc, concelhos, 'mode', append_cols = 'name', progress = FALSE) %>% inner_join(clc_classes, by=c(mode = 'value'))
landcov_mode %>% dplyr::select(-mode) %>% knitr::kable()
While mode
provides a handy way to see the most common land cover category, we
need to write a custom summary function if we want to see the frequency of
different land cover types in an area.
Summary functions are called once per feature from the input sf
object. They
can return either:
a scalar, in which case the return value of exact_extract
will be a vector
whose entries correspond with the rows of the input sf
object, or
a data frame, in which case exact_extract
will return a rowwise combination
of the data frames for each feature. If the data frame returned by the
summary function will have than a single row, it is useful for some
identifying information to be included in the returned data frame.
If we are going to perform typical data frame operations on the raster values
and coverage fractions, it can be more convenient for the summary function to
accept a single data frame argument, instead of separate arguments for the cell
values and coverage fractions. This behavior can be enabled with the
summarize_df
argument.
Using this method, we can calculate the fraction of each concelho that is covered by each land cover category:
landcov_fracs <- exact_extract(clc, concelhos, function(df) { df %>% mutate(frac_total = coverage_fraction / sum(coverage_fraction)) %>% group_by(name, value) %>% summarize(freq = sum(frac_total)) }, summarize_df = TRUE, include_cols = 'name', progress = FALSE)
Here we use the include_cols
argument here to include the name
column from
concelhos
in the data frame passed to the summary function. (Although the
value of name
will be the same for all rows in df
, we include name
in the
group_by
expression so that it is not removed by summarize
.) Other similar
arguments include include_xy
to get the cell center coordinates,
include_area
to get the cell area, and include_cell
to get the cell index
used by the raster
package.
This provides us with a correspondence between each numeric land cover category and its frequency in each concelho:
head(landcov_fracs)
Joining this table to clc_classes
, we can associate the descriptions with the
numeric types and view the three most common land cover classes in each
concelho:
landcov_fracs %>% inner_join(clc_classes, by = 'value') %>% group_by(name) %>% arrange(desc(freq)) %>% slice_head(n = 3) %>% mutate(freq = sprintf('%0.1f%%', 100*freq)) %>% knitr::kable()
Similarly, we can find the top land covers by area:
landcov_areas <- exact_extract(clc, concelhos, function(df) { df %>% group_by(name, value) %>% summarize(area_km2 = sum(coverage_area) / 1e6) }, summarize_df = TRUE, coverage_area = TRUE, include_cols = 'name', progress = FALSE)
landcov_areas %>% inner_join(clc_classes, by = 'value') %>% dplyr::select(-value) %>% group_by(name) %>% arrange(desc(area_km2)) %>% slice_head(n = 3) %>% knitr::kable()
One extension of the analysis above is to see which land covers are associated with human population in a given concelho. Is the population primary urban or rural?
As described in the previous vignette, the population density raster provides the most robust results in the presence of partially-covered pixels.
pop_density <- raster(system.file('sao_miguel/gpw_v411_2020_density_2020.tif', package = 'exactextractr'))
We are able to perform this analysis because the CORINE sample distributed with
exactextractr
has been reprojected from its native Lambert Equal Area
projection into geographic coordinates consistent with GPW. Otherwise, working
with multiple rasters in different projections requires transformation to a
common grid using tools such as raster::projectRaster
or the gdalwarp
command-line utility.
Very little about the call to exact_extract
requires changing to incorporate
population. We set weights = pop_density
and, since we are using the
summarize_df
option, a column called weight
will be added to the data frame
passed to the summary function. We also set coverage_area = TRUE
so that we
can multiply the density by the covered pixel area to get a population count.
landcov_pop_areas <- exact_extract(clc, concelhos, function(df) { df %>% group_by(name, value) %>% summarize(pop = sum(coverage_area * weight / 1e6)) %>% mutate(pop_frac = pop / sum(pop)) }, weights = pop_density, coverage_area = TRUE, summarize_df = TRUE, include_cols = 'name')
Looking at the highest-population land cover type in each concelho, we can can see that the western/central concelhos of Lagoa, Ponta Delgada, Ribeira Grande, and Vila Franca do Campo have a more urban population than Nordeste or Povoação to the east.
landcov_pop_areas %>% inner_join(clc_classes, by = 'value') %>% group_by(name) %>% arrange(desc(pop_frac)) %>% slice_head(n = 1) %>% dplyr::select(name, landcov, pop, pop_frac) %>% mutate(pop = round(pop), pop_frac = round(pop_frac, 3)) %>% knitr::kable()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.