In mqueinnec/RMFinventory:

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This vignette presents the sampling and ABA modeling approaches that were used for the enhanced forest inventory (EFI) pilot project at Romeo Malette Forest (RMF) using Leica SPL100 lidar (SPL) data collected wall-to-wall during the summer of 2018. The package (and this vignette) are written for the RMNF project but the intent is to make the overall approach applicable to other forest management units.

First, we will perform a structurally-guided stratified sampling to select locations for new plots establishment. Then, we will detail how plot-level forest attribute data can be calculated or modeled from published allometric equations. Finally, we will use the field inventory data and the SPL point cloud to build random forest predictive models and generate wall-to-wall maps of forest attributes.

We start by loading the packages that we are going to use

library(RMFinventory)
library(raster)
library(sf)
library(dplyr)
library(RStoolbox)
library(ggplot2)

Select candidate cells from inventory layers

For this section we assume that wall-to-wall lidar metrics are available:

input <- "G:/ROMEO/SPL_processing/2_Metrics/2_mosaic"
flist <- list.files(input, pattern = "tif$", full.names = TRUE)
test <- stack(flist)

wall_metrics <- brick(system.file("extdata","wall_metrics_small.tif", package = "RMFinventory"))
names(wall_metrics) <- c("avg", "cov", "std","p10", "p20","p50","p70","p95", "p99","d0","d2","d4","dns")

plot(wall_metrics$avg)

#Coordinate system of wall-to-wal raster
wall_crs <- raster::crs(wall_metrics)

We will only select cells that are suitable for plot establishment using the followqign process:

Open a shapefile that contains the areas to keep/mask as a sf object
Perform some operations such as querying some attributes or create buffers
Convert the sf object as a sp object for cpompatibility with the raster package
USe the mask function of the raster package that will assign NA values to all cells not covered (i.e. centroid not inside polygon) by the sp object.

PCI/BMI polygons

poly <- st_read(system.file("extdata/inventory_polygons","inventory_polygons.shp", package = "RMFinventory"))

#Match CRS of wall-to-wall raster layer
poly <- st_transform(poly, wall_crs@projargs)
plot(poly["POLYTYPE"], axes = TRUE)
plot(poly["OWNER"], axes = TRUE)

poly_subset <- poly[poly$POLYTYPE == "FOR" & poly$OWNER == 1, ]
plot(st_geometry(poly_subset), axes = TRUE, col = "red")

poly_subset <- st_union(poly_subset)
plot(st_geometry(poly_subset), axes = TRUE, col = "red")

#Transform to SpatialPolygonDataFrame object for compatibility with the raster package
poly_subset <- sf::as_Spatial(poly_subset)

#Mask wall-to-wall layer
wall_poly <- raster::mask(wall_metrics, mask = poly_subset)
plot(wall_poly$avg)

Road layer

# Open roads layer, select suitable road types and create a buffer 

roads <- st_read(system.file("extdata/roads","roads.shp", package = "RMFinventory"))
roads <- st_transform(roads, wall_crs@projargs)
plot(roads["RDTYPE"])

# Select only suitable road types
# Highway (H), Municipal (M), Primary (P), Branch (B), Clay/mineral surface haul (C) and roads accessed when dry or frozen (d) , graveled (g), industrial grade road (i), highway or municipal road(r), yearly accessible (y)

roads_subset <- roads[roads$RDTYPE %in% c("H", "M", "P", "B", "C") &
                        roads$RDMOD %in% c("d", "g", "i", "r", "y"), ]

# Dissolve roads layer to work with only 1 feature
roads_subset <- st_union(roads_subset)
plot(st_geometry(roads_subset))

#Create buffers of 30 m and 200 m
roads_buffer_30m <- st_buffer(roads_subset, dist = 30)
roads_buffer_200m <- st_buffer(roads_subset, dist = 200)

#Take the symetrical difference between buffers to keep only roads >- 30 m AND <= 200 m
roads_buffer <- st_sym_difference(roads_buffer_200m, roads_buffer_30m)
plot(st_geometry(roads_buffer))

#Transform to SpatialPolygonDataFrame object for compatibility with the raster package
roads_buffer <- sf::as_Spatial(roads_buffer)

#Mask wall-to-wall layer
wall_poly_roads <- raster::mask(wall_poly, mask = roads_buffer)
plot(wall_poly_roads$avg)

Additional masking layers

In this example we only mask by inventory polygons and distance to road but other layers can be considered (such as past or planned depletions). The same approach as the one described above can be used.

Stratified random sampling of cells for new plots establishement

When generating forest inventory attributes using an area-based-approach (ABA), the ground plots data used to calibrate ABA regression models need to be representative as much as possible of the full range of forest structure variability within the study area. If this is not the case, regression models might perform poorly in underrepresented forest types (White et al., 2013). Lidar metrics such as height percentiles, cover or height variability can be used to design a sampling network driven by forest structure.

Principal Component Analysis (PCA) is a method used to summarize the variability of a large number of highly correlated LiDAR structural metrics into a smaller number of uncorrelated variables. The feature space created by the generated principal components can then be stratified into classes that will represent specific types of forest structural conditions. Random sampling can then be performed within each of these classes to ensure a representative characterization of all forest structures occurring across the study area.

Calculate PCA values of candidate cells

We assume that a set of lidar metrics have been calculated across the entire study area and that candidate cells for new plots establishment have been determined using criteria such as distance and access to roads, forested land cover etc.

Note that the resolution of the lidar metrics maps is determined based on the size of the plots that will be established. It is important the the area of each cell (pixel) corresponds to the area of the plots. For fixed-radius circular plots of 11.28 m radius, corresponding to 400 m2, the resolution should be 20 m x 20 m.

In the example of RMF, we selected only cells located between 30 m and 200 m from roads and forest inventory polygons determined from photo interpretation, were used to select only cells considered to be forested.

# Plot wall-to-wall
plot(wall_metrics$p95, zlim = c(0, 30), main = "Wall to wall")

candidate_metrics <- wall_poly_roads #From previous processing
names(candidate_metrics) <- c("avg", "cov", "std","p10", "p20","p50","p70","p95", "p99","d0","d2","d4","dns")


# Plot candidate metrics
plot(candidate_metrics$p95, zlim = c(0, 30), main = "Candidates")

We calculate PCA values from the lidar metrics located in forested cells using the rasterPCA function from the RStoolbox package.

poly <- st_read("E:/ROMEO/RMF_Pilot_Data/Scripts/RMFinventory/inst/extdata/inventory_polygons/inventory_polygons.shp")

#Match CRS of wall-to-wall raster layer
poly <- st_transform(poly, wall_crs@projargs)

#Select FOR polygons and transform to sp
poly_FOR <- poly[poly$POLYTYPE == "FOR", ]
wall_metrics_FOR <- mask(wall_metrics, poly_FOR)

PCA <- RStoolbox::rasterPCA(wall_metrics_FOR, nComp = 2, spca = TRUE, maskCheck = FALSE)
# nComp = 2, we return the two first principal components
# spca = TRUE, since metrics have different ranges of values the fucntion will center and scale the metrics
# maskCheck = FALSE, we don't check if some pixels have NA value in one or more layers. If not sure, set to TRUE

# The output of rasterPCA is a list with an element model which contains the PCA model and an element map which contains the map of PCA values

PCA_wall <- PCA$map
plot(PCA_wall)

PCA_model <- PCA$model

We can check the porportion of variance contained in each principal component:

summary(PCA_model)

Use PCA model to get PCA values of an existing set of plots

A network of 182 plots was already established in RMF. The objective of the structural guided sampling was to check if the existing plot network was covering the entire range of structural variability and if not, selecting new plots in underrepresented forest types.

We load the existing set of plots. All the ALS metrics that were calculated to make the PCA model have also been calculated at the plot-level.

Note: Make sure that the name of metrics used to get the PCA model correspond to the name of the plot-level metrics

We can use the predict function to get PCA values of the existing set of plots and off the candidate cells

# Plot-level
plots_metrics <- st_read(system.file("extdata", "plots_metrics.shp", package = "RMFinventory"))

plots_metrics_coords <- st_coordinates(plots_metrics)
plots_metrics_df <- plots_metrics %>% st_drop_geometry()

plots_PCA <- as.data.frame(predict(PCA_model, plots_metrics_df))[,c(1,2)]
colnames(plots_PCA) <- c("PC1", "PC2")

#Candidate cells

# This is the sricpt that would be ran. Since candidates are just a subset of the wall to wall cells, we could also just mask the PCA_wall layer cells tha are not part of the candidate cells. 
PCA_candidates <- raster::predict(candidate_metrics, model = PCA_model, index = c(1,2),  filename = "path/to/file.tif")

PCA_candidates <- raster::predict(object = candidate_metrics, model = PCA_model,index = c(1,2))
names(PCA_candidates) <- c("PC1", "PC2")

#Also possible to mask wall to wall PCA using a candidate metrics layer
#PCA_candidates <- raster::mask(PCA_wall, candidate_metrics$avg)

Once PCA values of all forested cells, candidate cells and existing plots have been calculated, it is useful to plot them to visualize their distribution.

# In the following steps we get raster cells values in a data.frame
# With large raster object it can creates very large objects in memory
df_PCA_wall <- as.data.frame(PCA_wall, na.rm = TRUE, xy = FALSE)
df_PCA_candidates <- as.data.frame(PCA_candidates, na.rm = TRUE, xy = FALSE)

## Function to calculate point density in scatterplot
get_density <- function(x, y, ...) {
  dens <- MASS::kde2d(x, y, ...)
  ix <- findInterval(x, dens$x)
  iy <- findInterval(y, dens$y)
  ii <- cbind(ix, iy)
  return(dens$z[ii])
}

#Calculate point density for scatterplot visualization
df_PCA_candidates <- df_PCA_candidates %>%
  mutate(dens = get_density(PC1, PC2, n = 300))

# Get convex hulls 

hulls_wall_idx <- chull(df_PCA_wall$PC1, df_PCA_wall$PC2)
hulls_wall <- dplyr::slice(df_PCA_wall[,c("PC1","PC2")],hulls_wall_idx)

hulls_cand_idx <- chull(df_PCA_candidates$PC1, df_PCA_candidates$PC2)
hulls_cand <- dplyr::slice(df_PCA_candidates[,c("PC1","PC2")],hulls_cand_idx)

ggplot(mapping = aes(x = PC1, y = PC2)) + 
  geom_polygon(data = hulls_wall, colour = "#440154FF", fill ="#440154FF", alpha = 0.2) +
  geom_polygon(data = hulls_cand, colour = "orange", fill = NA) + 
  geom_point(data = df_PCA_candidates, aes(color = dens)) + 
  scale_color_viridis_c() + 
  geom_point(data = plots_PCA) + 
  theme_bw() +
  coord_equal() +
  theme(panel.grid = element_blank(),
        legend.position = "right")


#Function to calculate point density in scatterplot
get_density <- function(x, y, ...) {
  dens <- MASS::kde2d(x, y, ...)
  ix <- findInterval(x, dens$x)
  iy <- findInterval(y, dens$y)
  ii <- cbind(ix, iy)
  return(dens$z[ii])
}

Stratification

Once PCA values have been calculated we can stratify the PC1 / PC2 feature space. The approach chosen for the RMF was to stratify using equal intervals.

From the previous plot, we can decide to stratify the feature space into 7 equal interval on the PC1 axis and 5 intervals on the PC2 axis (7 x 5 grid). We can use the function getPCAstrata to automatically create the stratification:

# Determine the number of breaks for the first and second features (PC1 and PC2) 
strata <- RMFinventory::getPCAstrata(PCA_layer = PCA_candidates,
                                  nbreaks = c(8, 6), #Since we want a 7 x 5 grid we need 8 x 6 breaks
                                  summary = TRUE)


#getPCAstrata returns a list of three objects

strata_candidates <- strata$strata_layer
strata_candidates
plot(strata_candidates)

breaks <- strata$breaks
breaks

strata$summary

strata$matrix

We can use the breaks returned by the getPCAstrata function to get the strata of the existing set of plots:

strata <- getPCAstrata(PCA_layer = plots_PCA,
                             breaks = breaks, # From the precedent call to getPCAstrata
                             summary = TRUE)

strata_plots <- strata$strata_layer
strata_plots

strata$summary

Make a new plots with break lines between strata

ggplot(mapping = aes(x = PC1, y = PC2)) + 
  geom_polygon(data = hulls_wall, colour = "#440154FF", fill ="#440154FF", alpha = 0.2) +
  geom_polygon(data = hulls_cand, colour = "orange", fill = NA) + 
  geom_point(data = df_PCA_candidates, aes(color = dens)) + 
  scale_color_viridis_c() + 
  geom_point(data = plots_PCA, color = "red") + 
  geom_vline(xintercept = breaks$PC1, linetype = "dashed") + 
  geom_hline(yintercept = breaks$PC2, linetype = "dashed") + 
  theme_bw() +
  coord_equal() +
  theme(panel.grid = element_blank(),
        legend.position = "right")

Select new plots

The function sampleCells that performs the sampling requires the number of cells to sample for each strata. This can be determined based on the previous feature space plots, total number of plots than can be selected, number of existing plots, how many of them should be re-measured etc.

The following figure illustrates the sampling process:

knitr::include_graphics(system.file("extdata","figures","Flow_RMF_sampling.png", package = "RMFinventory"))

toSample <- read.csv(system.file("extdata","cells_to_sample.csv", package = "RMFinventory"), stringsAsFactors = F)
toSample

existing_plots <- data.frame(plotID = plots_metrics$plotID, 
                             x = plots_metrics_coords[,1], 
                             y = plots_metrics_coords[,2], 
                             strata = strata_plots)
existing_plots


new_plots <- RMFinventory::sampleCells(strata_layer = strata_candidates,
                         matrix_strata = strata$matrix, 
                         existing_sample = existing_plots, # You can provide a set of existing plots or output of previous call to sampleCells and these cells won't be sampled again.
                         toSample = toSample, 
                         mindist = 150, 
                         message = T)

# There might be a warning like this: 
#no non-missing arguments to min; returning Infno non-missing arguments to max; returning -Inf
# If think safe to ignore but need to look more into that
new_plots 

#Assign plotID to sampled plots
new_plots <- new_plots %>%
  mutate(plotID = ifelse(type == "New", paste0("S_", seq_len(n())), plotID))

#Convert to sf object
new_plots_sf <- st_as_sf(new_plots, coords = c("x", "y"))

plot(strata_candidates)
plot(st_geometry(new_plots_sf), add = T)

#Get metrics and PCA of new plots
new_plots_metrics <- extract(wall_metrics, new_plots_sf) #could also get from point cloud directly
new_plots_PCA <- as.data.frame(predict(PCA_model, new_plots_metrics))[,c(1,2)]
colnames(new_plots_PCA) <- c("PC1", "PC2")


ggplot(mapping = aes(x = PC1, y = PC2)) + 
  geom_polygon(data = hulls_wall, colour = "#440154FF", fill ="#440154FF", alpha = 0.2) +
  geom_polygon(data = hulls_cand, colour = "orange", fill = NA) + 
  geom_point(data = df_PCA_candidates, aes(color = dens)) + 
  scale_color_viridis_c() + 
  geom_point(data = new_plots_PCA, color = "red") + 
  geom_vline(xintercept = breaks$PC1, linetype = "dashed") + 
  geom_hline(yintercept = breaks$PC2, linetype = "dashed") + 
  theme_bw() +
  coord_equal() +
  theme(panel.grid = element_blank(),
        legend.position = "right")

Field inventory data

Forest attributes using allometric equations

To be completed

Calculate plot-level ALS metrics

Plot-level ALS metrics need to be calculated in order to build predictive models of forest attributes. These task can be performed in various softwares/packages. Here, we present two of them: LAStools and the lidR R package.

The lidR package offers more flexibility to calculate user-defined metrics. At the plot-level, computing time difference between lidR and LAStools is not significant. However, ALS metrics also need to be calculated wall-to-wall in order to generate forest attribute maps once predictive models have been developed on plot data. Over larger areas, LAStools will generally be more efficient than lidR.

In the end, the choice between lidR and LAStools is a trade-off between flexibility in ALS metrics and computational efficiency when calculating wall-to-wall maps.

Using the lidR package

The functionalities of the lidR package are well described in the following wiki: https://github.com/Jean-Romain/lidR/wiki.

The Area-based approach from A to Z is a great resource to have a look at for ABA.

Clip to plots extent

lastiles <- lidR::readLAScatalog("path/to/ALS tiles")

plots <- st_read("path/to/plots_coordinates.shp")

opt_output_files(lastiles) <- "path/to/output/plot_{PlotID}" # Here plotID is a field of plots_coordinates.shp

plot_radius = 11.28
plots_las <- lasclip(lastiles, plots, radius = plot_radius)

Calculate plot_level metrics

To be completed

Using lastools

To be completed

Clip to plots extent

To be completed

Calculate plot-level metrics

It is possible to execute operative system commands through R using the system function by passing it a character containing the command.

files <- "path/to/lasfiles"
odir <- "path/to/output/file.csv"
lastools_cmd <- paste0("lascanopy -i ", files, "*.las -files_are_plots -height_cutoff 1.3 -cover_cutoff 2 -p 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 99 -avg -cov -dns -std -kur -ske -d 2 5 10 15 20 25 -names -o ", odir)
system(lastools_cmd)

Make predictive models

The makeModels function implemented in the package builds random forest predictive models for each forest attribute and assess their accuracy using a k-fold cross-validation.

# Open plot-level ALS metrics

als_metrics <- read.csv(system.file("extdata","plot_metrics.csv", package = "RMFinventory"))
head(als_metrics)

toKeep <- colnames(als_metrics)[colnames(als_metrics) %in% names(wall_metrics)]

als_metrics <- als_metrics[,c("plotID", toKeep)]

pred_names <- colnames(als_metrics)[!colnames(als_metrics) %in% "plotID"]
pred_names

# Open plot-level forest attributes 

plot_data <- read.csv(system.file("extdata","tblPlotLevelSummary.csv", package = "RMFinventory"))
head(plot_data)

# Combine field data and ALS metrics
combine_dat <- merge(als_metrics, plot_data, by = "plotID")

# Attributes to model
attNames <- c("top_height", "ba_ha", "V_ha") 
# Title to add to plots (optional)
titles <- list(ba_ha =  expression("Basal area (m"^"2"*"/ha)"), 
               top_height = "Top height (m)", 
               V_ha = expression("Whole stem volume (m"^"3"*"/ha)"))

# Run modeling: k-fold cross validation, random forest regression models and accuracy assessment
rfList <- RMFinventory::makeModels(dat = combine_dat,
                                   attNames = attNames,
                                   preds = pred_names, 
                                   k = 5, 
                                   titles = titles,
                                   saveModel = F,
                                   saveFigure = F)
rfList$accuracy$`all folds`
rfList$accuracy$summary

# The accuracy measures reported on the plot are currently not extracted from rfList$accuracy$summary as they should be. They are calculated based on all plotted points. 
# The mean (and std) of accuracy measures of the cross-validation are better indicators of the "true" accuracy assessment. 
rfList$top_height$plot
rfList$ba_ha$plot
rfList$V_ha$plot

Calculate wall-to-wall ALS metrics

# Wall to wall metrics (already opened at the beginnign of the vignette)
wall_metrics <- brick(system.file("extdata", "wall_metrics_small.tif", package = "RMFinventory"))
names(wall_metrics) <- c("avg", "cov", "std","p10", "p20","p50","p70","p95", "p99","d0","d2","d4","dns")

Get maps of forest attributes

top_height_wall <- raster::predict(wall_metrics, rfList$top_height$finalModel)
ba_ha_wall <- raster::predict(wall_metrics, rfList$ba_ha$finalModel)
V_ha_wall <- raster::predict(wall_metrics, rfList$V_ha$finalModel)

plot(top_height_wall, main = titles$top_height)
plot(ba_ha_wall, main = titles$ba_ha, zlim = c(0, 70))
plot(V_ha_wall, main = titles$V_ha, zlim = c(0, 500))