knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=6, 
  fig.height=4
)
library(nhdplusTools)
library(hyRefactor)
library(dplyr)
library(sf)
options(scipen = 9999)
options(rmarkdown.html_vignette.check_title = FALSE)

NHDPlus Network Refactor

This vignette summarizes the workflow implemented by the refactor_nhdplus() function to normalize flowline length in the NHDPlus network by splitting long flow lines and collapsing short flowlines. After running this, you may want to read vignette("refactor_catchment") to see how to apply the refactored network to catchment polygons.

The code developed for this is parameterizable and this vignette is presented with target lengths of 2km for long flowlines splits and 0.5km for minimum flowlines length. While this workflow can be run on the entire network, for this example, we will use the sample network subset included in the nhdplusTools package.

The first section describes the network as it exists without modification. Following that, each processing step is illustrated and final results are summarized.

Network Preparation

Prior to summarizing the network, some NHDPlus conditioning can be performed. The following function removes coastal and non-dendritic flowlines. It can also remove networks with an outlet drainage area under a threshold or a longest path length under a threshold, but those inputs are set to 0 such that all networks are included here. Note that this is especially important when running the whole country -- little change is applied to network subsets that follow the network upstream with tributaries.

source(system.file("extdata",
                   "sample_flines.R",
                   package = "nhdplusTools"))

nhdplus_flines <- sample_flines
min_network_size <- 0
min_path_length <- 0
flines <- prepare_nhdplus(st_set_geometry(nhdplus_flines, NULL), 
                          min_network_size, 
                          min_path_length)

The following histograms and plots show characteristics of the NHDPlusV2 network flowline lengths.

hist(flines$LENGTHKM, breaks = 100, main = "NHDPlus Flowlines",
     xlab = "flowline length (km)")

excluded <- length(flines$LENGTHKM[which(flines$LENGTHKM > 10)])

hist(flines$LENGTHKM[which(flines$LENGTHKM < 10)], breaks = 100,
     main = paste0("NHDPlus Flowlines Less than 10km long. \n", excluded, " of ",
                   nrow(flines), " truncated."),
     xlab = "flowline length (km)")

tot_flowlines <- nrow(flines)

flow_length_ecdf <- ecdf(flines$LENGTHKM)
flow_length <- data.frame(l_km = c(0.01, 0.1,1,2,3,4,5,7,10))
flow_length$percent <- flow_length_ecdf(flow_length$l_km)*100
plot(flow_length, main = "Percent of flowlines shorter than ...", xlab = "km")
grid(col="black")

max_l <- .5 #km

flow_length_low <- data.frame(l_km = c(0.01, 0.05, .1, .2, .3, 
                                       .4, .5, .6, .7, .8, .9, 1))
flow_length_low$count <- flow_length_ecdf(flow_length_low$l_km)*tot_flowlines
plot(flow_length_low, main = "Number of flowlines shorter than ...", xlab = "km")
grid(col="black")

flow_length_high <- data.frame(l_km = c(4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30))
flow_length_high$count <- tot_flowlines - 
                          flow_length_ecdf(flow_length_high$l_km) * 
                          tot_flowlines
plot(flow_length_high, main = "Number of flowlines longer than ...", xlab = "km")
grid(col="black")

refactor_nhdplus function

The complete refactor workflow has been packaged into a single function that can be called as shown below. For the purposes of demonstration this function is not used and individual package functions called by it are shown.

## Not Run ##
refactor_nhdplus(nhdplus_flines = nhdplus_flines,
                 split_flines_meters = 2000, 
                 split_flines_cores = 2,
                 collapse_flines_meters = 500, 
                 collapse_flines_main_meters = 500,
                 out_refactored = "nhdplus_collapsed.gpkg",
                 out_reconciled = "nhdplus_reconciled.gpkg",
                 three_pass = TRUE)

The inputs to this are:

Split flowlines

The split_flowlines feature requires flowlines with geometry but for performance reasons, the prep_nhdplus function works with attributes only. The code below, joins the geometry back, coerces it to the required geometry type, and projects it to a projection suited to distance measurement at the contental scale of NHDPlus. It then executes the split_flowlines function.

flines <- inner_join(select(nhdplus_flines, COMID), flines, by = "COMID") %>%
    st_as_sf() %>%
    st_cast("LINESTRING") %>%
    st_transform(5070)

flines <- split_flowlines(flines, 
                          max_length = 2000, 
                          para = 3)
hist(flines$LENGTHKM, breaks = 100, main = "Split NHDPlus Flowlines",
     xlab = "flowline length (km)")

As can be seen here, split_flowlines breaks long flowlines into equal length pieces, which results in the new distribution of flowline lengths shown.

Collapse Flowlines

Now we can collapse very short flowlines.

collapse_flines_meters <- 500
collapse_flines_main_meters <- 500
collapsed_flines <- collapse_flowlines(st_set_geometry(flines, NULL),
                                           (0.25*collapse_flines_meters/1000),
                                           TRUE,
                                           (0.25*collapse_flines_main_meters/1000))
excluded <- length(collapsed_flines$LENGTHKM[which(collapsed_flines$LENGTHKM == 0)])

hist(collapsed_flines$LENGTHKM[which(collapsed_flines$LENGTHKM > 0)], breaks = 100,
     main = paste0("Collapsed NHDPlus Flowlines. \n", excluded, " of ",
                   nrow(collapsed_flines), " removed in collapse."),
     xlab = "flowline length (km)")

As can be seen here, some collapsed flowlines extend the split flowlines. This is a result of chains of very short flowlines being collapsed together. This problem is more or less unavoidable but is improved by running the collapse step a few times with small to large input thresholds as is done next.

Collapse step two and three

Now we can collapse twice more

collapsed_flines <- collapse_flowlines(collapsed_flines,
                                       (0.5*collapse_flines_meters/1000),
                                       TRUE,
                                       (0.5*collapse_flines_main_meters/1000),
                                       warn = FALSE)

collapsed_flines <- collapse_flowlines(collapsed_flines,
                                       (collapse_flines_meters/1000),
                                       TRUE,
                                       (collapse_flines_main_meters/1000),
                                       warn = FALSE)

# Now we can save the output
 select(flines, COMID) %>%
    inner_join(collapsed_flines, by = "COMID") %>%
    st_as_sf() %>%
    st_transform(4326) %>%
    st_write("nhdplus_collapsed.gpkg", layer_options = "OVERWRITE=YES", quiet = TRUE)
excluded <- length(collapsed_flines$LENGTHKM[which(collapsed_flines$LENGTHKM == 0)])

hist(collapsed_flines$LENGTHKM[which(collapsed_flines$LENGTHKM > 0)], breaks = 100,
     main = paste0("Collapsed NHDPlus Flowlines. \n", excluded, " of ",
                   nrow(collapsed_flines), " removed in collapse."),
     xlab = "flowline length (km)")

As shown here, the distribution of flowline lengths is much tighter now but very small headwaters and long chains of collapsed flowlines remain out of the desired range. For the sake of completness,

collapsed_flines_one_pass <- collapse_flowlines(st_set_geometry(flines, NULL),
                                                (collapse_flines_meters/1000),
                                                TRUE,
                                                (collapse_flines_main_meters/1000))
excluded <- length(collapsed_flines_one_pass$LENGTHKM[which(collapsed_flines_one_pass$LENGTHKM == 0)])

hist(collapsed_flines_one_pass$LENGTHKM[which(collapsed_flines_one_pass$LENGTHKM > 0)], breaks = 100,
     main = paste0("Collapsed NHDPlus Flowlines. \n", excluded, " of ",
                   nrow(collapsed_flines_one_pass), " removed in collapse."),
     xlab = "flowline length (km)")

As shown in this historigram, the problem of long chains of collapsed flowlines is much worse when the collapse is run only once.

Reconcile collapsed flowlines

Now that we have our flowlines collapsed, we need to reconcile the results into merged geometries with updated ids. The reconcile_collapsed_flowlines function does this including a column that contains a comma seperated list of catchment ids that were merged together to form each new flowline.

collapsed <- reconcile_collapsed_flowlines(collapsed_flines, 
                                           geom = select(flines, COMID), 
                                           id = "COMID")

collapsed$member_COMID <- sapply(collapsed$member_COMID, paste, collapse = ",")

st_write(st_transform(collapsed, 4326), "nhdplus_reconciled.gpkg", layer_options = "OVERWRITE=YES")
hist(collapsed$LENGTHKM, breaks = 100,
     main = "Collapsed and reconciled NHDPlus Flowlines",
     xlab = "flowline length (km)")

unlink("nhdplus_collapsed.gpkg")
unlink("nhdplus_reconciled.gpkg")

Examples


This image (and those below) show the original NHDPlus network in blue with streams greater than first order rendered with a thick blue line. Blue dots are where the original NHDPlus flowline breaks were and red dots are where the new refactored flowline breaks are. Notice that lone red dots are the result of splitting flowlines and lone blue dots are the result of collapsing them.


This image shows an example of a chain combination. All the small tributaries that intersect the highlighted mainstem are routed to the outlet of the mainstem flowline and have a length attribute that is the distance to that outlet. This problem is especially common where there is a high density of small tributaries meeting a mainstem like in the case shown here. The longest tributary here routed to the oulet of the mainstem is the longest single flowline in the result set (5.5km) shown in the histogram above.


This image shows how short headwater flowlines can be collapsed downstream. The two open blue dots are where three short flowlines were collapsed into one 2km long flowline.


This image shows a coastal system with very long flowlines that have largely been split into many pieces.


This image is a zoomed in view of the outlet and coastal drainages of the system in the image above. Not that there are numerous single-flowling coastal systems in the NHDPlus.



dblodgett-usgs/hyRefactor documentation built on Aug. 25, 2023, 9:09 p.m.