library(nhdplusTools) library(dplyr) local <- (Sys.getenv("BUILD_VIGNETTES") == "TRUE") if(local) { cache_path <- file.path(nhdplusTools_data_dir(), "nhdpt_v_cache") } else { cache_path <- tempdir() } knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, comment = "#>", fig.width=4, fig.height=4, fig.align = "center", eval=local, cache=local, cache.path=(cache_path) ) oldoption <- options(scipen = 9999, "rgdal_show_exportToProj4_warnings"="none")
The terms used below are derived from concepts of graph theory, the HY_Features data model, and the NHDPlus data model. Many of the concepts here are also presented in: Mainstems: A logical data model implementing mainstem and drainage basin feature types based on WaterML2 Part 3: HY Features concepts.
The NHDPlus data model includes many 'value added attributes' (VAA). This vignette discusses a core set of VAA's that nhdplusTools
can create from readily available hydrograhic inputs. The vignette begins with a background needed to understand what these attributes are, and then demonstrates how to create them based on some basic input data. These attributes are documented in the NHDPlus manual, and every effort has been made to faithfully implement their meaning.
While the nhdplusTools
package contains other functions to generate network attributes, (e.g. get_pfaf()
for Pfafstetter codes and get_streamorder()
for stream orders) this vignette focuses on the advanced network VAAs from the NHDPlus data model that revolve around the hydrosequence and levelpath.
A network of flowpaths can be represented as an edge-to-edge (e.g. edge list) or edge-node topology. An edge list only expresses the connectivity between edges (flowpaths in the context of rivers), requiring nodes (confluences in the context of rivers) to be inferred.
As of 2/2022 nhdplusTools
works on edge list representations only. The table and simple graphics below depict both an edge-node and edge-to-edge topology.
print.data.frame(data.frame(ID = c(1, 2, 3), toID = c(3, 3, NA), fromnode = c("N1", "N2", "N3"), tonode = c("N3", "N3", "N4")), row.names = FALSE)
x <- c(1, 5, 3, 3) y <- c(5, 5, 3, 1) par(mar = c(0, 0, 0, 0)) plot(x, y, col = NA) arrows(x[1] + 0.1, y[1] - 0.1, x[3] - 0.1, y [3] + 0.1, 0.1) arrows(x[2] - 0.1, y[2] -0.1, x[3] + 0.1, y [3] + 0.1, 0.1) arrows(x[3], y[3] - 0.1, x[4], y [4] + 0.1, 0.1) text(c(2, 4, 3.15), c(4.2, 4.2, 2), c("1", "2", "3")) par(mar = c(0, 0, 0, 0)) plot(x, y) arrows(x[1] + 0.1, y[1] - 0.1, x[3] - 0.1, y [3] + 0.1, 0.1) arrows(x[2] - 0.1, y[2] -0.1, x[3] + 0.1, y [3] + 0.1, 0.1) arrows(x[3], y[3] - 0.1, x[4], y [4] + 0.1, 0.1) text(c(2, 4, 3.1), c(4.2, 4.2, 2), c("1", "2", "3")) text(c(1, 5, 3, 3.25), c(4.8, 4.8, 3.4, 1), c("N1", "N2", "N3", "N4"))
The "toID" of a terminal flowpath can be either NA or, by convention, 0. Using 0 is preferred within nhdplusTools
but both are handled in most cases. Further, as long as 0 is not in the set of IDs, there is little practical difference.
In nhdplusTools
, edge-to-edge topology is referred to with "comid and tocomid" attributes, or a more general, "ID and toID" depending on the function in question.
source(system.file("extdata/new_hope_data.R", package = "nhdplusTools")) new_hope_flowline <- get_sorted(new_hope_flowline[c("Hydroseq", "DnHydroseq")]) new_hope_flowline$sort <- seq(nrow(new_hope_flowline), 1) par(mar = c(0, 0, 0, 0)) plot(sf::st_geometry(new_hope_flowline), col = NA) plot(new_hope_flowline["sort"], add = TRUE, lwd = 2)
The NHDPlus data model includes an attribute called hydrosequence that is functionally a topological sort of the flowpath network. It provides an integer identifier guaranteed to decrease in the downstream direction. For flowpaths that are not connected by a single direction navigation (e.g. parallel tributaries) the hydrosequence has no significance. However, when two flowpaths have a direct navigation, the downstream flowpath will always have the smaller hydrosequence. nhdplusTools
supports creation of hydrosequence with the get_sorted()
function.
It is hard to overstate the importance of hydrosequence as any function that requires understanding upstream-downstream relationships requires a sorted version of the flowpath network. In the NHDPlus data model, a edge-list topology is stored in the form of a hydrosequence and 'to hydrosequence' attribute. The equivalent is available in nhdplusTools
, but does not use the to hydrosequence convention, preferring the primary identifier (ID or comid) and an accompanying toID/tocomid.
source(system.file("extdata/new_hope_data.R", package = "nhdplusTools")) lp <- data.frame(lp = sort(unique(new_hope_flowline$LevelPathI))) lp$id <- seq_len(nrow(lp)) new_hope_flowline <- dplyr::left_join(new_hope_flowline, lp, by = c("LevelPathI" = "lp")) par(mar = c(0, 0, 0, 0)) plot(sf::st_geometry(new_hope_flowline), col = NA) plot(new_hope_flowline["id"], add = TRUE, lwd = 2)
A level path is derived from "stream level" which assigns an integer value to mainstem rivers from outlet up the network (see NHDPlus documentation for more). Rivers terminating to the ocean are given level 1 and this level extends all the way to the headwaters. Rivers terminating in level 1 rivers are given level 2, and so on.
"Stream leveling", then, is the process of establishing uniquely identified "level paths" through a stream network. This is accomplished with a set of rules that determine which tributary should be considered dominant at every confluence to establish the "mainstem rivers" for each "drainage basin" in a network. nhdplusTools
supports creation of streamlevel with the get_streamlevel()
function, and the creation of level path with get_levelpath()
. The convention used in NHDPlus is to assign the levelpath as the hydrosequence of the path's outlet.
See Mainstems: A logical data model implementing mainstem and drainage basin feature types based on WaterML2 Part 3: HY Features concepts for an in depth discussion of these concepts.
A number of additional attributes can be derived once levelpath
and hydrosequence
are established. These include:
terminalpa
): the identifier (hydrosequence or primary id) of the terminal flowpath of network.uphydroseq
): the identifier of the upstream flowpath on the mainstemdnhydroseq
): the identifier of the downstream flowpath on the mainstemuplevelpat
): the identifier of the next upstream levelpath on the mainstemdnlevelpat
): the identifier of the next downstream levelpath on the mainstempathlength
): The distance to the network outlet downstream along the main path.totdasqkm
): Total accumulated area from upstream flowpath's catchment area.arbolatsu
): The total accumulated length of upstream flowpaths.terminalfl
): A simple 0 or 1 indicating whether a flowpath is a terminal path or not.Creating levelpath and hydrosequence identifiers requires a set of base attributes that include:
fromnode
/ tonode
or ID
/ toID
: from and to nodes can be used to generate an edge to edge flowpath topology. Note that "ID/toID" is "comid/tocomid" in some nhdplusTools
functions.
length
: a length is required for each flowpath in the network to determine a flow distance, and, if using the arbolate sum for stream leveling.
area
: the local drainage area of each flowpath is useful in many contexts but is primarily used to calculate total drainage area.
weight
: a weight metric is required for stream leveling to determine the dominant upstream flowpath. In the NHD, the arbolate sum is used however alternative metrics (e.g. total drainage area) can be used instead.
nameID
: Many times it is preferable to follow a consistently named path (e.g. GNIS) rather a strict physical weight when stream leveling. In these cases a nameID
can be provided.
divergence
: in order to create a [many:1] upstream to downstream topology, diverted paths must be labeled as such. This attribute, is 0 for normal (already [many:1]) connections, 1 for the main path through a divergence, and 2 for any diverted path.
feature type (ftype
): used to determine whether a feature is a stream, a coastal path, or some other type of connected network feature. This is the 'ftype' in the NHD data model.
NOTE: The nhdplusTools
package does not support creation of all attributes discussed here, however, those that are not directly supported can be created based on basic transformations of attributes that nhdplusTools
does support.
To illustrate the above concepts and attributes, we'll start with the "New Hope" demo data included in the nhdplusTools
package and add a tocomid attribute based on the edge-node topology included in the data.
# Import data source(system.file("extdata/new_hope_data.R", package = "nhdplusTools")) # Strip the data back to the required base attributes fpath <- get_tocomid( dplyr::select(new_hope_flowline, COMID, FromNode, ToNode, Divergence, FTYPE, AreaSqKM, LENGTHKM, GNIS_ID) ) # Print head(fpath <- select(sf::st_cast(fpath, "LINESTRING"), -tonode, -fromnode, -divergence, -ftype))
After removing attributes used to generate the tocomid
attribute, we have a comid
and tocomid
relation representing the connectivity of the network as well as attributes required to generate a sorted network with get_sorted()
head(fpath <- get_sorted(fpath, split = TRUE))
The get_sorted()
function sorts the flowpaths such that headwaters come first and the terminal flowpath last. Additionally, it produces a terminalID
representing the outlet ID of the network. If multiple terminal networks had been provided, the terminalID
would allow us to group the data by complete sub networks (a convenient parallelization scheme).
In contrast to the NHD, where the terminal path is identified by the hydrosequence ID of the outlet flowpath (meaning the outlet of a level path is left to a user to generate), nhdplusTools
uses the more stable primary ID for identifying outlets to allow the hydrosequence / topo sort attribute to be generated and discarded as needed.
We can visualize this sorting by assigning a temporary "hydrosequence" value to the sorted network row wise. Here, we assign the first rows in the sorted set to large values and the last rows to small values in line with the hydrosequence order convention in NHDPlus.
fpath['hydrosequence'] <- seq(nrow(fpath), 1) plot(fpath['hydrosequence'], key.pos = NULL)
To generate a levelpath attribute, a "physical" weight is needed to determine the upstream mainstem at divergences. For this example, we'll follow the NHD convention and calculate the arbolate sum explicitly. The get_levelpaths()
function will add arbolate sum internally if no weight is explicitly defined.
# Rename and compute weight fpath[["arbolatesum"]] <- calculate_arbolate_sum( dplyr::select(fpath, ID = comid, toID = tocomid, length = lengthkm)) plot(sf::st_geometry(fpath), lwd = fpath$arbolatesum / 10)
A nameID
identifier can also be provided to override the physical weight
when a "smaller" river has the same name. There is an optional override_factor
parameter that signifies if the physical weight is override_factor
times (e.g. 5) larger on an unnamed or differently named upstream path, the physical weight will be used in favor of the named ID.
As mentioned above, nhdplusTools
favors more general naming of to/from nodes then the NHD, so names are modified accordingly.
# Get levelpaths lp <- get_levelpaths( dplyr::select(fpath, ID = comid, toID = tocomid, nameID = gnis_id, weight = arbolatesum), status = FALSE, override_factor = 5) # Print head(fpath <- dplyr::left_join(fpath, lp, by = c("comid" = "ID"))) plot(fpath["topo_sort"], key.pos = NULL, reset = FALSE) plot(fpath["levelpath"], key.pos = NULL)
Note that the get_levelpaths
adds an outletID
signifying the overall network outlet ID (not topo sort / hydrosequence) by primary identifier.
Finally, let's visualize these advanced VAAs!
In the below animation, each newly added level path is shown in blue, with the outlet flowpath colored in red. Remembering that get_sorted()
sorts flowpaths such that headwaters come first and the terminal flowpaths last we invert the network so that level paths fill in from outlet to head waters. For clarity, only levelpaths with more than 2 flowlines are shown.
# Invert plotting order fpath <- dplyr::arrange(fpath, topo_sort) # Level Paths with more then 2 flowpaths lp <- dplyr::group_by(fpath, levelpath) %>% dplyr::filter(n() > 2) # Unique Level Path ID lp <- unique(lp$levelpath) # Terminal Flowpath terminal_fpath <- dplyr::filter(fpath, comid %in% terminalID) gif_file <- "levelpath.gif" gifski::save_gif({ for(i in 1:length(lp)) { lp_plot <- dplyr::filter(fpath, levelpath == lp[i]) outlet_plot <- dplyr::filter(lp_plot, comid %in% outletID) plot(sf::st_geometry(fpath), lwd = 0.5, col = "grey") plot(sf::st_geometry(terminal_fpath), lwd = 3, col = "red", add = TRUE) plot(sf::st_geometry(dplyr::filter(fpath, levelpath %in% lp[1:i])), add = TRUE) plot(sf::st_geometry(lp_plot), col = "blue", add = TRUE) plot(sf::st_geometry(outlet_plot), col = "red", lwd = 1.5, add = TRUE) } }, gif_file, delay = 0.5) knitr::include_graphics(gif_file)
This entire process of sorting the network and building hydrosequence, levelpath, and derivative variables is wrapped in the add_plus_network_attributes()
function to provide performance and simplicity. It supports paralellization and will print status updates in the case when the input network is very large. add_plus_network_attributes()
returns NHDPlus attribute names (truncated per shapefile rules as is done in the NHDPlus database).
The terminalpa
, levelpathi
, dnlevelpat
, and dnhydroseq
attributes are hydroseq
identifiers as is the convention in NHDPlus, not primary identifiers (comid
s) as is returned from the base functions demonstrated above.
As of 2/2022, this function does not support the up level path (uplevelpat
) or up hydrosequence (uphydroseq
) noted in "Other Derived Network Attributes".
head(add_plus_network_attributes(dplyr::select(fpath, comid, tocomid, lengthkm, areasqkm, nameID = gnis_id), status = TRUE))
options(oldoption) if(Sys.getenv("BUILD_VIGNETTES") != "TRUE") { unlink(work_dir, recursive = TRUE) }
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.