knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) devtools::load_all('.') VPU = "01" library(sf) library(glue) library(mapview) mapviewOptions(fgb = FALSE)
The first question generally raised is, "what is a "hydrofabric"? To date, the term has been a bit nebulous and has been used to describe artifacts as narrow as a set of cartographic lines, all the way to encompass the entire spatial data architecture needed to map and model the flow of water and flood extents.
Here, a "Ngen Hydrofabric" will include the following four parts:
The blue line (flowpaths) and per-flowline catchments feature
The flowline/catchment topologies
Attributes to support routing and runoff modeling
The software and data models to make the creation of these open, reproducible, and flexible.
In this breakdown, features define the computational elements for hydrology and hydraulic modeling; topologies link data together for space/time processing (modular elements that act as a whole); attributes provide the information for model execution (physics based, conceptual and ML/AI model formulations); and software and data models develop community standards, reproducibility, and flexibility to support analysis at scale.
knitr::include_graphics('img/what-is-a-hydrofabric.png')
Discritizing the land surface into computational elements is fundamental to all modeling tasks. Without it, distributed and lumped models have no way to apply the needed model formulations or computer science applications to achieve meaningful results. Therefore anyone who cares about the science and application of water resource modeling should care about the underlying data as it drives the locations where forecasts are made, the attributes that inform a model, and the spatial elements in which formulation are valid.
However, describing the earths surface - particularly at continental scales - is a tricky task. Automated techniques can get us a long way in representation, however the modeling task at hand and local knowledge should be used in developing an authoritative product. Through time, local knowledge has been collected in a number of places, but never centralized. Further, one off products (like the NHDPlus) have been used to guide all modeling task even in cases when its resolution, or representation is not well suited.
The aim of NOAAs work in this space is to develop a federal reference fabric to support all flavors of modeling, and a national instance of that reference fabric to support heterogeneous model application.
Equally important is the software tools to support flexibility and community uptake; the data models to support interoperability, community engagement, and long term stability; and a reference data set with the quality assurances that when one uses the product they are getting a well vetted resource that will be able to play nicely with the growing Ngen framework.
Ngen aims to provide a framework in which heterogeneous models can be used to achieve the best possible results in the highest number of places. In the framework the hydrofabric can is a malleable product that can be modified to support specific applications. However, the starting point for all variations of this product should stem from a consistent, quality reference dataset.
The central role of a reference dataset cannot be overstated as it not only allows people to avoid replication (and possible error) but also provides a source in which all variations of the data can be linked together.
For example - while there might be specific hydrofabric traits needed for runoff modeling, hydraulic routing, and inundation mapping - if all of these groups built their own product from scratch, the ability to link them together to provide actionable information would take a a number of conflation processes that may or may not be easy, let alone feasible.
Instead, if all groups were to start from a common reference system - and track the origins of that system - data conflation is simple meaning time can be spent on science rather then brute force data integration.
As part of NOAA's efforts to build Ngen, this reference fabric is being built with partners at the USGS, the private sector and the Internet of Water.
In addition to the reference fabric, software that uses this reference to support scale dependent modeling tasks are in development. The foundational layer of the Ngen model instance used as the National Water Model will be based on an authoritative instance of a hydrofabric that will dictate where forecasts are made, where model formulations are run, how the model engine passes data through time and space, and how model as a service visualizes, subsets, and disseminates data.
So, can you build your own? For now all of these products are under heavy development but are not close to final. The anticipated goal for finishing these products in a beta release are fiscal year 2023. When this time comes, you will be able to both generate your own modelling task based data product using the NOAA reference fabric and software tools, and/or utilize subsets of the fabric used for the operational modeling task.
In either case, building or finding a Ngen ready hydrofabric requires the hyAggregate
R package that can be installed from Github:
remotes::install_github("NOAA-OWP/hyAggregate")
The goal of hyAggregate
is to find and/or develop aggregated hydrologic and hydraulic networks to inform modeling tasks reliant on uniform length (flowpath) and area (catchment) distributions.
In this document we illustrate the tool set for working with the established reference fabric to create Ngen ready data products. We focus on VPU 01 (east coast USA), as it is the most heavily tested. The workflow described here will work on any reference fabric artifact once they are produced, validated, and published to ScienceBase.
The following steps walk you through the concepts and tools for building a Ngen ready dataset, what the outputs look like, and how you might interact with them. hyAggregate
is part of a larger family of projects and packages aiming to support federal (USGS/NOAA) water modeling efforts. The whole suite of development tools can be installed with:
remotes::install_github("NOAA-OWP/hydrofabric")
Attaching this library, similar to the tidyverse
, installs and loads a canon of software designed to manipulate, modify, describe, process, and quantify hydrologic networks and land surface attributes:
library(hydrofabric)
It includes the following:
| Repo | Purpose | | ------------- |:-------------:| | USGS-R/nhdplusTools | Tools for for network manipulation | | dblodgett-usgs/hyRefactor | Tools for network refactoring | | NOAA-OWP/hyAggregate | Tools for distribution based network aggregation and Ngen file creation | | mikejohnson51/opendap.catalog | Tools for for accessing remote data resources for parameter and attributes estimation | | NOAA-OWP/zonal | Tools for rapid catchment parameter summarization |
hyAggregate
relies on data products within the Geospatial Fabric for National Hydrologic Modeling, version 2.0 project, the general outline of which, can be seen below:
knitr::include_graphics("img/roadmap.png")
In the first row of Figure 2, there are three (3) reference products.
An updated network attributes table that provides attributes for the network features in the data model of NHDPlusV2, but with substantial improvements based on contributions from the USGS, NOAA OWP, NCAR and others.
A set of reference catchment geometries in which geometric errors and artifacts in the NHDPlus CatchmentSP layer are corrected/removed.
A set of reference flowline geometries where headwater flowlines have been replaced with the NHDPlus Burn lines
The CONUS reference files for these datasets can be downloaded here respectively (attributes, catchments, flowpaths)
In the second row of Figure 2, the reference products are refactored based on a minimum flowpath criterion. This process is facilitated by the hyRefactor
^[Blodgett, D., Johnson, J.M., 2022, hyRefactor: Tools for Manipulating the NHDPlus Network in Preparation for Hydrologic Modeling] R package. The concept of refactoring includes (1) Splitting large or long catchments in the reference data to create a more uniform catchment size distribution and (2) collapsing catchment topology by removing very small inter and intra confluence segments, and to merge very small headwaters. The goal of refactoring is to NOT reduce the fidelity of the network, but instead to move the network to a more uniform/coherent version of the network.
As with all function from here on out, refactoring is a parameter-based workflow and the selection parameters will impact the resulting network. For the National refactored fabric, the following were selected:
| Parameter | Purpose | Elected Value | | ------------- |:-------------:| -----:| | split_flines_meters | the maximum length flowpath desired in the output. | 10,000| | collapse_flines_meters | the minimum length of inter-confluence flowpath desired in the output. | 1,000 | | collapse_flines_main_meters | the minimum length of between-confluence flowpaths.| 1,000 |
The refactored output is shared under the Refactored Parent item of the above ScienceBase resource (available here)
::: {#hello .blackbox .left} NOTE:
There is a difference between the use of flowline and flowpath. Per the hy_features standard, a catchment can have one primary flowpath (1:1) but multiple flowlines (1:many).
There is also a difference in the concept of a catchment, which is the holistic unit of hydrology, and a catchment divide which is the edge bounded by an inflown and outflow node
For the development of the reference fabric and modeling task outputs, we seek to define a set of divides and corresponding flowpaths from the reference flowlines and catchments.
:::
All reference, refactored and Ngen hydrofabric archives live on ScienceBase. They can be accessed with the web interface or can be downloaded programatically.
The hyAggregate::get_reference_fabric()
utility will download the most current geofabric for a Vector Processing Unit (VPU). Options include downloading the "refactored" (default) or "reference" data. If the requested file already exists the file path will be returned. Here we can find the local path to the reference fabric for VPU=01 in the ./data
directory, and explore the layers contained within:
VPU = "01" ref_file = get_reference_fabric(VPU = VPU, type = "reference", dir = "data") st_layers(ref_file)
Within the reference fabric artifacts - the reference_flowline
and reference_catchment
layers are those associated with the reference data (row 1 in figure 2). (NOTE: In some artifacts, these are called nhd_flowlines
and nhd_catchments
per this issue)
We can also request the data products associated with the national refactor:
file = get_reference_fabric(VPU = VPU, type = "refactored", dir = "data") st_layers(file)
Within the refactored fabric artifacts - the refactored_flowpaths
and refactored_divides
layers are the output of the refactoring process (row 2 in figure 2). The remaining layers are those central to the USGS gfv2.0
modeling task (row 3 in figure 2). (NOTE: In some artifacts, these are called reconciled
and divides
per this issue)
The NOAA Next Generation Water Resource Modeling Framework (Ngen) is a specific modeling task. hyAggregate
houses the workflow(s) for generating the needed output for the NOAA NextGen
modeling task, that starts from the refactored_flowpaths
and refactored_divides
layers.
It operates under the assumption that there is no "one model to rule them all" and that different model formulations (e.g. topmodel, wrf-hydro, LSTM) will work better in different locations. Formulations are then modules that can execute at the scale of the catchment. The aim of the hydrofabric is to encapsulate the higher level notion of a catchment (see the HY_features specification) such that a variety of hydraulic and hydrologic models can execute and exchange information.
knitr::include_graphics("img/right-tool-right-place.png")
Because of the notion is to model the runoff process in a heterogeneous way, there is an interest in trying to align the scale of the catchment artifacts to the scale of the hydrologic processes being simulated. The scale identified for this initial run is the 3 - 15 square kilometers, with an ideal size of 10 sqkm.
As seen below, this is not even close to the area/length distributions found in either the reference or refactored artifacts.
library(dplyr) library(ggplot2) reference = read_sf(ref_file, "reference_flowline") |> st_drop_geometry() |> select(areasqkm = AreaSqKM, lengthkm = LENGTHKM) refactored_length = read_sf(file, "refactored_flowpaths") |> st_drop_geometry() |> select(ID, lengthkm = LENGTHKM) refactored = read_sf(file, "refactored_divides") |> st_drop_geometry() |> select(ID, areasqkm) |> left_join(refactored_length, by = 'ID') ggplot() + geom_density(data = reference, aes(x = areasqkm), color="#69b3a2", size = 2) + geom_density(data = refactored, aes(x = areasqkm), color= "#404080", size = 2) + xlim(0,25) + ylim(0,.6) + geom_vline(xintercept = 3) + geom_vline(xintercept = 10, size = 1) + geom_vline(xintercept = 15) + theme_light() + labs(x = expression("Area (km²)"), title = "Area Distribution", subtitle = paste0(sum(reference$areasqkm >= 25), ' catchments removed from reference (>25 km²)\n', sum(refactored$areasqkm >= 25), ' divides removed from refactored (>25 km²)')) + annotate("rect", xmin = 3, xmax = 15, ymin = 0, ymax = .6, alpha = .1) + scale_y_continuous("Density", expand = c(0,0)) + geom_label( aes(x=10, y=0.5, label= "Idealized Range"), color="black") + geom_label( aes(x=4, y=0.4, label=paste(nrow(reference), "\nreference catchments")), color="#69b3a2", fill = "white") + geom_label( aes(x=6.5, y=0.2, label= paste(nrow(refactored), "\nrefactored divides")), color="#404080") ### ggplot() + geom_density(data = reference, aes(x = lengthkm), color="#69b3a2", size = 2) + geom_density(data = refactored, aes(x = lengthkm), color= "#404080", size = 2) + xlim(0,10) + ylim(0,.75) + geom_vline(xintercept = 1, size = 1) + theme_light() + labs(x = "Length (km)", y = "Density", title = "Length Distribution", subtitle = paste0(sum(reference$lengthkm >= 10), ' flowlines from reference (>10 km)\n', sum(refactored$lengthkm >= 10), ' flowpaths removed from reference (>10 km)')) + geom_label( aes(x=1, y=0.75, label= "Minimum Length"), color="black") + geom_label( aes(x=2, y=0.5, label=paste(nrow(reference), "\nreference flowlines")), color="#69b3a2", fill = "white") + geom_label( aes(x=5, y=0.2, label= paste(nrow(refactored), "\nrefactored flowpaths")), color="#404080")
To overcome the mismatched scale in the reference/reference fabric and the Ngen modelling task, we need to aggregate the catchments to a defined user threshold/distribution. Therefore, Ngen is a distribution based modeling task that seeks to align the scale of process to scale of catchment representation.
hyAggregate
allows users to prescribe an ideal catchment size, a minimum catchment area, and a minimum flowpath length. The default parameters used for Ngen are an ideal catchment size of 10 km², a minimum catchment size of 3 km², and a minimum flowpath length of 1 km. A refactored_fabric
can be aggregated to this distribution using hyAggregate
:
ngen_v1 = aggregate_network_to_distribution(gf = get_reference_fabric(VPU = VPU, dir = "data"), outfile = glue("data/01_ngen_{VPU}.gpkg"), nexus_topology = FALSE) st_layers(ngen_v1)
You'll see that aggregating the network to this scale requires losing resolution of the network. In its current iteration Ngen requires a 1:1 flowpath to catchment relationship. In time the ability to support 1:many relationship will be developed so that the flowpath and/or catchment network can be densified in areas where it is needed. Until then, the 1:1 requirement mean that for each catchment a primary mainstem must be elected, and it is this mainstem only that is reflected in the final aggregated product.
The outputs of the base hyAggregate
function create a geopackage with a set of catchment and flowpath features. Below we see the desired distribution has been enforced on the resulting hydrofabric. A more complete discussion of what happens within the aggregation can be found in the code and will be documented latter on.
agg = read_sf(ngen_v1, "aggregate_flowpaths") |> st_drop_geometry() |> select(areasqkm, lengthkm) ggplot() + geom_density(data = reference, aes(x = areasqkm), color="#69b3a2", size = 2) + geom_density(data = refactored, aes(x = areasqkm), color= "#404080", size = 2) + geom_density(data = agg, aes(x = areasqkm), color= "red", size = 2) + xlim(0,25) + ylim(0,.6) + geom_vline(xintercept = 3) + geom_vline(xintercept = 10, size = 1) + geom_vline(xintercept = 15) + theme_light() + labs(x = expression("Area (km²)"), title = "Area Distribution", subtitle = paste0(sum(reference$areasqkm >= 25), ' removed from reference (>25 km²)\n', sum(refactored$areasqkm >= 25), ' removed from refactored (>25 km²)\n', sum(agg$areasqkm >= 25), ' removed from ngen (>25 km²)\n')) + annotate("rect", xmin = 3, xmax = 15, ymin = 0, ymax = .6, alpha = .1) + scale_y_continuous("Density", expand = c(0,0)) + geom_label( aes(x=10, y=0.5, label= "Idealized Range"), color="black") + geom_label( aes(x=4, y=0.4, label=paste(nrow(reference), "\nreference catchments")), color="#69b3a2", fill = "white") + geom_label( aes(x=6.5, y=0.2, label= paste(nrow(refactored), "\nrefactored divides")), color="#404080") + geom_label( aes(x=15, y=0.1, label= paste(nrow(agg), "\naggregate divides")), color="red") ggplot() + geom_density(data = reference, aes(x = lengthkm), color="#69b3a2", size = 2) + geom_density(data = refactored, aes(x = lengthkm), color= "#404080", size = 2) + geom_density(data = agg, aes(x = lengthkm), color= "red", size = 2) + xlim(0,10) + ylim(0,.75) + geom_vline(xintercept = 1, size = 1) + theme_light() + labs(x = expression("Length (km)"), title = "Length Distribution", subtitle = paste0(sum(reference$lengthkm >= 10), ' removed from reference (>10 km)\n', sum(reference$lengthkm >= 10), ' removed from refactored (>10 km)')) + geom_label( aes(x=1, y=0.75, label= "Minimum Length"), color="black") + geom_label( aes(x=2, y=0.6, label=paste(nrow(reference), "\nreference flowlines")), color="#69b3a2", fill = "white") + geom_label( aes(x=4, y=0.35, label= paste(nrow(refactored), "\nrefactored flowpaths")), color="#404080") + geom_label( aes(x=6, y=0.2, label= paste(nrow(agg), "\naggregate flowpaths")), color="red")
Below is a spatial view of this process. The gray catchments with white outlines are the "refactored_divides" while the yellow lines are the "refactored_flowpaths". The red, hollow catchments are the "aggregated_divides" while the blue lines are the "aggregated_flowpaths". This image hopefully illustrates the level of manipulation that occurs in network as a result of the requested parameters. When viewing this image the white visible edges are dissolved and the yellow flowpaths are dropped. In cases where catchments are dissolved, the associated flowpaths and topologies are modified to reflect the changes.
knitr::include_graphics("img/example.png")
All core hydrofabric data is distributed as a geopackage segmented by vector processing unit. Geopackages are spatial databases that can be used with many software suites, as well as programming languages that call GDAL (e.g. R, python, Rust). Once read in, the data layers can be mapped with base products, packages (In R, sf, ggplot2, leaflet, mapview -- In python fiona, geopandas, geoplot, matlibplot, etc), and GUI's (QGIS).
In R, here is are a few basic examples:
agg = read_sf(ngen_v1, "aggregate_flowpaths") plot(agg$geom)
#Define area and project to same Coordinate Reference System AOI = AOI::aoi_get("Boston") |> st_transform(st_crs(agg)) boston_flow = st_intersection(agg, AOI) plot(boston_flow$geom) plot(AOI$geometry, add = TRUE)
mapview::mapview(boston_flow)
In QGIS, double clicking the gpkg file will allow you to select which layers to load.
The flowpath/divides produced in the aggregation provide the features that discritize the landscape and river network into the computational elements that will be used. The next construct is that we need a description of how these features are connected and which
To date, the NHDPlus topology (and many other flow network models) rely on flowpath-to-flowpath connectivity that describes the network in terms of how flowpaths connect to other flowpaths (therefore, which divides connect to other divides). The reference and refactored products retain this 'flowline-to-flowline' connectivity as does the ngen_aggregation when nexus_topology=FALSE
.
topology = read_sf(ngen_v1, 'aggregate_flowpaths') |> st_drop_geometry() |> select(id, toid) head(topology)
# Get the first id/toid relation: example = read_sf(ngen_v1, 'aggregate_flowpaths') |> filter(id %in% topology[1,]) agg_cat = read_sf(ngen_v1, "aggregate_divides") |> filter(id %in% topology[1,]) ggplot(data = example, aes(color = as.factor(id))) + geom_sf(size = 4) + labs(title = "Row 1 of the Topology Table:", subtitle = glue("Flowpath {topology$id} flows to Flowpath {topology$toid}"), color = "ID") + theme_light() + theme(legend.position = "bottom")
Ngen is moving away from 'flowline-to-flowline' model towards 'area-to-point' model. Here the idea of 'area' is wrapped up into the holistic idea of a HY_feature catchment and 'point' is captured by the conceptual idea of a hydro nexus.
::: {#hello .blackbox .left} hydro(-logic) Catchment:
A physiographic unit where hydrologic processes take place. This class denotes a physiographic unit, which is defined by a hydrologically determined outlet to which all waters flow. While a catchment exists, it may or may not be clearly identified for repeated study. This approach considers the catchment concept to be the basic unit of study in hydrology, water resources management, and environmental reporting. The approach is meant to be holistic, referring to the continuous interaction of surface and subsurface waters within a catchment, even if a particular representation of the catchment refers to only surface or subsurface aspects of the catchment.
Conceptual outlet for water contained by a catchment. The hydro nexus concept represents the place where a catchment interacts with another catchment. Every catchment flows to a hydro nexus, conversely every location in a hydrologic system can be thought of as a hydro nexus that drains some catchment. Similar to catchments, hydro nexuses can be realized in several hydrology-specific ways.
If a given hydro nexus does not have a known hydrology-specific realization or is undetermined, it is termed 'nillable' in this standard. For example, a hydro nexus exists in the form of flow to the subsurface or atmosphere but may be undetermined and unrepresented within implementations focused on surface water hydrology and would not be included or referenced.
:::
We are doing this to be more compliant with the HY_Features conceptual model for representing surface water components and to ultimately help develop the community data model needed for implementing the hy_feature concepts. This type of flow model was needed due to the scale of aggregation Ngen requires, and the
The long term advantage of this approach is the ability to use multi-scale networks, leverage community efforts in hydrologic addressing and representation, and to have a more defined computational element for Ngen comparability.
In practice, this yields a scenario like below, where catchments may not flow to the coincident catchment but sometimes to the downstream nexus. This is highlighted where 'cat-4' not longer flows to 'cat-3' but instead to 'nex-2'. At the same time, 'cat-3' is also flowing to 'nex-2' suggesting two areas (holistic catchments) are flowing to one point (a hydro nexus).
knitr::include_graphics("img/nexus-topology-example.png")
There are two principle types of nexus locations, those that are implied by the network, and those that occur at locations of interest (hydrolocations). So far, Ngen only works with the first. To generate the 'nexus_topology' (opposed to flowline) and materialize the implied nexus locations, we can set the nexus_topology=TRUE
and rerun the aggregation code.
ngen_v2 = aggregate_network_to_distribution( gf = get_reference_fabric(VPU = VPU, dir = "data"), outfile = glue("data/02_ngen_{VPU}.gpkg")) st_layers(ngen_v2)
This option produces a non-spatial "flowpath_edge_list" table layer within the geopackage. Opening this table highlights the area to point topology where the area (holistic catchment) is realized as the waterbody ('wb-') of the flowpath.
head(read_sf(ngen_v2, 'flowpath_edge_list'))
The choice to use the waterbody realization, is tied to the need for network topology to support hydrologic and hydraulic routing.
The core of the Ngen modeling system uses t-route for hydrologic and hydraulic routing. t-route
was designed to support the existing NWM and as such relies on routing parameters defined in the NWM Routelink file.
The get_routelink_path()
function will search your hyAggregate
R directory for a fst representation of the most current routelink file. If this file does not exist, then the code will (1) Check for the latest version of the nwm domain files on ncep. (2) Download the NetCDF file (3) extract it into a fst file shown below, and (4) save it to the specified directory.
path = get_routelink_path() fst::metadata_fst(path)
By specifying a routelink_path
, aggregate_network_to_distribution
knows to develop length-averaged routelink parameters and associations (gages and NHDWaterbodies). The length averaging is done by determining the proportion of each reference flowline (COMID) that falls within each aggregated catchment. The associated values are then weighted by the relative length:
ngen_v3 = aggregate_network_to_distribution( get_reference_fabric(VPU = VPU, dir = "data"), outfile = glue("data/03_ngen_{VPU}.gpkg"), nexus_topology = TRUE, routelink_path = get_routelink_path())
Lets peak at the developed layers and the contents of the new flowpath_attributes table:
st_layers(ngen_v3) head(read_sf(ngen_v3, 'flowpath_attributes'))
While routing parameters and topologies are critical for modeling the flow of water, runoff
formulations are critical for simulating how much water will be in the channel.
The capacity for Ngen to encapsulate the idea of a catchment and implement a suite of different model formulations requires that critical attribute information is available to inform the wide range of models.
Since Ngen runs runoff models at the catchment scale, and not at the gridded scale (at least as of now), there is a need to summarize gridded information to the catchment scale in a fast, flexible, and performant way.
This requires the ability to identify and then quickly and accurately summarizes raster values over polygonal areas including partially covered grid cells. Once the divide/raster cell relationships are established, we must define the way in which those values are summarized (collapsed). For example, do we want the "max temperature per catchment" or the "mean temperature per catchment"?
For more information on how this is accomplished please see the zonal package which has produced the ability to summarize decades of data, for areas as large as the east coast, in minutes on a personal laptop (this package is included in the hydrofabric
meta package).
For the initial Ngen modules (cfe and NOAH-OWP-modular) the following variables from the NWM gridded data are summarized using the following functions:
| Tables | Description | Layer(s) | Summary Function | Source | | ------------- |:-------------:| -----:| -----:| -----:| | bexp | Beta Parameter| 4 | mode | soilproperties_CONUS_FullRouting.nc | | IVGTYP | Dominant category | 1 | mode | wrfinput_CONUS.nc | | ISLTYP | Dominant category | 1 | mode | wrfinput_CONUS.nc | | dksat | Saturated Soil Connectivity | 4 | geometric mean | soilproperties_CONUS_FullRouting.nc | | psisat | Saturated soil matric potential | 4 | geometric mean | soilproperties_CONUS_FullRouting.nc | | slope | Slope Index | 1 | mean | soilproperties_CONUS_FullRouting.nc | | smcmax | Saturated value of soil moisture [volumetric] | 4 | mean | soilproperties_CONUS_FullRouting.nc | | smcwlt | Wilting point soil moisture [volumetric] | 4 | mean | soilproperties_CONUS_FullRouting.nc | | refkdt | Parameter in the surface runoff parameterization | 1 | mean | soilproperties_CONUS_FullRouting.nc | | cwpvt | Empirical canopy wind parameter | 1 | mean | soilproperties_CONUS_FullRouting.nc | | vcmx25 | Maximum rate of carboxylation at 25 C [ umol CO2/m2/s] | 1 | mean | soilproperties_CONUS_FullRouting.nc | | mp | Slope of Conductance to photosynthesis relationship | 1 | mean | soilproperties_CONUS_FullRouting.nc | | mfsno | Snowmelt m parameter | 1 | mean | soilproperties_CONUS_FullRouting.nc | | Coef | Coefficient | 1 | mean | GWBUCKPARM_CONUS_FullRouting.nc | | Zmax | Zmax | 1 | mean | GWBUCKPARM_CONUS_FullRouting.nc | | Expon | Exponent | 1 | mode | GWBUCKPARM_CONUS_FullRouting.nc |
For convenience, these layers have been combined and stored as a single NetCDF on ScienceBase. The get
get_nwm_grids
function in hyAggregate will either return a file.path
if it exists in the provided directory; download the cached file from ScienceBase (if cache_download = TRUE); or, build it from the files available on NCEP.
This summarization is wrapped in the function aggregate_cfe_noahowp
, but can be reworked using zonal
if more variables, or different summary functions are desired:
ngen_v4 = glue("data/04_ngen_{VPU}.gpkg") #file.copy(ngen_v3, ngen_v4)
ngen_v4 = aggregate_cfe_noahowp(gpkg = ngen_v4, dir = 'data', add_to_gpkg = TRUE)
st_layers(ngen_v4) dplyr::glimpse(read_sf(ngen_v4, 'cfe_noahowp_attributes')) tmp = left_join(read_sf(ngen_v4, "aggregate_divides"), read_sf(ngen_v4, "cfe_noahowp_attributes"), by = "id") ggplot() + geom_sf(data = tmp, aes(fill = gw_Zmax), color = NA) + scale_fill_viridis_c()
Soon we will be producing the data, weight grids, and routines to produce and validate all attributes from CAMELS for a given, arbutrary Nextgen catchment. This will aim to support machine learning models and further extend the parameter space that model formulation have access too. When complete the weight grids and parameters will be added to the released geopackages.
National aggregates will also be produced as parquet files to support more optimal cloud based data access and visualization routines.
It is worth highlighting that the geopackage is a critical file format for sharing this type of data because it is (1) self containing (2) compact and (3) language agnostic. There are two broad paths to accessing the data. The first is a GDAL based approach (in R this is using sf, but in python Fiona or geopandas works well).
# Read Options pacman::p_load(sf, DBI, RSQLite, dplyr) ### 1. GDAL/geopackage st_layers(ngen_v4) nex = read_sf(ngen_v4, "nexus") head(nex)
The second is as a SQLite database (in R, I am doing this with RSQLite/DBI/dplyr):
### 2. SQLite/Database db <- dbConnect(SQLite(), ngen_v4) dbListTables(db) fpa = collect(tbl(db, "flowpath_attributes")) head(fpa) dbDisconnect(db)
While the geopackage serves as a concise, interoperaple, and efficient format for distributing packaged data that is tabular and spatial it does not meet everyone's need. In particular ngen operates using JSON/geoJSON files. Similarly the ESRI family of tools (e.g. ArcMap/ArcPro/ArcGIS Online) cannot read geopackges.
As such, we offer a final function that generates ngen files, and optionally shapefiles, from a final GPKG.
write_ngen_dir(ngen_v4, export_shapefiles = TRUE)
This function creates a directory at the same level and named as the GPKG and outputs the following files:
fs::dir_tree(glue("data/04_ngen_{VPU}"))
If you have made it this far - thanks! A small detail you might notice is that in the data/ngen_01
directory created from the GPKG, there is a new file called crosswalk-mapping.json
.
This file is critical for data disambiguation, mapping the output files to the original NHD comids (NWM forecasts locations), to finding NWIS stream gages, and to linking maistem IDs to the PIDs within the Internet of Water knowledge graph. Looking in this file we find a nested list object:
library(jsonlite) cw = fromJSON('data/04_ngen_01/crosswalk-mapping.json') length(cw)
In this crosswalk, all features have an outlet comid, member comids (when aggregated) and a Mainstem Identifier. Remember COMIDs are the "common identifier in the NHDPlus and were carried into the reference fabric:
cw$`wb-16354`
Some, have reference to a USGS gage if it is located along the flowpath:
# Feature gage cw$`wb-22117`
Bty providing the outlet of each aggregated feature, we are able to extract stream flow from the 42 year retrospective run of the NWM, this is advantageous when trying to compare output from Nextgen experiments to the current state of the art. nwmTools
allows you to subset the reanalysis product by time and NHDPlus COMID.
library(ggplot2) nwm = nwmTools::readNWMdata(comid = cw$`wb-17307`$outlet_comid) ggplot() + geom_line(data = nwm, aes(x = dateTime, y = flow_cms_v2.1)) + theme_light() + labs(x = 'DateTime', y = "Q (cms)", title = "NWM v2.1 Q at the outlet of wb-17307")
The cross walk also reports the collection of "member_comids" that have been aggregated into a single feature. Often there is an interest in extracting these original features which is possible with the nhdplusTools get_* family:
data = nhdplusTools::get_nhdplus(comid = cw$`wb-16354`$member_comids) plot(data['comid'])
For now, we are only indexing NWIS stations based on there COMID mapping in the NWM routelink file. As we make more progress, the crosswalk file will contain references to the POIs created as part of the reference fabric seen below.
catchments = sf::read_sf(ngen_v4, "aggregate_divides") pois = sf::read_sf(ref_file, 'POIs') grep("Type_", names(pois), value = TRUE) mapview(catchments) + pois
The indexing of these locations will not only provide more opportunity for modeling tasks and more realistic network aggregation, but it will allow for the inclusion of explicit and implicit nexus selection in the Ngen hydrofabric.
So the last step here is a non-function-ized way to use these files in conjunction to subset the network based on the flowpaths that flow to a USGS gage.
In the above code chunk, we saw that wb-16354
was associated with NWIS:01172010
. To find the network upstream of that gage, we can use the topology (edge list) and the determined identifier to 'sort' the network. The returned results is a sorted set of IDs upstream of the provided outlet. These IDs can then be used to extract the flowpath or divide network.
library(nhdplusTools) system.time({ trace = get_sorted(read_sf(ngen_v4, 'flowpath_edge_list'), split = TRUE, outlets = 'wb-16354') flowpaths = filter(read_sf(ngen_v4, 'aggregate_flowpaths'), id %in% trace$id) }) plot(st_geometry(flowpaths), main = paste(nrow(flowpaths), "flowpaths upstream of 'wb-16354'"))
The hyAggregate
repo has a "help wanted" tag that can be applied to Issues. I am hopeful the communitee can engage there, ask questions, point out all the things that we have wrong (I likely own all the mistakes r emo::ji("smile")
). Otherwise please email me at [mike.johnson@noaa.gov](mike.johnson@noaa.gov].
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.