This study leverages the respective strengths of R (for data wrangling, statistics, and figure-making) and Python (for spatial analysis and mapping). As a result, re-producing it requires going back and forth between these two languages and platforms. At the broadest level, the main steps of this analysis were the following:
[1. Pre-processing and formatting global river network environmental attributes][1. Pre-processing and formatting global river network environmental attributes] in Python.
[2. Pre-processing and formatting spatial datasets aside from hydro-environmental attributes][ 2. Pre-processing and formatting spatial datasets aside from hydro-environmental attributes] in Python.
[3. R analysis][3. R analysis] in R.
[4. Final formatting of analysis outputs for mapping][4. Final formatting of analysis outputs for mapping] in Python.
[5. Map results and comparisons][5. Map results and comparisons] in ArcMap.
Below, we briefly explain how each of these steps was implemented, but additional data not currently available publicly are needed to fully reproduce the analysis. Please contact mathis.messager@mail.mcgill.ca and/or bernhard.lehner(at)mcgill.ca for additional information should you want to re-produce the results from this study. In addition please note that processing these data takes weeks of continuous computing on a normal workstation.
Main purpose: download and compute additional hydro-environmental attributes for the global river network than those already included in RiverATLAS.
utility_functions.py:
- import key modules.
- defines utility functions used throughout the analysis.
- defines the basic folder structure of the analysis.
runUplandWeighting.py:
- define functions for routing data on river network
Downloading data requires the creation of a file called "configs.json" with login information for earthdata and alos. For guidance on formatting the json configuration file, see here.
Execute:
1. scripts for downloading data in any order
2. format_MODISmosaic.py
3. format_HydroSHEDS.py
4. format_WorldClim2.py
5. other formatting scripts in any order
6. runUplandWeighting_batch.py
7. runHydroATLASStatistics.py
Main purpose: compile and pre-process global river network; download and spatially pre-process streamflow gauging stations (reference data for model training and testing), national hydrographic datasets, and on-the-ground visual observations of flow intermittence.
utility_functions.py:
- imports key modules.
- defines utility functions used throughout the analysis.
- defines the basic folder structure of the analysis.\
setup_localIRformatting.py: - defines folder structure for formatting data to compare modeled estimates of global flow intermittence to national hydrographic datasets (Comparison_databases) and to in-situ/field-based observations of flow intermittence (Insitu_databases). - defines functions used in formatting data for the comparisons
Execute:
1. scripts for downloading data in any order
2. format_RiverATLAS.py
3. format_stations.py
4. format_FROndeEaudata.py
5. format_PNWdata.py
Main purpose: QA/QC streamflow gauging station records; develop and validate random forest models, compare predictions to hydrographic datasets and on-the-ground observations, generate tables, make non-spatial figures and generate tabular predictions for global river network and spatial data for other mapping.
The structure of the Github repository stems from the fact that this project is formatted as an R package, relies on drake for organizing the analysis workflow, renv for dependency management, and includes all documents used for this workflowr website. All files and directories whose role is mainly structural (and therefore whose content can be ignored for the analysis) are marked with a X; directories are in bold, files are in italic.
After downloading the project from Github (see Getting started), launch the R project (globalIRmap.Rproj
) and execute the following lines:
renv::restore() # respond y, restores all R packages with their specific version remotes::install_github('messamat/globalIRmap') #install project package so that the help documentation can be accessed for project functions (e.g., ?format_gaugestats)
To check the steps of the analysis, see R/IRmapping_plan.R, which contains the drake "plan", the high-level catalog of all the steps in the workflow (see the corresponding chapter in the drake user manual). This plan defines the order of functions to use, their inputs and outputs (usually, targets), and the relationship among targets and steps. The R object for a plan is a data.frame with columns named target
and command
; each row represents a step in the workflow; each command.Each command is a concise expression that makes use of our functions, and each target is the return value of the command (the target
column has the names of the targets, not the values). In this analysis, functions from planutil_functions.R were used to create complex branches in the plan (re-running the entire model and result formatting pipeline but when defining non-perennial watercourses as those that cease to flow at least 30 days per year, rather than 1 day per year).
To get information on a function, simply type ?function
in the R console. For instance, ?comp_GRDCqstats
.
Provided that your were given the necessary data, the entire analysis can simply be re-run with the following code found in interactive.R
:
library(drake) r_make() # recreates the analysis
r_make()
is the central, most important function of the drake approach. It runs all the steps of the workflow in the correct order, skipping any work that is already up to date. Because of how drake tracks global functions and objects as dependencies of targets, the use of r_make()
is needed to run the analysis pipeline in a clean reproducible environment. If all targets are up to date in the .drake/
directory, then nothing will be run.
If you were provided intermediate targets (i.e., a .drake/
directory; or once you have re-run the analysis), you can load individual targets in the environment with the following commands (even if the targets are not up to date due to e.g. a change in source path).
``` {r loadtarg, eval = FALSE}
loadd(globaltables_gad_id_cmj)
print(globaltables_gad_id_cmj)
tab <- readd(globaltables_gad_id_cmj) print(tab) ```
To reproduce the tables and (non-map) figures in the manuscript, run the following line in the R console:
rmarkdown::render('figtabres.Rmd', encoding = 'UTF-8')
. Note that nearly all figures and tables were manually adjusted for aesthetic purpose prior to inclusion in manuscript, so that the rendition of figtabres.Rmd
will not exactly match the final format in form (but it will match in content).
The R analysis produces a suite of outputs necessary for subsequent mapping and analysis, notably:
- RiverATLAS_predbasic800_20210216.csv
: comma-separated value table of model predictions for the RiverATLAS global river network.
- BasinATLAS_v10_lev03_errors.gpkg
: polygons of BasinATLAS level 3 subdivisions with attributes. Used to produce Figure 3 in the manuscript main text and Figure S6 in the Supplementary Information.
- GRDCstations_predbasic800.gpkg
and GRDCstations_predbasic800_mdur30.gpkg
: points of gauging stations used in model development and validation with reference attributes and model predictions. Used to produce Extended Data Fig. 2 and Supplementary Information Fig. S3.
- pnwobs_IPR_predbasic800cat%Y%m%d%H%M%S.shp
and ondeobs_IPR_predbasic800cat%Y%m%d%H%M%S.shp
: on-the-ground observations of flow intermittence from ONDE (France) and PROSPER (U.S. Pacific Northwest) with reference attributes and model predictions. Used to produce Extended Data Fig. 6.
Run predict_RF.py in the globalIRmap_py Github repository:
- Format global river network attributes for mapping.
- Divide and export RiverATLAS global river network (with predictions) into subsets by discharge and drainage area size classes for mapping.
- Join ONDE (France) and PROSPER (U.S. Pacific Northwest) on-the-ground observations with river network for mapping (Extended Data Fig. 6).
Please contact Mathis Messager for ArcGIS map packages to reproduce specific maps from the manuscript.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.