Changes to NWIS QW services"

library(knitr)
library(dataRetrieval)
library(dplyr)

options(continue = " ")

knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  fig.height = 7,
  fig.width = 7
)

IMPORTANT These recommendations have been updated using WQX3.0 profiles.

Changes to NWIS water quality services

USGS discrete water samples data are undergoing modernization, and NWIS services will no longer be updated with the latest data starting mid-March 2024, with a full decommission expected 6 months later.

For the latest news on USGS water quality data, see: https://doi-usgs.github.io/dataRetrieval/articles/Status.html. Learn more about the changes and where to find the new samples data in the WDFN blog.

What does this mean for dataRetrieval users? Eventually water quality data will ONLY be available from the Water Quality Portal rather than the NWIS services. There are 3 major dataRetrieval functions that will be affected: readNWISqw, whatNWISdata, and readNWISdata. This vignette will describe the most common workflows conversions needed to update existing scripts.

How to find more help

This vignette is being provided in advance of any breaking changes, and more information and guidance will be provided. These changes are big, and initially sound overwhelming. But in the end, they are thoughtful changes that will make understanding USGS water data more intuitive. Please reach out for questions and comments to: gs-w-IOW_PO_team@usgs.gov

WQX2 -> WQX3

If you have already converted your workflow from NWIS to WQP, much of the hard work has been done! The final step of the process is to make sure workflows are using the most modern WQX3 formats. WQX stands for Water Quality Exchange. WQX2 is the format that has been historically available on the Water Quality Portal, WQX3 is the more modern format that all the data is being converted to.

This is available as of dataRetrieval version 2.7.16. Starting with 2.7.16, all WQP functions default to the newer "WQX3" profiles if available. See WQX Conversions below to get a table of column names in WQX3 vs WQX2.

NWIS -> WQX3

readNWISqw

This function was retired as of Oct. 24, 2024.

So...what do you use instead? The function you will need to move to is readWQPqw. First, you'll need to convert the numeric USGS site ID's into something that the Water Quality Portal will accept, which requires the agency prefix. For most USGS sites this will mean pasting 'USGS-' before the site number, although it is important to note that there are some USGS sites that begin with a different prefix: it is up to users to determine the agency code..

Here's an example:

wqpData <- readWQPqw(paste0("USGS-", site_ids), parameterCd)

Let's say we have a data frame that we got from the retired readNWISqw function and we saved it as nwisData.

nwisData <- readRDS("nwisData.rds")
wqpData <- readRDS("wqpData.rds")

First we compare the number of rows, number of columns, and attributes to each return:

nrow(nwisData)
nrow(wqpData)

So, same number of rows returned. That's good, since it's the same data.

ncol(nwisData)
ncol(wqpData)

Different columns!

names(attributes(nwisData))
names(attributes(wqpData))

Slightly different attributes. You can explore the differences of those attributes:

site_NWIS <- attr(nwisData, "siteInfo")
site_WQP <- attr(wqpData, "siteInfo")

The next big task is figuring out which columns from the WQP output map to the original columns from the NWIS output. Look at your workflow and determine what columns from the original NWIS output are needed to preserve the integrity of the workflow.

Let's use the dplyr package to pull out the columns are used in this example workflow, and make sure both NWIS and WQP are ordered in the same way.

library(dplyr)

nwisData_relevant <- nwisData |> 
  select(
    site_no, startDateTime, parm_cd,
    remark_cd, result_va
  ) |> 
  arrange(startDateTime, parm_cd)

knitr::kable(head(nwisData_relevant))

If we explore the output from WQP, we can try to find the columns that include the same relevant information:

wqpData_relevant <- wqpData |> 
  select(
    site_no = Location_Identifier,
    startDateTime = Activity_StartDateTime,
    parm_cd = USGSpcode,
    remark_cd = Result_ResultDetectionCondition,
    result_va = Result_Measure,
    detection_level = DetectionLimit_MeasureA
  ) |> 
  arrange(startDateTime, parm_cd)
knitr::kable(head(wqpData_relevant))

Now we can start looking at the results and trying to decide how future workflows should be setup. Here are some decisions for this example that we can consider:

Censored values

The result_va in the NWIS service came back with a value. However, the data is actually censored, meaning we only know it's below the detection limit. With some lazier coding, it might have been really easy to not realize these values are left-censored. So, while we could substitute the detection levels into the measured values if there's an NA in the measured value, this might be a great time to update your workflow to handle censored values more robustly. We are probably interested in maintaining the detection level in another column.

For this theoretical workflow, let's think about what we are trying to find out. Let's say that we want to know if a value is "left-censored" or not. Maybe in this case, what would make the most sense is to have a column that is a logical TRUE/FALSE. For this example, there was only the text "Not Detected" in the "ResultDetectionConditionText" column PLEASE NOTE that other data may include different messages about detection conditions, you will need to examine your data carefully. Here's an example from the EGRET package on how to decide if a "ResultDetectionConditionText" should be considered a censored value:

censored_text <- c(
  "Not Detected",
  "Non-Detect",
  "Non Detect",
  "Detected Not Quantified",
  "Below Quantification Limit"
)

wqpData_relevant <- wqpData |> 
  mutate(left_censored = grepl(paste(censored_text, collapse = "|"),
    Result_ResultDetectionCondition,
    ignore.case = TRUE
  )) |> 
  select(
    site_no = Location_Identifier,
    startDateTime = Activity_StartDateTime,
    parm_cd = USGSpcode,
    left_censored,
    result_va = Result_Measure,
    detection_level = DetectionLimit_MeasureA,
    dl_units = DetectionLimit_MeasureUnitA
  ) |> 
  arrange(startDateTime, parm_cd)

knitr::kable(head(wqpData_relevant))

NWIS codes

Another difference that is going to require some thoughtful decisions is how to interpret additional NWIS codes. They will now be descriptive text. Columns such as samp_type_cd and medium_cd will now all be reported with descriptive words rather than single letters or numbers. It will be the responsibility of the user to consider the best way to deal with these changes.

If you use the readNWISqw function, you WILL need to adjust your workflows, and you may find there are more codes you will need to account for. Hopefully this section helped get you started. It does not include every scenario, so you may find more columns or codes or other conditions you need to account for.

whatNWISdata

This function will continue to work for any service EXCEPT "qw" (water quality discrete data). "qw" results will eventually no longer be returned, and are currently showing values that were frozen in March 2024.

The function to replace this functionality for discrete water quality data is currently whatWQPdata:

whatNWIS <- whatNWISdata(
  siteNumber = site_ids,
  service = "qw"
)
WARNING: NWIS does not deliver
new discrete water quality data or updates to existing data. 
For additional details, see:
https://doi-usgs.github.io/dataRetrieval/articles/Status.html
whatWQP <- whatWQPdata(siteNumber = paste0("USGS-", site_ids))

There are some major differences in the output. The NWIS services offers back one row per site/parameter code to learn how many samples are available. This is not currently available from the Water Quality Portal, however there are new summary services being developed. When those become available, we will include new documentation on how to get this information.

readNWISdata

If you get your water quality data from the readNWISdata function, no new data will be available and the function will generate a warning message. The other services are working as before. This is not an especially common dataRetrieval workflow, so there are not a lot of details here. Please reach out if more information is needed to update your workflows.

See ?readWQPdata to see all the ways to query data in the Water Quality Portal. Use the suggestions above to convert the output of the readWQPdata function to convert the WQP output to what is important for your workflow.

WQX Conversion {#conversion}

A table is provided on the EPA website that shows the conversions of WQX3 to the WQX2 mappings. This table may change periodically while WQP services are under active development, which is expected to last through Fall 2024.

Here is an example for pulling the EPA table and comparing column names.

schema <- readr::read_csv("https://www.epa.gov/system/files/other-files/2024-07/schema_outbound_wqx3.0.csv")

Within that schema table, the column "FieldName3.0" shows the column names for the new WQX3 profile. There are several "FieldName2.0.XXXX" columns that show how the older 2.0 profiles align with the newer columns.

For example, the 2.0 "narrow" dataProfile match up with these new WQP3 columns:

sub_schema <- schema |> 
  select(WQX3 = FieldName3.0,
         WQX2 = FieldName2.0.Narrow) |> 
  filter(!is.na(WQX2))

knitr::kable(sub_schema)

Disclaimer

This information is preliminary and is subject to revision. It is being provided to meet the need for timely best science. The information is provided on the condition that neither the U.S. Geological Survey nor the U.S. Government may be held liable for any damages resulting from the authorized or unauthorized use of the information.



Try the dataRetrieval package in your browser

Any scripts or data that you put into this service are public.

dataRetrieval documentation built on Oct. 31, 2024, 9:07 a.m.