A little package for making new netcdfs as subsets of existing netcdfs
NetCDF is a commonly used format for storing scientific data. Several useful R packages already exist for reading, writing, and manipulating netcdf data, including ncdf4 and tidync. Unfortunately, none of them facilitate simple subsetting of netcdf files to produce new netcdf files. This is the job of subsetnc.
subsetnc features:
Possible use cases include:
devtools::install_github("markwh/subsetnc")
A ncdf4 object can be subset by either its dimension(s) or 1-dimensional
variables using logical statements similar to dplyr::filter()
. Any
number of substting statemenst can be combined, for example:
nc <- ncdf4::nc_open(ncfile)
ssnc <- nc %>%
nc_subset(var1 %in% 1:5,
var2 > var1 + 7,
var3 %in% (median(var3) + (-3:3)),
dim1 < 100,
filename = "newnc.nc")
The output of nc_subset()
is an ncdf4 object, which points to a newly
created on-disk netcdf file, specified by the filename
argument. If
filename
is omitted, the new netcdf is written to a tempfile()
that
will be deleted upon R session exit. The result can therefore be used
with any of the ncdf4::
functions.
The netcdf produced by nc_subset()
contains additional variables
corresponding to the dimension values in the original netcdf. Because
netcdf dimensions are often required to be valued 1:length(dim)
(unless they are specified with units, which in my experience is rare),
these new variables are required in order to keep track of dimension
indices–to facilitate joining to other datasets, for example. These
variables have names identical to their corresponding dimensions, with
two underscores appended to the end. For example, the variable
group1/dim1__
is created to keep track of the values of the
dimension group1/dim1
from the original netcdf.
An example netcdf dataset is included, showing node- and reach-level simulated SWOT data from the Sacramento River in California. This dataset includes node- and reach-level measurements of river height, width, slope, and other variables.
Because the example dataset contains
groups,
its variables and dimensions must be accessed in the
<group>/<variable>
format. Because of the /
, these variables and
dimensions must be enclosed in backticks (`) when subsetting.
library(ggplot2)
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
library(ncdf4)
library(subsetnc)
ncfile <- system.file("extdata", "sacriver.nc", package = "subsetnc", mustWork = TRUE)
orignc <- nc_open(ncfile)
plotvars <- c("latitude", "longitude", "height")
mapdf_orig <- paste0("nodes/", plotvars) %>%
lapply(ncvar_get, nc = orignc) %>%
lapply(as.vector) %>%
setNames(plotvars) %>%
as.data.frame()
mapdf_orig %>%
ggplot(aes(x = longitude, y = latitude, color = height)) +
geom_point(size = 4) +
coord_equal()
# Keep every 3rd node along the `nodes/nodes` dimension
ssnc_dim <- nc_subset(orignc, `nodes/nodes` %% 3 == 0)
mapdf_dimss <- paste0("nodes/", plotvars) %>%
lapply(ncvar_get, nc = ssnc_dim) %>%
lapply(as.vector) %>%
setNames(plotvars) %>%
as.data.frame()
mapdf_dimss %>%
ggplot(aes(x = longitude, y = latitude, color = height)) +
geom_point(size = 4) +
coord_equal()
ssnc_var <- nc_subset(orignc, `nodes/width` < median(`nodes/width`))
mapdf_varss <- paste0("nodes/", plotvars) %>%
lapply(ncvar_get, nc = ssnc_var) %>%
lapply(as.vector) %>%
setNames(plotvars) %>%
as.data.frame()
mapdf_varss %>%
ggplot(aes(x = longitude, y = latitude, color = height)) +
geom_point(size = 4) +
coord_equal()
ssnc_vardim <- nc_subset(orignc,
`nodes/width` < median(`nodes/width`),
`nodes/height` > 8,
`nodes/nodes` %% 3 == 0)
mapdf_vardimss <- paste0("nodes/", plotvars) %>%
lapply(ncvar_get, nc = ssnc_vardim) %>%
lapply(as.vector) %>%
setNames(plotvars) %>%
as.data.frame()
mapdf_vardimss %>%
ggplot(aes(x = longitude, y = latitude, color = height)) +
geom_point(size = 4) +
coord_equal()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.