Writing ensemble series with the 'efts' R package

knitr::opts_chunk$set(
  comment = "#>",
  error = FALSE,
  tidy = FALSE,
  out.extra='style="display:block; margin: auto"', 
  fig.align="center", 
  fig.width=6, 
  fig.height=6, 
  dev='png'
)

Overview

efts is an R package to access ensemble forecast time series stored (EFTS) in netCDF format. It offers convenient functions to access time series, hiding the bug-prone details of netCDF array manipulations.

EFTS netCDF data sets follow the schema described at this location at the time of writing.

The package comes with API documentation, as well as the present vignette. You can access both navigating via

?efts

Creating EFTS netCDF files

Perhaps as an unusual introduction, we will learn to write a EFTS netCDF data set prior to reading it.

Typical EFTS data is multidimensional, and the netCDF schema reflects this. To create a new data set, you need to define these dimensions. To create the information about the main time axis, you can use the create_time_info helper function in the package. The following command creates an hourly data set over two days

library(efts)
timeAxisStart <- ISOdate(year=2010, month=08, day=01, hour = 12, min = 0, sec = 0, tz = "UTC")
timeDimInfo <- create_time_info(from=timeAxisStart, n=48, time_step='hours since', time_step_delta=1L)
str(timeDimInfo)

Other dimensions are easier to create. Variables are more involved; one way to define several variables is via a data frame. For instance, you can start from a stub such as created by the following:

variable_names <- c('rain_sim','pet_sim')
varDef <- create_variable_definition_dataframe(variable_names=variable_names, long_names = rep('synthetic data', 2), dimensions=4L)
str(varDef)

Do note the column names of the data frame varDef; the package is picky about these, to comply with the EFTS netCDF data schema.

Let's create a data set with 2 point stations, 3 ensembles and four time step forecast lead.

stationIds <- c(123,456)
nEns <- 3
nLead <- 4
fname <- tempfile() # or something you prefer.

It is mandatory to provide some global file attributes, the function create_global_attributes provides a starting point.

global_attr <- create_global_attributes(
  title="data set title", 
  institution="my org", 
  catchment="My_Catchment", 
  source="A journal reference, URL", 
  comment="example for vignette")

Similarly for optional geographic metadata variables:

(opt_metadatavars <- default_optional_variable_definitions_v2_0())

Similarly for optional geographic metadata variables:

snc <- create_efts(
  fname=fname,
  data_var_definitions=varDef,
  optional_vars=opt_metadatavars,
  time_dim_info=timeDimInfo,
  stations_ids=stationIds,
  station_names=NULL,
  nc_attributes=global_attr,
  lead_length=nLead,
  ensemble_length=nEns)

The default print method for this object snc is the same output as objects of class ncdf4

snc

snc is a type of object that few R aficionados are aware of; this is a reference class. Without entering into unnecessary technical details, this is mostly a design choice done to achieve better memory usage and performance in some contexts.

The following command displays the main characteristics of this reference object, of class EftsDataSet Note that you should really use the methods, and not access directly the fields.

str(snc, max.level=2)

You can get to the documentation page for the methods in this class with:

?efts::EftsDataSet

Our EFTS data set object is ready to be populated with data. Let's create synthetic data. The basic idea of the object's methods is to offer intuitive and concise means to get/set ensemble of forecast time series. A method can be called in a syntax that may be unfamiliar to most R users, but similar to most object oriented languages: theObject$theMethodName(someArgumentsIfAny), for instance snc$get_time_dim() in our EFTS data set, to retrieve its time axis.

TODO: test, document and demonstrate missing value handling.

set.seed(42)
td <- snc$get_time_dim()
for (i in 1:length(td)) {
    for (station in stationIds) {
        rain <- 6 * rnorm(nEns*nLead)
        rain <- matrix(pmax(as.numeric(rain), 0), nrow=nLead) # nEns replicates of a forecast of length nLead
        pet <- 6.0 + rnorm(nEns*nLead)
        pet <- matrix(pmax(as.numeric(pet), 0), nrow=nLead)
        dtime = td[i]
        snc$put_ensemble_forecasts(rain, variable_name = variable_names[1], identifier = station, start_time = dtime) 
        snc$put_ensemble_forecasts(pet,  variable_name = variable_names[2], identifier = station, start_time = dtime) 
    }
}

Reading data

Now we can demonstrate how to retrieve data. If you had closed the previous EFTS data set object snc and deleted the variable, you'd reopen it for reading with the following command:

if (!exists('snc')) snc <- open_efts(fname)

You get the ensemble forecast for a variable (pet in this case) for a point in time with the following command

td <- snc$get_time_dim()
timeStamp <- td[5] # for instance
d <- snc$get_ensemble_forecasts(variable_names[2], stationIds[1], start_time=timeStamp)
str(d)

The object returned is of class xts. The package xts has been chosen as the default time series structure for efts. The rationale is empirical: previous experience with the xts in conjunction with plyr (note to self: possibly dplyr in the future) showed good performance to calculate hydrologic statistics on ensemble of time series.

To quickly visualize it (if the data is not too big), using the function plot.zoo has the advantage of stacking the series:

zoo::plot.zoo(d)

As of version 0.5-1, there is a convenience function to retrieve the UTC offset of the units of time dimension. Here there is none, hence a duration of zero returned:

paste0( "UTC offset as a string: ", snc$get_utc_offset())
paste0( "UTC offset as a difftime: ", snc$get_utc_offset(as_string=FALSE))

Close the data set with the following commands:

snc$close()
rm(snc)

This vignette creates by default a temporary file. This shall be cleaned up by default with:

if(file.exists(fname)) file.remove(fname)


Try the efts package in your browser

Any scripts or data that you put into this service are public.

efts documentation built on May 2, 2019, 3:39 p.m.