library(dplyr)
library(ggplot2)

Introduction

The waterinfo.be API uses a system of identifiers, called ts_id to define individual time series. For example, the identifier ts_id = 78073042 corresponds to the time series of air pressure data for the measurement station in Liedekerke, with a 15 min time resolution. Hence, the ts_id identifier defines a variable of interest from a measurement station of interest with a specific frequency (e.g. 15 min, hourly,...). The knowledge of the proper identifier is essential to be able to download the corresponding data.

library(wateRinfo)

Download with known ts identifier

In case you already know the ts_id identifier that defines your time serie, the package provides the function get_timeseries_tsid() to download a specific period of the time series.

As an example, to download the air pressure time series data of Liedekerke with a 15 min resolution (ts_id = 78073042) for the first of January 2016:

my_data <- get_timeseries_tsid("78073042", from = "2016-01-01", to = "2016-01-02")
knitr::kable(head(my_data), align = "lcc")

For more information on defining the date period of the download, see this vignette. Let's have a visual check of our data, using the ggplot2 package:

ggplot(my_data, aes(Timestamp, Value)) + 
    geom_line()

As such, knowing the identifier is the most straightforward way of downloading a time series. In order to find these ts_id identifier, the package supports looking for identifiers based on a supported variable name (limited set of supported variables by VMM) or looking for identifiers by checking all variables for an individual station. These methods are explained in the next sections.

Search identifier based on variable name

For a number of variables, the documentation of the waterinfo.be API provides a direct overview option of all available VMM measurement stations, using the so-called Timeseriesgroup_id. For these variables, the package provides the function get_stations() to download an overview of available measurement stations and the related ts_id identifiers. The latter can be used to download the time series.

get_stations("air_pressure")

By default, the expected frequency is the 15 min frequency of the time series. However, for some of the variables, multiple frequencies are supported by the API. The package provides a check on the supported variables and frequencies. An overview of the currently supported variables can be requested with the command supported_variables() (either in Dutch, nl, or in English, en). Actually, more variables are available with the API (see next section), but for each of these variables the get_stations() function is supported (i.e. the Timeseriesgroup_id is documented by VMM).

supported_variables("en") %>% 
    as.list()

To check which predefined frequencies are provided by the waterinfo.be API for a given variable, the supported_frequencies() function is available:

supported_frequencies(variable_name = "air_pressure")

Hence, for air pressure data, only the 15 min resolution is supported. Compared to evaporation derived by the Monteith equation:

supported_frequencies(variable_name = "evaporation_monteith")

Multiple resolutions are available. Using the coarser time resolutions can be helpful when you want to download longer time series while keeping the number of records to download low (if the frequency would be sufficient for your analysis):

stations <- get_stations("evaporation_monteith", frequency = "year")
subset_of_columns <- stations %>% select(ts_id, station_no, station_name, 
                                         parametertype_name, ts_unitsymbol)
knitr::kable(subset_of_columns)

When interested in the data of Herentals_ME, we can use the corresponding ts_id to download the time series of PET with a yearly frequency and make a plot with ggplot:

pet_yearly <- get_timeseries_tsid("94526042", period = "P10Y")
pet_yearly %>% 
    na.omit() %>%
    ggplot(aes(Timestamp, Value)) + 
    geom_bar(stat = "identity") +
    scale_x_datetime(date_labels = "%Y", date_breaks = "1 year") + 
    xlab("") + ylab("PET Herentals (mm)")

See this vignette to understand the period = "P10Y" format.

Remark: the get_stations() function only works for those measurement stations belonging to the VMM meetnet (network), related to the so-called datasource = 1. For other networks, i.e. datasource = 4, the enlisting is not supported. Still, a search for data is provided starting from a given station name, as explained in the next section.

Search identifier based on station name

In addition to the option to check the measurement stations that can provide data for a given variable, the package provides the function get_variables() to get an overview of the available variables for a given station, using the station_no. The advantage compared to the ts_id is that these station_no names are provided by the waterinfo.be website itself when exploring the data. When clicking on a measurement station on the map and checking the time series graph, the station_no is provided in the upper left corner in between brackets.

Waterinfo.be example printscreen of time series

So, for the example in the figure, i.e. station_no = zes42a-1066, the available time series are retrieved by using the get_variables() command:

available_variables <- get_variables("zes42a-1066")
available_variables %>% select(ts_id, station_name, ts_name, 
                               parametertype_name)

The available number of variables depends on the measurement station. The representation is not standardized and also depends on the type of meetnet. Nevertheless, one can derive the required ts_id from the list when interpreting the field names. Remark that the datasource can be 4 instead of 1 for specific meetnetten (networks). The datasource to use is printed when asking the variables for a station.

In order to download the 10 min time series water level data for the station in Sint-Amands tij/Zeeschelde, the ts_id = 55419010 can be used in the get_timeseries_tsid() function, taking into account the datasource = 4 (default is 1):

tide_stamands <- get_timeseries_tsid("55419010", 
                                     from = "2017-06-01", to = "2017-06-05",
                                     datasource = 4)
ggplot(tide_stamands, aes(Timestamp, Value)) + 
    geom_line() + xlab("") + ylab("waterlevel")

For some measurement stations, the number of variables can be high (lots of precalculated derivative values) and extracting the required time series identifier is not always straightforward. For example, the dat Etikhove/Schuif/Nederaalbeek (K06_221), provides the following number of variables:

available_variables <- get_variables("K06_221")
nrow(available_variables)

As the measured variables at a small time resolution are of most interest, filtering on P.15 (or P.1, P.60,...) will help to identify measured time series, for those stations belonging to the meetnet of VMM (datasource = 1):

available_variables <- get_variables("K06_221")
available_variables %>% 
    filter(ts_name == "P.15")

Loading and visualizing the last day (period P1D) of available data for the water level downstream (Hafw):

afw_etikhove <- get_timeseries_tsid("22302042", 
                                    period = "P1D",
                                    datasource = 1) # 1 is default

ggplot(afw_etikhove, aes(Timestamp, Value)) + 
    geom_line() + xlab("") + ylab("Volume")

We can do similar filtering to check for time series on other stations, for example the Molenbeek in Etikhove:

available_variables <- get_variables("L06_347")
available_variables %>%
    filter(ts_name == "P.15")

And use the ts_id code representing discharge to create a plot of the discharge during a storm in 2010:

etikhove <- get_timeseries_tsid("276494042", 
                                from = "2010-11-09", to = "2010-11-16")

ggplot(etikhove, aes(Timestamp, Value)) + 
    geom_line() + xlab("") + ylab("Q (m3/s)")

Check the URL used to request the data

As each of these data requests is actually a call to the waterinfo.be API, the call can be tested in a browser as well. To retrieve the URL used to request certain data (using get_variables(), get_stations() or get_timeseries_tsid()), check the comment() attribute of the returned data.frame, for example:

air_stations <- get_stations("air_pressure")
comment(air_stations)

or

etikhove <- get_timeseries_tsid("276494042", 
                                from = "2010-11-09", to = "2010-11-16")
comment(etikhove)


inbo/wateRinfo documentation built on July 12, 2022, 11:29 a.m.