This vignette provides an introduction into the basic handling of the climex web app and how its run on your local computer.

Prerequisites

Installation path

The climex web app needs several files to be powered and all of them will be stored in a folder defined in the option climex.path, which will point to ~/R/climex/ per default. If your are fine with this choice, feel free to skip the remainder of this subsection.

A web app, like the shiny-based one provided by this package, needs loads of different files: .css files to shape and beautify its appearance, .js files enabling the app to perform logic and calculations within your browser to provide further customization and a more natural interaction, all the displayed images and gifs etc.

Fortunately, you don't have to care about all these files since the wrapper function climex() will take them from the package's installation path and copy them all in the climex.path folder the app will be run in. The only thing you have to care about is where all these things are copied to.

By default all resources will be stored in your home in a folder called ~/R/climex/. To change its path, add the following lines to the R configuration file ~/.Rprofile in your home.

## Replace PATH by the directory you want to store all files related
## to the climex app into.
options( climex.path = "PATH" )

Input data

The web app provided in the climexUI package is meant to be a convenient interface to perform the extreme value analysis interactively on a large set of station data. But the main ingredient, the data itself, has to be provided by the user herself. If you haven't gathered lots of data yet, don't worry. You can use the dwd2r package to retrieve quite a number of climatological time series measured within Germany (and provided by the German weather service (DWD)).

The app itself should be started using the auxiliary function climex() and the data has to be provided using its list.data.sources input argument in a very particular format. The basic time series can have two different formats. They must be either of class xts or data.frames containing two columns called value and date where the former is of class numeric and the latter is of class Date. All these time series have to be elements of a named list with the names corresponding to the ones of the individual stations the data was measured at. From one or many of those lists another named list will be constructed with its names corresponding to the ones of the individual data sets (lists of time series). This is, finally, the proper format of the list.data.sources input argument. The elements of the top level (data sets) and second level lists (stations) can be selected using two drop-down menus in the sidebar of the application. Examples for well formatted second level lists can be obtained by running the dwd.download() function of the dwd2r package with the batch.choices = c( 1, 1, 5, 1) argument. The original intent was to group different climatological quantities in the top level of the list and the corresponding stations into the second level. But you can use the first level of hierarchy of list.data.sources as you like and group the data in a way that fits you best. However, the second level must be named according to the corresponding stations and the name of the list elements containing the time series will be used to extract the position data from the second input argument data.frame.positions.

data.frame.positions is an object linking the names of the stations used in list.data.sources with the corresponding geographical positions and has to be, again, provided in a very particular format. One can either use a data.frame or a SpatialPointsDataFrame provided by the sp package. The former one has to contain at least three columns named longitude, latitude, and name with the first two being of class numeric and the last of class character. The spatial object, on the other hand, has to contain the corresponding names of the station in a column called name of the data.frame in its data slot. An example for such an object can be found in the station.positions object created when invoking the dwd.download() function of the dwd2r package with the batch.choices = c( 1, 1, 5, 1 ) argument. Please note that the names are supposed to be unique and that all time series in list.data.sources who's names are not present in data.frame.positions won't be handled by the web app at all.

Starting the climex web application

It is strongly advised to start the web application provided in the climexUI package using the climex() auxiliary function and the input arguments described in the previous section. It will ensure the data is properly formatted, it will resort to some fallback data if this is not the case, and will copy all required scripts and files to the folder the application is started in.

library( climexUI )
climex()

Now, your R shell will open a server on your computer and the climex app will be started in a new tab of your default web browser (Firefox preferred. Don't feed the Google!). To stop the app just go to your R shell and press Ctrl-c two times in a row.

Internally, the climex() function will run runApp() of the shiny package in the prepared folder. The climex.server() and climex.ui() functions are taking care of the actual work. The former one will load the a file called input.RData from the prerequisite folder, which was written by the climex() function.

Map tab

This leaflet-based tab is the default entry point of the climex app and its main purpose is to provide a convenient way of navigation between the data of the different stations.

While working with tens or hundreds of time series from different geographical sites, it is always hard to keep their spatial relations, like the distance between the stations, in mind. To overcome this problem, a map-based selection interface was introduced in addition to the Station selector in the sidebar.

Apart from being a selection interface for the station data the Map tab features the following functions:

All the changes to the preprocessing options in the General tab will be considered while calculating the summary statistics and the return levels of all displayed stations.

NOTE: The map is based on my fork of the leaflet package instead of the original one since the developer seem to dislike my pull request for horizontal and sizable legends. Without this the legend of the colored return level markers would look just to weird and misplaced.

The sidebar

As described in a previous section, the available data is structured into two levels. The top level can be selected via the Data set drop-down menu and the corresponding station of the second level via the Station menu.

Right below all these drop-down menus there is a very important radio button: the Remove incomplete years or Declustering of the data button (depending on the choice of the extreme value distribution in the Options box in the General tab). This button will ensure your time series is getting cleaned and short-range correlations as well as artifacts are getting removed before the extreme value analysis does take place. So, only disable it if you really know what you are doing!

But apart from the input data provided to the climex() function, the web app is also able to draw and analyse random series distributed according to the generalized extreme value (GEV) or generalize Pareto (GP) distribution by selecting Artificial data in the Data set drop-down menu. You can control the sampling process by adjusting the Location, Scale, and Shape slider for the distribution's parameters and the Length one for the number of points to be sampled. Every time you update one of these sliders the time series is getting redrawn. In addition, you can also use Draw button to redraw it manually.

General tab

This tab provides both an interface to change the preprocessing options used throughout the whole application as well as a variety of tools to analyze an individual time series. Keep in mind: whenever you change an option within this tab it will affect the analysis performed in the entire app.

GEV fit box

This box contains four different plots:

Options box

Via the settings in this box all the preprocessing options are controlled throughout the app.

Limiting distribution

This radio buttons control whether to fit the generalized extreme value (GEV) distribution or the generalized Pareto (GP) one.

Box length in days | Threshold

Via this slider the extreme events will be obtained from your time series. Depending on the choice of limiting distribution, it will either determine the number of days of a time series constituting a block in the GEV approach or the height of the threshold above which all events will be considered extreme. The default value of one year for the block method is recommended especially since the user can change the deseasonalization method and may end up with seasonal correlations in the data.

Remember: The approximation of the histogram using both the GEV and the GP distribution is only valid for an asymptotic block length and threshold height. So don't pick too low values!

Type of extreme

If set to Min, the GEV distribution will be fitted to the block minima and the GP distribution to all events below the specified threshold.

Deseasonalization method

Using this selector you can choose if and how to get rid of the short-range correlations in your time series introduced by the annual cycle. Per default the anomalies will be calculated. But you can also pick other R-based implementations like stl, decompose, and the ds function from the deseasonalize package.

Results box

This table displays several details of the GEV fitting procedure:

For the distribution's parameter and the return level a small change while varying the preprocessing options does suggest the stability of the fitting procedure and thus a high quality of the time series. This will be indicated by a green color whereas large changes are indicated in red. For the nllh, AIC, and BIC the values should be as low as possible. That's why all decreases are marked green and all increases red.

To better review the influence of the individual parameter changes, not just the current results and statistics but also the ones from the three last fits (in the hist_1, hist_2, and hist_3 column) are contained in the table.

Time series box

In the Pure tab you can view the raw and unprocessed time series. The Deseasonalized tab uses the former series and applies the function specified in the Deseasonalization method drop-down menu of the Options box to it. Both plots contain the extracted extreme events as additional orange points and are generated using the dygraphs library. That's why you are also able to zoom into them using your mouse!

The Remaining tab contains all the events in your time series, which got extracted in the blocking or thresholding. Those can be considered the extreme events of the series. Since it is often quite interesting to see what will happen to your fitted parameters when discarding individual events (like the most extreme ones), you can toggle all the points by clicking them or brush them using your mouse. When deactivated they do not contribute to the fitting anymore and the optimization is being redone instantly.



theGreatWhiteShark/climexUI documentation built on May 22, 2019, 2:25 p.m.