This vignette provides an introduction into the basic handling of the climex web app and how its run on your local computer.
The climex web app needs several files to be powered and all of them
will be stored in a folder defined in the option climex.path
, which
will point to ~/R/climex/ per default. If your are fine with this
choice, feel free to skip the remainder of this subsection.
A web app, like the shiny-based one provided by this package, needs loads of different files: .css files to shape and beautify its appearance, .js files enabling the app to perform logic and calculations within your browser to provide further customization and a more natural interaction, all the displayed images and gifs etc.
Fortunately, you don't have to care about all these files since the
wrapper function climex()
will take them from the package's
installation path and copy them all in the climex.path
folder the
app will be run in. The only thing you have to care about is where
all these things are copied to.
By default all resources will be stored in your home in a folder called ~/R/climex/. To change its path, add the following lines to the R configuration file ~/.Rprofile in your home.
## Replace PATH by the directory you want to store all files related ## to the climex app into. options( climex.path = "PATH" )
The web app provided in the climexUI package is meant to be a convenient interface to perform the extreme value analysis interactively on a large set of station data. But the main ingredient, the data itself, has to be provided by the user herself. If you haven't gathered lots of data yet, don't worry. You can use the dwd2r package to retrieve quite a number of climatological time series measured within Germany (and provided by the German weather service (DWD)).
The app itself should be started using the auxiliary function
climex()
and the data has to be provided using its
list.data.sources
input argument in a very particular format. The
basic time series can have two different formats. They must be either
of class xts or data.frames containing two columns called
value and date where the former is of class numeric and the
latter is of class Date. All these time series have to be elements
of a named list with the names corresponding to the ones of the
individual stations the data was measured at. From one or many of
those lists another named list will be constructed with its names
corresponding to the ones of the individual data sets (lists of time
series). This is, finally, the proper format of the
list.data.sources
input argument. The elements of the top level
(data sets) and second level lists (stations) can be selected using
two drop-down menus in the sidebar of the application. Examples for
well formatted second level lists can be obtained by running the
dwd.download()
function of the dwd2r package with the
batch.choices = c( 1, 1, 5, 1)
argument. The original intent was to
group different climatological quantities in the top level of the list
and the corresponding stations into the second level. But you can use
the first level of hierarchy of list.data.sources
as you like and
group the data in a way that fits you best. However, the second level
must be named according to the corresponding stations and the name of
the list elements containing the time series will be used to extract the
position data from the second input argument data.frame.positions
.
data.frame.positions
is an object linking the names of the stations
used in list.data.sources
with the corresponding geographical
positions and has to be, again, provided in a very particular
format. One can either use a data.frame or a
SpatialPointsDataFrame provided by the sp package. The former
one has to contain at least three columns named longitude,
latitude, and name with the first two being of class numeric
and the last of class character. The spatial object, on the other
hand, has to contain the corresponding names of the station in a
column called name of the data.frame in its data slot. An
example for such an object can be found in the station.positions
object created when invoking the dwd.download()
function of the
dwd2r package with the batch.choices = c( 1, 1, 5, 1 )
argument. Please note that the names are supposed to be unique and
that all time series in list.data.sources
who's names are not
present in data.frame.positions
won't be handled by the web app at
all.
It is strongly advised to start the web application provided in the
climexUI package using the climex()
auxiliary function and the
input arguments described in the previous section. It will ensure the
data is properly formatted, it will resort to some fallback data if
this is not the case, and will copy all required scripts and files to
the folder the application is started in.
library( climexUI ) climex()
Now, your R shell will open a server on your computer and the climex app will be started in a new tab of your default web browser (Firefox preferred. Don't feed the Google!). To stop the app just go to your R shell and press Ctrl-c two times in a row.
Internally, the climex()
function will run runApp()
of the
shiny package in the prepared folder. The climex.server()
and
climex.ui()
functions are taking care of the actual work. The former
one will load the a file called input.RData from the prerequisite
folder, which was written by the climex()
function.
This leaflet-based tab is the default entry point of the climex app and its main purpose is to provide a convenient way of navigation between the data of the different stations.
While working with tens or hundreds of time series from different geographical sites, it is always hard to keep their spatial relations, like the distance between the stations, in mind. To overcome this problem, a map-based selection interface was introduced in addition to the Station selector in the sidebar.
Apart from being a selection interface for the station data the Map tab features the following functions:
All the changes to the preprocessing options in the General tab will be considered while calculating the summary statistics and the return levels of all displayed stations.
NOTE: The map is based on my fork of the leaflet package instead of the original one since the developer seem to dislike my pull request for horizontal and sizable legends. Without this the legend of the colored return level markers would look just to weird and misplaced.
As described in a previous section, the available data is structured into two levels. The top level can be selected via the Data set drop-down menu and the corresponding station of the second level via the Station menu.
Right below all these drop-down menus there is a very important radio button: the Remove incomplete years or Declustering of the data button (depending on the choice of the extreme value distribution in the Options box in the General tab). This button will ensure your time series is getting cleaned and short-range correlations as well as artifacts are getting removed before the extreme value analysis does take place. So, only disable it if you really know what you are doing!
But apart from the input data provided to the climex()
function,
the web app is also able to draw and analyse random series distributed
according to the generalized extreme value (GEV) or generalize Pareto
(GP) distribution by selecting Artificial data in the Data set
drop-down menu. You can control the sampling process by adjusting the
Location, Scale, and Shape slider for the distribution's
parameters and the Length one for the number of points to be
sampled. Every time you update one of these sliders the time series is
getting redrawn. In addition, you can also use Draw button to
redraw it manually.
This tab provides both an interface to change the preprocessing options used throughout the whole application as well as a variety of tools to analyze an individual time series. Keep in mind: whenever you change an option within this tab it will affect the analysis performed in the entire app.
This box contains four different plots:
Via the settings in this box all the preprocessing options are controlled throughout the app.
This radio buttons control whether to fit the generalized extreme value (GEV) distribution or the generalized Pareto (GP) one.
Via this slider the extreme events will be obtained from your time series. Depending on the choice of limiting distribution, it will either determine the number of days of a time series constituting a block in the GEV approach or the height of the threshold above which all events will be considered extreme. The default value of one year for the block method is recommended especially since the user can change the deseasonalization method and may end up with seasonal correlations in the data.
Remember: The approximation of the histogram using both the GEV and the GP distribution is only valid for an asymptotic block length and threshold height. So don't pick too low values!
If set to Min, the GEV distribution will be fitted to the block minima and the GP distribution to all events below the specified threshold.
Using this selector you can choose if and how to get rid of the short-range correlations in your time series introduced by the annual cycle. Per default the anomalies will be calculated. But you can also pick other R-based implementations like stl, decompose, and the ds function from the deseasonalize package.
This table displays several details of the GEV fitting procedure:
For the distribution's parameter and the return level a small change while varying the preprocessing options does suggest the stability of the fitting procedure and thus a high quality of the time series. This will be indicated by a green color whereas large changes are indicated in red. For the nllh, AIC, and BIC the values should be as low as possible. That's why all decreases are marked green and all increases red.
To better review the influence of the individual parameter changes, not just the current results and statistics but also the ones from the three last fits (in the hist_1, hist_2, and hist_3 column) are contained in the table.
In the Pure tab you can view the raw and unprocessed time series. The Deseasonalized tab uses the former series and applies the function specified in the Deseasonalization method drop-down menu of the Options box to it. Both plots contain the extracted extreme events as additional orange points and are generated using the dygraphs library. That's why you are also able to zoom into them using your mouse!
The Remaining tab contains all the events in your time series, which got extracted in the blocking or thresholding. Those can be considered the extreme events of the series. Since it is often quite interesting to see what will happen to your fitted parameters when discarding individual events (like the most extreme ones), you can toggle all the points by clicking them or brush them using your mouse. When deactivated they do not contribute to the fitting anymore and the optimization is being redone instantly.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.