knitr::opts_chunk$set(echo = FALSE, warning = FALSE, fig.width = 7) Sys.setlocale(locale="english")
This package gathers different tools for managing aerobiological databases, elaborating the main calculations and visualization of results. In a first step, data may be checked using tools for quality control and all missing gaps can be completed. Then, the main parameters of the pollen season can be calculated and represented graphically. Multiple graphical tools are available: pollen calendars, phenological plots, time series, tendencies, interactive plots, abundance plots...
The first thing you have to do is to install from CRAN repository and to load the package AeRobiology.
library ("knitr")
library ("AeRobiology")
install.packages("AeRobiology") library (AeRobiology)
During this tutorial we are going to use the pollen data from Munich which are integrated in the package. This will allow you to follow the tutorial obtaining the same results. If you want to follow the tutorial by using your own data, check the next section.
munich_pollen
is a data set containing information of daily concentrations of pollen in the atmosphere of Munich during the years 2010_2015. Pollen types included: "Alnus", "Betula", "Taxus", "Fraxinus", "Poaceae", "Quercus", "Ulmus" and "Urtica".
Data were obtained at Munich (Zentrum Allergie und Umwelt, ZAUM) using a Hirst_type volumetric pollen trap (Hirst, 1952) following the standard methodology (VDI4252_4_2016). Some gaps have been added to test some functions of the package (e.g. quality_control(), interpollen()).
The data were obtained by the research team of Prof. Jeroen Buters (Christine Weil & Ingrid Weichenmeier). At the Zentrum Allergie und Umwelt (ZAUM, directed by Prof. Carsten B. Schmidt_Weber).
You can load the data in your working environment by typing:
data("munich_pollen")
This section has been developed for people who are not familiar with R. If you have experience in loading databases in R, you might not be interested in this section and you can skip it.
There are several functions to import data from Excel. Arbitrarily, we are going to use the package readxl.
install.packages("readxl") library (readxl)
Now we can import the data with the function read_xlsx(). I strongly recommend you to have your Excel file in the same folder in which you are working with R. This will avoid long paths to your files and also will prevent broken paths when some files have been moved. If you are working with an old database, you might be interested in the read_xls() function.
Mydata<-read_xlsx("C:/Users/Antonio/Desktop/Prueba Markdown/mydata.xlsx")
You can also select the sheet you want to import:
Mydata<-read_xlsx("C:/Users/Antonio/Desktop/Prueba Markdown/mydata.xlsx", sheet=2)
Now you should have loaded your data in the object Mydata
. Let's check if there is any mistake:
str(munich_pollen)
Note: I'm using munich_pollen
database as example but you should use Mydata
.
As you can see, the object is a data.frame
, the first column is "Date" type and the rest are "num", which means numeric. This is the format you database must have. Normally it is automatically recognized by the import function, but if not, check the functions: as.Date(), as.numeric() and as.data.frame().
**Your database must have this format to start working with the package**. The functions are designed to warn you in case the format is not correct but there may be some exceptions. **This is the most important step when using the package and some strange errors reported by the functions can be solved by doing this**. Sometimes the date column has 2 different types simultaneously ("POSIXlt" and "POSIXct") and it might cause mistakes. You can use one or another, but not both. Furthermore, it is strongly recommended to use "Date" format instead of "POSIXlt" or "POSIXct" despite theoretically there may not be problems by using them.
**Please, don`t exasperate**. This is the **slowest step of using the package and the most important**. Once you have your data loaded, you only have to spend a few minutes in each function to have your results. If you have some unsolving problems, search on the internet. Some other people must have had your same problem and it should be solved in some forums. If not, write us and we will help you as soon as possible: aerobiology.package@gmail.com
If you want to import your data from **csv files**, you might be interested in **read.csv()** function.
This function was designed to check the quality of an historical database of several pollen types. Since many of the quality requirements depend on the criteria selected to establish the main pollen season, this function has the calculate_ps() function integrated. You can insert arguments of calculate_ps() in this function to select the method you want for calculating the main pollen season. calculate_ps() details can be consulted later in this same manuscript.
If **result = "plot"**, this function returns a **graphical resume** of the quality of the database marking with different red intensities the "weak-points" of it, and if **result = "table"** a `data.frame` with the detailed reasons of their "weakness".
It establishes the quality according to 5 different criteria:
It checks if the main pollen season cannot be calculated according to the minimal requirements: lack of data for this pollen type and year. This criteria appears as "Complete" in the generated data.frame.
If the start, end or peak dates of the main pollen season have been interpolated (or a date near them specified by int.window
argument). It is based on the following premise: If this day or a nearby date is missing/interpolated, the selected date as start/peak/end might not be the real one. These criteria appear as "Start", "Peak" and "End" in the generated data.frame.
The percentage of missing data within the main pollen season. It calculates the number of days which have been interpolated by interpollen() function and their percentage of the main pollen season. If a high percentage of the main pollen season has been interpolated, the information of this season might not be reliable. The maximal percentage allowed can be specified by the perc.miss
argument. This filter appears as "Comp.MPS" in the generated data.frame.
When running the function, it gives you a graphical resume with different color intensity depending on the risk of including each pollen/spore type for a concrete year and a data.frame with detailed information about the reasons of this evaluation.
QualityControl<-quality_control(munich_pollen, result = "table")
If the filter has been successfully passed, it appears as TRUE
in the generated data.frame
. If not, it appears as FALSE
.
head(QualityControl)
quality_control(munich_pollen, result = "plot")
There are lots of arguments for this function. You can consult them in the "help" section of the function in R or in the official document of the package.
Let's suppose we want to check the quality of our database, but we want to calculate it for a 80% defined Main Pollen Season (arguments ps.method
and perc
). We don't want to take into account years with a missing data less than 4 days away from the start, peak or end dates (argument int.window
). Furthermore, we only want to exclude years with more than 50% of interpolated data within the main pollen season (argument perc.miss
):
quality_control(munich_pollen, int.window = 4, perc.miss = 50, ps.method = "percentage", perc = 80)
As you would have noticed, there are several differences between this result and the previous one.
The function interpollen() was designed to complete all the missing gaps within a database in just one step. All the gaps of each pollen/spore type are completed simultaneously. There are different methods to do so which may be more or less appropriate according to the particularities of your database. The arguments are:
data.frame
object. This data.frame should include a first column in "Date" format and the rest of the columns in "numeric" format belonging to each pollen/spore type by column.character
string specifying the method applied to calculate and generate the pollen missing data. The implemented methods that can be used are: "lineal", "movingmean", "spline", "tseries" or "neighbour". The method argument will be "lineal" by default.numeric (integer)
value specifying the maximum number of consecutive days with missing data that the algorithm is going to interpolate. If the gap is bigger than the argument value, the gap will not be interpolated. Not valid with "tseries" method. The maxdays argument will be 30 by default. This argument might be very interesting to avoid long interpolated gaps.logical (integer)
argument. If TRUE
, graphical previews of the input database will be plot at the end of the interpolation process. All the interpolated gaps will be marked in red. The plot
argument will be TRUE
by default.Interpolated<-interpollen(munich_pollen[,c(1,6)], method="lineal", plot = TRUE, result = "long")
As you can see above, when result = "long", the missing gaps which have been completed by the algorithm are marked in red. It displays similar graphs for each pollen/spore type within your database. Furthermore, the "Interpolated" data.frame is produced as "long" with another new column indicating if each data has been interpolated or not (1 means interpolated and 0 means original data).
head(Interpolated)
If you want to integrate all the new data in your database, you can assign the function to a new object and specify the argument result = "wide":
CompleteData<-interpollen(munich_pollen[,c(1,6)], method="lineal", plot = FALSE, result = "wide")
Now your database without gaps appears as "CompleteData".
As you might have noticed, if you add plot=FALSE
, the graphs are not shown.
You don't have to do this before applying another function of this package. "interpollen" function has been integrated in the main functions of the package so you can choose whether you want your data to be completed or not by a specific argument in each function.E.g.:
calculate_ps(munich_pollen, method="percentage", interpolation=TRUE, int.method = "lineal", plot = F)
In this function there are 5 different methods to interpolate missing gaps. Each method can be more appropriate than the others under some specific circumstances. Interpolated data are not real data, but including such estimations can reduce the error of some calculations such as the main pollen season by percentage or a pollen calendar, i.e.: if you keep the gap, these days would count as 0 for the calculations and this would suppose a bigger error than estimating pollen concentrations for these days. Obviously, this will also suppose a bigger error than having the real data.
As mentioned above, there are many methods to interpolate your missing data. The simplest one was the showed in the previous example: "lineal". It traces a straight line between each extreme of the gap. The pollen/spore concentration of the missing days is calculated according to this line. This method may be appropriate for dates without pollen/spores or for small gaps.
Nevertheless, there are other method which can be more effective during the pollen season.
This method calculates the moving mean of the pollen daily concentrations. The gaps are fulfilled by calculating the moving mean of the concentrations for this particular day. The window of the moving mean is centered on this day and its size is the result of multiplicating the gap size by the factor
argument. E.g.: If you have a gap of 3 days and your factor is 2, the gaps are replaced by the value of a moving mean of 6 days (3 x 2) centered in this particular day and not taking into account the missing days for the mean value. Furthermore, it is a dynamic function: for each gap of the database the window size of the moving mean changes depending of each gap size.
i<-interpollen(munich_pollen[,c(1,6)], method="movingmean", factor = 2, plot = TRUE)
This method carries out the interpolation by tracing a polinomic function to link both extremes of the gap. The polinomic function is chosen according to the best fitting equation to the previous and posterior days of the gap (second, third, fourth degree...). The number of days of each side of the gap which will be taken into account for calculating the spline regression are specified by ndays
argument. The smoothness of the adjustment can be specified by the spar
argument.
i2<-interpollen(munich_pollen[,c(1,6)], method="spline", ndays=3, spar=0.7, plot = FALSE)
Note: By changing the ndays
and spar
argument, very different results can be obtained. E.g.:
i3<-interpollen(munich_pollen[,c(1,6)], method="spline", ndays=5, spar=0.2, plot = TRUE)
If you have long time series of data, this might be the most suitable method for you. This method analyses the time series of the pollen/spore database and performs a seasonal-trend decomposition based on LOESS (Cleveland et al., 1990). It extracts the seasonality of the historical database and uses it to predict the missing data by performing a linear regression with the target year.
i4<-interpollen(munich_pollen, method="tseries", plot = TRUE)
Seasonality is represented in grey, real data in black and interpolated data in red. This method improves with long time series of years. Strange results may be caused by lack of years or small sampling periods.
Other near stations provided by the user are used to interpolate the missing data of the target station. First of all, a Spearman correlation is performed between the target station and the neighbour stations to discard the neighbour stations with a correlation coefficient smaller than mincorr
value. For each gap, a linear regression is performed between the neighbour stations and the target stations to determine the equation which converts the pollen concentrations of the neighbour stations into the pollen concentration of the target station. Only neighbour stations without any missing data during the gap period are taken into account for each gap.
You can include 4 different databases of neighbour stations with the arguments **`data2`, `data3`, `data4` and `data5`.** **The format of these databases must be the same than the original one and the name of the pollen/spore types must be exactly the same.**
With the mincorr
argument you can specify the minimal correlation coefficient (Spearman correlation) that the neighbour stations must have with the target station to be taken into account for a concrete gap. This process is completely independent for each gap of each pollen type.
The function calculate_ps() is one of the core functions of the AeRobiology package. It was designed to calculate the main parameters of the pollen season with regard to phenology and pollen intensity from a historical database of several pollen types. The function can use the most common methods to define the main pollen season.
This function has several arguments, but don't worry, we are going to perform examples later. The main arguments are:
data.frame
object including the general database where calculation of the pollen season must be applied. This data.frame must include a first column in "Date" format and the rest of columns in "numeric" format belonging to each pollen type by column.character
string specifying the method applied to calculate the pollen season and the main parameters. The implemented methods that can be used are: "percentage", "logistic", "moving", "clinical" or "grains". A more detailed information about the different methods for defining the pollen season may be consulted here) or later in this same document.numeric
value. The number of days whose pollen concentration is bigger than this threshold is calculated for each year and pollen type. This value will be obtained in the results of the function. The th.day
argument will be 100 by default.numeric
value ranging 0_100. This argument is valid only for method = "percentage"
. This value represents the percentage of the total annual pollen included in the pollen season, removing (100_perc)/2% of the total pollen before and after of the pollen season. The perc
argument will be 95 by default.character
string specifying the method for selecting the best annual period to calculate the pollen season. The pollen seasons may occur within the natural year between two years. The implemented options that can be used are: "natural", "interannual" or "peak". The def.season
argument will be "natural" by default. A more detailed information about the different methods for selecting the best annual period to calculate the pollen season may be consulted here.logical
value. This argument is valid only for the "logistic" method. If FALSE
the reduction of the pollen data is not applicable. If TRUE
a reduction of the peaks above a certain level (red.level
argument) will be carried out before the definition of the pollen season. The reduction argument will be FALSE
by default. A more detailed information about the reduction process may be consulted here.numeric
value ranging 0_1 specifying the percentile used as level to reduce the peaks of the pollen series before the definition of the pollen season. This argument is valid only for the "logistic" method. The red.level
argument will be 0.90 by default, specifying the percentile 90.numeric (integer)
value belonging to options of 4, 5 or 6 specifying the derivative that will be applied to calculate the asymptotes which determines the pollen season using the "logistic" method. This argument is valid only for the "logistic" method. The derivative
argument will be 5 by default. More information may be consulted here.numeric (integer)
value specifying the order of the moving average applied to calculate the pollen season using the "moving" method. This argument is valid only for the "moving" method. The man
argument will be 11 by default.numeric
value specifying the threshold used for the "moving" method for defining the beginning and the end of the pollen season. This argument is valid only for the "moving" method. The th.ma
argument will be 5 by default.numeric (integer)
value specifying the number of days which must exceed a given threshold (th.pollen argument) for defining the beginning and the end of the pollen season. This argument is valid only for the "clinical" method. The n.clinical
argument will be 5 by default.numeric (integer)
value specifying the time window during which the conditions must be evaluated for defining the beginning and the end of the pollen season using the "clinical" method. This argument is valid only for the "clinical" method. The window.clinical
argument will be 7 by default.numeric (integer)
value specifying the time window during which the conditions must be evaluated for defining the beginning and the end of the pollen season using the "grains" method. This argument is valid only for the "grains" method. The window.grains
argument will be 5 by default.numeric
value specifying the threshold that must be exceeded during a given number of days (n.clinical or window.grains arguments) for defining the beginning and the end of the pollen season using the "clinical" or "grains" methods. This argument is valid only for the "clinical" or "grains" methods. The th.pollen
argument will be 10 by default.numeric
value specifying the pollen threshold that must be exceeded by the sum of daily pollen during a given number of days (n.clinical
argument) exceeding a given daily threshold (th.pollen
argument) for defining the beginning and the end of the pollen season using the "clinical" method. This argument is valid only for the "clinical" method. The th.sum
argument will be 100 by default.character
string specifying the parameters considered according to a specific pollen type for calculating the pollen season using the "clinical" method. The implemented pollen types that may be used are: "birch", "grasses", "cypress", "olive" or "ragweed". As result for selecting any of these pollen types the parameters n.clinical
, window.clinical
, th.pollen
and th.sum
will be automatically adjusted for the "clinical" method. If no pollen types are specified (type = "none"
), these parameters will be considered by the user. This argument is valid only for the "clinical" method. The type
argument will be "none" by default.logical
value. If FALSE
the interpolation of the pollen data is not applicable. If TRUE
an interpolation of the pollen series will be applied to complete the gaps with no data before the calculation of the pollen season. The interpolation
argument will be TRUE
by default. A more detailed information about the interpolation method may be consulted here.character
string specifying the method selected to apply the interpolation method in order to complete the pollen series. The implemented methods that may be used are: "lineal", "movingmean", "spline" or "tseries". The int.method
argument will be "lineal" by default.numeric
(integer value) specifying the maximum number of consecutive days with missing data that the algorithm is going to interpolate. If the gap is bigger than the argument value, the gap will not be interpolated. Not valid with int.method = "tseries"
. The maxdays
argument will be 30 by default.logical
value specifying if a set of plots based on the definition of the pollen season and saved in the working directory will be required or not. If FALSE
graphical results will not be saved. If TRUE
a pdf file for each pollen type showing graphically the definition of the pollen season for each studied year will be saved within the plot_AeRobiology directory created in the working directory. The export.plot
argument will be FALSE
by default.logical
value specifying if a excel file including all parameters for the definition of the pollen season saved in the working directory will be required or not. If FALSE
the results will not exported. If TRUE
the results will be exported as xlsx file including all parameters calculated from the definition of the pollen season within the table_AeRobiology directory created in the working directory. The export.result
argument will be FALSE
by default.logical
value specifying if the plots are generated in the plot history. The plot
argument will be TRUE
by default.character
string specifying the output for the function. The implemented outputs that may be obtained are: "table"
and "list"
. The argument result
will be "table"
by default.This function allows to calculate the pollen season using five different methods which are described below. After calculating the start_date, end_date and peak_date for the pollen season all rest of parameters are calculated:
Although the results can be extracted in the working directory in a xlsx file, they can also be assigned to an object, and visualized, for example (an extract):
pollen_season <- calculate_ps(munich_pollen)
knitr::kable(pollen_season[24:31, ] , format = "html", booktabs = TRUE)
The result is also plotted by default (plot = TRUE).
calculate_ps(munich_pollen[,c(1,6)], plot = TRUE)
This is a commonly used method for defining the pollen season based on the elimination of a certain percentage in the beginning and the end of the pollen season ( Nilsson and Persson, 1981; Andersen, 1991). For example, if the pollen season is based on the 95% of the total annual pollen (perc = 95
), the start_date of the pollen season is marked as the day in which 2.5% of the total pollen is registered and the end_date of the pollen season is marked as the day in which 97.5% of the total pollen is registered.
calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc = 90, export.result = TRUE)
In this case we have calculated the main pollen season based on the 90% of the total annual pollen. Results are stored in the "table_AeRobiology" folder since export.result=TRUE
.
You can select different methods of interpolation or even to not interpolate gaps:
calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc = 90, export.result = FALSE, int.method = "spline")
calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc = 75, export.result = FALSE, interpolation = FALSE)
This method was developed by Ribeiro et al. (2007) and modified by Cunha et al. (2015). It is based on fitting annually a non-linear logistic regression model to the daily accumulated curve for each pollen type. This logistic function and the different derivatives were considered to calculate the start_date and end_date of the pollen season, based on the asymptotes when pollen amounts are stabilized on the beginning and the end of the accumulated curve. For more information about the method, see Ribeiro et al. (2007) and Cunha et al. (2015). Three different derivatives may be used (derivative
argument) 4, 5 or 6 that represent from higher to lower restrictive criterion for defining the pollen season.
This method may be complemented with an optional method for reduction the peaks values (reduction = TRUE
), thus avoiding the effect of the great influence of extreme peaks. In this sense, peaks values will be cut below a certain level that the user may select based on a percentile analysis of peaks. For example, red.level = 0.90
will cut all peaks above the percentile 90.
pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "logistic", derivative = 5, reduction=FALSE)
knitr::kable(pollen_season, format = "html", booktabs = TRUE )
In the previous case, the reduction wasn't carried out (reduction=FALSE
) and all the peaks were conserved. We can cut some peaks and change the derivative
:
pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "logistic", derivative = 6, reduction=FALSE, red.level = 0.8)
knitr::kable(pollen_season, format = "html", booktabs = TRUE)
As you can observe, results are a bit different.
This method was proposed by Pfaar et al. (2017). It is based on the expert consensus in relation to pollen exposure and the relationship with allergic symptoms derived of the literature.
Different periods may be defined by this method: the pollen season, the high pollen season and the high pollen days:
1) The start_date and end_date of the pollen season were defined as a certain number of days (n.clinical
argument) within a time window period (window.clinical
argument) exceeding a certain pollen threshold (th.pollen
argument) which summation is above a certain pollen sum (th.sum
argument).
All these parameters are established for each pollen type according to Pfaar et al. (2017) and using the type
argument these parameters may be automatically adjusted for the specific pollen types ("birch", "grasses", "cypress", "olive" or "ragweed"). Furthermore, the user may change all parameters to do a customized definition of the pollen season.
2) The start_date and end_date of the high pollen season were defined as three consecutive days exceeding a certain pollen threshold (th.day
argument).
3) The number of high pollen days will also be calculated exceeding this pollen threshold (th.day
). For more information about the method, see Pfaar et al. (2017).
Running the following example, the main pollen season will be established according to the birch requirements: more than 5 days within a week in which more than 10 pollen grains/m3 are registered and whose sum exceeds 100 pollen grains/m3. The high pollen days are those which exceed 100 pollen grains/m3 (n.clinical=5, window.clinical=7, th.pollen=10, th.sum=100, th.day=100
)
pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "clinical", type = "birch")
knitr::kable(pollen_season, format = "html", booktabs = TRUE)
The code above returns the same result than:
pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "clinical", n.clinical = 5, window.clinical = 7, th.pollen = 10, th.sum = 100, th.day = 100)
knitr::kable(pollen_season, format = "html", booktabs = TRUE)
We have resumed all these parameters under the argument type=birch
to facilitate its application.
This method was proposed by Galan et al. (2001) originally in olive pollen but after also applied in other pollen types.
The start_date and end_date of the pollen season were defined as a certain number of days (window.grains
argument) exceeding a certain pollen threshold (th.pollen
argument). For more information about the method, see Galan et al. (2001).
We want to establish the main pollen season start in the first day when more than 3 consecutive days have more than 2 pollen grains/m3, and the end in the last day in which these conditions are fulfilled:
calculate_ps(munich_pollen[,c(1,6)], method = "grains", window.grains = 3, th.pollen = 2 )
The definition of the pollen season is based on the application of a moving average to the pollen series in order to obtain the general seasonality of the pollen curve avoiding the great variability of the daily fluctuations. Thus, the start_date and the end_date will be established when the curve of the moving average reaches a given pollen threshold (th.ma
argument). Also the order of the moving average may be customized by the user (man
argument). By default, man = 11
and th.ma = 5
.
The idea of this method is to be able to calculate the start of the main pollen season when the season has not finished yet. Moreover, it allows to establish the start and end even if there is a huge amount of lacking data within the main pollen season. Its similar to the "grains" method but taking into account the moving mean to avoid daily variability.
You might understand it better by consulting the plots obtained.
Let's calculate it with a moving average of 7 days. We are going to establish the start and end of the main pollen season when the moving average reaches 4 pollen grains/m3:
calculate_ps(munich_pollen[,c(1,6)], method = "moving", man = 7, th.ma = 4)
As you may have noticed, we have been using the function under an "European" point of view: the calculations are from 1st January to 31th December. The researchers of the Southern Hemisphere are used to work with interanual pollen seasons. Don't worry, we haven't forgotten you!
1) You can work from 1st June to 31th May by means of the argument def.season="interannual"
:
calculate_ps(munich_pollen[,c(1,3)], method = "moving", man = 7, th.ma = 4, def.season = "interannual")
In this method, the season belongs to the first year of the pair of years, i.e: from June 2017 to May 2018 -> season "2017".
2) You can center the main pollen season in the average peak day (182 days before and after the average date of the peak):
calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc=95, def.season = "peak")
In this last method, the season belongs to the year in which the average peak date - 182 days is located, i.e.: if the average peak date is in January 2013, the season is called "2012" in the data.frames.
Pollen time series frequently have different gaps with no data and this fact could be a problem for the calculation of specific methods for defining the pollen season even providing incorrect results. In this sense by default a linear interpolation will be carried out to complete these gaps before to define the pollen season (interpolation = TRUE
). Additionally, the users may select other interpolation methods using the int.method
argument, as "lineal", "movingmean", "spline" or "tseries".
Some advanced users may have noticed that you can't use directly all the arguments of interpollen() through calculate_ps(). You only are able to select the interpolation method. Nevertheless, it is not impossible to use them:
1) Use interpollen() with all the arguments you want and overwrite your interpolated database in an object:
CompleteData<-interpollen(munich_pollen, method="spline", ndays=3, spar=0.7, plot = TRUE, maxdays = 3, result = "wide")
2) Then, use "CompleteData" instead of "munich_pollen" (or your database) in the following steps:
plot_ps(CompleteData, pollen.type="Alnus", year=2013, fill.col = "orange", axisname = "AeRobiology custom units")
**Note**: You might select **`interpolation=FALSE`** in the other functions you use after interpolating manually. In some cases it doesn't matter, but if you want to interpolate only gaps with less than 3 days and you did in the first step, if you don't use `interpolation=FALSE` in calculate_ps function, a second interpolation will be carried out by using the default method (gaps with less than 30 days and lineal interpolation). You will obtain a database with the gaps of less than 3 days interpolated by "spline" method and the rest with the "lineal" method.
This function calculates the pollen calendar from a historical database of several pollen types using different designs. The main arguments are:
data.frame
with the first column in "Date" format and the rest of the columns in "numeric" format (pollen types).export.format
).Data exportation:
export.plot = TRUE
this plot displaying the pollen calendar will be exported within the Plot_AeRobiology directory created in the working directory.export.plot = TRUE
and export.format = pdf
, a pdf file with the pollen calendar will be saved within the plot_AeRobiology directory, created in the working directory. Additional characteristics
may be incorporated to the exportation as pdf file.export.plot = TRUE
and export.format = png
, a png file with the pollen calendar will be saved within the plot_AeRobiology directory, created in the working directory. Additional characteristics may be incorporated to the exportation as png file.Exclusive arguments for pollen calendars generated as "heatplot":
method.classes = custom
.Exclusive arguments for pollen calendars generated as "phenological":
perc1
) and the "early/late pollination" (perc2
) based on the "phenological" method proposed by Werchan
et al. (2018).This pollen calendar is constructed based on the daily or weekly average of pollen concentrations (depending on the preferences of the user, who may select "daily" or "weekly" as period
argument). Then, these averages may be classified in different categories following different methods selected by the user. An example of this pollen calendar may be consulted in Rojo et al. (2016). This method to design pollen calendars is an adaptation from the pollen calendar proposed by Spieksma (1991), who considered 10-day periods instead of daily or weekly periods.
First, we are going to generate a pollen calendar based on the heatplot, designed with green color and constructed with daily averages:
pollen_calendar(munich_pollen, method = "heatplot", period = "daily", color = "green", interpolation = FALSE)
In all cases, the table used by the pollen calendar with the averaged values will be created. This table can be visualized. To do that we have to set the argument result = "table" For example (an extract):
average_values<-pollen_calendar(munich_pollen, method = "heatplot", period = "daily", color = "green", interpolation = FALSE, result = "table") knitr::kable(average_values[82:90, ], format = "html", booktabs = TRUE)
By default the method for defining the classes for the pollen calendar is according to the exponential method. Nevertheless, the classes can be customized by classes
argument:
pollen_calendar(data = munich_pollen, method = "heatplot", period = "daily", color = "red", method.classes = "custom", n.classes = 5, classes = c(5, 25, 50, 200), interpolation = FALSE)
In addition, for species with their pollen season occurring between two natural years, the start of the pollen calendar can be selected by the start.month
argument:
pollen_calendar(data = munich_pollen, method = "heatplot", period = "daily", color = "purple", method.classes = "custom", n.classes = 5, classes = c(5, 25, 50, 200), start.month = 11, na.remove = FALSE, interpolation = FALSE)
**`NA` (no data) can be removed by using `na.remove` argument.**
Also we can generate a pollen calendar based on the heatplot design with weekly averages.
pollen_calendar(data = munich_pollen, method = "heatplot", period = "weekly", color = "blue", method.classes = "exponential", n.types = 4, y.start = 2011, y.end = 2014, interpolation = FALSE)
In this case, we have included other restrictive arguments as n.types
limiting the number of pollen types and y.start
and y.end
, limiting the period to be considered for the pollen calendar.
This pollen calendar is based on the phenological definition of the pollen season and adapted from the methodology proposed by Werchan et al. (2018).
After obtaining the daily average pollen concentrations for the most abundant pollen types, different pollination periods are calculated. The main pollination period is calculated according to the percentage defined by perc1
argument (selected by the user, 80% by default; red) of the annual total pollen. For example, if perc1 = 80
, the beginning of the high season is marked when the 10% of the annual value is reached; the end is selected when 90% is reached. The early/late pollination period is defined with the perc2
argument (selected by the user, 99% of the total annual pollen by default; orange), i.e.: the start of this period will be when the 0.5% is reached and the end when the 99.5% is reached. For this kind of pollen calendar the th.pollen
argument defines the possible occurrence period as adapted by Werchan et al. (2018), considering the entire period between the first and the last day when this pollen level is reached (yellow).
pollen_calendar(data = munich_pollen, method = "phenological", n.types = 5, y.start = 2011, y.end = 2014, interpolation = FALSE)
Furthermore, different criteria can be customized:
pollen_calendar(data = munich_pollen, method = "phenological", perc1 = 90, perc2 = 95, th.pollen = 5, interpolation = FALSE)
In this last case, the pollen calendar has been generated with more restrictive criteria for perc1
, perc2
and th.pollen
.
This pollen calendar is based on the pollen intensity, and adapted from the pollen calendar published by ORourke (1990). At first, the daily averages of the pollen concentrations are calculated and then, these averages are represented by using the violin plot graph.
pollen_calendar(data = munich_pollen, method = "violinplot", y.start = 2012, y.end = 2015, interpolation = FALSE)
In addition, th.pollen
can be established, specifying the minimum pollen concentration considered (E.g.: 10 pollen grains/m3):
pollen_calendar(data = munich_pollen, method = "violinplot", th.pollen = 10, na.rm = FALSE, interpolation = FALSE)
These functions have been designed for a quick view of your data for discussions or interpretations of them. Interactive plots are displayed. It might be interesting for group meetings or real time exposition of results. The functions create a emergent screen in which you can select the pollen/spore type or the years you want to plot.
The function iplot_pollen() is to plot the pollen data during one season. The arguments are:
data.frame
object. This data.frame should include a first column in "Date" format and the rest of columns in format "numeric" belonging to each pollen type by column. integer
value specifying the year to display. This is a mandatory argument. The function iplot_years() is to plot the pollen data during one season. The arguments are:
data.frame
object. This data.frame should include a first column in "Date" format and the rest of columns in format "numeric" belonging to each pollen type by column. character
string with the name of the particle to show. This character must match with the name of a column in the input database. This is a mandatory argument.iplot_pollen(munich_pollen, year = 2012) iplot_years(munich_pollen, pollen = "Betula")
**Note:** We are not able to plot interactive figures in this document. Please, run the codes above in your R session.
The function plot_summary() is to plot the pollen data during several seasons. Also to plot the averaged pollen season over the study period. It is possible to plot the relative abundance per day and smoothing the pollen season by calculating a moving average. The main arguments are:
data.frame
object. This data.frame should include a first column in "Date" format and the rest of columns in "numeric" format belonging to each pollen type by column. character
string with the name of the particle to show. This character must match with the name of a column in the input database. This is a mandatory argument. integer
value specifying the order of the moving average applied to the data. By default, mave = 1
. logical
value specifying if the visualization shows real pollen data (normalized = FALSE
) or the percentage of every day over the whole pollen season (normalized = TRUE
). By default, normalized = FALSE
. character
string specifying the title of the y axis. By default, axisname = "Pollen grains / m3"
plot_summary(munich_pollen, pollen = "Betula")
In some cases the user may want to reduce the "noise" of the daily values by calculating moving means (E.g.: 5 days moving mean; mave = 5
):
plot_summary(munich_pollen, pollen = "Betula", mave = 5)
You might be also interested in representing as background the percentage each day suppose to the main pollen season (normalized = TRUE
):
plot_summary(munich_pollen, pollen = "Betula", mave = 5, normalized = TRUE)
The function plot_normsummary() has been designed to plot the pollen data amplitude during several seasons: daily average pollen concentration over the study period, maximum pollen concentration of each day over the study period and minimum pollen concentration of each day value over the study period. It is possible to plot the relative abundance per day and smoothing the pollen season by calculating a moving average. The main arguments similar to plot_summary(), but as a result you will obtain the the max-min range of the study period instead of the values for every year.
plot_normsummary(munich_pollen, pollen = "Betula", color.plot = "red")
The maximum values are marked in red and the minimum values in white. The average is the black line.
Of course, you can change the color (color.plot
argument):
plot_normsummary(munich_pollen, pollen = "Betula", color.plot = "green", mave = 5, normalized = TRUE)
The function analyse_trend() has been created to calculate the main seasonal indexes of the pollen season ("Start Date", "Peak Date", "End Date" and "Pollen Integral"), as well as trends analysis of those parameters over the seasons. It is a summary dot plot showing the distribution of the main seasonal indexes over the years.
The results can also be stored in two folders termed "plot_AeRobiology" and "table_AeRobiology", which will be created in your working directory (only if export.result=TRUE
& export.plot=TRUE
). You can decide which result to be returned from the function by setting the argument result: result = "table" or result = "plot".
The function allows you to decide if you want to interpolate the data or not, by the argument interpolation
, it also allows you to select the interpolation method by the argument int.method
. Furthermore, it allows you to select the pollen season definition method, by the argument method
and additional arguments for the function calculate_ps(). Some arguments about the visualization are:
logical
argument. If split = TRUE
, the plot is separated in two according to the nature of the variables (i.e. dates or pollen concentrations). This argument was a solution to reduce the scale of the x-axis when "total pollen"" variable has a very high/low slope. By default, split = TRUE
.numeric
value (between 0 and 1) indicating the quantile of data to be displayed in the graphical output of the function. quantil = 1
would show all the values, however a lower quantile will exclude the most extreme values of the sample. This argument was designed after realizing of a common problem: when plotting the results with split=FALSE
, some "outlier" results increased a lot the scale of the x-axis, making unreadable the rest of results. Our solution was to create an argument to exclude these outliers results and to be able to observe the main results in an appropriate scale. Furthermore, this argument may be used to split the parameters using a different sampling units (e.g. dates and pollen concentrations) can be used low vs high values of quantil argument (e.g. 0.5 vs 1). Also can be used an extra argument: split
. By default, quantil = 0.75
. This argument only works when split=FALSE
.numeric
value indicating the significant level to be considered in the linear trends analysis. This p level is displayed in the graphical output of the function (as a number in the legend and as a black ring in the graphical representation). By default, significant = 0.05
. analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, result="plot")
Lets see what happen if we don't split the graphics:
analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, split = FALSE, quantil = 1, result="plot")
Now the results are less readable, but this kind of graphical representation may be interesting for some people. When plotting all the results together, it might be useful to exclude some outliers:
analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, split=FALSE, quantil = 0.5, result="plot")
As you can appreciate, a lot of points have been omitted.
You can also change the significance level as mentioned above. Why don`t we try?:
analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, significant = 1, result = "plot")
Now everything is significant!
Have a look to the numbers by setting result = "table" (the default output).
analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, significant = 1, result="table")
The function plot_trend() has been created to calculate the main seasonal indexes of the pollen season ("Start Date", "Peak Date", "End Date" and "Pollen Integral") and their trends analysis over the seasons. It produces plots showing the distribution of the main seasonal indexes over the years.
The results are stored in two folders termed "plot_AeRobiology" and "table_AeRobiology" which will be located in your working directory (only if export.result=TRUE
& export.plot=TRUE
).
plot_trend(munich_pollen, interpolation = FALSE, export.plot = TRUE, export.result = TRUE)
The **interval of confidence** only appears if more than 6 dots are plotted. If not, a line crossing all the dots is plotted.
The function iplot_abundance() generates a barplot based on the relative abundance in the air (as percentage) of each pollen/spore type with respect to the total amounts.
The main arguments are:
numeric
(integer) value specifying the number of the most abundant pollen types that must be represented in the plot of the relative abundance. A more detailed information about the selection of the considered pollen types may be consulted here. The n.types
argument will be 15 types by default.numeric
(integer) value specifying the period selected to calculate relative abundances of the pollen types (start year - end year). If y.start
and y.end
are not specified (NULL
), the entire database will be used to generate the pollen calendar. The y.start
and y.end
arguments will be NULL
by default.character
string specifying the color of the bars to generate the graph showing the relative abundances of the pollen types. The color
argument will be #E69F00 by default, but any color may be selected.character
string specifying the type of plot selected to show the plot showing the relative abundance of the pollen types. The implemented types that may be used are: "static" (generates a static ggplot object) and "dynamic" (generates a dynamic plotly object).iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE)
Now we are going to reduce the number of types:
iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3)
We can also select the abundance for only one year and change the color:
iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3, y.start = 2011, y.end = 2011, col.bar = "#d63131")
Furthermore, we can make it interactive:
iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3, y.start = 2011, y.end = 2011, col.bar = "#d63131", type.plot = "dynamic")
The function iplot_pheno() generates a boxplot based on phenological parameters (start_dates and end_dates) which are calculated by the estimation of the main parameters of the pollen season. The main arguments are:
data.frame
object including the general database where calculation of the pollen season must be applied in order to generate the phenological plot based on the start_dates and end_dates. This data.frame must include a first column in "Date" format and the rest of columns in "numeric" format belonging to each pollen type by column.character
string specifying the method applied to calculate the pollen season and the main parameters. The implemented methods that can be used are: "percentage", "logistic", "moving", "clinical" or "grains". A more detailed information about the different methods for defining the pollen season may be consulted in calculate_ps function.numeric
(integer) value specifying the number of the most abundant pollen types that must be represented. A more detailed information about the selection of the considered pollen types may be consulted here. The n.types
argument will be 15 by default.character
string specifying the type of plot selected to show the phenological plot. The implemented types that may be used are: "static" (generates a static ggplot object) and "dynamic" (generates a dynamic plotly object).iplot_pheno(munich_pollen, method= "percentage", perc=80, int.method="spline", n.types = 8)
We can change the method to establish the main pollen season and the number of pollen types to show:
iplot_pheno(munich_pollen, method= "clinical", n.clinical = 3, int.method="spline", n.types = 4)
Furthermore, we can make the plot interactive to obtain more information by clicking in each object:
iplot_pheno(munich_pollen, method= "clinical", n.clinical = 3, int.method="spline", n.types = 4, type.plot = "dynamic")
The function plot_ps() function was designed to plot the main pollen season of a single pollen type and year. Some of the arguments are:
data.frame
object including the general database where interpolation must be performed. This data.frame must include a first column in "Date" format and the rest of the columns in "numeric" format. Each column must contain information of one pollen type. It is not necessary to insert missing gaps, the function will automatically detect them.character
string specifying the name of the pollen type which will be plotted. The name must be exactly the same that appears in the column name. Mandatory argument with no default.numeric
(integer) value specifying the season to be plotted. The season does not necessary fit a natural year. See calculate_ps for more details. Mandatory argument with no default.numeric
(integer) specifying the number of days beyond each side of the main pollen season that will be represented. The days
argument will be 30 by default.character
string specifying the name of the color to fill the main pollen season in the plot. See ggplot function for more details. The fill.col
argument will be "turquoise4" by default. It uses the ggplot color codes.character
string specifying the title of the y axis. By default, axisname = expression(paste("Pollen grains / m"^"3"))
plot_ps(munich_pollen, pollen.type="Alnus", year=2011)
If you misspell some pollen type name, the function will tell you:
plot_ps(munich_pollen, pollen.type="Alnuscdscscr", year=2011)
Let`s test more arguments:
plot_ps(munich_pollen, pollen.type="Alnus", year=2013, method= "percentage", perc=95 ,int.method = "lineal")
As you have noticed, the arguments method
, perc
and int.method
are from calculate_ps function.
What about changing the interpolation method?
plot_ps(munich_pollen, pollen.type="Alnus", year=2013, method= "percentage", perc=95 ,int.method = "movingmean")
Do you want a larger scale?
plot_ps(munich_pollen, pollen.type="Alnus", year=2013, days = 90)
Maybe a different color and y-axis name?
plot_ps(munich_pollen, pollen.type="Alnus", year=2013, fill.col = "orange", axisname = "AeRobiology custom units")
Please keep in mind that the function plot_hour() is only available after package version 2.0.
The input data must be a data.frame object with the structure long. Where the first two columns are factors indicating the pollen and the location. The 3 and 4 columns are POSIXct, showing the date with the hour. Where the third column is the beginning of the concentration from and the fourth column is the end time of the concentration data to. The fifth column shows the concentrations of the different pollen types as numeric. Please see the example 3-hourly data from the automatic pollen monitor BAA500 from Munich and Viechtach in Bavaria (Germany) data("POMO_pollen"), supplied by ePIN Network supported by the Bavarian Government.
We will load an example dataset about 3-hourly data from the ePIN network in Bavaria (Germany). The dataset contains information of 3-hourly concentrations of pollen in the atmosphere of Munich (DEBIED) and Viechtach (DEVIEC) during the year 2018. Pollen types included: "Poaceae" and "Pinus".
The data were obtained by the automatic pollen monitor BAA500 from Munich and Viechtach in Bavaria (Germany) data("POMO_pollen"), supplied by the public ePIN Network supported by the Bavarian Government. The ePIN Network was built by Das Bayerische Landesamt für Gesundheit und Lebensmittelsicherheit (LGL) in collaboration with Zentrum Allergie und Umwelt (ZAUM).
library(ggplot2) library(dplyr)
data("POMO_pollen")
The function plots pollen data expressed in concentrations with time resolution higher than 1 day (e.g. hourly, bi-hourly concentrations). If the argument result = "plot", the function returns a list of objects of class ggplot2; if result = "table", the function returns a data.frame with the hourly patterns.
plot_hour(POMO_pollen)
To display a table we have to set the argument result = "table".
TO<-plot_hour(POMO_pollen, result ="table") knitr::kable(TO[1:10,], caption = "3-Hourly patterns", row.names = FALSE, digits = 1, format = "html", booktabs = TRUE)
We can also split the different stations by setting the argument locations = TRUE.
plot_hour(POMO_pollen, locations = TRUE)
An alternative to plot_hour() is plot_heathour(), which shows a summary of all particles with a heatplot. The input data should have the same format as for plot_hour().
plot_heathour(POMO_pollen)
You can also set the colors by the arguments: low.col, mid.col and high.col. E.g.
plot_heathour(POMO_pollen, low.col = "darkgreen", mid.col = "moccasin", high.col = "brown")
By setting locations = TRUE you can split the result by locations:
plot_heathour(POMO_pollen, locations = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.