kernelExceed: Kernel density plot for daily mean exceedance statistics

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/kernelExceed.R


This function is used to explore the conditions leading to exeedances of air quality limits. Currently the focus is on understanding the conditions under which daily limit values for PM10 are in excess of a specified threshold. Kernel density estimates are calculated and plotted to highlight those conditions.


  x = "wd",
  y = "ws",
  pollutant = "pm10",
  type = "default",
  by = c("day", "dayhour", "all"),
  limit = 50,
  data.thresh = 0,
  more.than = TRUE,
  cols = "default",
  nbin = 256,
  auto.text = TRUE,



A data frame minimally containing date and at least three other numeric variables, typically ws, wd and a pollutant.


x-axis variable. Mandatory.


y-axis variable. Mandatory


Mandatory. A pollutant name corresponding to a variable in a data frame should be supplied e.g. pollutant = "nox"


The type of analysis to be done. The default is will produce a single plot using the entire data. Other types include "hour" (for hour of the day), "weekday" (for day of the week) and "month" (for month of the year), "year" for a polarPlot for each year. It is also possible to choose type as another variable in the data frame. For example, type = "o3" will plot four kernel exceedance plots for different levels of ozone, split into four quantiles (approximately equal numbers of counts in each of the four splits). This offers great flexibility for understanding the variation of different variables dependent on another. See function cutData for further details.


by determines how data above the limit are selected. by = "day" will select all data (typically hours) on days where the daily mean value is above limit. by = "dayhour" will select only those data above limit on days where the daily mean value is above limit. by = "hour" will select all data above limit.


The threshold above which the pollutant concentration will be considered.


The data capture threshold to use ( the data using timeAverage to daily means. A value of zero means that all available data will be used in a particular period regardless if of the number of values available. Conversely, a value of 100 will mean that all data will need to be present for the average to be calculated, else it is recorded as NA.


If TRUE data will be selected that are greater than limit. If FALSE data will be selected that less than limit.


Colours to be used for plotting. Options include "default", "increment", "heat", "spectral", "hue", "brewer1" and user defined (see manual for more details). The same line colour can be set for all pollutant e.g. cols = "black".


number of bins to be used for the kernel density estimate.


Either TRUE (default) or FALSE. If TRUE titles and axis labels will automatically try and format pollutant names and units properly e.g. by subscripting the ‘2’ in NO2.


Other graphical parameters passed onto lattice:levelplot and cutData. For example, kernelExceed passes the option hemisphere = "southern" on to cutData to provide southern (rather than default northern) hemisphere handling of type = "season". Similarly, common axis and title labelling options (such as xlab, ylab, main) are passed to levelplot via quickText to handle routine formatting.


The kernelExceed functions is for exploring the conditions under which exceedances of air pollution limits occur. Currently it is focused on the daily mean (European) Limit Value for PM10 of 50~ug/m3 not to be exceeded on more than 35 days. However, the function is sufficiently flexible to consider other limits e.g. could be used to explore days where the daily mean are greater than 100~ug/m3.

By default the function will plot the kernel density estimate of wind speed and wind directions for all days where the concentration of pollutant is greater than limit. Understanding the conditions where exceedances occur can help with source identification.

The function offers different ways of selecting the data on days where the pollutant are greater than limit through setting by. By default it will select all data on days where pollutant is greater than limit. With the default setting of by it will select all data on those days where pollutant is greater than limit, even if individual data (e.g. hours) are less than limit. Setting by = "dayhour" will additionally ensure that all data on the those dates are also greater than limit. Finally, by = "all" will select all values of pollutant above limit, regardless of when they occur.

The usefulness of the function is greatly enhanced through using type, which conditions the data according to the level of another variable. For example, type = "season" will show the kernel density estimate by spring, summer, autumn and winter and type = "so2" will attempt to show the kernel density estimates by quantiles of SO2 concentration. By considering different values of type it is possible to develop a good understanding of the conditions under which exceedances occur.

To aid interpretation the plot will also show the estimated number of days or hours where exeedances occur. For type = "default" the number of days should exactly correspond to the actual number of exceedance days. However, with different values of type the number of days is an estimate. It is an estimate because conditioning breaks up individual days and the estimate is based on the proportion of data calculated for each level of type.


To be completed.


This function automatically chooses the bandwidth for the kernel density estimate. We generally find that most data sets are not overly sensitive to the choice of bandwidth. One important reason for this insensitivity is likley to be the characteristics of air pollution itself. Due to atmospheric dispersion processes, pollutant plumes generally mix rapidly in the atmosphere. This means that pollutant concentrations are ‘smeared-out’ and extra fidelity brought about by narrower bandwidths do not recover any more detail.


David Carslaw

See Also

polarAnnulus, polarFreq, polarPlot


# Note! the manual contains other examples that are more illuminating
# basic plot
kernelExceed(mydata, pollutant = "pm10")

# condition by NOx concentrations
## Not run: kernelExceed(mydata, pollutant = "pm10", type = "nox")

openair documentation built on Oct. 22, 2021, 5:08 p.m.