Nothing
#' Plot heat maps of atmospheric composition data
#'
#' The [trendLevel()] function provides a way of rapidly showing a large amount
#' of data in a condensed form. In one plot, the variation in the concentration
#' of one pollutant can to shown as a function of between two and four
#' categorical properties. The default arguments plot hour of day on the x-axis
#' and month of year on the y-axis. However, `x`, `y` and `type` and summarising
#' statistics can all be modified to provide a range of other similar plots, all
#' being passed to [cutData()] for discretisation. The average wind speed and
#' direction in each bin can also be plotted using the `windflow` argument.
#'
#' [trendLevel()] allows the use of third party summarising functions via the
#' `statistic` option. Any additional function arguments not included within a
#' function called using `statistic` should be supplied as a list of named
#' parameters and sent using `stat.args`. For example, the encoded option
#' `statistic = "mean"` is equivalent to `statistic = mean, stat.args =
#' list(na.rm = TRUE)` or the R command `mean(x, na.rm = TRUE)`. Many R
#' functions and user's own code could be applied in a similar fashion, subject
#' to the following restrictions: the first argument sent to the function must
#' be the data series to be analysed; the name 'x' cannot be used for any of the
#' extra options supplied in `stat.args`; and the function should return the
#' required answer as a numeric or `NA`. Note: If the supplied function returns
#' more than one answer, currently only the first of these is retained and used
#' by [trendLevel()]. All other returned information will be ignored without
#' warning. If the function terminates with an error when it is sent an empty
#' data series, the option `stat.safe.mode` should not be set to `FALSE` or
#' [trendLevel()] may fail. Note: The `stat.safe.mode = TRUE` option returns an
#' NA without warning for empty data series.
#'
#' @inheritParams shared_openair_params
#'
#' @param mydata The openair data frame to use to generate the [trendLevel()]
#' plot.
#'
#' @param pollutant The name of the data series in `mydata` to sample to produce
#' the [trendLevel()] plot.
#'
#' @param x,y,type The name of the data series to use as the [trendLevel()]
#' x-axis, y-axis or conditioning variable, passed to [cutData()]. These are
#' used before applying `statistic`. [trendLevel()] does not allow duplication
#' in `x`, `y` and `type` options.
#'
#' @param rotate.axis The rotation to be applied to `trendLevel` `x` and `y`
#' axes. The default, `c(90, 0)`, rotates the x axis by 90 degrees but does
#' not rotate the y axis. If only one value is supplied, this is applied to
#' both axes; if more than two values are supplied, only the first two are
#' used.
#'
#' @param n.levels The number of levels to split `x`, `y` and `type` data into
#' if numeric. The default, `c(10, 10, 4)`, cuts numeric `x` and `y` data into
#' ten levels and numeric `type` data into four levels. This option is ignored
#' for date conditioning and factors. If less than three values are supplied,
#' three values are determined by recursion; if more than three values are
#' supplied, only the first three are used.
#'
#' @param limits The colour scale range to use when generating the
#' [trendLevel()] plot.
#'
#' @param min.bin The minimum number of records required in a bin to show a
#' value. Bins with fewer than `min.bin` records are set to `NA`. The default
#' is 1, i.e., all bins with no records are set to `NA`. Setting `min.bin` to
#' a value greater than 1 can be useful to exclude bins with very few records
#' that might produce unreliable statistic values.
#'
#' @param statistic The statistic to apply when aggregating the data; default is
#' the mean. Can be one of `"mean"`, `"max"`, `"min"`, `"median"`,
#' `"frequency"`, `"sum"`, `"sd"`, `"percentile"`. Note that `"sd"` is the
#' standard deviation, `"frequency"` is the number (frequency) of valid
#' records in the period and `"data.cap"` is the percentage data capture.
#' `"percentile"` is the percentile level (%) between 0-100, which can be set
#' using the `"percentile"` option. Functions can also be sent directly via
#' `statistic`; see 'Details' for more information.
#'
#' @param percentile The percentile level used when `statistic = "percentile"`.
#' The default is 95%.
#'
#' @param stat.args Additional options to be used with `statistic` if this is a
#' function. The extra options should be supplied as a list of named
#' parameters; see 'Details' for more information.
#'
#' @param stat.safe.mode An addition protection applied when using functions
#' directly with `statistic` that most users can ignore. This option returns
#' `NA` instead of running `statistic` on binned sub samples that are empty.
#' Many common functions terminate with an error message when applied to an
#' empty dataset. So, this option provides a mechanism to work with such
#' functions. For a very few cases, e.g., for a function that counted missing
#' entries, it might need to be set to `FALSE`; see 'Details' for more
#' information.
#'
#' @param drop.unused.types Hide unused/empty `type` conditioning cases. Some
#' conditioning options may generate empty cases for some data sets, e.g. a
#' hour of the day when no measurements were taken. Empty `x` and `y` cases
#' generate 'holes' in individual plots. However, empty `type` cases would
#' produce blank panels if plotted. Therefore, the default, `TRUE`, excludes
#' these empty panels from the plot. The alternative `FALSE` plots all `type`
#' panels.
#'
#' @param col.na Colour to be used to show missing data.
#'
#' @param ... Addition options are passed on to [cutData()] for `type` handling.
#' Some additional arguments are also available:
#' - `xlab`, `ylab` and `main` override the x-axis label, y-axis label, and plot title.
#' - `layout` sets the layout of facets - e.g., `layout(2, 5)` will have 2 columns and 5 rows.
#' - `fontsize` overrides the overall font size of the plot.
#' - `border` sets the border colour of each tile.
#'
#' @export
#' @return an [openair][openair-package] object.
#' @author Karl Ropkins
#' @author David Carslaw
#' @author Jack Davison
#' @examples
#' # basic use
#' # default statistic = "mean"
#' trendLevel(mydata, pollutant = "nox")
#'
#' # applying same as 'own' statistic
#' my.mean <- function(x) mean(x, na.rm = TRUE)
#' trendLevel(mydata, pollutant = "nox", statistic = my.mean)
#'
#' # alternative for 'third party' statistic
#' # trendLevel(mydata, pollutant = "nox", statistic = mean,
#' # stat.args = list(na.rm = TRUE))
#'
#' \dontrun{
#' # example with categorical scale
#' trendLevel(mydata,
#' pollutant = "no2",
#' border = "white", statistic = "max",
#' breaks = c(0, 50, 100, 500),
#' labels = c("low", "medium", "high"),
#' cols = c("forestgreen", "yellow", "red")
#' )
#' }
trendLevel <- function(
mydata,
pollutant = "nox",
x = "month",
y = "hour",
type = "default",
rotate.axis = c(90, 0),
n.levels = c(10, 10, 4),
windflow = NULL,
limits = NULL,
min.bin = 1,
cols = "default",
auto.text = TRUE,
key.title = paste("use.stat.name", pollutant, sep = " "),
key.position = "right",
strip.position = "top",
labels = NULL,
breaks = NULL,
statistic = c(
"mean",
"max",
"min",
"median",
"frequency",
"sum",
"sd",
"percentile"
),
percentile = 95,
stat.args = NULL,
stat.safe.mode = TRUE,
drop.unused.types = TRUE,
col.na = "white",
plot = TRUE,
key = NULL,
...
) {
# check key.position
key.position <- check_key_position(key.position, key)
extra.args <- rlang::list2(...)
# check length of x
if (length(x) > 1) {
x <- x[1]
cli::cli_warn(
c(
"!" = "{.fun openair::trendLevel} does not allow multiple {.field x} values.",
"i" = "Setting {.field x} to '{x}'."
)
)
}
# check length of y
if (length(y) > 1) {
y <- y[1]
cli::cli_warn(
c(
"!" = "{.fun openair::trendLevel} does not allow multiple {.field y} values.",
"i" = "Setting {.field y} to '{y}'."
)
)
}
# check length of type
if (length(type) > 2) {
type <- type[1:2]
cli::cli_warn(
c(
"!" = "{.fun openair::trendLevel} does not allow more than two {.field type} values.",
"i" = "Setting {.field type} to '{type}'."
)
)
}
# ensure x, y and type are unique
vars <- c(pollutant, x, y, type)
if (length(vars) != length(unique(vars))) {
cli::cli_abort(
c(
"x" = "{.fun openair::trendLevel} could not rationalise plot structure.",
"i" = "Duplicate term(s) in {.field pollutant} ('{pollutant}'), {.field x} ('{x}'), {.field y} ('{y}'), and {.field type} ('{type}')."
)
)
}
# make sure windflow is a list
# lower the default range - good for heatmap plots
windflow <- resolve_windflow_opts(windflow, range = c(0.01, 0.5))
if (
windflow$windflow &&
(x %in%
c("ws", "wd") ||
y %in% c("ws", "wd") ||
any(type %in% c("ws", "wd")))
) {
cli::cli_abort(
"In {.fun openair::trendLevel}, cannot use {.arg windflow} and have {.arg x}, {.arg y} and/or {.arg type} be 'ws' or 'wd'."
)
}
# number vector handling
ls.check.fun <- function(vector, vector.name, len) {
if (!is.numeric(vector)) {
cli::cli_warn(
c(
"!" = "{.fun openair::trendLevel} ignored unrecognised {.field vector.name} option.",
"i" = "See {.fun openair::trendLevel} for more information."
),
call. = FALSE
)
# use current default
vector <- eval(formals(trendLevel)[[vector.name]])
}
if (length(vector) < len) {
vector <- rep(vector, len)[1:len]
}
# insert default if not given
ifelse(is.na(vector), eval(formals(trendLevel)[[vector.name]]), vector)
}
rotate.axis <- ls.check.fun(rotate.axis, "rotate.axis", 2)
n.levels <- ls.check.fun(n.levels, "n.levels", 3)
# statistic handling
if (is.character(statistic) || is.function(statistic)) {
if (is.character(statistic)) {
# hardcoded statistic options
statistic <- rlang::arg_match(statistic)
stat.name <- statistic
if (statistic == "mean") {
stat.fun <- mean
stat.args <- list(na.rm = TRUE)
}
if (statistic == "median") {
stat.fun <- stats::median
stat.args <- list(na.rm = TRUE)
}
if (statistic == "sd") {
stat.fun <- stats::sd
stat.args <- list(na.rm = TRUE)
}
if (statistic == "max") {
stat.fun <- function(x, ...) {
if (all(is.na(x))) {
NA
} else {
max(x, ...)
}
}
stat.args <- list(na.rm = TRUE)
}
if (statistic == "min") {
stat.fun <- function(x, ...) {
if (all(is.na(x))) {
NA
} else {
min(x, ...)
}
}
stat.args <- list(na.rm = TRUE)
}
if (statistic == "sum") {
stat.fun <- function(x, ...) {
if (all(is.na(x))) {
NA
} else {
sum(x, ...)
}
}
stat.args <- list(na.rm = TRUE)
}
if (statistic == "frequency") {
stat.fun <- function(x, ...) {
if (all(is.na(x))) {
NA
} else {
length(stats::na.omit(x))
}
}
stat.args <- NULL
}
if (statistic == "percentile") {
if (percentile < 0 || percentile > 100) {
cli::cli_abort("{.field percentile} outside {0}-{100}.")
}
probs <- percentile / 100
stat.fun <- function(x, ...) {
stats::quantile(x, probs = probs, names = FALSE, ...)
}
stat.args <- list(na.rm = TRUE)
lastnum <- substr(percentile, nchar(percentile), nchar(percentile))
numend <- dplyr::recode_values(
lastnum,
"1" ~ "st",
"2" ~ "nd",
"3" ~ "rd",
default = "th"
)
stat.name <- paste0(percentile, numend, " Perc.")
}
} else {
# user defined function handling
# default unnamed stats to 'level'
stat.name <- substitute(statistic)
if (length(stat.name) != 1) {
stat.name <- "level"
}
if (stat.safe.mode) {
stat.fun <- function(x, ...) {
if (all(is.na(x))) NA else statistic(x, ...)[1]
}
} else {
stat.fun <- function(x, ...) statistic(x, ...)[1]
}
}
} else {
# early end for bad stats
cli::cli_abort(
c(
"x" = "{.fun openair::trendLevel} could not apply given {.field statistic} option.",
"i" = "Please provide a valid function or character vector.",
"i" = "Valid character vectors: {.emph {eval(formals(trendLevel)$statistic)}}"
)
)
}
# checkPrep
temp <- c(pollutant)
if (x %in% dateTypes || y %in% dateTypes || any(type %in% dateTypes)) {
temp <- c(temp, "date")
}
if (windflow$windflow) {
temp <- c(temp, "ws", "wd")
}
# all of x, y, temp need to be handled as type here
mydata <- checkPrep(mydata, temp, type = c(x, y, type), remove.calm = FALSE)
# if using windflow, need the u & v components of wind
if (windflow$windflow) {
mydata <- mydata |>
dplyr::mutate(
wind_u = -1 * .data$ws * sin(.data$wd * pi / 180),
wind_v = -1 * .data$ws * cos(.data$wd * pi / 180)
) |>
dplyr::select(
-c("ws", "wd")
)
}
# cutData
# different n.levels for axis and type, axes get `is.axis = TRUE`
newdata <-
mydata |>
cutData(x, n.levels = n.levels[1], is.axis = TRUE, ...) |>
cutData(y, n.levels = n.levels[2], is.axis = TRUE, ...) |>
cutData(type, n.levels = n.levels[3], drop = "outside", ...) |>
tidyr::complete(!!!rlang::syms(c(x, y, type)))
# select only pollutant and axis/facet columns
newdata <- dplyr::select(
newdata,
dplyr::any_of(c(pollutant, x, y, type, "wind_u", "wind_v"))
)
# apply statistic
calc_stat <- function(x) {
args <- append(stat.args, list(x = x))
rlang::exec(stat.fun, !!!args)
}
# NB: don't overwrite `pollutant` initially so that `n` is still valid
newdata_poll <-
newdata |>
dplyr::summarise(
poll_avg = calc_stat(.data[[pollutant]]),
n = length(stats::na.omit(.data[[pollutant]])),
.by = dplyr::all_of(c(x, y, type))
)
names(newdata_poll)[names(newdata_poll) == "poll_avg"] <- pollutant
# if using windflow, calculate mean wind_u and wind_v for each bin
if (windflow$windflow) {
newdata_met <- newdata |>
dplyr::summarise(
wind_u = mean(.data[["wind_u"]], na.rm = TRUE),
wind_v = mean(.data[["wind_v"]], na.rm = TRUE),
.by = dplyr::all_of(c(x, y, type))
)
newdata <-
dplyr::left_join(
newdata_poll,
newdata_met,
by = c(x, y, type)
)
} else {
newdata <- newdata_poll
}
# if using windflow, convert wind_u and wind_v back into ws & wd
if (windflow$windflow) {
newdata <- newdata |>
dplyr::mutate(
ws = sqrt(.data$wind_u^2 + .data$wind_v^2),
wd = (atan2(-.data$wind_u, -.data$wind_v) * 180 / pi) %% 360
) |>
dplyr::select(-c("wind_u", "wind_v"))
}
# remove bins with fewer than min.bin records
vars <- c(pollutant)
if (windflow$windflow) {
vars <- c(vars, "ws", "wd")
}
newdata <- dplyr::mutate(
newdata,
dplyr::across(
dplyr::any_of(vars),
\(x) ifelse(.data$n < min.bin, NA, x)
)
)
# sub out 'use.stat.name' with stat.name
key.title <- gsub("use.stat.name", stat.name, key.title)
# check if key.header / key.footer are being used
key.title <- check_key_header(key.title, extra.args)
# categorical colour scale or not?
categorical <- FALSE
if (!is.null(breaks)) {
# get breaks
labels <- get_labels_from_breaks(breaks, labels)
# cut data into categories
newdata$cuts <- cut(
newdata[[pollutant]],
breaks = breaks,
labels = labels,
include.lowest = TRUE
)
categorical <- TRUE
}
# construct plot
thePlot <-
ggplot2::ggplot(
newdata,
ggplot2::aes(
x = num_convert(.data[[x]]),
y = num_convert(.data[[y]]),
fill = .data[[ifelse(categorical, "cuts", pollutant)]]
)
) +
ggplot2::geom_tile(
colour = extra.args$border %||% "transparent",
show.legend = TRUE
) +
ggplot2::coord_cartesian(clip = "off", expand = FALSE) +
theme_openair(key.position) +
set_extra_fontsize(extra.args) +
ggplot2::labs(
x = quickText(extra.args$xlab %||% x, auto.text = auto.text),
y = quickText(extra.args$ylab %||% y, auto.text = auto.text),
title = quickText(extra.args$main %||% "", auto.text = auto.text),
fill = quickText(key.title, auto.text = auto.text)
) +
ggplot2::guides(
x = ggplot2::guide_axis(check.overlap = TRUE, angle = rotate.axis[1]),
y = ggplot2::guide_axis(check.overlap = TRUE, angle = rotate.axis[2])
) +
get_facet(
type,
extra.args = extra.args,
scales = "fixed",
auto.text = auto.text,
drop = drop.unused.types,
strip.position = strip.position
)
# make key full width/height
if (key.position %in% c("left", "right")) {
thePlot <- thePlot +
ggplot2::theme(
legend.key.height = ggplot2::unit(1, "null"),
legend.key.spacing.y = ggplot2::unit(0, "cm")
)
}
if (key.position %in% c("top", "bottom")) {
thePlot <- thePlot +
ggplot2::theme(
legend.key.width = ggplot2::unit(1, "null"),
legend.key.spacing.x = ggplot2::unit(0, "cm")
)
}
# colours
if (categorical) {
thePlot <-
thePlot +
ggplot2::scale_fill_manual(
values = openColours(
scheme = cols,
n = dplyr::n_distinct(levels(newdata$cuts))
),
na.value = col.na,
breaks = levels(newdata$cuts),
drop = FALSE
) +
ggplot2::guides(
fill = ggplot2::guide_legend(
reverse = key.position %in% c("left", "right"),
theme = ggplot2::theme(
legend.title.position = ifelse(
key.position %in% c("left", "right"),
"top",
key.position
),
legend.text.position = key.position
),
nrow = if (key.position %in% c("left", "right")) NULL else 1
)
)
} else {
thePlot <-
thePlot +
ggplot2::scale_fill_gradientn(
colours = openColours(cols),
na.value = col.na,
oob = scales::oob_squish,
limit = limits
) +
ggplot2::guides(
fill = ggplot2::guide_colorbar(
theme = ggplot2::theme(
legend.title.position = ifelse(
key.position %in% c("left", "right"),
"top",
key.position
),
legend.text.position = key.position
)
)
)
}
if (windflow$windflow) {
thePlot <- thePlot +
layer_windflow_opts(newdata, windflow)
}
if (plot) {
plot(thePlot)
}
output <- list(
plot = thePlot,
data = dplyr::tibble(newdata),
call = match.call()
)
class(output) <- "openair"
invisible(output)
}
num_convert <- function(x) {
y <- utils::type.convert(x, as.is = TRUE)
if (is.numeric(y) || is.integer(y)) {
return(y)
} else {
return(x)
}
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.