Nothing
#' Plot time series, perhaps for multiple pollutants, grouped or in separate
#' panels.
#'
#' The [timePlot()] is the basic time series plotting function in `openair`. Its
#' purpose is to make it quick and easy to plot time series for pollutants and
#' other variables. The other purpose is to plot potentially many variables
#' together in as compact a way as possible.
#'
#' The function is flexible enough to plot more than one variable at once. If
#' more than one variable is chosen plots it can either show all variables on
#' the same plot (with different line types) *on the same scale*, or (if `group
#' = FALSE`) each variable in its own panels with its own scale.
#'
#' The general preference is not to plot two variables on the same graph with
#' two different y-scales. It can be misleading to do so and difficult with more
#' than two variables. If there is in interest in plotting several variables
#' together that have very different scales, then it can be useful to normalise
#' the data first, which can be down be setting the `normalise` option.
#'
#' The user has fine control over the choice of colours, line width and line
#' types used. This is useful for example, to emphasise a particular variable
#' with a specific line type/colour/width.
#'
#' [timePlot()] works very well with [selectByDate()], which is used for
#' selecting particular date ranges quickly and easily. See examples below.
#'
#' @inheritParams shared_openair_params
#' @inheritParams timeAverage
#'
#' @param mydata A data frame of time series. Must include a `date` field and at
#' least one variable to plot.
#'
#' @param pollutant Name of variable to plot. Two or more pollutants can be
#' plotted, in which case a form like `pollutant = c("nox", "co")` should be
#' used.
#'
#' @param group Controls how multiple lines/series are grouped. Three options
#' are available:
#'
#' - `FALSE` (default): each pollutant is plotted in its own panel with its
#' own scale.
#' - `TRUE`: all pollutants are plotted together on the same panel and scale,
#' coloured by pollutant name.
#' - A character string giving the name of a column in `mydata` (e.g.
#' `group = "site"` or `group = "pollutant"`): lines are coloured by the
#' values in that column. With a single `pollutant` all groups appear in one
#' panel; with multiple `pollutant`s each pollutant gets its own panel and
#' lines within each panel are coloured by the group column. This is
#' particularly useful for long-format data where multiple species are stored
#' in one column.
#'
#' @param stack If `TRUE` the time series will be stacked by year. This option
#' can be useful if there are several years worth of data making it difficult
#' to see much detail when plotted on a single plot.
#'
#' @param normalise Should variables be normalised? The default is is not to
#' normalise the data. `normalise` can take two values, either `"mean"` or a
#' string representing a date in UK format e.g. "1/1/1998" (in the format
#' dd/mm/YYYY). If `normalise = "mean"` then each time series is divided by
#' its mean value. If a date is chosen, then values at that date are set to
#' 100 and the rest of the data scaled accordingly. Choosing a date (say at
#' the beginning of a time series) is very useful for showing how trends
#' diverge over time. Setting `group = TRUE` is often useful too to show all
#' time series together in one panel.
#'
#' @param percentile The percentile level in percent used when `statistic =
#' "percentile"` and when aggregating the data with `avg.time`. More than one
#' percentile level is allowed for `type = "default"` e.g. `percentile = c(50,
#' 95)`. Not used if `avg.time = "default"`.
#'
#' @param date.pad Should missing data be padded-out? This is useful where a
#' data frame consists of two or more "chunks" of data with time gaps between
#' them. By setting `date.pad = TRUE` the time gaps between the chunks are
#' shown properly, rather than with a line connecting each chunk. For
#' irregular data, set to `FALSE`. Note, this should not be set for `type`
#' other than `default`.
#'
#' @param log Should the y-axis appear on a log scale? The default is `FALSE`.
#' If `TRUE` a well-formatted log10 scale is used. This can be useful for
#' plotting data for several different pollutants that exist on very different
#' scales. It is therefore useful to use `log = TRUE` together with `group =
#' TRUE`.
#'
#' @param smooth Should a smooth line be applied to the data? The default is
#' `FALSE`.
#'
#' @param smooth_k An integer controlling the number of basis functions used in
#' the GAM smooth. In a GAM, `k` sets the maximum degrees of freedom for the
#' smooth term: larger values allow more flexibility and can capture finer
#' structure in the data, while smaller values produce smoother, less wiggly
#' fits. The default (`NULL`) lets `ggplot2` choose automatically (typically
#' `k = 10`). Increase `k` if the smooth appears too rigid; decrease it to
#' avoid over-fitting.
#'
#' @param ci If a smooth fit line is applied, then `ci` determines whether the
#' 95 percent confidence intervals are shown.
#'
#' @param ref.x See `ref.y` for details. In this case the correct date format
#' should be used for a vertical line e.g. `ref.x = list(v =
#' as.POSIXct("2000-06-15"), lty = 5)`.
#'
#' @param ref.y A list with details of the horizontal lines to be added
#' representing reference line(s). For example, `ref.y = list(h = 50, lty =
#' 5)` will add a dashed horizontal line at 50. Several lines can be plotted
#' e.g. `ref.y = list(h = c(50, 100), lty = c(1, 5), col = c("green",
#' "blue"))`.
#'
#' @param name.pol This option can be used to give alternative names for the
#' variables plotted. Instead of taking the column headings as names, the user
#' can supply replacements. For example, if a column had the name "nox" and
#' the user wanted a different description, then setting `name.pol = "nox
#' before change"` can be used. If more than one pollutant is plotted then use
#' `c` e.g. `name.pol = c("nox here", "o3 there")`.
#'
#' @param ... Addition options are passed on to [cutData()] for `type` handling.
#' Some additional arguments are also available:
#' - `xlab`, `ylab` and `main` override the x-axis label, y-axis label, and plot title.
#' - `layout` sets the layout of facets - e.g., `layout(2, 5)` will have 2 columns and 5 rows.
#' - `lwd`, `lty`, and `pch` control various graphical parameters.
#' - `fontsize` overrides the overall font size of the plot.
#' - `border` sets the border colour of each tile.
#' - `ylim` and `xlim` control axis limits.
#'
#' @export
#' @return an [openair][openair-package] object
#' @author David Carslaw
#' @author Jack Davison
#' @family time series and trend functions
#' @examples
#' # basic use, single pollutant
#' timePlot(mydata, pollutant = "nox")
#'
#' # two pollutants in separate panels
#' \dontrun{
#' timePlot(mydata, pollutant = c("nox", "no2"))
#'
#' # two pollutants in the same panel with the same scale
#' timePlot(mydata, pollutant = c("nox", "no2"), group = TRUE)
#'
#' # group by a column (e.g. long-format data with a 'site' column)
#' d <- rbind(
#' cbind(mydata[, c("date", "nox")], site = "London"),
#' cbind(transform(mydata[, c("date", "nox")], nox = nox * 1.5), site = "Manchester")
#' )
#' timePlot(d, pollutant = "nox", group = "site")
#'
#' # alternative by normalising concentrations and plotting on the same scale
#' timePlot(
#' mydata,
#' pollutant = c("nox", "co", "pm10", "so2"),
#' group = TRUE,
#' avg.time = "year",
#' normalise = "1/1/1998",
#' lwd = 3,
#' lty = 1
#' )
#'
#' # examples of selecting by date
#'
#' # plot for nox in 1999
#' timePlot(selectByDate(mydata, year = 1999), pollutant = "nox")
#'
#' # select specific date range for two pollutants
#' timePlot(
#' selectByDate(mydata, start = "6/8/2003", end = "13/8/2003"),
#' pollutant = c("no2", "o3")
#' )
#'
#' # choose different line styles etc
#' timePlot(mydata, pollutant = c("nox", "no2"), lty = 1)
#'
#' # choose different line styles etc
#' timePlot(
#' selectByDate(mydata, year = 2004, month = 6),
#' pollutant = c("nox", "no2"),
#' lwd = c(1, 2),
#' col = "black"
#' )
#'
#' # different averaging times
#'
#' # daily mean O3
#' timePlot(mydata, pollutant = "o3", avg.time = "day")
#'
#' # daily mean O3 ensuring each day has data capture of at least 75%
#' timePlot(mydata, pollutant = "o3", avg.time = "day", data.thresh = 75)
#'
#' # 2-week average of O3 concentrations
#' timePlot(mydata, pollutant = "o3", avg.time = "2 week")
#' }
#'
timePlot <- function(
mydata,
pollutant = "nox",
group = FALSE,
stack = FALSE,
normalise = NULL,
avg.time = "default",
data.thresh = 0,
statistic = "mean",
percentile = NA,
date.pad = FALSE,
type = "default",
cols = "brewer1",
log = FALSE,
windflow = NULL,
smooth = FALSE,
smooth_k = NULL,
ci = TRUE,
x.relation = "same",
y.relation = "same",
ref.x = NULL,
ref.y = NULL,
key.columns = NULL,
key.position = "bottom",
strip.position = "top",
name.pol = pollutant,
date.breaks = 7,
date.format = NULL,
auto.text = TRUE,
plot = TRUE,
key = NULL,
...
) {
# check key.position
key.position <- check_key_position(key.position, key)
# Args setup
extra.args <- rlang::list2(...)
# detect if group is a column name rather than a boolean
group_is_col <- is.character(group) && length(group) == 1
# warning messages and other checks
if (length(percentile) > 1 && length(pollutant) > 1) {
cli::cli_abort(
"Only one {.arg pollutant} allowed when considering more than one {.arg percentile}."
)
}
if (stack && length(type) > 1) {
cli::cli_abort(
"Cannot {.arg stack} and have more than one {.arg type}."
)
}
if (
isFALSE(group) &&
length(type) > 1 &&
(length(pollutant) > 1 || length(percentile) > 1)
) {
cli::cli_abort(
"{.arg group} cannot be {FALSE} and have more than one {.arg type}."
)
}
if (length(type) > 2) {
cli::cli_abort(
"Cannot have more than 2 {.arg type}s."
)
}
if (!missing(statistic) && missing(avg.time)) {
cli::cli_inform("No {.field avg.time} specified; using 'month'.")
avg.time <- "month"
}
rlang::arg_match(x.relation, c("same", "free"))
rlang::arg_match(y.relation, c("same", "free"))
if (group_is_col) {
if (!group %in% names(mydata) && !group %in% dateTypes) {
cli::cli_abort("Column {.val {group}} not found in {.arg mydata}.")
}
if (group %in% type) {
cli::cli_abort(
"{.arg group} column {.val {group}} is also used in {.arg type}. Choose one or the other."
)
}
if (length(type) > 1 && length(pollutant) > 1) {
cli::cli_abort(
"Cannot combine multiple {.arg pollutant}s, multiple {.arg type}s, and a {.arg group} column simultaneously."
)
}
}
# ensure windflow is a list
windflow <- resolve_windflow_opts(windflow)
# style controls
extra.args$pch <- extra.args$pch %||% NA
extra.args$layout <- extra.args$layout %||% NULL
# check & cut data
mydata <- prepare_timeplot_data(
mydata = mydata,
pollutant = pollutant,
type = type,
avg.time = avg.time,
date.pad = date.pad,
windflow = windflow,
group = group,
...
)
# time average & reshape data
mydata <- time_average_timeplot_data(
mydata = mydata,
pollutant = pollutant,
type = type,
statistic = statistic,
avg.time = avg.time,
data.thresh = data.thresh,
percentile = percentile,
windflow = windflow,
group = group,
...
)
pollutant <- mydata$pollutant
mydata <- mydata$data
# normalise data (if required)
mydata <- normalise_timeplot_data(mydata, normalise = normalise)
# need to group pollutants if conditioning
if (avg.time != "default" && length(percentile) > 1L && missing(group)) {
group <- TRUE
}
# label controls
extra.args$xlab <- quickText(extra.args$xlab %||% "", auto.text)
extra.args$main <- quickText(extra.args$main, auto.text)
extra.args$ylab <- quickText(
extra.args$ylab %||%
dplyr::case_when(
!is.null(normalise) ~
paste("normalised", paste(pollutant, collapse = ", ")),
.default = paste(pollutant, collapse = ", ")
),
auto.text
)
# if stacking of plots by year is needed
if (stack || all(type == "year")) {
mydata$year <- as.character(lubridate::year(mydata$date))
if (is.null(extra.args$layout)) {
extra.args$layout <- c(1, length(unique(mydata$year)))
}
lubridate::year(mydata$date) <- lubridate::year(mydata$date)[1]
date.format <- date.format %||% "%b"
}
# make sure order is correct
mydata$variable <- factor(
mydata$variable,
levels = pollutant,
labels = name.pol
)
# x-axis scale function
if (lubridate::is.Date(mydata$date)) {
x_scale_fun <- ggplot2::scale_x_date
} else {
x_scale_fun <- ggplot2::scale_x_datetime
}
# number of distinct pollutants (for panel layout / faceting)
npol <- length(unique(mydata$variable))
# number of groups used for colour/linetype/linewidth aesthetics
if (group_is_col) {
group_levels <- unique(as.character(mydata[[group]]))
mydata[[group]] <- factor(mydata[[group]], levels = group_levels)
n_groups <- length(group_levels)
} else {
n_groups <- npol
}
extra.args$lty <- extra.args$lty %||% 1
while (length(extra.args$lty) < n_groups) {
extra.args$lty <- c(extra.args$lty, extra.args$lty)
}
extra.args$lty <- extra.args$lty[1:n_groups]
extra.args$lwd <- extra.args$lwd %||% 1
while (length(extra.args$lwd) < n_groups) {
extra.args$lwd <- c(extra.args$lwd, extra.args$lwd)
}
extra.args$lwd <- extra.args$lwd[1:n_groups]
# layout - stack vertically
if (
is.null(extra.args$layout) &&
!isTRUE(group) &&
!stack &&
all(type == "default")
) {
extra.args$layout <- c(1, npol)
}
# deal with type
# when group is a string, !isTRUE(group) is TRUE, so multiple pollutants
# still get their own panels (coloured by the group column within each)
if (!isTRUE(group) && npol > 1) {
type <- c(type, "variable")
}
if (stack) {
type <- c(type, "year")
}
if (length(type) > 1) {
type <- type[type != "default"]
}
theGuide <-
ggplot2::guide_legend(
reverse = key.position %in% c("left", "right"),
theme = ggplot2::theme(
legend.title.position = ifelse(
key.position %in% c("left", "right"),
"top",
key.position
)
),
ncol = if (missing(key.columns)) {
if (key.position %in% c("left", "right")) {
1
} else {
n_groups
}
} else {
key.columns
}
)
# aesthetic column: group column name when group is a string, else "variable"
aes_col <- if (group_is_col) group else "variable"
# legend title: show group column name when group is a string, else no title
legend_title <- if (group_is_col) quickText(group, auto.text) else NULL
# built plot
thePlot <-
ggplot2::ggplot(
mydata,
ggplot2::aes(
x = .data$date,
y = .data$value,
color = .data[[aes_col]],
fill = .data[[aes_col]],
linetype = .data[[aes_col]]
)
) +
ggplot2::geom_line(
ggplot2::aes(linewidth = .data[[aes_col]]),
show.legend = TRUE
) +
gg_ref_x(ref.x) +
gg_ref_y(ref.y) +
theme_openair(key.position = ifelse(n_groups == 1, "none", key.position)) +
ggplot2::labs(
x = extra.args$xlab,
y = extra.args$ylab,
title = extra.args$main,
fill = legend_title,
colour = legend_title,
linetype = legend_title,
linewidth = legend_title
) +
get_facet(
type,
extra.args,
scales = relation_to_facet_scales(x.relation, y.relation),
auto.text = auto.text,
drop = FALSE,
strip.position = strip.position
) +
ggplot2::coord_cartesian(
xlim = extra.args$xlim,
ylim = extra.args$ylim
) +
x_scale_fun(
breaks = scales::breaks_pretty(date.breaks),
date_labels = date.format %||% ggplot2::waiver(),
expand = ggplot2::expansion(c(0.025, 0.025))
) +
ggplot2::scale_y_continuous(
expand = if (is.null(extra.args$ylim)) {
ggplot2::expansion(mult = c(0.1, 0.1))
} else {
ggplot2::expansion()
},
transform = ifelse(log, "log10", "identity")
) +
ggplot2::scale_color_manual(
values = openColours(scheme = cols, n = n_groups),
drop = FALSE,
label = \(x) label_openair(x, auto_text = auto.text),
aesthetics = c("fill", "colour")
) +
ggplot2::scale_linetype_manual(
values = extra.args$lty,
drop = FALSE,
label = \(x) label_openair(x, auto_text = auto.text)
) +
ggplot2::scale_linewidth_manual(
values = extra.args$lwd / 2,
drop = FALSE,
label = \(x) label_openair(x, auto_text = auto.text)
) +
ggplot2::guides(
fill = theGuide,
color = theGuide,
linetype = theGuide,
linewidth = theGuide
)
if (stack) {
thePlot <-
thePlot +
ggplot2::theme(
panel.spacing.y = ggplot2::unit(0, "cm")
)
}
if (smooth) {
smooth_formula <- if (!is.null(smooth_k)) {
stats::as.formula(paste0("y ~ s(x, k = ", smooth_k, ")"))
} else {
y ~ s(x)
}
thePlot <- thePlot +
ggplot2::stat_smooth(method = "gam", formula = smooth_formula, se = ci)
}
if (windflow$windflow) {
thePlot <-
thePlot +
layer_windflow_opts(data = NULL, windflow_opts = windflow)
}
# output
if (plot) {
plot(thePlot)
}
output <- list(plot = thePlot, data = mydata, call = match.call())
class(output) <- "openair"
invisible(output)
}
#' Prepare data for the timeplot function
#' @noRd
prepare_timeplot_data <- function(
mydata,
pollutant,
type,
avg.time,
date.pad,
windflow,
group = FALSE,
...
) {
# determine necessary variables
vars <- c("date", pollutant)
if (windflow$windflow) {
vars <- unique(c(vars, "wd", "ws"))
}
if (is.character(group)) {
if (!group %in% dateTypes) {
vars <- unique(c(vars, group))
}
}
# standard data checks
mydata <- checkPrep(mydata, vars, type, remove.calm = FALSE)
# pad out any missing date/times so that line don't extend between areas of missing data
if (date.pad) {
mydata <- datePad(mydata, type = type)
}
# cut data
mydata <- cutData(mydata, type, ...)
# check for duplicates - can't really have duplicate data in a timeplot
# when group is a column, duplicate check must also split by that column
if (avg.time == "default") {
check_type <- if (is.character(group)) c(type, group) else type
check_duplicate_rows(mydata, check_type, fn = cli::cli_abort)
}
# return data
return(mydata)
}
#' timeAverage and reshape timeplot data
#' @noRd
time_average_timeplot_data <- function(
mydata,
pollutant,
type,
statistic,
avg.time,
data.thresh,
percentile,
windflow,
group = FALSE,
...
) {
# when group is a column name, average within each group separately
avg_type <- if (is.character(group)) c(type, group) else type
# average the data if necessary (default does nothing)
if (avg.time != "default") {
# deal with multiple percentile values
if (length(percentile) > 1) {
prefix <- paste(pollutant, "percentile ")
mydata <-
calcPercentile(
mydata,
type = avg_type,
pollutant = pollutant,
avg.time = avg.time,
data.thresh = data.thresh,
percentile = percentile,
prefix = prefix,
...
)
pollutant <- paste0(prefix, percentile)
} else {
mydata <- timeAverage(
mydata,
pollutant = pollutant,
type = avg_type,
statistic = statistic,
avg.time = avg.time,
data.thresh = data.thresh,
percentile = percentile,
...
)
}
} else if (is.character(group)) {
mydata <- cutData(mydata, type = group, ...)
}
# timeAverage drops type if default
if (any(type == "default")) {
mydata$default <- "default"
}
# need to flag if ws/wd are being plotted *and* used for windflow
flag_wind_pollutant <- windflow$windflow &&
("ws" %in% pollutant || "wd" %in% pollutant)
# retain ws/wd if needed later
if (flag_wind_pollutant) {
# only select what is being pivoted, as it will need joining back on
met_vars <- c("ws", "wd")
met_vars <- met_vars[met_vars %in% pollutant]
met_data <- dplyr::select(mydata, dplyr::any_of(c("date", met_vars)))
}
# reshape
mydata <-
tidyr::pivot_longer(
mydata,
cols = dplyr::all_of(pollutant),
names_to = "variable",
values_to = "value"
)
# bind on ws/wd if needed for windflow
if (flag_wind_pollutant) {
mydata <- dplyr::left_join(
mydata,
met_data,
by = dplyr::join_by("date")
) |>
dplyr::relocate(dplyr::any_of(c("ws", "wd")), .after = "date")
}
# need to return pollutant as it can change
return(list(
data = mydata,
pollutant = pollutant
))
}
#' Apply normalisation to timeplot data
#' @noRd
normalise_timeplot_data <-
function(mydata, normalise) {
# preserve order of pollutants
mydata <- dplyr::mutate(
mydata,
variable = factor(.data$variable, levels = unique(.data$variable))
)
# if normalise isn't given, just return the data
if (is.null(normalise)) {
return(mydata)
}
# handle normalisation
if (normalise == "mean") {
mydata <-
dplyr::mutate(
mydata,
value = .data$value / mean(.data$value, na.rm = TRUE),
.by = "variable"
)
} else {
# convert string to date
target_date <- as.POSIXct(strptime(
normalise,
format = "%d/%m/%Y",
tz = "GMT"
))
if (is.na(target_date)) {
cli::cli_abort(c(
"x" = "Provided {.field normalise} option not recognised.",
"i" = "{.field normalise} must be either 'mean' or a string in the format 'DD/MM/YYYY'."
))
}
# find nearest values to each date
target_date_values <-
tidyr::drop_na(mydata) |>
dplyr::slice_min(
abs(.data$date - target_date),
n = 1L,
with_ties = FALSE,
by = "variable"
) |>
dplyr::select("variable", "target_value" = "value")
# scale value to 100 at specific date
mydata <-
mydata |>
dplyr::left_join(
target_date_values,
by = dplyr::join_by("variable")
) |>
dplyr::mutate(
value = 100 * (.data$value / .data$target_value)
) |>
dplyr::select(-"target_value")
}
# return input data
return(mydata)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.