Nothing
#' Trajectory level plots with conditioning
#'
#' This function plots gridded back trajectories. This function requires that
#' data are imported using the [importTraj()] function.
#'
#' An alternative way of showing the trajectories compared with plotting
#' trajectory lines is to bin the points into latitude/longitude intervals. For
#' these purposes [trajLevel()] should be used. There are several trajectory
#' statistics that can be plotted as gridded surfaces. First, `statistic` can be
#' set to "frequency" to show the number of back trajectory points in a grid
#' square. Grid squares are by default at 1 degree intervals, controlled by
#' `lat.inc` and `lon.inc`. Such plots are useful for showing the frequency of
#' air mass locations. Note that it is also possible to set `method = "hexbin"`
#' for plotting frequencies (not concentrations), which will produce a plot by
#' hexagonal binning.
#'
#' If `statistic = "difference"` the trajectories associated with a
#' concentration greater than `percentile` are compared with the the full set of
#' trajectories to understand the differences in frequencies of the origin of
#' air masses of the highest concentration trajectories compared with the
#' trajectories on average. The comparison is made by comparing the percentage
#' change in gridded frequencies. For example, such a plot could show that the
#' top 10\% of concentrations of PM10 tend to originate from air-mass origins to
#' the east.
#'
#' If `statistic = "pscf"` then the Potential Source Contribution Function is
#' plotted. The PSCF calculates the probability that a source is located at
#' latitude \eqn{i} and longitude \eqn{j} (Pekney et al., 2006).The basis of
#' PSCF is that if a source is located at (i,j), an air parcel back trajectory
#' passing through that location indicates that material from the source can be
#' collected and transported along the trajectory to the receptor site. PSCF
#' solves \deqn{PSCF = m_{ij}/n_{ij}} where \eqn{n_{ij}} is the number of times
#' that the trajectories passed through the cell (i,j) and \eqn{m_{ij}} is the
#' number of times that a source concentration was high when the trajectories
#' passed through the cell (i,j). The criterion for determining \eqn{m_{ij}} is
#' controlled by `percentile`, which by default is 90. Note also that cells with
#' few data have a weighting factor applied to reduce their effect.
#'
#' A limitation of the PSCF method is that grid cells can have the same PSCF
#' value when sample concentrations are either only slightly higher or much
#' higher than the criterion. As a result, it can be difficult to distinguish
#' moderate sources from strong ones. Seibert et al. (1994) computed
#' concentration fields to identify source areas of pollutants. The
#' Concentration Weighted Trajectory (CWT) approach considers the concentration
#' of a species together with its residence time in a grid cell. The CWT
#' approach has been shown to yield similar results to the PSCF approach. The
#' openair manual has more details and examples of these approaches.
#'
#' A further useful refinement is to smooth the resulting surface, which is
#' possible by setting `smooth = TRUE`.
#'
#' @note This function is under active development and is likely to change
#'
#' @inheritParams trajPlot
#' @param smooth Should the trajectory surface be smoothed?
#' @param statistic Statistic to use for [trajLevel()]. By default, the function
#' will plot the trajectory frequencies (`statistic = "frequency"`). As an
#' alternative way of viewing trajectory frequencies, the argument `method =
#' "hexbin"` can be used. In this case hexagonal binning of the trajectory
#' \emph{points} (i.e., a point every three hours along each back trajectory).
#' The plot then shows the trajectory frequencies uses hexagonal binning.
#'
#' There are also various ways of plotting concentrations.
#'
#' It is possible to set `statistic = "difference"`. In this case trajectories
#' where the associated concentration is greater than `percentile` are
#' compared with the the full set of trajectories to understand the
#' differences in frequencies of the origin of air masses. The comparison is
#' made by comparing the percentage change in gridded frequencies. For
#' example, such a plot could show that the top 10\% of concentrations of PM10
#' tend to originate from air-mass origins to the east.
#'
#' If `statistic = "pscf"` then a Potential Source Contribution Function map
#' is produced. This statistic method interacts with `percentile`.
#'
#' If `statistic = "cwt"` then concentration weighted trajectories are
#' plotted.
#'
#' If `statistic = "sqtba"` then Simplified Quantitative Transport Bias
#' Analysis is undertaken. This statistic method interacts with `.combine` and
#' `sigma`.
#' @param percentile The percentile concentration of `pollutant` against which
#' the all trajectories are compared.
#' @param lon.inc,lat.inc The longitude and latitude intervals to be used for
#' binning data.
#' @param min.bin The minimum number of unique points in a grid cell. Counts
#' below `min.bin` are set as missing.
#' @param .combine When statistic is "SQTBA" it is possible to combine lots of
#' receptor locations to derive a single map. `.combine` identifies the column
#' that differentiates different sites (commonly a column named `"site"`).
#' Note that individual site maps are normalised first by dividing by their
#' mean value.
#' @param sigma For the SQTBA approach `sigma` determines the amount of back
#' trajectory spread based on the Gaussian plume equation. Values in the
#' literature suggest 5.4 km after one hour. However, testing suggests lower
#' values reveal source regions more effectively while not introducing too
#' much noise.
#' @param plot Should a plot be produced? `FALSE` can be useful when analysing
#' data to extract plot components and plotting them in other ways.
#' @param ... other arguments are passed to [cutData()] and [scatterPlot()].
#' This provides access to arguments used in both these functions and
#' functions that they in turn pass arguments on to. For example,
#' [trajLevel()] passes the argument \code{cex} on to [scatterPlot()] which in
#' turn passes it on to [lattice::xyplot()] where it is applied to set the
#' plot symbol size.
#' @export
#' @return an [openair][openair-package] object
#' @family trajectory analysis functions
#' @author David Carslaw
#' @references
#'
#' Pekney, N. J., Davidson, C. I., Zhou, L., & Hopke, P. K. (2006). Application
#' of PSCF and CPF to PMF-Modeled Sources of PM 2.5 in Pittsburgh. Aerosol
#' Science and Technology, 40(10), 952-961.
#'
#' Seibert, P., Kromp-Kolb, H., Baltensperger, U., Jost, D., 1994. Trajectory
#' analysis of high-alpine air pollution data. NATO Challenges of Modern Society
#' 18, 595-595.
#'
#' Xie, Y., & Berkowitz, C. M. (2007). The use of conditional probability
#' functions and potential source contribution functions to identify source
#' regions and advection pathways of hydrocarbon emissions in Houston, Texas.
#' Atmospheric Environment, 41(28), 5831-5847.
#' @examples
#'
#' # show a simple case with no pollutant i.e. just the trajectories
#' # let's check to see where the trajectories were coming from when
#' # Heathrow Airport was closed due to the Icelandic volcanic eruption
#' # 15--21 April 2010.
#' # import trajectories for London and plot
#' \dontrun{
#' lond <- importTraj("london", 2010)
#' }
#' # more examples to follow linking with concentration measurements...
#'
#' # import some measurements from KC1 - London
#' \dontrun{
#' kc1 <- importAURN("kc1", year = 2010)
#' # now merge with trajectory data by 'date'
#' lond <- merge(lond, kc1, by = "date")
#'
#' # trajectory plot, no smoothing - and limit lat/lon area of interest
#' # use PSCF
#' trajLevel(subset(lond, lat > 40 & lat < 70 & lon > -20 & lon < 20),
#' pollutant = "pm10", statistic = "pscf"
#' )
#'
#' # can smooth surface, suing CWT approach:
#' trajLevel(subset(lond, lat > 40 & lat < 70 & lon > -20 & lon < 20),
#' pollutant = "pm2.5", statistic = "cwt", smooth = TRUE
#' )
#'
#' # plot by season:
#' trajLevel(subset(lond, lat > 40 & lat < 70 & lon > -20 & lon < 20),
#' pollutant = "pm2.5",
#' statistic = "pscf", type = "season"
#' )
#' }
trajLevel <- function(
mydata,
lon = "lon",
lat = "lat",
pollutant = "height",
type = "default",
smooth = FALSE,
statistic = "frequency",
percentile = 90,
map = TRUE,
lon.inc = 1.0,
lat.inc = 1.0,
min.bin = 1,
.combine = NA,
sigma = 1.5,
map.fill = TRUE,
map.res = "default",
map.cols = "grey40",
map.alpha = 0.3,
projection = "lambert",
parameters = c(51, 51),
orientation = c(90, 0, 0),
grid.col = "deepskyblue",
origin = TRUE,
plot = TRUE,
...) {
## mydata can be a list of several trajectory files; in which case combine them
## before averaging
hour.inc <- N <- NULL
## variables needed in trajectory plots
vars <- c("date", "lat", "lon", "hour.inc", pollutant)
statistic <- tolower(statistic)
# to combine the effects of several receptors
if (!is.na(.combine)) {
vars <- c(vars, .combine)
}
mydata <- checkPrep(mydata, vars, type, remove.calm = FALSE)
if (statistic == "sqtba" && type != "default") {
warning("'type' option not available yet with SQTBA", call. = FALSE)
type <- "default"
}
## Args
Args <- list(...)
## set graphics
current.strip <- trellis.par.get("strip.background")
current.font <- trellis.par.get("fontsize")
## reset graphic parameters
on.exit(trellis.par.set(
fontsize = current.font
))
if (!"ylab" %in% names(Args)) {
Args$ylab <- ""
}
if (!"xlab" %in% names(Args)) {
Args$xlab <- ""
}
if (!"main" %in% names(Args)) {
Args$main <- ""
}
if (!"border" %in% names(Args)) {
Args$border <- NA
}
if ("fontsize" %in% names(Args)) {
trellis.par.set(fontsize = list(text = Args$fontsize))
}
if (!"key.header" %in% names(Args)) {
if (statistic == "frequency") Args$key.header <- "% trajectories"
if (statistic == "pscf") Args$key.header <- "PSCF \nprobability"
if (statistic == "sqtba") Args$key.header <- paste0("SQTBA \n", pollutant)
if (statistic == "sqtba" && !is.na(.combine)) {
Args$key.header <- paste0("SQTBA \n(Normalised)\n)", pollutant)
}
if (statistic == "difference") {
Args$key.header <- quickText(paste("gridded differences", "\n(", percentile, "th percentile)", sep = ""))
}
}
if (!"key.footer" %in% names(Args)) {
Args$key.footer <- ""
}
## xlim and ylim set by user
if (!"xlim" %in% names(Args)) {
Args$xlim <- range(mydata$lon)
# tweak for SQTBA
Args$xlim <- c(
round(quantile(mydata$lon, probs = 0.002)),
round(quantile(mydata$lon, probs = 0.998))
)
}
if (!"ylim" %in% names(Args)) {
Args$ylim <- range(mydata$lat)
# tweak for SQTBA
Args$ylim <- c(
round(quantile(mydata$lat, probs = 0.002)),
round(quantile(mydata$lat, probs = 0.998))
)
}
## extent of data (or limits set by user) in degrees
trajLims <- c(Args$xlim, Args$ylim)
## need *outline* of boundary for map limits
Args <- setTrajLims(mydata, Args, projection, parameters, orientation)
Args$trajStat <- statistic
if (!"method" %in% names(Args)) {
method <- "traj"
} else {
method <- Args$method
statistic <- "XX" ## i.e. it wont touch the data
}
## location of receptor for map projection, used to show location on maps
origin_xy <- mydata %>%
filter(hour.inc == 0) %>%
group_by(lat, lon) %>%
slice_head(n = 1)
tmp <- mapproject(
x = origin_xy[["lon"]],
y = origin_xy[["lat"]],
projection = projection,
parameters = parameters,
orientation = orientation
)
receptor <- tibble(x = tmp$x, y = tmp$y)
if (method == "hexbin") {
## transform data for map projection
tmp <- mapproject(
x = mydata[["lon"]],
y = mydata[["lat"]],
projection = projection,
parameters = parameters,
orientation = orientation
)
mydata[["lon"]] <- tmp$x
mydata[["lat"]] <- tmp$y
}
if (method == "density") stop("Use trajPlot with method = 'density' instead")
if (is.list(mydata)) mydata <- bind_rows(mydata)
mydata <- cutData(mydata, type, ...)
## bin data
if (method == "traj") {
mydata$ygrid <- round_any(mydata[[lat]], lat.inc)
mydata$xgrid <- round_any(mydata[[lon]], lon.inc)
} else {
mydata$ygrid <- mydata[[lat]]
mydata$xgrid <- mydata[[lon]]
}
rhs <- c("xgrid", "ygrid", type)
rhs <- paste(rhs, collapse = "+")
if (statistic == "sqtba") {
mydata <- mydata %>%
select(any_of(na.omit(c("date", "lon", "lat", "hour.inc", type, pollutant, .combine))))
} else {
mydata <- mydata[, c("date", "xgrid", "ygrid", "hour.inc", type, pollutant)]
ids <- which(names(mydata) %in% c("xgrid", "ygrid", type))
}
# grouping variables
vars <- c("xgrid", "ygrid", type)
## plot mean concentration - CWT method
if (statistic %in% c("cwt", "median")) {
## calculate the mean of points in each cell
mydata <- mydata %>%
group_by(across(vars)) %>%
summarise(
N = length(date),
date = head(date, 1),
count = mean(.data[[pollutant]], na.rm = TRUE)
)
mydata[[pollutant]] <- mydata$count
## adjust at edges
id <- which(mydata$N > 20 & mydata$N <= 80)
mydata[id, pollutant] <- mydata[id, pollutant] * 0.7
id <- which(mydata$N > 10 & mydata$N <= 20)
mydata[id, pollutant] <- mydata[id, pollutant] * 0.42
id <- which(mydata$N <= 10)
mydata[id, pollutant] <- mydata[id, pollutant] * 0.05
attr(mydata$date, "tzone") <- "GMT" ## avoid warning messages about TZ
# prep output data
out_data <- dplyr::ungroup(mydata) %>%
dplyr::select(-dplyr::any_of(c("date", "count"))) %>%
dplyr::rename("count" = "N") %>%
dplyr::mutate(
statistic = statistic,
.before = dplyr::everything()
)
}
## plot trajectory frequecies
if (statistic == "frequency" && method != "hexbin") {
## count % of times a cell contains a trajectory point
## need date for later use of type
mydata <- mydata %>%
group_by(across(vars)) %>%
summarise(count = length(date), date = head(date, 1))
mydata[[pollutant]] <- 100 * mydata$count / max(mydata$count)
# prep output data
out_data <- dplyr::ungroup(mydata) %>%
dplyr::select(-dplyr::any_of(c("date"))) %>%
dplyr::mutate(
statistic = statistic,
.before = dplyr::everything()
)
}
## Poential Source Contribution Function
if (statistic == "pscf") {
## high percentile
Q90 <- quantile(mydata[[pollutant]], probs = percentile / 100, na.rm = TRUE)
## calculate the proportion of points in cell with value > Q90
mydata <- mydata %>%
group_by(across(vars)) %>%
summarise(
N = length(date),
date = head(date, 1),
count = length(which(.data[[pollutant]] > Q90)) / N
)
mydata[[pollutant]] <- mydata$count
## ## adjust at edges
n <- mean(mydata$N)
id <- which(mydata$N > n & mydata$N <= 2 * n)
mydata[id, pollutant] <- mydata[id, pollutant] * 0.75
id <- which(mydata$N > (n / 2) & mydata$N <= n)
mydata[id, pollutant] <- mydata[id, pollutant] * 0.5
id <- which(mydata$N <= (n / 2))
mydata[id, pollutant] <- mydata[id, pollutant] * 0.15
# prep output data
out_data <- dplyr::ungroup(mydata) %>%
dplyr::select(-dplyr::any_of(c("date", "count"))) %>%
dplyr::rename("count" = "N") %>%
dplyr::mutate(
statistic = statistic,
percentile = percentile,
.before = dplyr::everything()
)
}
# simplified quantitative transport bias analysis ------------------------
if (tolower(statistic) == "sqtba") {
# calculate sigma
mydata <- mydata %>%
mutate(sigma = sigma * abs(hour.inc)) %>%
drop_na({{ pollutant }})
# receptor grid
# use trajectory data to determine grid size - don't go to extremes
r_grid <- expand_grid(
lat = seq(round(quantile(mydata$lat, probs = 0.002)),
round(quantile(mydata$lat, probs = 0.998)),
by = lat.inc
),
lon = seq(round(quantile(mydata$lon, probs = 0.002)),
round(quantile(mydata$lon, probs = 0.998)),
by = lon.inc
)
) %>%
as.matrix(.)
# just run
if (is.na(.combine)) {
mydata <- calc_SQTBA(mydata, r_grid, pollutant, min.bin) %>%
rename({{ pollutant }} := SQTBA)
} else {
# process by site, normalise contributions by default
mydata <- mydata %>%
group_by(across(.combine)) %>%
group_modify(~ calc_SQTBA(.x, r_grid, pollutant, min.bin)) %>%
mutate(SQTBA_norm = SQTBA / mean(SQTBA)) %>%
group_by(ygrid, xgrid) %>%
summarise(
SQTBA = mean(SQTBA),
SQTBA_norm = mean(SQTBA_norm)
) %>%
ungroup() %>%
mutate(SQTBA_norm = SQTBA_norm * mean(SQTBA)) %>%
rename({{ pollutant }} := SQTBA_norm)
}
# prep output data
names(mydata)[names(mydata) == "n"] <- "count"
out_data <- dplyr::ungroup(mydata) %>%
dplyr::select(-dplyr::any_of(c("lat_rnd", "lon_rnd", "Q", "Q_c", "SQTBA"))) %>%
dplyr::relocate(dplyr::any_of("count"), .before = pollutant) %>%
dplyr::relocate("xgrid", .before = "ygrid") %>%
dplyr::mutate(
statistic = statistic,
sigma = sigma,
combine = .combine,
.before = dplyr::everything()
)
}
## plot trajectory frequency differences e.g. top 10% concs cf. mean
if (statistic == "difference") {
## calculate percentage of points for all data
base <- mydata %>%
group_by(across(vars)) %>%
summarise(count = length(date), date = head(date, 1))
base[[pollutant]] <- 100 * base$count / max(base$count)
## high percentile
Q90 <- quantile(mydata[[pollutant]], probs = percentile / 100, na.rm = TRUE)
## calculate percentage of points for high data
high <- mydata %>%
group_by(across(vars)) %>%
summarise(
N = length(date),
date = head(date, 1),
count = length(which(.data[[pollutant]] > Q90))
)
high[[pollutant]] <- 100 * high$count / max(high$count)
## calculate percentage absolute difference
mydata <- base
mydata[[pollutant]] <- high[[pollutant]] - mydata[[pollutant]]
## select only if > min.bin points in grid cell
mydata <- subset(mydata, count >= min.bin)
# prep output data
out_data <- dplyr::ungroup(mydata) %>%
dplyr::select(-dplyr::any_of(c("date"))) %>%
dplyr::mutate(
statistic = statistic,
percentile = percentile,
.before = dplyr::everything()
)
}
## change x/y names to gridded values
lon <- "xgrid"
lat <- "ygrid"
## the plot, note k is the smoothing parameter when surface smooth fitted
scatterPlot.args <- list(
mydata,
x = lon, y = lat, z = pollutant,
type = type, method = method, smooth = smooth,
map = map, x.inc = lon.inc, y.inc = lat.inc,
map.fill = map.fill, map.res = map.res,
map.cols = map.cols, map.alpha = map.alpha, traj = TRUE,
projection = projection,
parameters = parameters, orientation = orientation,
grid.col = grid.col, trajLims = trajLims,
receptor = receptor, origin = origin, dist = 0.05, k = 50
)
## reset for Args
scatterPlot.args <- listUpdate(scatterPlot.args, Args)
scatterPlot.args <- listUpdate(scatterPlot.args, list(plot = plot))
## plot
plt <- do.call(scatterPlot, scatterPlot.args)
output <-
list(plot = plt$plot,
data = if (exists("out_data")) {out_data} else {mydata},
call = match.call())
class(output) <- "openair"
invisible(output)
}
# SQTBA functions
# use matrices for speed; Haversine distances away from Gaussian plume
# centreline
pred_Q <- function(i, traj_data, r_grid, pollutant) {
x_origin <- traj_data$lon[i]
y_origin <- traj_data$lat[i]
Q <- numeric(nrow(r_grid))
Q_c <- Q
# only calculate near receptor (within 4 degrees)
id <- which(r_grid[, "lat"] > y_origin - 4 &
r_grid[, "lat"] < y_origin + 4 &
r_grid[, "lon"] > x_origin - 4 &
r_grid[, "lon"] < x_origin + 4)
# subtract 1E-7 from acos - numerical imprecision results in value being greater than 1
dist <- acos(sin(y_origin * pi / 180) * sin(r_grid[id, "lat"] * pi / 180) +
cos(y_origin * pi / 180) *
cos(r_grid[id, "lat"] * pi / 180) *
cos(r_grid[id, "lon"] * pi / 180 - x_origin * pi / 180) - 1e-7) * 6378.137
Q[id] <- (1 / traj_data$sigma[i]^2) * exp(-0.5 * (dist / traj_data$sigma[i])^2)
Q_c[id] <- Q[id] * traj_data[[pollutant]][i]
cbind(Q, Q_c)
}
make_grid <- function(traj_data, r_grid, pollutant) {
# go through points on back trajectory
# makes probability surface
out_grid <- purrr::map(
2:max(abs(traj_data$hour.inc)),
pred_Q, traj_data, r_grid, pollutant
)
out_grid <- reduce(out_grid, `+`) / length(out_grid)
return(out_grid)
}
calc_SQTBA <- function(mydata, r_grid, pollutant, min.bin) {
q_calc <- mydata %>%
select(date, lat, lon, hour.inc, {{ pollutant }}, sigma) %>%
group_by(date) %>%
nest() %>%
mutate(out = purrr::map(data, make_grid, r_grid, pollutant))
# combine all trajectory grids, add coordinates back in
output <- reduce(q_calc$out, `+`) %>%
cbind(r_grid) %>%
as_tibble(.) %>%
mutate(
SQTBA = Q_c / Q,
lat_rnd = round(lat),
lon_rnd = round(lon)
)
# number of trajectories in a grid square
grid_out <- mydata %>%
mutate(lat_rnd = round(lat), lon_rnd = round(lon)) %>%
group_by(lat_rnd, lon_rnd) %>%
tally() %>%
filter(n >= min.bin)
# add in number of trajectories in each grid square to act as a filter
mydata <- inner_join(output, grid_out, by = c("lat_rnd", "lon_rnd"))
mydata <- rename(mydata, xgrid = lon, ygrid = lat)
return(mydata)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.