##' Diurnal, day of the week and monthly variation
##'
##' Plots the diurnal, day of the week and monthly variation for
##' different variables, typically pollutant concentrations. Four
##' separate plots are produced.
##'
##' The variation of pollutant concentrations by hour of the day and
##' day of the week etc. can reveal many interesting features that
##' relate to source types and meteorology. For traffic sources, there
##' are often important differences in the way vehicles vary by
##' vehicles type e.g. less heavy vehicles at weekends.
##'
##' The \code{timeVariation} function makes it easy to see how
##' concentrations (and many other variable types) vary by hour of the
##' day and day of the week.
##'
##' The plots also show the 95\% confidence intervals in the mean. The
##' 95\% confidence intervals in the mean are calculated through
##' bootstrap simulations, which will provide more robust estimates of
##' the confidence intervals (particularly when there are relatively
##' few data).
##'
##' The function can handle multiple pollutants and uses the flexible
##' \code{type} option to provide separate panels for each 'type' ---
##' see \code{cutData} for more details. \code{timeVariation} can also
##' accept a \code{group} option which is useful if data are
##' stacked. This will work in a similar way to having multiple
##' pollutants in separate columns.
##'
##' The user can supply their own \code{ylim} e.g. \code{ylim = c(0,
##' 200)} that will be used for all plots. \code{ylim} can also be a
##' list of length four to control the y-limits on each individual
##' plot e.g. \code{ylim = list(c(-100,500), c(200, 300), c(-400,400),
##' c(50,70))}. These pairs correspond to the hour, weekday, month and
##' day-hour plots respectively.
##'
##' The option \code{difference} will calculate the difference in
##' means of two pollutants together with bootstrap estimates of the
##' 95\% confidence intervals in the difference in the mean. This works
##' in two ways: either two pollutants are supplied in separate
##' columns e.g. \code{pollutant = c("no2", "o3")}, or there are two
##' unique values of \code{group}. The difference is calculated as the
##' second pollutant minus the first and is labelled as
##' such. Considering differences in this way can provide many useful
##' insights and is particularly useful for model evaluation when
##' information is needed about where a model differs from
##' observations by many different time scales. The manual contains
##' various examples of using \code{difference = TRUE}.
##'
##' Note also that the \code{timeVariation} function works well on a
##' subset of data and in conjunction with other plots. For example, a
##' \code{\link{polarPlot}} may highlight an interesting feature for a
##' particular wind speed/direction range. By filtering for those
##' conditions \code{timeVariation} can help determine whether the
##' temporal variation of that feature differs from other features ---
##' and help with source identification.
##'
##' In addition, \code{timeVariation} will work well with other variables if
##' available. Examples include meteorological and traffic flow data.
##'
##' Depending on the choice of statistic, a subheading is added. Users
##' can control the text in the subheading through the use of
##' \code{sub} e.g. \code{sub = ""} will remove any subheading.
##'
##' @param mydata A data frame of hourly (or higher temporal resolution data).
##' Must include a \code{date} field and at least one variable to plot.
##' @param pollutant Name of variable to plot. Two or more pollutants can be
##' plotted, in which case a form like \code{pollutant = c("nox", "co")}
##' should be used.
##' @param local.tz Should the results be calculated in local time
##' that includes a treatment of daylight savings time (DST)? The
##' default is not to consider DST issues, provided the data were
##' imported without a DST offset. Emissions activity tends to occur
##' at local time e.g. rush hour is at 8 am every day. When the clocks
##' go forward in spring, the emissions are effectively released into
##' the atmosphere typically 1 hour earlier during the summertime
##' i.e. when DST applies. When plotting diurnal profiles, this has
##' the effect of \dQuote{smearing-out} the concentrations. Sometimes,
##' a useful approach is to express time as local time. This
##' correction tends to produce better-defined diurnal profiles of
##' concentration (or other variables) and allows a better comparison
##' to be made with emissions/activity data. If set to \code{FALSE}
##' then GMT is used. Examples of usage include \code{local.tz =
##' "Europe/London"}, \code{local.tz = "America/New_York"}. See
##' \code{cutData} and \code{import} for more details.
##' @param normalise Should variables be normalised? The default is
##' \code{FALSE}. If \code{TRUE} then the variable(s) are divided by their
##' mean values. This helps to compare the shape of the diurnal trends for
##' variables on very different scales.
##' @param xlab x-axis label; one for each sub-plot.
##' @param name.pol Names to be given to the pollutant(s). This is useful if
##' you want to give a fuller description of the variables, maybe also
##' including subscripts etc.
##' @param type \code{type} determines how the data are split
##' i.e. conditioned, and then plotted. The default is will produce a
##' single plot using the entire data. Type can be one of the built-in
##' types as detailed in \code{cutData} e.g. \dQuote{season},
##' \dQuote{year}, \dQuote{weekday} and so on. For example, \code{type
##' = "season"} will produce four plots --- one for each season.
##'
##' It is also possible to choose \code{type} as another variable in the data
##' frame. If that variable is numeric, then the data will be split into four
##' quantiles (if possible) and labelled accordingly. If type is an existing
##' character or factor variable, then those categories/levels will be used
##' directly. This offers great flexibility for understanding the variation
##' of different variables and how they depend on one another.
##'
##' Only one \code{type} is allowed in\code{timeVariation}.
##' @param group This sets the grouping variable to be used. For example, if a
##' data frame had a column \code{site} setting \code{group = "site"} will
##' plot all sites together in each panel. See examples below.
##' @param difference If two pollutants are chosen then setting
##' \code{difference = TRUE} will also plot the difference in means
##' between the two variables as \code{pollutant[2] -
##' pollutant[1]}. Bootstrap 95\% confidence intervals of the
##' difference in means are also calculated. A horizontal dashed line
##' is shown at y = 0.
##' @param statistic Can be \dQuote{mean} (default) or
##' \dQuote{median}. If the statistic is \sQuote{mean} then the mean
##' line and the 95\% confidence interval in the mean are plotted by
##' default. If the statistic is \sQuote{median} then the median line
##' is plotted together with the 5/95 and 25/75th quantiles are
##' plotted. Users can control the confidence intervals with
##' \code{conf.int}.
##' @param conf.int The confidence intervals to be plotted. If
##' \code{statistic = "mean"} then the confidence intervals in the
##' mean are plotted. If \code{statistic = "median"} then the
##' \code{conf.int} and \code{1 - conf.int} \emph{quantiles} are
##' plotted. \code{conf.int} can be of length 2, which is most useful
##' for showing quantiles. For example \code{conf.int = c(0.75, 0.99)}
##' will yield a plot showing the median, 25/75 and 5/95th quantiles.
##' @param B Number of bootstrap replicates to use. Can be useful to
##' reduce this value when there are a large number of observations
##' available to increase the speed of the calculations without
##' affecting the 95\% confidence interval calculations by much.
##' @param ci Should confidence intervals be shown? The default is \code{TRUE}.
##' Setting this to \code{FALSE} can be useful if multiple pollutants are
##' chosen where over-lapping confidence intervals can over complicate plots.
##' @param cols Colours to be used for plotting. Options include
##' \dQuote{default}, \dQuote{increment}, \dQuote{heat}, \dQuote{jet}
##' and \code{RColorBrewer} colours --- see the \code{openair}
##' \code{openColours} function for more details. For user defined the
##' user can supply a list of colour names recognised by R (type
##' \code{colours()} to see the full list). An example would be
##' \code{cols = c("yellow", "green", "blue")}
##' @param ref.y A list with details of the horizontal lines to be
##' added representing reference line(s). For example, \code{ref.y =
##' list(h = 50, lty = 5)} will add a dashed horizontal line at
##' 50. Several lines can be plotted e.g. \code{ref.y = list(h = c(50,
##' 100), lty = c(1, 5), col = c("green", "blue"))}. See
##' \code{panel.abline} in the \code{lattice} package for more details
##' on adding/controlling lines.
##' @param key By default \code{timeVariation} produces four plots on
##' one page. While it is useful to see these plots together, it is
##' sometimes necessary just to use one for a report. If \code{key} is
##' \code{TRUE}, a key is added to all plots allowing the extraction
##' of a single plot \emph{with} key. See below for an example.
##' @param key.columns Number of columns to be used in the key. With many
##' pollutants a single column can make to key too wide. The user can thus
##' choose to use several columns by setting \code{columns} to be less than
##' the number of pollutants.
##' @param start.day What day of the week should the plots start on?
##' The user can change the start day by supplying an integer between
##' 0 and 6. Sunday = 0, Monday = 1, \ldots For example to start the
##' weekday plots on a Saturday, choose \code{start.day = 6}.
##' @param auto.text Either \code{TRUE} (default) or \code{FALSE}. If
##' \code{TRUE} titles and axis labels will automatically try and format
##' pollutant names and units properly e.g. by subscripting the \sQuote{2} in NO2.
##' @param alpha The alpha transparency used for plotting confidence intervals.
##' 0 is fully transparent and 1 is opaque. The default is 0.4
##' @param ... Other graphical parameters passed onto \code{lattice:xyplot}
##' and \code{cutData}. For example, in the case of \code{cutData} the option
##' \code{hemisphere = "southern"}.
##'
##' @import ggplot2
##' @importFrom lubridate ymd
##' @export
##' @return As well as generating the plot itself, \code{timeVariation} also
##' returns an object of class ``openair''. The object includes three main
##' components: \code{call}, the command used to generate the plot;
##' \code{data}, the data used to make the four components of the plot (or
##' subplots); and \code{plot}, the associated subplots. If retained, e.g.
##' using \code{output <- timeVariation(mydata, "nox")}, this output can be
##' used to recover the data, reproduce or rework the original plot or
##' undertake further analysis.
##'
##' An openair output can be manipulated using a number of generic operations,
##' including \code{print}, \code{plot} and \code{summary}.
##'
##' The four components of timeVariation are: \code{day.hour}, \code{hour},
##' \code{day} and \code{month}. Associated data.frames can be extracted
##' directly using the \code{subset} option, e.g. as in \code{plot(object,
##' subset = "day.hour")}, \code{summary(output, subset = "hour")}, etc, for
##' \code{output <- timeVariation(mydata, "nox")}
##' @author David Carslaw
##' @seealso \code{\link{polarPlot}}, \code{\link{linearRelation}}
##' @keywords methods
##' @examples
##'
##'
##' # basic use
##' timeVariation(mydata, pollutant = "nox")
##'
##' # for a subset of conditions
##' \dontrun{
##' timeVariation(subset(mydata, ws > 3 & wd > 100 & wd < 270),
##' pollutant = "pm10", ylab = "pm10 (ug/m3)")
##' }
##'
##' # multiple pollutants with concentrations normalised
##' \dontrun{timeVariation(mydata, pollutant = c("nox", "co"), normalise = TRUE)}
##'
##' # show BST/GMT variation (see ?cutData for more details)
##' # the NOx plot shows the profiles are very similar when expressed in
##' # local time, showing that the profile is dominated by a local source
##' # that varies by local time and not by GMT i.e. road vehicle emissions
##'
##' \dontrun{timeVariation(mydata, pollutant = "nox", type = "dst", local.tz = "Europe/London")}
##'
##' ## In this case it is better to group the results for clarity:
##' \dontrun{timeVariation(mydata, pollutant = "nox", group = "dst", local.tz = "Europe/London")}
##'
##' # By contrast, a variable such as wind speed shows a clear shift when
##' # expressed in local time. These two plots can help show whether the
##' # variation is dominated by man-made influences or natural processes
##'
##' \dontrun{timeVariation(mydata, pollutant = "ws", group = "dst", local.tz = "Europe/London")}
##'
##' ## It is also possible to plot several variables and set type. For
##' ## example, consider the NOx and NO2 split by levels of O3:
##'
##' \dontrun{timeVariation(mydata, pollutant = c("nox", "no2"), type = "o3", normalise = TRUE)}
##'
##' ## difference in concentrations
##' \dontrun{timeVariation(mydata, poll= c("pm25", "pm10"), difference = TRUE)}
##'
##' # It is also useful to consider how concentrations vary by
##' # considering two different periods e.g. in intervention
##' # analysis. In the following plot NO2 has clearly increased but much
##' # less so at weekends - perhaps suggesting vehicles other than cars
##' # are important because flows of cars are approximately invariant by
##' # day of the week
##'
##' \dontrun{
##' mydata <- splitByDate(mydata, dates= "1/1/2003", labels = c("before Jan. 2003", "After Jan. 2003"))
##' timeVariation(mydata, pollutant = "no2", group = "split.by", difference = TRUE)
##' }
##'
##' ## sub plots can be extracted from the openair object
##' \dontrun{
##' myplot <- timeVariation(mydata, pollutant = "no2")
##' plot(myplot, subset = "day.hour") # top weekday and plot
##' }
##'
##' ## individual plots
##' ## plot(myplot, subset="day.hour") for the weekday and hours subplot (top)
##' ## plot(myplot, subset="hour") for the diurnal plot
##' ## plot(myplot, subset="day") for the weekday plot
##' ## plot(myplot, subset="month") for the monthly plot
##'
##' ## numerical results (mean, lower/upper uncertainties)
##' ## results(myplot, subset = "day.hour") # the weekday and hour data set
##' ## summary(myplot, subset = "hour") #summary of hour data set
##' ## head(myplot, subset = "day") #head/top of day data set
##' ## tail(myplot, subset = "month") #tail/top of month data set
##'
##' ## plot quantiles and median
##' \dontrun{
##' timeVariation(mydata, stati="median", poll="pm10", col = "firebrick")
##'
##' ## with different intervals
##' timeVariation(mydata, stati="median", poll="pm10", conf.int = c(0.75, 0.99),
##' col = "firebrick")
##' }
##'
timeVariation <- function(mydata, pollutant = "nox", local.tz = NULL,
normalise = FALSE, xlab = c("hour", "hour", "month",
"weekday"), name.pol = pollutant,
type = "default", group = NULL, difference = FALSE,
statistic = "mean", conf.int = 0.95, B = 100, ci = TRUE, cols = "hue",
ref.y = NULL, key = NULL, key.columns = 1, start.day = 1,
auto.text = TRUE, alpha = 0.4, ...) {
## get rid of R check annoyances
variable = NULL
## Args setup
Args <- list(...)
## label controls
## xlab handled in formals and code because unique
Args$ylab <- if ("ylab" %in% names(Args))
quickText(Args$ylab, auto.text) else
quickText(paste(pollutant, collapse=", "), auto.text)
Args$main <- if ("main" %in% names(Args))
quickText(Args$main, auto.text) else quickText("", auto.text)
if (statistic == "median" && missing(conf.int)) conf.int <- c(0.75, 0.95)
## sub heading stat info
if (statistic == "mean") {
sub.text <- paste("mean and ", 100 * conf.int[1], "% confidence interval in mean",
sep = "")
} else {
if (length(conf.int) == 1L) {
sub.text <- paste("median and ", 100 * (1 - conf.int[1]), "/", 100 * conf.int[1],
"th quantiles", sep = "")
} else {
sub.text <- paste("median, ", 100 * (1 - conf.int[1]), "/", 100 * conf.int[1],
" and ", 100 * (1 - conf.int[2]), "/", 100 * conf.int[2],
"th quantiles", sep = "")
}
}
Args$sub <- if ("sub" %in% names(Args))
quickText(Args$sub, auto.text) else quickText(sub.text, auto.text)
Args$lwd <- if ("lwd" %in% names(Args)) Args$lwd else 1.5
ylim.handler <- if ("ylim" %in% names(Args))
FALSE else TRUE
## if user supplies separate ylims for each plot
ylimList <- FALSE
if ("ylim" %in% names(Args)) {
if (is.list(Args$ylim)) {
if (length(Args$ylim) != 4) stop("ylim should be a list of 4")
ylim.list <- Args$ylim
ylimList <- TRUE
}
}
vars <- c("date", pollutant)
# various check to make sure all the data are available ######################################
if (!missing(group) & length(pollutant) > 1) {
stop("Can only have one pollutant with a grouping variable, or several pollutants and no grouping variable.")}
## only one type for now
if (length(type) > 1) stop("Can only have one type for timeVariation.")
if (type %in% pollutant) stop("Cannot have type the same as a pollutant name.")
if (!missing(group)) {
if (group %in% pollutant) stop("Cannot have group the same as a pollutant name.")
}
## if differences between two pollutants are calculated
if (difference) {
if (missing(group)) {
if (length(pollutant) != 2)
stop("Need to specify two pollutants to calculate their difference.")
}
if (!missing(group)) {
if (length(unique(mydata[ , group])) != 2)
stop("Need to specify two pollutants to calculate their difference.")
}
}
## statistic check
if (!statistic %in% c("mean", "median")) {
statistic <- "mean"
warning("statistic should be 'mean' or 'median', setting it to 'mean'.")
}
## check to see if type = "variable" (word used in code, so change)
if (type == "variable") {
mydata <- rename(mydata, c(variable = "tempVar"))
type <- "tempVar"
}
## if group is present, need to add that list of variables
if (!missing(group)){
if (group %in% dateTypes) {
vars <- unique(c(vars, "date"))
} else {
vars <- unique(c(vars, group))
}
}
## data checks
mydata <- checkPrep(mydata, vars, type, remove.calm = FALSE)
if (!missing(group)) mydata <- cutData(mydata, group, local.tz = local.tz, ...)
mydata <- cutData(mydata, type, local.tz = local.tz, ...)
# if using a type, make sure there are no NA
if (type != "default") {
id <- which(is.na(mydata[[type]]))
if (length(id) > 0) mydata <- mydata[-id, ]
}
## put in local time if needed
if (!is.null(local.tz)) attr(mydata$date, "tzone") <- local.tz
## title for overall and individual plots
overall.main <- Args$main
Args$main <- ""
overall.sub <- Args$sub
Args$sub <- ""
## labels for pollutants, can be user-defined, special handling when difference = TRUE
poll.orig <- pollutant
if (difference && missing(group)) pollutant <- c(pollutant, paste(pollutant[2], "-", pollutant[1]))
if (missing(name.pol)) mylab <- sapply(seq_along(pollutant), function(x)
quickText(pollutant[x], auto.text))
if (!missing(name.pol)) mylab <- sapply(seq_along(name.pol), function(x)
quickText(name.pol[x], auto.text))
if (missing(group)) {
mydata <- gather(mydata, key = variable, value = value, one_of(poll.orig))
mydata$variable <- factor(mydata$variable, levels = pollutant) ## drop unused factor levels
} else {
## group needs to be 'variable' and pollutant 'value'
id <- which(names(mydata) == poll.orig)
names(mydata)[id] <- "value"
id <- which(names(mydata) == group)
names(mydata)[id] <- "variable"
mydata$variable <- factor(mydata$variable) ## drop unused factor levels
the.names <- levels(mydata[["variable"]])
if (difference)
the.names <- c(the.names, paste(levels(mydata$variable)[2], "-",
levels(mydata$variable)[1]))
mylab <- sapply(the.names, function(x) quickText(x, auto.text))
}
## function to normalise
divide.by.mean <- function(x) {
Mean <- mean(x$Mean, na.rm = TRUE)
x$Mean <- x$Mean / Mean
x$Lower <- x$Lower / Mean
x$Upper <- x$Upper / Mean
x
}
if (normalise) Args$ylab <- "normalised level"
## get days in right order
days <- format(ISOdate(2000, 1, 2:8), "%A")
days.abb <- format(ISOdate(2000, 1, 2:8), "%a")
if (start.day < 0 || start.day > 6) stop ("start.day must be between 0 and 6.")
if (start.day > 0) {
day.ord <- c(days[(1 + start.day):7], days[1:(1 + start.day - 1)])
day.ord.abb <- c(days.abb[(1 + start.day):7], days.abb[1:(1 + start.day - 1)])
} else {
day.ord <- days
day.ord.abb <- days.abb
}
## calculate temporal components
mydata <- within(mydata, {
wkday <- format(date, "%A")
wkday <- ordered(wkday, levels = day.ord)
hour <- as.numeric(format(date, "%H"))
mnth <- as.numeric(format(date, "%m"))}
)
## y range taking account of expanded uncertainties
rng <- function(x) {
## if no CI information, just return
if (all(is.na(x[, c("Lower", "Upper")]))) {
lims <- NULL
return(lims)
}
if (ci) {
lims <- range(c(x$Lower, x$Upper), na.rm = TRUE)
inc <- 0.04 * abs(lims[2] - lims[1])
lims <- c(lims[1] - inc, lims[2] + inc)
} else {
lims <- range(c(x$Mean, x$Mean), na.rm = TRUE)
inc <- 0.04 * abs(lims[2] - lims[1])
lims <- c(lims[1] - inc, lims[2] + inc)
if (diff(lims) == 0) lims <- NULL
}
lims
}
npol <- length(levels(mydata$variable)) ## number of pollutants
if (difference) {
npol <- 3 ## 3 pollutants if difference considered
if (missing(group)) poll1 <- pollutant[1]; poll2 <- pollutant[2]
if (!missing(group)) poll1 <- levels(mydata$variable)[1]; poll2 <- levels(mydata$variable)[2]
}
## number of columns for key
if (missing(key.columns)) key.columns <- npol
myColors <- openColours(cols, npol)
## for individual plot keys - useful if only one of the plots is extracted after printing
if (!is.null(key)) {
key <- list(rectangles = list(col = myColors[1:npol], border = NA), title = "",
text = list(lab = mylab), space = "bottom", columns = key.columns,
lines.title = 1)
Args$main <- overall.main
}
# hour calculations ---------------------------------------------
if (difference) {
data.hour <- errorDiff(mydata, vars = "hour", type = type, poll1 = poll1,
poll2 = poll2, B = B, conf.int = conf.int)
} else {
data.hour <- plyr::ldply(conf.int, proc, mydata, vars = "hour",
pollutant, type, B = B,
statistic = statistic)
}
if (normalise) data.hour <- group_by(data.hour, variable) %>%
do(divide.by.mean(.))
if (is.null(xlab[2]) | is.na(xlab[2])) xlab[2] <- "hour"
temp <- paste(type, collapse = "+")
myform <- formula(paste("Mean ~ hour | ", temp, sep = ""))
## ylim hander
if (ylim.handler)
Args$ylim <- rng(data.hour)
## user supplied separate ylim
if (ylimList)
Args$ylim <- ylim.list[[1]]
## plot
pltHr <- ggplot(data.hour, aes(x = hour, y = Mean, ymin = Lower, ymax = Upper,
col = variable, fill = variable))
# plot uncertainty intervals?
if (ci) pltHr <- pltHr +
geom_ribbon(alpha = alpha, colour = NA)
pltHr <- pltHr +
geom_line(size = Args$lwd) +
scale_x_continuous(breaks = c(0, 6, 12, 18, 23),
expand = c(0.02, 0),
minor_breaks = seq(0, 23, 3)) +
scale_colour_manual(values = openColours(cols, npol),
breaks = levels(data.hour$variable),
labels = mylab) +
scale_fill_manual(values = openColours(cols, npol),
breaks = levels(data.hour$variable),
labels = mylab) +
theme(legend.position = "none") +
ylab(Args$ylab) +
xlab(xlab[1]) +
ylim(Args$ylim)
# weekday -------------------------------------------------------
if (difference) {
data.weekday <- errorDiff(mydata, vars = "wkday", type = type, poll1 = poll1,
poll2 = poll2, B = B, conf.int = conf.int)
} else {
data.weekday <- plyr::ldply(conf.int, proc, mydata, vars = "wkday",
pollutant, type, B = B,
statistic = statistic)
}
if (normalise) data.weekday <- group_by(data.weekday, variable) %>%
do(divide.by.mean(.))
data.weekday$wkday <- ordered(data.weekday$wkday, levels = day.ord)
data.weekday$wkday <- as.numeric(as.factor(data.weekday$wkday))
if (is.null(xlab[4]) | is.na(xlab[4])) xlab[4] <- "weekday"
temp <- paste(type, collapse = "+")
myform <- formula(paste("Mean ~ wkday | ", temp, sep = ""))
## ylim hander
if (ylim.handler)
Args$ylim <- rng(data.weekday)
## user supplied separate ylim
if (ylimList)
Args$ylim <- ylim.list[[2]]
## plot
pltWkday <- ggplot(data.weekday, aes(x = wkday, y = Mean,
xmin = wkday - 0.4, xmax = wkday + 0.4,
ymin = Lower, ymax = Upper,
col = variable, fill = variable))
if (ci) pltWkday <- pltWkday +
geom_rect(alpha = alpha, colour = NA)
pltWkday <- pltWkday +
geom_line(size = Args$lwd) +
scale_x_continuous(breaks = 1:7,
expand = c(0.02, 0),
minor_breaks = seq(0, 7, 1),
labels = day.ord.abb) +
scale_colour_manual(values = openColours(cols, npol),
breaks = levels(data.hour$variable),
labels = mylab) +
scale_fill_manual(values = openColours(cols, npol),
breaks = levels(data.hour$variable),
labels = mylab) +
theme(legend.position = "none") +
ylab(Args$ylab) +
xlab(xlab[4]) +
ylim(Args$ylim)
## month ############################################################################
if (difference) {
data.month <- errorDiff(mydata, vars = "mnth", type = type, poll1 = poll1,
poll2 = poll2, B = B, conf.int = conf.int)
} else {
data.month <- plyr::ldply(conf.int, proc, mydata, vars = "mnth",
pollutant, type, B = B,
statistic = statistic)
}
if (normalise) data.month <- group_by(data.month, variable) %>%
do(divide.by.mean(.))
if (is.null(xlab[3]) | is.na(xlab[3])) xlab[3] <- "month"
temp <- paste(type, collapse = "+")
myform <- formula(paste("Mean ~ mnth | ", temp, sep = ""))
## ylim hander
if (ylim.handler)
Args$ylim <- rng(data.month)
## user supplied separate ylim
if (ylimList)
Args$ylim <- ylim.list[[3]]
## plot
pltMnth <- ggplot(data.month, aes(x = mnth, y = Mean,
xmin = mnth - 0.4, xmax = mnth + 0.4,
ymin = Lower, ymax = Upper,
col = variable, fill = variable))
if (ci) pltMnth <- pltMnth +
geom_rect(alpha = alpha, colour = NA)
pltMnth <- pltMnth +
geom_line(size = Args$lwd) +
scale_x_continuous(breaks = 1:12,
expand = c(0.02, 0),
minor_breaks = seq(0, 12, 1),
labels = substr(format(seq(ymd("2000-01-01"),
ymd("2000-12-31"), "month"),
"%B"), 1, 1)) +
scale_colour_manual(values = openColours(cols, npol),
breaks = levels(data.hour$variable),
labels = mylab) +
scale_fill_manual(values = openColours(cols, npol),
breaks = levels(data.hour$variable),
labels = mylab) +
theme(legend.position = "none") +
ylab(Args$ylab) +
xlab(xlab[3]) +
ylim(Args$ylim)
# day and hour --------------------------------------------------
if (difference) {
data.day.hour <- errorDiff(mydata, vars = "day.hour", type = type, poll1 = poll1,
poll2 = poll2, B = B, conf.int = conf.int)
} else {
data.day.hour <- plyr::ldply(conf.int, proc, mydata, vars = "day.hour", pollutant,
type, B = B, statistic = statistic)
}
if (normalise) data.day.hour <- group_by(data.day.hour, variable) %>%
do(divide.by.mean(.))
ids <- which(is.na(data.day.hour$Lower)) ## missing Lower ci, set to mean
data.day.hour$Lower[ids] <- data.day.hour$Mean[ids]
ids <- which(is.na(data.day.hour$Upper)) ## missing Upper ci, set to mean
data.day.hour$Upper[ids] <- data.day.hour$Mean[ids]
if (is.null(xlab[1]) | is.na(xlab[1])) xlab[1] <- "hour"
## proper names of labelling #####################################################################
if (type != "default") {
stripName <- sapply(levels(factor(data.day.hour[[type]])), function(x) quickText(x, auto.text))
}
temp <- paste(type, collapse = "+")
if (type == "default") {
myform <- formula("Mean ~ hour | wkday")
} else {
myform <- formula(paste("Mean ~ hour | wkday *", temp, sep = ""))
}
## ylim hander
if(ylim.handler)
Args$ylim <- rng(data.day.hour)
## user supplied separate ylim
if (ylimList)
Args$ylim <- ylim.list[[4]]
## plot
pltDayHr <- ggplot(data.day.hour, aes(x = hour, y = Mean, ymin = Lower, ymax = Upper,
col = variable, fill = variable))
if (ci) pltDayHr <- pltDayHr +
geom_ribbon(alpha = alpha, colour = NA)
pltDayHr <- pltDayHr +
geom_line(size = Args$lwd) +
scale_x_continuous(breaks = c(0, 6, 12, 18, 23),
expand = c(0.02, 0),
minor_breaks = seq(0, 23, 3)) +
scale_colour_manual(name = NULL,
values = openColours(cols, npol),
breaks = levels(data.hour$variable),
labels = mylab) +
scale_fill_manual(name = NULL,
values = openColours(cols, npol),
breaks = levels(data.hour$variable),
labels = mylab) +
theme(legend.position = "bottom", legend.text.align = 0) +
guides(col = guide_legend(ncol = npol)) +
facet_grid(~ wkday) +
ylab(Args$ylab) +
xlab(xlab[2]) +
ylim(Args$ylim)
subsets = c("pltDayHr", "pltHr", "pltwday", "pltmnth")
# for facetting
if (type != "default") {
pltHr <- pltHr + facet_grid(reformulate(type), labeller = label_parsed(stripName))
pltMnth <- pltMnth + facet_grid(reformulate(type))
pltWkday <- pltWkday + facet_grid(reformulate(type))
pltDayHr <- pltDayHr + facet_grid(paste(type, "~ wkday"))
}
# plot all 4 plots
layout <- matrix(c(1, 1, 1, 2, 3, 4), nrow = 2, byrow = TRUE)
multiplot(plotlist = list(pltDayHr, pltHr, pltMnth, pltWkday), layout = layout)
# return individual plots with legends
output <- list(plot = list(pltDayHr,
pltHr + theme(legend.position = "right", legend.text.align = 0),
pltWkday + theme(legend.position = "right", legend.text.align = 0),
pltMnth + theme(legend.position = "right", legend.text.align = 0),
subsets = subsets),
data = list(data.day.hour, data.hour, data.weekday,
data.month, subsets = subsets),
call = match.call())
names(output$data)[1:4] <- subsets
names(output$plot)[1:4] <- subsets
class(output) <- "openair"
#
invisible(output)
}
proc <- function(conf.int = conf.int, mydata, vars = "day.hour",
pollutant, type, B = B,
statistic = statistic) {
## get rid of R check annoyances
variable = value = NULL
if (statistic == "mean") {
stat <- bootMean
} else {
stat <- median.hilow
}
summary.values <- function(conf.int = conf.int, mydata, vars = vars, FUN, type = type, B = B,
statistic = statistic) {
if (vars == "hour") myform <- formula(paste("value ~ variable + hour +", type))
if (vars == "day.hour") myform <- formula(paste("value ~ variable + wkday + hour +", type))
if (vars == "wkday") myform <- formula(paste("value ~ variable + wkday +", type))
if (vars == "mnth") myform <- formula(paste("value ~ variable + mnth +", type))
mydata <- aggregate(myform, data = mydata, FUN, B = B, statistic = statistic,
conf.int = conf.int)
mydata
}
## function to calculate statistics dealing with wd properly
if (any(!pollutant %in% "wd")) {
data1 <- subset(mydata, variable != "wd")
data1 <- summary.values(data1, vars, stat, type, B = B, statistic = statistic,
conf.int = conf.int)
data1 <- data.frame(subset(data1, select = -value), data1$value)
}
if ("wd" %in% pollutant) {
if (length(pollutant) > 1) mydata <- subset(mydata, variable == "wd")
data2 <- summary.values(conf.int, mydata, vars, wd.smean.normal, type,
B = B, statistic = statistic)
data2 <- data.frame(subset(data2, select = -value), data2$value)
}
if (length(pollutant) > 1 & "wd" %in% pollutant) data2 <- bind_rows(data1, data2)
if (!"wd" %in% pollutant) data2 <- data1
if (length(pollutant) == 1 & "wd" %in% pollutant) data2 <- data2
data2$ci <- conf.int
data2
}
wd.smean.normal <- function(wd, B = B, statistic, conf.int) {
## function to calculate mean and 95% CI of the mean for wd
u <- mean(sin(pi * wd / 180), na.rm = TRUE)
v <- mean(cos(pi * wd / 180), na.rm = TRUE)
Mean <- as.vector(atan2(u, v) * 360 / 2 / pi)
ids <- which(Mean < 0) ## ids where wd < 0
Mean[ids] <- Mean[ids] + 360
## to calculate SD and conf int, need to know how much the angle changes from one point
## to the next. Also cannot be more than 180 degrees. Example change from 350 to 10 is not
## 340 but 20.
wd.diff <- diff(wd)
ids <- which(wd.diff < 0)
wd.diff[ids] <- wd.diff[ids] + 360
ids <- which(wd.diff > 180)
wd.diff[ids] <- abs(wd.diff[ids] - 360)
if (statistic == "mean") {
intervals <- bootMean(wd.diff, B = B, conf.int)
} else {
intervals <- median.hilow(wd.diff, conf.int)
}
Lower <- intervals[2]
names(Lower) <- NULL
Upper <- intervals[3]
names(Upper) <- NULL
diff.wd <- (Upper - Lower) / 2
c(Mean = Mean, Lower = Mean - diff.wd, Upper = Mean + diff.wd)
}
errorDiff <- function(mydata, vars = "day.hour", poll1, poll2, type, B = B,
conf.int = conf.int) {
## bootstrap mean difference confidence intervals
## rearrange data
mydata <- spread(mydata, key = variable, value = value)
if (vars == "hour") splits <- c("hour", type)
if (vars == "day.hour") splits <- c("hour", "wkday", type)
if (vars == "wkday") splits <- c("wkday", type)
if (vars == "mnth") splits <- c("mnth", type)
## warnings from dplyr seem harmless FIXME
res <- suppressWarnings(group_by_(mydata, .dots = splits) %>%
do(bootMeanDiff(., x = poll1, y = poll2, B = B)))
# make sure we keep the order correct
res$variable <- ordered(res$variable, levels = res$variable[1:3])
res$ci <- conf.int[1]
res
}
## function to calculate median and lower/upper quantiles
median.hilow <- function (x, conf.int = 0.95, na.rm = TRUE, ...) {
quant <- quantile(x, probs = c(0.5, (1 - conf.int) / 2,
(1 + conf.int) / 2), na.rm = na.rm)
names(quant) <- c("Mean", "Lower", "Upper")
quant
}
## make poly function ########################################
mkpoly <- function(dat, x = "hour", y = "Mean", group.number, myColors, alpha) {
len <- length(unique(dat$ci)) ## number of confidence intervals to consider
ci <- sort(unique(dat$ci))
fac <- 2
if (len == 1L) id1 <- which(dat$ci == ci[1]) ## ids of higher band
if (len == 2L) {
id1 <- which(dat$ci == ci[2])
id2 <- which(dat$ci == ci[1])
fac <- 1
}
poly.na(dat[[x]][id1], dat$Lower[id1], dat[[x]][id1], dat$Upper[id1],
group.number, myColors, alpha = fac * alpha / 2)
if (len == 2L) poly.na(dat[[x]][id2], dat$Lower[id2], dat[[x]][id2], dat$Upper[id2],
group.number, myColors, alpha = alpha)
}
mkrect <- function(dat, x = "wkday", y = "Mean", group.number, myColors, alpha) {
len <- length(unique(dat$ci)) ## number of confidence intervals to consider
ci <- sort(unique(dat$ci))
fac <- 2
if (len == 1L) id1 <- which(dat$ci == ci[1]) ## ids of higher band
if (len == 2L) {
id1 <- which(dat$ci == ci[2])
id2 <- which(dat$ci == ci[1])
fac <- 1
}
panel.rect(dat[[x]][id1] - 0.15 * fac, dat$Lower[id1], dat[[x]][id1] + 0.15 * fac,
dat$Upper[id1], fill = myColors[group.number],
border = NA, alpha = fac * alpha / 2)
if (len == 2L) panel.rect(dat[[x]][id2] - 0.3, dat$Lower[id2], dat[[x]][id2] + 0.3,
dat$Upper[id2], fill = myColors[group.number],
border = NA, alpha = alpha)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.