timePlot: Plot time series
In openair: Tools for the Analysis of Air Pollution Data

timePlot

R Documentation

Plot time series

Description

Plot time series quickly, perhaps for multiple pollutants, grouped or in separate panels.

Usage

timePlot(
  mydata,
  pollutant = "nox",
  group = FALSE,
  stack = FALSE,
  normalise = NULL,
  avg.time = "default",
  data.thresh = 0,
  statistic = "mean",
  percentile = NA,
  date.pad = FALSE,
  type = "default",
  cols = "brewer1",
  plot.type = "l",
  key = TRUE,
  log = FALSE,
  windflow = NULL,
  smooth = FALSE,
  ci = TRUE,
  y.relation = "same",
  ref.x = NULL,
  ref.y = NULL,
  key.columns = 1,
  key.position = "bottom",
  name.pol = pollutant,
  date.breaks = 7,
  date.format = NULL,
  auto.text = TRUE,
  plot = TRUE,
  ...
)

Arguments

`mydata`	A data frame of time series. Must include a `date` field and at least one variable to plot.
`pollutant`	Name of variable to plot. Two or more pollutants can be plotted, in which case a form like `pollutant = c("nox", "co")` should be used.
`group`	If more than one pollutant is chosen, should they all be plotted on the same graph together? The default is `FALSE`, which means they are plotted in separate panels with their own scaled. If `TRUE` then they are plotted on the same plot with the same scale.
`stack`	If `TRUE` the time series will be stacked by year. This option can be useful if there are several years worth of data making it difficult to see much detail when plotted on a single plot.
`normalise`	Should variables be normalised? The default is is not to normalise the data. `normalise` can take two values, either “mean” or a string representing a date in UK format e.g. "1/1/1998" (in the format dd/mm/YYYY). If `normalise = "mean"` then each time series is divided by its mean value. If a date is chosen, then values at that date are set to 100 and the rest of the data scaled accordingly. Choosing a date (say at the beginning of a time series) is very useful for showing how trends diverge over time. Setting `group = TRUE` is often useful too to show all time series together in one panel.
`avg.time`	This defines the time period to average to. Can be “sec”, “min”, “hour”, “day”, “DSTday”, “week”, “month”, “quarter” or “year”. For much increased flexibility a number can precede these options followed by a space. For example, a timeAverage of 2 months would be `period = "2 month"`. See function `timeAverage` for further details on this.
`data.thresh`	The data capture threshold to use when aggregating the data using `avg.time`. A value of zero means that all available data will be used in a particular period regardless if of the number of values available. Conversely, a value of 100 will mean that all data will need to be present for the average to be calculated, else it is recorded as `NA`. Not used if `avg.time = "default"`.
`statistic`	The statistic to apply when aggregating the data; default is the mean. Can be one of “mean”, “max”, “min”, “median”, “frequency”, “sd”, “percentile”. Note that “sd” is the standard deviation and “frequency” is the number (frequency) of valid records in the period. “percentile” is the percentile level between 0-100, which can be set using the “percentile” option - see below. Not used if `avg.time = "default"`.
`percentile`	The percentile level in percent used when `statistic = "percentile"` and when aggregating the data with `avg.time`. More than one percentile level is allowed for `type = "default"` e.g. `percentile = c(50, 95)`. Not used if `avg.time = "default"`.
`date.pad`	Should missing data be padded-out? This is useful where a data frame consists of two or more "chunks" of data with time gaps between them. By setting `date.pad = TRUE` the time gaps between the chunks are shown properly, rather than with a line connecting each chunk. For irregular data, set to `FALSE`. Note, this should not be set for `type` other than `default`.
`type`	`type` determines how the data are split i.e. conditioned, and then plotted. The default is will produce a single plot using the entire data. Type can be one of the built-in types as detailed in `cutData` e.g. “season”, “year”, “weekday” and so on. For example, `type = "season"` will produce four plots — one for each season. It is also possible to choose `type` as another variable in the data frame. If that variable is numeric, then the data will be split into four quantiles (if possible) and labelled accordingly. If type is an existing character or factor variable, then those categories/levels will be used directly. This offers great flexibility for understanding the variation of different variables and how they depend on one another. Only one `type` is currently allowed in `timePlot`.
`cols`	Colours to be used for plotting. Options include “default”, “increment”, “heat”, “jet” and `RColorBrewer` colours — see the `openair` `openColours` function for more details. For user defined the user can supply a list of colour names recognised by R (type `colours()` to see the full list). An example would be `cols = c("yellow", "green", "blue")`
`plot.type`	The `lattice` plot type, which is a line (`plot.type = "l"`) by default. Another useful option is `plot.type = "h"`, which draws vertical lines.
`key`	Should a key be drawn? The default is `TRUE`.
`log`	Should the y-axis appear on a log scale? The default is `FALSE`. If `TRUE` a well-formatted log10 scale is used. This can be useful for plotting data for several different pollutants that exist on very different scales. It is therefore useful to use `log = TRUE` together with `group = TRUE`.
`windflow`	This option allows a scatter plot to show the wind speed/direction as an arrow. The option is a list e.g. `windflow = list(col = "grey", lwd = 2, scale = 0.1)`. This option requires wind speed (`ws`) and wind direction (`wd`) to be available. The maximum length of the arrow plotted is a fraction of the plot dimension with the longest arrow being `scale` of the plot x-y dimension. Note, if the plot size is adjusted manually by the user it should be re-plotted to ensure the correct wind angle. The list may contain other options to `panel.arrows` in the `lattice` package. Other useful options include `length`, which controls the length of the arrow head and `angle`, which controls the angle of the arrow head. This option works best where there are not too many data to ensure over-plotting does not become a problem.
`smooth`	Should a smooth line be applied to the data? The default is `FALSE`.
`ci`	If a smooth fit line is applied, then `ci` determines whether the 95 percent confidence intervals are shown.
`y.relation`	This determines how the y-axis scale is plotted. "same" ensures all panels use the same scale and "free" will use panel-specific scales. The latter is a useful setting when plotting data with very different values.
`ref.x`	See `ref.y` for details. In this case the correct date format should be used for a vertical line e.g. `ref.x = list(v = as.POSIXct("2000-06-15"), lty = 5)`.
`ref.y`	A list with details of the horizontal lines to be added representing reference line(s). For example, `ref.y = list(h = 50, lty = 5)` will add a dashed horizontal line at 50. Several lines can be plotted e.g. `ref.y = list(h = c(50, 100), lty = c(1, 5), col = c("green", "blue"))`. See `panel.abline` in the `lattice` package for more details on adding/controlling lines.
`key.columns`	Number of columns to be used in the key. With many pollutants a single column can make to key too wide. The user can thus choose to use several columns by setting `columns` to be less than the number of pollutants.
`key.position`	Location where the scale key is to plotted. Can include “top”, “bottom”, “right” and “left”.
`name.pol`	This option can be used to give alternative names for the variables plotted. Instead of taking the column headings as names, the user can supply replacements. For example, if a column had the name “nox” and the user wanted a different description, then setting `name.pol = "nox before change"` can be used. If more than one pollutant is plotted then use `c` e.g. `name.pol = c("nox here", "o3 there")`.
`date.breaks`	Number of major x-axis intervals to use. The function will try and choose a sensible number of dates/times as well as formatting the date/time appropriately to the range being considered. This does not always work as desired automatically. The user can therefore increase or decrease the number of intervals by adjusting the value of `date.breaks` up or down.
`date.format`	This option controls the date format on the x-axis. While `timePlot` generally sets the date format sensibly there can be some situations where the user wishes to have more control. For format types see `strptime`. For example, to format the date like “Jan-2012” set `date.format = "%b-%Y"`.
`auto.text`	Either `TRUE` (default) or `FALSE`. If `TRUE` titles and axis labels will automatically try and format pollutant names and units properly e.g. by subscripting the ‘2’ in NO2.
`plot`	Should a plot be produced? `FALSE` can be useful when analysing data to extract plot components and plotting them in other ways.
`...`	Other graphical parameters are passed onto `cutData` and `lattice:xyplot`. For example, `timePlot` passes the option `hemisphere = "southern"` on to `cutData` to provide southern (rather than default northern) hemisphere handling of `type = "season"`. Similarly, most common plotting parameters, such as `layout` for panel arrangement and `pch` and `cex` for plot symbol type and size and `lty` and `lwd` for line type and width, as passed to `xyplot`, although some maybe locally managed by `openair` on route, e.g. axis and title labelling options (such as `xlab`, `ylab`, `main`) are passed via `quickText` to handle routine formatting. See examples below.

Details

The timePlot is the basic time series plotting function in openair. Its purpose is to make it quick and easy to plot time series for pollutants and other variables. The other purpose is to plot potentially many variables together in as compact a way as possible.

The function is flexible enough to plot more than one variable at once. If more than one variable is chosen plots it can either show all variables on the same plot (with different line types) on the same scale, or (if group = FALSE) each variable in its own panels with its own scale.

The general preference is not to plot two variables on the same graph with two different y-scales. It can be misleading to do so and difficult with more than two variables. If there is in interest in plotting several variables together that have very different scales, then it can be useful to normalise the data first, which can be down be setting the normalise option.

The user has fine control over the choice of colours, line width and line types used. This is useful for example, to emphasise a particular variable with a specific line type/colour/width.

timePlot works very well with selectByDate(), which is used for selecting particular date ranges quickly and easily. See examples below.

By default plots are shown with a colour key at the bottom and in the case of multiple pollutants or sites, strips on the left of each plot. Sometimes this may be overkill and the user can opt to remove the key and/or the strip by setting key and/or strip to FALSE. One reason to do this is to maximise the plotting area and therefore the information shown.

Value

an openair object

Author(s)

David Carslaw

Examples



# basic use, single pollutant
timePlot(mydata, pollutant = "nox")

# two pollutants in separate panels
## Not run: timePlot(mydata, pollutant = c("nox", "no2"))

# two pollutants in the same panel with the same scale
## Not run: timePlot(mydata, pollutant = c("nox", "no2"), group = TRUE)

# alternative by normalising concentrations and plotting on the same
  scale
## Not run: 
timePlot(mydata, pollutant = c("nox", "co", "pm10", "so2"), group = TRUE, avg.time =
  "year", normalise = "1/1/1998", lwd = 3, lty = 1)

## End(Not run)

# examples of selecting by date

# plot for nox in 1999
## Not run: timePlot(selectByDate(mydata, year = 1999), pollutant = "nox")

# select specific date range for two pollutants
## Not run: 
timePlot(selectByDate(mydata, start = "6/8/2003", end = "13/8/2003"),
pollutant = c("no2", "o3"))

## End(Not run)

# choose different line styles etc
## Not run: timePlot(mydata, pollutant = c("nox", "no2"), lty = 1)

# choose different line styles etc
## Not run: 
timePlot(selectByDate(mydata, year = 2004, month = 6), pollutant =
c("nox", "no2"), lwd = c(1, 2), col = "black")

## End(Not run)

# different averaging times

#daily mean O3
## Not run: timePlot(mydata, pollutant = "o3", avg.time = "day")

# daily mean O3 ensuring each day has data capture of at least 75%
## Not run: timePlot(mydata, pollutant = "o3", avg.time = "day", data.thresh = 75)

# 2-week average of O3 concentrations
## Not run: timePlot(mydata, pollutant = "o3", avg.time = "2 week")

openair documentation built on May 29, 2024, 11:07 a.m.