| scatterPlot | R Documentation |
Scatter plots with conditioning and three main approaches: conventional scatterPlot, hexagonal binning and kernel density estimates. The former also has options for fitting smooth fits and linear models with uncertainties shown.
scatterPlot(
mydata,
x = "nox",
y = "no2",
z = NA,
method = "scatter",
group = NA,
avg.time = "default",
data.thresh = 0,
statistic = "mean",
percentile = NA,
type = "default",
smooth = FALSE,
spline = FALSE,
linear = FALSE,
ci = TRUE,
mod.line = FALSE,
cols = "hue",
plot.type = "p",
key.title = group,
key.columns = 1,
key.position = "right",
strip.position = "top",
log.x = FALSE,
log.y = FALSE,
x.inc = NULL,
y.inc = NULL,
limits = NULL,
windflow = NULL,
y.relation = "same",
x.relation = "same",
ref.x = NULL,
ref.y = NULL,
k = NA,
dist = 0.02,
auto.text = TRUE,
plot = TRUE,
key = NULL,
...
)
mydata |
A data frame containing at least two numeric variables to plot. |
x |
Name of the x-variable to plot. Note that x can be a date field or a
factor. For example, |
y |
Name of the numeric y-variable to plot. |
z |
Name of the numeric z-variable to plot for |
method |
Methods include “scatter” (conventional scatter plot),
“hexbin” (hexagonal binning using the |
group |
The grouping variable to use, if any. Setting this to a variable in the data frame has the effect of plotting several series in the same panel using different symbols/colours etc. If set to a variable that is a character or factor, those categories or factor levels will be used directly. If set to a numeric variable, it will split that variable in to quantiles. |
avg.time |
This defines the time period to average to. Can be Note that |
data.thresh |
The data capture threshold to use (%). A value of zero
means that all available data will be used in a particular period
regardless if of the number of values available. Conversely, a value of 100
will mean that all data will need to be present for the average to be
calculated, else it is recorded as |
statistic |
The statistic to apply when aggregating the data; default is
the mean. Can be one of |
percentile |
The percentile level used when |
type |
Character string(s) defining how data should be split/conditioned
before plotting.
Most |
smooth |
A smooth line is fitted to the data if |
spline |
A smooth spline is fitted to the data if |
linear |
A linear model is fitted to the data if |
ci |
Should the confidence intervals for the smooth/linear fit be shown? |
mod.line |
If |
cols |
Colours to use for plotting. Can be a pre-set palette (e.g.,
|
plot.type |
Type of plot: “p” (points, default), “l” (lines) or “b” (both points and lines). |
key.title |
Used to set the title of the legend. The legend title is
passed to |
key.columns |
Number of columns to be used in a categorical legend. With
many categories a single column can make to key too wide. The user can thus
choose to use several columns by setting |
key.position |
Location where the legend is to be placed. Allowed
arguments include |
strip.position |
Location where the facet 'strips' are located when
using |
log.x, log.y |
Should the x-axis/y-axis appear on a log scale? The
default is |
x.inc, y.inc |
The x/y-interval to be used for binning data when |
limits |
For |
windflow |
If |
x.relation, y.relation |
This determines how the x- and y-axis scales are
plotted. |
ref.x, ref.y |
A list with details of the horizontal or vertical lines to
be added representing reference line(s). For example, |
k |
Smoothing parameter supplied to |
dist |
When plotting smooth surfaces ( |
auto.text |
Either |
plot |
When |
key |
Deprecated; please use |
... |
Addition options are passed on to
|
scatterPlot() is the basic function for plotting scatter plots in flexible
ways in openair. It is flexible enough to consider lots of conditioning
variables and takes care of fitting smooth or linear relationships to the
data.
There are four main ways of plotting the relationship between two variables,
which are set using the method option. The default "scatter" will plot a
conventional scatterPlot. In cases where there are lots of data and
over-plotting becomes a problem, then method = "hexbin" or method = "density" can be useful. The former requires the hexbin package to be
installed.
There is also a method = "level" which will bin the x and y data
according to the intervals set for x.inc and y.inc and colour the bins
according to levels of a third variable, z. Sometimes however, a far better
understanding of the relationship between three variables (x, y and z)
is gained by fitting a smooth surface through the data. See examples below.
A smooth fit is shown if smooth = TRUE which can help show the overall form
of the data e.g. whether the relationship appears to be linear or not. Also,
a linear fit can be shown using linear = TRUE as an option.
The user has fine control over the choice of colours and symbol type used.
Another way of reducing the number of points used in the plots which can
sometimes be useful is to aggregate the data. For example, hourly data can be
aggregated to daily data. See timePlot() for examples here.
an openair object
David Carslaw
timePlot() and timeAverage() for details on selecting averaging
times and other statistics in a flexible way
# load openair data if not loaded already
dat2004 <- selectByDate(mydata, year = 2004)
# basic use, single pollutant
scatterPlot(dat2004, x = "nox", y = "no2")
## Not run:
# scatterPlot by year
scatterPlot(mydata, x = "nox", y = "no2", type = "year")
## End(Not run)
# scatterPlot by day of the week, removing key at bottom
scatterPlot(dat2004,
x = "nox", y = "no2", type = "weekday", key =
FALSE
)
# example of the use of continuous where colour is used to show
# different levels of a third (numeric) variable
# plot daily averages and choose a filled plot symbol (pch = 16)
# select only 2004
## Not run:
scatterPlot(dat2004, x = "nox", y = "no2", z = "co", avg.time = "day", pch = 16)
# show linear fit, by year
scatterPlot(mydata,
x = "nox", y = "no2", type = "year", smooth =
FALSE, linear = TRUE
)
# do the same, but for daily means...
scatterPlot(mydata,
x = "nox", y = "no2", type = "year", smooth =
FALSE, linear = TRUE, avg.time = "day"
)
# log scales
scatterPlot(mydata,
x = "nox", y = "no2", type = "year", smooth =
FALSE, linear = TRUE, avg.time = "day", log.x = TRUE, log.y = TRUE
)
# also works with the x-axis in date format (alternative to timePlot)
scatterPlot(mydata,
x = "date", y = "no2", avg.time = "month",
key = FALSE
)
## multiple types and grouping variable and continuous colour scale
scatterPlot(mydata, x = "nox", y = "no2", z = "o3", type = c("season", "weekend"))
# use hexagonal binning
scatterPlot(mydata, x = "nox", y = "no2", method = "hexbin")
# scatterPlot by year
scatterPlot(mydata,
x = "nox", y = "no2", type = "year", method =
"hexbin"
)
## bin data and plot it - can see how for high NO2, O3 is also high
scatterPlot(mydata, x = "nox", y = "no2", z = "o3", method = "level", dist = 0.02)
## fit surface for clearer view of relationship
scatterPlot(mydata,
x = "nox", y = "no2", z = "o3", method = "level",
x.inc = 10, y.inc = 2, smooth = TRUE
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.