summaryPlot | R Documentation |
This function provides a quick graphical and numerical summary of data. The
location presence/absence of data are shown, with summary statistics and
plots of variable distributions. summaryPlot()
can also provide summaries
of a single pollutant across many sites.
summaryPlot(
mydata,
na.len = 24,
clip = TRUE,
percentile = 0.99,
type = "histogram",
pollutant = "nox",
period = "years",
avg.time = "day",
print.datacap = TRUE,
breaks = NULL,
plot.type = "l",
col.trend = "darkgoldenrod2",
col.data = "lightblue",
col.mis = rgb(0.65, 0.04, 0.07),
col.hist = "forestgreen",
cols = NULL,
date.breaks = 7,
auto.text = TRUE,
plot = TRUE,
debug = FALSE,
...
)
mydata |
A data frame to be summarised. Must contain a |
na.len |
Missing data are only shown with at least |
clip |
When data contain outliers, the histogram or density plot can
fail to show the distribution of the main body of data. Setting |
percentile |
This is used to clip the data. For example, |
type |
|
pollutant |
|
period |
|
avg.time |
This defines the time period to average the time series
plots. Can be "sec", "min", "hour", "day" (the default), "week", "month",
"quarter" or "year". For much increased flexibility a number can precede
these options followed by a space. For example, a |
print.datacap |
Should the data capture % be shown for each period? |
breaks |
Number of histogram bins. Sometime useful but not easy to set a single value for a range of very different variables. |
plot.type |
The |
col.trend |
Colour to be used to show the monthly trend of the data,
shown as a shaded region. Type |
col.data |
Colour to be used to show the presence of data. Type
|
col.mis |
Colour to be used to show missing data. |
col.hist |
Colour for the histogram or density plot. |
cols |
Predefined colour scheme, currently only enabled for
|
date.breaks |
Number of major x-axis intervals to use. The function will
try and choose a sensible number of dates/times as well as formatting the
date/time appropriately to the range being considered. This does not
always work as desired automatically. The user can therefore increase or
decrease the number of intervals by adjusting the value of |
auto.text |
Either |
plot |
Should a plot be produced? |
debug |
Should data types be printed to the console? |
... |
Other graphical parameters. Commonly used examples include the
axis and title labelling options (such as |
summaryPlot()
produces two panels of plots: one showing the
presence/absence of data and the other the distributions. The left panel
shows time series and codes the presence or absence of data in different
colours. By stacking the plots one on top of another it is easy to compare
different pollutants/variables. Overall statistics are given for each
variable: mean, maximum, minimum, missing hours (also expressed as a
percentage), median and the 95th percentile. For each year the data capture
rate (expressed as a percentage of hours in that year) is also given.
The right panel shows either a histogram or a density plot depending on the
choice of type
. Density plots avoid the issue of arbitrary bin sizes that
can sometimes provide a misleading view of the data distribution. Density
plots are often more appropriate, but their effectiveness will depend on the
data in question.
summaryPlot()
will only show data that are numeric or integer type. This is
useful for checking that data have been imported properly. For example, if
for some reason a column representing wind speed erroneously had one or more
fields with characters in, the whole column would be either character or
factor type. The absence of a wind speed variable in the summaryPlot()
plot
would therefore indicate a problem with the input data. In this particular
case, the user should go back to the source data and remove the characters or
remove them using R functions.
If there is a field site
, which would generally mean there is more than one
site, summaryPlot()
will provide information on a
single pollutant across all sites, rather than provide details on all
pollutants at a single site. In this case the user should also provide a
name of a pollutant e.g. pollutant = "nox"
. If a pollutant is not provided
the first numeric field will automatically be chosen.
It is strongly recommended that the summaryPlot()
function is
applied to all new imported data sets to ensure the data are imported as
expected.
David Carslaw
# do not clip density plot data
## Not run:
summaryPlot(mydata, clip = FALSE)
## End(Not run)
# exclude highest 5 % of data etc.
## Not run:
summaryPlot(mydata, percentile = 0.95)
## End(Not run)
# show missing data where there are at least 96 contiguous missing
# values (4 days)
## Not run:
summaryPlot(mydata, na.len = 96)
## End(Not run)
# show data in green
## Not run:
summaryPlot(mydata, col.data = "green")
## End(Not run)
# show missing data in yellow
## Not run:
summaryPlot(mydata, col.mis = "yellow")
## End(Not run)
# show density plot line in black
## Not run:
summaryPlot(mydata, col.dens = "black")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.