knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = T, message = F ) library(ggplot2)
Outliers can be evaluated once the required data are successfully imported into R (see the data input and checks vignette for an overview). The results data file with the monitoring data is required. The data quality objectives file for accuracy is also required to determine plot axis scaling as arithmetic (linear) or logarithmic and to fill results data that are below detection or above quantitation limits. The example data included with the package are imported here to demonstrate how to use the analysis functions:
library(MassWateR) # import results data respth <- system.file("extdata/ExampleResults.xlsx", package = "MassWateR") resdat <- readMWRresults(respth) # import accuracy data accpth <- system.file("extdata/ExampleDQOAccuracy.xlsx", package = "MassWateR") accdat <- readMWRacc(accpth)
Outliers can be identified using the anlzMWRoutlier()
function. Evaluating data for outliers is a critical step of quality control, and this function allows a user to quickly identify them for removal or additional follow-up. Outliers are defined using the standard definition of 1.5 times the interquartile range (the 25th to 75th percentile) of a parameter, depending on the grouping used for the function. They are visually identified as points above or below the whiskers in the boxplots that describe distribution of the data, i.e., the box defines the interquartile range, the horizontal line is the median, and the whiskers extend above and below 1.5 times the interquartile range.
knitr::include_graphics("boxplotex.PNG")
The anlzMWRoutlier()
function uses the results data as a file path or data frame as input. The data quality objectives for accuracy is also required. The input files are passed to the function using the res
and acc
arguments, but a named list with the same arguments can also be used with the fset
argument for convenience. The results for a selected parameter are shown as boxplots, with the outliers labelled accordingly.
anlzMWRoutlier(res = resdat, param = "DO", acc = accdat, group = "month")
Outliers can be shown by month using group = "month"
, by site using group = "site"
, or by week of year using group = "week"
. Above, outliers are shown for dissolved oxygen grouped by month. The same data can also grouped by site.
anlzMWRoutlier(res = resdat, param = "DO", acc = accdat, group = "site")
The same data can also be grouped by week using group = "week"
. The week of the year is shown on the plot as an integer. Note that there can be no common month/day indicating the start of the week between years and an integer is the only way to compare summaries if the results data span multiple years.
anlzMWRoutlier(res = resdat, param = "DO", acc = accdat, group = "week")
Results can also be filtered by dates using the dtrng
argument. The date format must be YYYY-MM-DD
and include two entries.
anlzMWRoutlier(res = resdat, param = "DO", acc = accdat, group = "week", dtrng = c("2022-05-01", "2022-07-31"))
Points that are not outliers can also be jittered over the boxplots using type = "jitterbox"
(default is type = "box"
). Outlier labels that overlap are also offset by default. This can be suppressed using repel = FALSE
.
anlzMWRoutlier(res = resdat, param = "DO", acc = accdat, type = "jitterbox", group = "month", repel = FALSE)
Specifying type = "jitter
will suppress the boxplots.
anlzMWRoutlier(res = resdat, param = "DO", acc = accdat, type = "jitter", group = "month")
The y-axis scaling as arithmetic (linear) or logarithmic can be set with the yscl
argument. If yscl = "auto"
(default), the scaling is determined automatically from the data quality objective file for accuracy, i.e., parameters with "log" in any of the columns are plotted on log10-scale, otherwise arithmetic. Setting yscl = "linear"
or yscl = "log"
will set the axis as linear or log10-scale, respectively, regardless of the information in the data quality objective file for accuracy. Below, the axis for E. Coli is plotted on the log-10 scale automatically. The y-axis scaling does not need to specified explicitly in the function call because the default setting is yscl = "auto"
.
anlzMWRoutlier(res = resdat, param = "E.coli", acc = accdat, group = "site")
To force the y-axis as linear for E. Coli, yscl = "linear"
must be used. Note that the default linear scaling for dissolved oxygen above is determined automatically.
anlzMWRoutlier(res = resdat, param = "E.coli", acc = accdat, group = "site", yscl = "linear")
Finally, a data frame of the identified outliers can also be returned by setting outliers = TRUE
. This table can be used to identify the outliers in the original data for removal or additional follow-up.
anlzMWRoutlier(res = resdat, param = "DO", acc = accdat, group = "month", outliers = TRUE)
Outlier plots for all parameters in the results data file can be created using the anlzMWRoutlierall()
function. This can be used to create a word document with all plots embedded in the file or as separate png images saved to a specified directory. The relevant arguments required for the function include format = "word"
or format = "png"
to specify the output type, fig_height
and fig_width
for the plot dimensions (default as 4 and 8, respectively), and output_dir
and output_file
for the output directory and file name. The rest of the arguments are all those required for anlzMWRoutlier()
, except the param
argument because plots for all parameters are created in the output.
Create a word document for all outlier plots in a temporary directory:
anlzMWRoutlierall(res = resdat, acc = accdat, group = 'month', format = 'word', output_dir = tempdir())
Create the same output but as separate png plots for each parameter:
anlzMWRoutlierall(res = resdat, acc = accdat, group = 'month', format = 'png', output_dir = tempdir())
The inputs can also be passed to the fset
argument as a named list for convenience.
# names list of inputs fsetls <- list( res = resdat, acc = accdat ) anlzMWRoutlierall(fset = fsetls, group = 'month', format = 'word', output_dir = tempdir())
Once the function is done running, a message indicating success and where the file(s) is located is returned.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.