knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = T, message = F, fig.align = "center" ) library(dplyr)
Five types of data are used with the MassWateR package.
qcMWRreview()
, specifically for the quality control completeness checks in the table produced by tabMWRcom()
.Templates with instructions for each of the types of input data are available for download in the Resources tab. Additionally, example files, described below, are provided with the package to demonstrate and test the functions.
Load the package in an R session after installation:
library(MassWateR)
The following shows how to specify a path and import each required data file. These are hypothetical files and the path will need to be changed to where your data are located on your computer.
# import results data respth <- "C:/Documents/MassWateR/MyResults.xlsx" resdat <- readMWRresults(respth) # import dqo accuracy data accpth <- "C:/Documents/MassWateR/MyDQOAccuracy.xlsx" accdat <- readMWRacc(accpth) # import dqo frequency and completeness data frecompth <- "C:/Documents/MassWateR/MyDQOFreCom.xlsx" frecomdat <- readMWRfrecom(frecompth) # import site data sitpth <- "C:/Documents/MassWateR/MySites.xlsx" sitdat <- readMWRsites(sitpth) # import WQX meta data wqxpth <- "C:/Documents/MassWateR/MyWQXMeta.xlsx" wqxdat <- readMWRwqx(wqxpth) # censored data censpth <- "C:/Documents/MassWateR/MyCensored.xlsx" censdat <- readMWRcens(censpth)
Example files included with the package are imported for demonstration in this vignette. The paths to these files are identified using the system.file()
function and used in the examples below. In practice, alternative data files that follow the same format as the examples will be used with the functions and imported using code similar to above.
Make sure to close any data files on your desktop before importing them in MassWateR. Some operating systems may require the file to be closed for successful import.
The MassWateR package is developed for quality control and exploratory analysis of surface water quality data. Before these analyses can occur, the data must be formatted correctly. There are several checks included in the data import functions to ensure the files are formatted correctly for downstream use. If any of the checks fail, an error message will be returned that prompts the necessary changes that must be made to the Excel file before the data can be used.
For many of the checks, parameter names and units need to match the following columns in the paramsMWR
file included with the package. Specifically parameter names (either Characteristic Name
in the results file, or Parameter
in the data quality objectives files) can be the simple or WQX format as below. The units can be any that apply for a given parameter, although only one is allowed per parameter. All entries are case-sensitive. This file is also available in the Resources tab.
paramsMWR %>% select(`Simple Parameter`, `WQX Parameter`, `Units of measure`) %>% arrange(`Simple Parameter`, .locale = 'en') %>% knitr::kable(.)
The readMWRresultsview()
function can be used to help troubleshoot issues that are encountered importing the water quality results file (next section). This function can be used to create a .csv spreadsheet that shows the unique values within columns of the results file. This information can be used to verify if the values in each conform to the requirements for the data import checks, including acceptable values for the table above. By default, a .csv is created for all columns. The columns
argument can be used to select columns of interest. Below shows how to view the unique entries for parameters ("Characteristic Name"
) and units ("Result Unit"
). The .csv is created in the directory specified by output_dir
. Visually evaluating the results for conformance to the package requirements and manually editing the input file can help with the import checks, described below.
# find path to the file included with the package, replace with a path to your file as needed respth <- system.file("extdata/ExampleResults.xlsx", package = "MassWateR") # create a .csv file for the two columns that show unique values in each readMWRresultsview(respth, columns = c("Characteristic Name", "Result Unit"), output_dir = tempdir())
First, the surface water quality results can be imported with the readMWRresults()
function. This is designed to import an Excel file external to R, run checks on the data, and provide some minor formatting for downstream quality control or exploratory analysis. In this example, the system file ExampleResults.xlsx
is imported. In practice, the pth
argument will point to an external file in the WQX format. See the Resources tab for the Excel file template and detailed instructions (in the template's instructions worksheet). Note that runchk = TRUE
is set to run the checks on data import. This is the default setting and it is not necessary to explicitly set this argument on import.
respth <- system.file("extdata/ExampleResults.xlsx", package = "MassWateR") resdat <- readMWRresults(respth, runchk = TRUE) head(resdat)
Several checks are run automatically when the data are imported. These file checks are as follows (also viewed from the help file for checkMWRresults()
):
Activity Depth/Height Measure
or Activity Relative Depth Name
for all rows where Activity Type
is Field Msr/Obs or Sample-Routineft
, m
, or blankSimple Parameter
or WQX Parameter
column of the paramsMWR
data (warning only)Result Unit
, except pH which can be blankCharacteristic Name
should have only one entry in Result Unit
(excludes entries for lab spikes reported as %
or % recovery
)Characteristic Name
should have an entry in Result Unit
that matches one of the acceptable values in the Units of measure
column of the paramsMWR
data (excludes entries for lab spikes reported as %
or % recovery
), see the table above.An informative error is returned if the input data fail any of the checks. The input data should be corrected by hand in the Excel file by altering the appropriate rows or column names indicated in the error. Checks with warnings can be fixed at the discretion of the user before proceeding.
Here is an example of an error that might be returned for an incorrect data entry (using the checkMWRresults()
function, which is used inside of readMWRresults()
). To remedy the issue, change the entries in rows 4 and 135 in the Activity Type column to Sample-Routine and Field Msr/Obs, respectively. This must be done in the original Excel file. Import the data again in R to verify the data are corrected.
chk <- resdat chk[4, 2] <- "Sample" chk[135, 2] <- "Field" checkMWRresults(chk)
Data imported with readMWRresults()
are also formatted to address a few minor issues for downstream analysis. This formatting includes:
NA
. Characteristic Name
are converted to Simple Parameter
in paramsMWR
as needed.A few other points about the Activity Type
column are worth mentioning. As noted in the list above, the package uses specific activity types for data organization. Sample-Routine should be used for collected water samples. Field Msr/Obs should be used for in-situ measurements. The rest of the activity types are all for QC data. The input file should only contain the activity types in the Input Activity Type column below. The WQX output that is generated by the package will create extra rows with the activity types in the second column below. The new WQX output rows will take the value from the QC Reference Value
column as their Result Value
. Please view the Water Quality Exchange output vignette for a complete description of creating output for WQX upload with MassWateR.
tibble::tibble( `Input Activity Type` = c('Sample-Routine', 'Quality Control Sample-Field Blank', 'Quality Control Sample-Lab Duplicate', 'Quality Control Sample-Lab Blank', 'Quality Control Sample-Lab Spike', 'Field Msr/Obs', 'Quality Control-Meter Lab Duplicate', 'Quality Control-Meter Lab Blank', 'Quality Control-Calibration Check'), `WQX output new row Activity Type` = c('Quality Control Sample-Field Replicate', 'NA', 'Quality Control Sample-Lab Duplicate 2', 'NA', 'Quality Control Sample-Lab Spike Target', 'Quality Control Field Replicate Msr/Obs', 'Quality Control-Meter Lab Duplicate 2', 'NA', 'Quality Control-Calibration Check Buffer') ) %>% flextable::flextable(.) %>% flextable::border_outer() %>% flextable::border_inner(part = 'all') %>% flextable::hline(part = 'header', border = flextable::fp_border_default(width = 2)) %>% flextable::autofit() %>% flextable::bold(part = 'header') %>% flextable::bg(i = 1, j = 1, bg = 'lightgrey') %>% flextable::bg(i = 6, j = 1, bg = 'lightgrey')
To use the quality control functions in MassWateR, Excel files that describe the data quality objectives for accuracy, frequency, and completeness must be provided. The system files included with the package, described above, demonstrate the required information and format for these files. They can be imported into R using the readMWRacc()
and readMWRfrecom()
functions for the accuracy, frequency, and completeness files. The pth
argument will point to the location of the external files on your computer. See the Resources tab for the Excel file template and detailed instructions (in the template's instructions worksheet). As above, the system files included with the package are used for the examples.
# import data quality objectives for accuracy accpth <- system.file("extdata/ExampleDQOAccuracy.xlsx", package = "MassWateR") accdat <- readMWRacc(accpth) head(accdat)
# import data quality objectives for frequency and completeness frecompth <- system.file("extdata/ExampleDQOFrequencyCompleteness.xlsx", package = "MassWateR") frecomdat <- readMWRfrecom(frecompth) head(frecomdat)
Both the readMWRacc()
and readMWRfrecom()
functions will run a series of checks to ensure the imported data are formatted correctly. The checkMWRacc()
and checkMWRfrecom()
functions run these checks when the readMWRacc()
and readMWRfrecom()
functions are executed, respectively. The checks for each are as follows.
File checks for accuracy:
Value Range
column na check: The character string "na"
should not be in the Value Range
column, "all"
should be used if the entire range applies"\%"
, "BDL"
, "AQL"
, "log"
, or "all"
Value Range
column: Entries in Value Range
should not overlap for a parameter (excludes ascending ranges)Value Range
column: Entries in Value Range
should not include a gap for a parameter, warning onlySimple Parameter
or WQX Parameter
columns of the paramsMWR
datauom
), except pH which can be blankParameter
should have only one type for the units (uom
)Parameter
should have an entry in the units (uom
) that matches one of the acceptable values in the Units of measure
column of the paramsMWR
data, see the table above.NA
values will return a warning File checks for frequency and completeness:
Simple Parameter
or WQX Parameter
columns of the paramsMWR
dataNA
values will return a warning Minor formatting of the input files is also done to address a few minor issues for downstream analysis. This formatting includes:
NA
for the uom
column (readMWRacc()
only). Parameter
are converted to Simple Parameter
in paramsMWR
as needed.qcMWRacc()
, e.g., replace $\geq$ with $>=$.MDL
and UQL
columns in the accuracy file to numericAn Excel file for site metadata that describes spatial location and any other grouping factors for the sites in the results file can be imported using readMWRsites()
. The system file included with the package, described above, demonstrates the required information and format for the file. The pth
argument will point to the location of the external file on your computer. See the Resources tab for the Excel file template. As above, the system file included with the package is used for the example.
# import site metadata sitpth <- system.file("extdata/ExampleSites.xlsx", package = "MassWateR") sitdat <- readMWRsites(sitpth) head(sitdat)
The readMWRsites()
function runs several checks on the file using the checkMWRsites()
function. Most of the checks are to ensure the latitude and longitude data are present and properly formatted. It is assumed that latitude and longitude data are entered in decimal degrees. The projection can be entered in other functions used in exploratory analysis. Details on the file checks are as follows:
An Excel file for wqx metadata that is required to generate output for upload to WQX can be imported using readMWRwqx()
. The system file included with the package, described above, demonstrates the required information and format for the file. The pth
argument will point to the location of the external file on your computer. See the Resources tab for the Excel file template and detailed instructions (in the template's instructions worksheet). As above, the system file included with the package is used for the example.
# import wqx metadata wqxpth <- system.file("extdata/ExampleWQX.xlsx", package = "MassWateR") wqxdat <- readMWRwqx(wqxpth) head(wqxdat)
The readMWRwqx()
function runs a few checks on the file using the checkMWRwqx()
function. Details on the file checks are as follows:
Parameter
should be unique (no duplicates)Simple Parameter
or WQX Parameter
column of the paramsMWR
data (warning only)An informative error is returned if the input data fail any of the checks. The input data should be corrected by hand in the Excel file by altering the appropriate rows or column names indicated in the error. Checks with warnings can be fixed at the discretion of the user before proceeding.
An Excel file for the number of missed or censored observations by parameter is required for use with some of the quality control functions, specifically qcMWRreview()
to generate the QC report and tabMWRcom()
to generate the completeness table. This file is optional and not a requirement for producing the report and table. The parameters in this file must match those in the data quality objectives file for frequency and completeness. The censored data can be imported using the readMWRcens()
function. The system file included with the package, described above, demonstrates the required information and format for the file. The pth
argument will point to the location of the external file on your computer. See the Resources tab for the Excel file template and detailed instructions (in the template's instructions worksheet). As above, the system file included with the package is used for the example.
# import cens metadata censpth <- system.file("extdata/ExampleCensored.xlsx", package = "MassWateR") censdat <- readMWRcens(censpth) head(censdat)
The readMWRcens()
function runs a few checks on the file using the checkMWRcens()
function. Details on the file checks are as follows:
Simple Parameter
or WQX Parameter
column of the paramsMWR
data (warning only)An informative error is returned if the input data fail any of the checks. The input data should be corrected by hand in the Excel file by altering the appropriate rows or column names indicated in the error. Checks with warnings can be fixed at the discretion of the user before proceeding.
All of the quality control, outlier, analysis, and wqx functions in MassWateR require the inputs described above, depending on the function. These are generally passed to each function using the appropriate arguments, e.g., res
for the surface water quality results, acc
and frecom
for the data quality objective files for accuracy, frequency, and completeness, sit
for the site metadata, wqx
for the wqx metadata, and cens
for the censored data. The values passed to these functions can be file paths that specify the location of the file or as data frames returned by the relevant import functions (i.e., readMWRresults()
, readMWRacc()
, readMWRfrecom()
, readMWRsites()
, readMWRwqx()
, readMWRcens()
).
Because it can be tedious to specify each of the input files in the arguments for the MassWateR functions, the fset
(file set) argument can be used as an alternative method. The fset
argument for each function accepts a named list with the relevant input file locations or data frames. This can be created at the top of a script and recycled as necessary for an analysis workflow. The examples below demonstrate creating the list as file paths or as exported data frames from the import functions. Examples in the vignettes show how this list can be used as alternative input using the fset
argument.
# a list of input file paths fsetls <- list( res = respth, acc = accpth, frecom = frecompth, sit = sitpth, wqx = wqxpth, cens = censpth ) # a list of input data frames fsetls <- list( res = resdat, acc = accdat, frecom = frecomdat, sit = sitdat, wqx = wqxdat, cens = censdat )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.