knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  warning = T, 
  message = F
)

The quality control functions in MassWateR can be used once the required data are successfully imported into R (see the data input and checks vignette for an overview). The required data includes the results data file that describe the monitoring data, the data quality objective files for accuracy, frequency, and completeness, and a censored data file (optional) showing number of missing or censored data by parameter. The example data files included with the package are imported here to demonstrate how to use the quality control functions:

library(MassWateR)

# import results data
respth <- system.file("extdata/ExampleResults.xlsx", package = "MassWateR")
resdat <- readMWRresults(respth)

# import data quality objectives for accuracy
accpth <- system.file("extdata/ExampleDQOAccuracy.xlsx", package = "MassWateR")
accdat <- readMWRacc(accpth)

# import data quality objectives for frequency and completeness
frecompth <- system.file("extdata/ExampleDQOFrequencyCompleteness.xlsx", package = "MassWateR")
frecomdat <- readMWRfrecom(frecompth)

# import censored data
censpth <- system.file("extdata/ExampleCensored.xlsx", package = "MassWateR")
censdat <- readMWRcens(censpth)

Creating the review report

The qcMWRreview() function compiles a review report as a Word document for all quality control checks included in the MassWateR package. The report shows several tables, including the data quality objectives files for accuracy, frequency, and completeness, summary results for all accuracy checks, summary results for all frequency checks, summary results for all completeness checks, and individual results for all accuracy checks. The report uses individual table functions described in the sections below to return the results, which include tabMWRacc(), tabMWRfre(), and tabMWRcom().

The workflow for using this function is to import the required data (results, data quality objective files and censored data, as above) and to fix any errors noted on import prior to creating the review report. The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults(), readMWRacc(), readMWRfrecom(), and readMWRcens(). For the former, the full suite of data checks can be evaluated with runkchk = T (default) or suppressed with runchk = F, as explained in the relevant help files. In the latter case, downstream analyses may not work if data are formatted incorrectly.

The report can be created as follows by including the required files and specifying an output directory where the Word document is saved (a temporary directory is used here). Once the function is done running, a message indicating success and where the file is located is returned. The Word file can be further edited by hand as needed.

qcMWRreview(res = resdat, acc = accdat, frecom = frecomdat, cens = censdat, warn = FALSE, output_dir = tempdir())

As a convenience, the input files can also be passed to the qcMWRreview() function as a named list using the fset argument. This eliminates the need to individually specify the input arguments.

# names list of inputs
fsetls <- list(
  res = resdat, 
  acc = accdat, 
  frecom = frecomdat,
  cens = censdat
)

qcMWRreview(fset = fsetls, output_dir = tempdir())

Note that the warnings are suppressed above with warn = FALSE. By default, this argument is set to TRUE to view the warnings in the R console after the report is created. The warnings indicate notable concerns to consider for the input data that may need to be addressed. Details on these warnings are described in the sections below for each quality control table.

Optional arguments for qcMWRreview() that can be changed as needed include specifying the file name with output_file, suppressing the raw data summaries at the end of the report with rawdata = FALSE, and changing the table font sizes (dqofontsize for the data quality objectives on the first page, tabfontsize for the remainder). A spreadsheet of the tables in the Word document can also be created by setting savesheet = TRUE. The spreadsheet is saved in the same directory as the Word document.

Quality control for accuracy

The quality control checks for accuracy assess several characteristics of the data in the results file by referencing appropriate values in the data quality objectives file for accuracy. In short, the accuracy checks evaluate field blanks, lab blanks, field duplicates, lab duplicates, lab spikes, and instrument checks. The accuracy checks require results data (as in resdat above) and data quality objectives files for accuracy (accdat) and frequency and completeness (frecomdat).

The tabMWRacc() function is used to create tabular results for the accuracy checks for each parameter. The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults(), readMWRacc(), and readMWRfrecom. For the former, the full suite of data checks can be evaluated with runkchk = T (default) or suppressed with runchk = F, as explained in the data inputs and checks vignette. In the latter case, downstream analyses may not work if data are formatted incorrectly. Also note that accuracy is only evaluated on parameters that are shared between the results file and the data quality objectives accuracy file. A warning is returned if there are parameters that do not match. The warnings can be suppressed by setting warn = FALSE.

The tabMWRacc() function can return three types of tables as specified with the type argument: "individual", "summary", or "percent". The individual tables are specific to each type of accuracy check for each parameter (e.g., field blanks, lab blanks, etc.). The summary table summarizes all accuracy checks by the number of checks and how many hit/misses are returned for each across all parameters. The percent table is similar to the summary table, but showing only percentages with appropriate color-coding for hit/misses. The data quality objectives file for frequency and completeness is also required if type = "summary" or type = "percent".

For type = "individual", the quality control tables for accuracy are retrieved by specifying the check with the accchk argument. The accchk argument can be used to specify one of the following values to retrieve the relevant tables: "Field Blanks", "Lab Blanks", "Field Duplicates", "Lab Duplicates", or "Lab Spikes / Instrument Checks". Below shows how to retrieve each table using the data frames from above for the results and quality objectives file for accuracy. The warnings are suppressed with warn = FALSE.

tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "individual", accchk = "Field Blanks", warn = FALSE)
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat,type = "individual", accchk = "Lab Blanks", warn = FALSE)
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "individual", accchk = "Field Duplicates", warn = FALSE)
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "individual", accchk = "Lab Duplicates", warn = FALSE)
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "individual", accchk = "Lab Spikes / Instrument Checks", warn = FALSE)

For type = "summary", the function summarizes all accuracy checks by counting the number of quality control checks, number of misses, and percent acceptance for each parameter. All accuracy checks are used and the accchk argument does not apply.

tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "summary", warn = FALSE)

For type = "percent", the function returns a similar table as for the summary option, except only the percentage of checks that pass for each parameter are shown in wide format. Cells are color-coded based on the percentage of checks that have passed using the percent thresholds from the % Completeness column of the data quality objectives file for frequency and completeness. Parameters without an entry for % Completeness are not color-coded and an appropriate warning is returned. All accuracy checks are used and the accchk argument does not apply.

tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "percent", warn = FALSE)

The tabMWRacc() function uses the qcMWRacc() function internally to create the table. This function creates the raw summaries of accuracy from the input data. Typically, qcMWRacc() is not used by itself, but it is explained here to demonstrate how the raw summaries can be obtained.

Below, the qcMWRacc() function is executed with the data frames for the results and quality objectives file for accuracy. As with tabMWRacc, the accchk argument can be used to return one to all of the checks. The results are returned as a list, with each element of the list corresponding to a specific accuracy check. Here, the lab duplicate checks are returned. The warnings are also suppressed.

qcMWRacc(res = resdat, acc = accdat, frecom = frecomdat, accchk = "Lab Duplicates", warn = FALSE)

Quality control for frequency

The quality control checks for frequency are used to verify an appropriate number of quality control samples have been collected or analyzed for each parameter. These are checks on the quantity of samples and not the values, as for the accuracy checks. The frequency checks require results data (as in resdat above), a data quality objectives file for accuracy (as in accdat), and a data quality objectives file for frequency and completeness (as in frecomdat).

The tabMWRfre() function is used to create tabular results for the frequency checks for each parameter. The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults(), readMWRacc(), and readMWRfrecom(). For the former, the full suite of data checks can be evaluated with runkchk = T (default) or suppressed with runchk = F, as explained in the data inputs and checks vignette. In the latter case, downstream analyses may not work if data are formatted incorrectly. Also note that completeness is only evaluated on parameters that are shared between the results file and the data quality objectives completeness file. A warning is returned if there are parameters that do not match. The warnings can be suppressed by setting warn = FALSE.

The quality control tables for frequency show the number of records that apply to a given check (e.g., Lab Blank, Field Blank, etc.) relative to the number of "regular" data records (e.g., field samples or measures) for each parameter. A summary of all frequency checks for each parameter is provided if type = "summary". The function is executed with the data frames for the results and quality objectives file for frequency.

tabMWRfre(res = resdat, acc = accdat, frecom = frecomdat, type = "summary", warn = FALSE)

A color-coded table showing similar information as percentages for each parameter is provided if type = "percent". Cells are shown as green or red if the required frequency checks are met as specified in the data quality objectives file.

tabMWRfre(res = resdat, acc = accdat, frecom = frecomdat, type = "percent", warn = FALSE)

The tabMWRfre() function uses the qcMWRfre() function internally to create the table. This function creates the raw summaries of frequency from the input data. Typically, qcMWRfre() is not used by itself, but it is explained here to demonstrate how the raw summaries can be obtained.

Below, the qcMWRfre() function is executed with the data frames for the results and quality objectives file for frequency and completeness. The warnings are suppressed.

qcMWRfre(res = resdat, acc = accdat, frecom = frecomdat, warn = FALSE)

The output shows the frequency checks from the combined files. Each row applies to a frequency check for a parameter. The Parameter column shows the parameter, the obs column shows the total records that apply to regular activity types, the check column shows the relevant activity type for each frequency check, the count column shows the number of records that apply to a check, the standard column shows the relevant percentage required for the quality control check from the quality control objectives file, and the met column shows if the standard was met by comparing if percent is greater than or equal to standard.

Quality control for completeness

The quality control checks for completeness are used to assess the number of regular samples relative to qualified samples that apply to each parameter. Regular samples are those with Field Msr/Obs or Sample-Routine in the Activity Type column of the results file and qualified samples are those marked as Q in the Result Measure Qualifier column of the results file. The completeness checks require results data (as in resdat above), a data quality objectives file for frequency and completeness (as in frecomdat), and the censored data indicating number of missing or censored observations by parameter (as in censdat). The censored data is optional.

The tabMWRcom() function is used to create a table that shows the completeness percentages for each parameter. As explained in the previous section, the function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults(), readMWRfrecom(), and readMWRcens().

A summary table showing the number of data records, number of qualified records, and percent completeness can be created with tabMWRcom(). The % Completeness column shows cells as green or red if the required percentage of observations for completeness are present as specified in the data quality objectives file. The Hit/Miss column shows similar information but in text format, i.e., MISS is shown if the quality control standard for completeness is not met. The function is also executed with the data frames from above, since they have already passed the internal checks.

tabMWRcom(res = resdat, frecom = frecomdat, cens = censdat, warn = FALSE)

The tabMWRcom() function uses the qcMWRcom() function internally to create the table. This function creates the raw summaries of completeness from the input data. Typically, qcMWRcom() is not used by itself, but it is explained here to demonstrate how the raw summaries can be obtained.

Below, the qcMWRcom() function is executed with the data frames for the results, quality objectives file for frequency and completeness, and censored data. The warnings are suppressed.

qcMWRcom(res = resdat, frecom = frecomdat, cens = censdat, warn = FALSE)

The output shows the completeness checks from the combined files. Each row applies to a completeness check for a parameter. The datarec and qualrec columns show the number of data records and qualified records, respectively. The datarec column specifically shows only records not for quality control by excluding those as duplicates, blanks, or spikes in the count. The standard column shows the relevant percentage required for the quality control check from the quality control objectives file, the complete column shows the calculated completeness taken from the input data, and the met column shows if the standard was met by comparing if complete is greater than or equal to standard.



massbays-tech/MassWateR documentation built on April 12, 2025, 7:53 p.m.