knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = T, message = F )
The quality control functions in MassWateR can be used once the required data are successfully imported into R (see the data input and checks vignette for an overview). The required data includes the results data file that describe the monitoring data, the data quality objective files for accuracy, frequency, and completeness, and a censored data file (optional) showing number of missing or censored data by parameter. The example data files included with the package are imported here to demonstrate how to use the quality control functions:
library(MassWateR) # import results data respth <- system.file("extdata/ExampleResults.xlsx", package = "MassWateR") resdat <- readMWRresults(respth) # import data quality objectives for accuracy accpth <- system.file("extdata/ExampleDQOAccuracy.xlsx", package = "MassWateR") accdat <- readMWRacc(accpth) # import data quality objectives for frequency and completeness frecompth <- system.file("extdata/ExampleDQOFrequencyCompleteness.xlsx", package = "MassWateR") frecomdat <- readMWRfrecom(frecompth) # import censored data censpth <- system.file("extdata/ExampleCensored.xlsx", package = "MassWateR") censdat <- readMWRcens(censpth)
The qcMWRreview()
function compiles a review report as a Word document for all quality control checks included in the MassWateR package. The report shows several tables, including the data quality objectives files for accuracy, frequency, and completeness, summary results for all accuracy checks, summary results for all frequency checks, summary results for all completeness checks, and individual results for all accuracy checks. The report uses individual table functions described in the sections below to return the results, which include tabMWRacc()
, tabMWRfre()
, and tabMWRcom()
.
The workflow for using this function is to import the required data (results, data quality objective files and censored data, as above) and to fix any errors noted on import prior to creating the review report. The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults()
, readMWRacc()
, readMWRfrecom()
, and readMWRcens()
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
, as explained in the relevant help files. In the latter case, downstream analyses may not work if data are formatted incorrectly.
The report can be created as follows by including the required files and specifying an output directory where the Word document is saved (a temporary directory is used here). Once the function is done running, a message indicating success and where the file is located is returned. The Word file can be further edited by hand as needed.
qcMWRreview(res = resdat, acc = accdat, frecom = frecomdat, cens = censdat, warn = FALSE, output_dir = tempdir())
As a convenience, the input files can also be passed to the qcMWRreview()
function as a named list using the fset
argument. This eliminates the need to individually specify the input arguments.
# names list of inputs fsetls <- list( res = resdat, acc = accdat, frecom = frecomdat, cens = censdat ) qcMWRreview(fset = fsetls, output_dir = tempdir())
Note that the warnings are suppressed above with warn = FALSE
. By default, this argument is set to TRUE
to view the warnings in the R console after the report is created. The warnings indicate notable concerns to consider for the input data that may need to be addressed. Details on these warnings are described in the sections below for each quality control table.
Optional arguments for qcMWRreview()
that can be changed as needed include specifying the file name with output_file
, suppressing the raw data summaries at the end of the report with rawdata = FALSE
, and changing the table font sizes (dqofontsize
for the data quality objectives on the first page, tabfontsize
for the remainder). A spreadsheet of the tables in the Word document can also be created by setting savesheet = TRUE
. The spreadsheet is saved in the same directory as the Word document.
The quality control checks for accuracy assess several characteristics of the data in the results file by referencing appropriate values in the data quality objectives file for accuracy. In short, the accuracy checks evaluate field blanks, lab blanks, field duplicates, lab duplicates, lab spikes, and instrument checks. The accuracy checks require results data (as in resdat
above) and data quality objectives files for accuracy (accdat
) and frequency and completeness (frecomdat
).
The tabMWRacc()
function is used to create tabular results for the accuracy checks for each parameter. The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults()
, readMWRacc()
, and readMWRfrecom
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
, as explained in the data inputs and checks vignette. In the latter case, downstream analyses may not work if data are formatted incorrectly. Also note that accuracy is only evaluated on parameters that are shared between the results file and the data quality objectives accuracy file. A warning is returned if there are parameters that do not match. The warnings can be suppressed by setting warn = FALSE
.
The tabMWRacc()
function can return three types of tables as specified with the type
argument: "individual"
, "summary"
, or "percent"
. The individual tables are specific to each type of accuracy check for each parameter (e.g., field blanks, lab blanks, etc.). The summary table summarizes all accuracy checks by the number of checks and how many hit/misses are returned for each across all parameters. The percent table is similar to the summary table, but showing only percentages with appropriate color-coding for hit/misses. The data quality objectives file for frequency and completeness is also required if type = "summary"
or type = "percent"
.
For type = "individual"
, the quality control tables for accuracy are retrieved by specifying the check with the accchk
argument. The accchk
argument can be used to specify one of the following values to retrieve the relevant tables: "Field Blanks"
, "Lab Blanks"
, "Field Duplicates"
, "Lab Duplicates"
, or "Lab Spikes / Instrument Checks"
. Below shows how to retrieve each table using the data frames from above for the results and quality objectives file for accuracy. The warnings are suppressed with warn = FALSE
.
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "individual", accchk = "Field Blanks", warn = FALSE)
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat,type = "individual", accchk = "Lab Blanks", warn = FALSE)
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "individual", accchk = "Field Duplicates", warn = FALSE)
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "individual", accchk = "Lab Duplicates", warn = FALSE)
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "individual", accchk = "Lab Spikes / Instrument Checks", warn = FALSE)
For type = "summary"
, the function summarizes all accuracy checks by counting the number of quality control checks, number of misses, and percent acceptance for each parameter. All accuracy checks are used and the accchk
argument does not apply.
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "summary", warn = FALSE)
For type = "percent"
, the function returns a similar table as for the summary option, except only the percentage of checks that pass for each parameter are shown in wide format. Cells are color-coded based on the percentage of checks that have passed using the percent thresholds from the % Completeness
column of the data quality objectives file for frequency and completeness. Parameters without an entry for % Completeness
are not color-coded and an appropriate warning is returned. All accuracy checks are used and the accchk
argument does not apply.
tabMWRacc(res = resdat, acc = accdat, frecom = frecomdat, type = "percent", warn = FALSE)
The tabMWRacc()
function uses the qcMWRacc()
function internally to create the table. This function creates the raw summaries of accuracy from the input data. Typically, qcMWRacc()
is not used by itself, but it is explained here to demonstrate how the raw summaries can be obtained.
Below, the qcMWRacc()
function is executed with the data frames for the results and quality objectives file for accuracy. As with tabMWRacc
, the accchk
argument can be used to return one to all of the checks. The results are returned as a list, with each element of the list corresponding to a specific accuracy check. Here, the lab duplicate checks are returned. The warnings are also suppressed.
qcMWRacc(res = resdat, acc = accdat, frecom = frecomdat, accchk = "Lab Duplicates", warn = FALSE)
The quality control checks for frequency are used to verify an appropriate number of quality control samples have been collected or analyzed for each parameter. These are checks on the quantity of samples and not the values, as for the accuracy checks. The frequency checks require results data (as in resdat
above), a data quality objectives file for accuracy (as in accdat
), and a data quality objectives file for frequency and completeness (as in frecomdat
).
The tabMWRfre()
function is used to create tabular results for the frequency checks for each parameter. The function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults()
, readMWRacc()
, and readMWRfrecom()
. For the former, the full suite of data checks can be evaluated with runkchk = T
(default) or suppressed with runchk = F
, as explained in the data inputs and checks vignette. In the latter case, downstream analyses may not work if data are formatted incorrectly. Also note that completeness is only evaluated on parameters that are shared between the results file and the data quality objectives completeness file. A warning is returned if there are parameters that do not match. The warnings can be suppressed by setting warn = FALSE
.
The quality control tables for frequency show the number of records that apply to a given check (e.g., Lab Blank, Field Blank, etc.) relative to the number of "regular" data records (e.g., field samples or measures) for each parameter. A summary of all frequency checks for each parameter is provided if type = "summary"
. The function is executed with the data frames for the results and quality objectives file for frequency.
tabMWRfre(res = resdat, acc = accdat, frecom = frecomdat, type = "summary", warn = FALSE)
A color-coded table showing similar information as percentages for each parameter is provided if type = "percent"
. Cells are shown as green or red if the required frequency checks are met as specified in the data quality objectives file.
tabMWRfre(res = resdat, acc = accdat, frecom = frecomdat, type = "percent", warn = FALSE)
The tabMWRfre()
function uses the qcMWRfre()
function internally to create the table. This function creates the raw summaries of frequency from the input data. Typically, qcMWRfre()
is not used by itself, but it is explained here to demonstrate how the raw summaries can be obtained.
Below, the qcMWRfre()
function is executed with the data frames for the results and quality objectives file for frequency and completeness. The warnings are suppressed.
qcMWRfre(res = resdat, acc = accdat, frecom = frecomdat, warn = FALSE)
The output shows the frequency checks from the combined files. Each row applies to a frequency check for a parameter. The Parameter
column shows the parameter, the obs
column shows the total records that apply to regular activity types, the check
column shows the relevant activity type for each frequency check, the count
column shows the number of records that apply to a check, the standard
column shows the relevant percentage required for the quality control check from the quality control objectives file, and the met
column shows if the standard was met by comparing if percent
is greater than or equal to standard
.
The quality control checks for completeness are used to assess the number of regular samples relative to qualified samples that apply to each parameter. Regular samples are those with Field Msr/Obs
or Sample-Routine
in the Activity Type
column of the results file and qualified samples are those marked as Q
in the Result Measure Qualifier
column of the results file. The completeness checks require results data (as in resdat
above), a data quality objectives file for frequency and completeness (as in frecomdat
), and the censored data indicating number of missing or censored observations by parameter (as in censdat
). The censored data is optional.
The tabMWRcom()
function is used to create a table that shows the completeness percentages for each parameter. As explained in the previous section, the function can be used with inputs as paths to the relevant files or as data frames returned by readMWRresults()
, readMWRfrecom()
, and readMWRcens()
.
A summary table showing the number of data records, number of qualified records, and percent completeness can be created with tabMWRcom()
. The % Completeness
column shows cells as green or red if the required percentage of observations for completeness are present as specified in the data quality objectives file. The Hit/Miss
column shows similar information but in text format, i.e., MISS
is shown if the quality control standard for completeness is not met. The function is also executed with the data frames from above, since they have already passed the internal checks.
tabMWRcom(res = resdat, frecom = frecomdat, cens = censdat, warn = FALSE)
The tabMWRcom()
function uses the qcMWRcom()
function internally to create the table. This function creates the raw summaries of completeness from the input data. Typically, qcMWRcom()
is not used by itself, but it is explained here to demonstrate how the raw summaries can be obtained.
Below, the qcMWRcom()
function is executed with the data frames for the results, quality objectives file for frequency and completeness, and censored data. The warnings are suppressed.
qcMWRcom(res = resdat, frecom = frecomdat, cens = censdat, warn = FALSE)
The output shows the completeness checks from the combined files. Each row applies to a completeness check for a parameter. The datarec
and qualrec
columns show the number of data records and qualified records, respectively. The datarec
column specifically shows only records not for quality control by excluding those as duplicates, blanks, or spikes in the count. The standard
column shows the relevant percentage required for the quality control check from the quality control objectives file, the complete
column shows the calculated completeness taken from the input data, and the met
column shows if the standard was met by comparing if complete
is greater than or equal to standard
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.