knitr::opts_chunk$set(collapse = FALSE, comment = "#>")

The myrwaR package includes a set of functions for retrieving data from the MyRWA Water Quality Access Database. First, we need to load the package:

library(myrwaR)

Accessing the Database

Connecting to the Database

To connect to the database, use db_connect() and specify the path to the database, which returns a connection handler that is later used to retrieve the data.

base_dir <- 'D:/Dropbox/Work/mystic/db'
db_path <- file.path(base_dir, "MysticDB_20160208.accdb")
ch <- db_connect(path = db_path)

Retrieving Data from Any Table

The db_list_tables() function lists the names of tables in the database (note that dplyr has a function with the same name, so you may need to namespace it with myrwaR::):

myrwaR::db_list_tables(ch)

To load the data from any one of these tables, use db_table(). For example, to get the Location table:

locations <- db_table(ch, table_name = "Location")
str(locations)

Retrieving Water Quality Data

If a database connection has already been created, water quality data can be retrieved using db_results(). This function will automatically join the Visit and Result tables to provide the location IDs, datetimes, and useful attributes for each row in the Result table. Note that this function will not join the Location table, and thus not contain latitudes and longitudes. See the load_wq() function in the next section.

results <- db_results(ch)

Water Quality Dataset Function

Finally, to connect to the database and retrieve the water quality data with the location information, use the load_wq() function (note that the name of this function differs from other database functions to maintain consistency with older R scripts used by MyRWA). The load_wq() function will connect to the database, and then merge the Result, Visit, and Location database tables into a single dataframe.

wq <- load_wq(path = db_path, sample_types = c("S"), exclude_flags = TRUE)
str(wq)

The sample_types parameter lets you specify which sample types to include in the data frame. By default, it will only return routine samples ("S"). To include other sample types such as field duplicates and field blanks, just add other SampleType IDs sample_types = c("S", "FD", "FB"), or set sample_types = NULL to retrieve all sample types. To see a complete list of sample types, load the SampleType table: db_table(ch, table_name="SampleType").

The exclude_flags parameter specifies whether the resulting data frame should exclude flagged samples (default is TRUE). To include the flagged samples, set this to FALSE.

Closing the Connection

To close a connection, simply use the close() function:

close(ch)

QAQC Procedures

Version Comparison

The compare_database() function summarizes changes between to database versions. This function loads the Result, Visit, and Location tables from each database version, and then summarizes how many rows have been added or removed from the first to the second version. Note that this does not look at changes to the values of each row, it only checks for row additions and removals.

compare_database(old_path = file.path(base_dir, "MysticDB_20160120.accdb"),
                 new_path = file.path(base_dir, "MysticDB_20160208.accdb"))

The output of the compare_database() function can also be saves to a text file by setting the log_file argument.

compare_database(old_path = file.path(base_dir, "MysticDB_20150227.accdb"),
                 new_path = file.path(base_dir, "MysticDB_20150529.accdb"),
                 log_file = file.path(base_dir, 'compare_20150227_20150529.txt'))

Database QAQC Checks

The db_qaqc_suite() function runs a series of checks on the Result and Visit tables These checks are designed to find invalid values in each column of these tables. The types of checks include:

Note that the first three checks are included because Microsoft Access does not enforce consistency across these issues and thus may go unnoticed within the database. However, these issues can cause problems when importing the results into R, which does require these consistencies.

For each set of checks, db_qaqc_suite() function will report which columns failed, the reason, and the number of rows that failed each check. The specific rows that failed can be shown by setting print.values=TRUE (not shown here for brevity).

ch <- db_connect(db_path)
db_qaqc_suite(ch, print.rows=FALSE)
close(ch)


walkerjeffd/myrwaR documentation built on May 3, 2019, 10:46 p.m.