knitr::opts_chunk$set(collapse = FALSE, comment = "#>")
The myrwaR
package includes a set of functions for retrieving data from the MyRWA Water Quality Access Database. First, we need to load the package:
library(myrwaR)
To connect to the database, use db_connect()
and specify the path to the database, which returns a connection handler that is later used to retrieve the data.
base_dir <- 'D:/Dropbox/Work/mystic/db' db_path <- file.path(base_dir, "MysticDB_20160208.accdb") ch <- db_connect(path = db_path)
The db_list_tables()
function lists the names of tables in the database (note that dplyr
has
a function with the same name, so you may need to namespace it with myrwaR::
):
myrwaR::db_list_tables(ch)
To load the data from any one of these tables, use db_table()
. For example, to get the Location
table:
locations <- db_table(ch, table_name = "Location") str(locations)
If a database connection has already been created, water quality data can be retrieved using db_results()
. This function will automatically join the Visit
and Result
tables to provide the location IDs, datetimes, and useful attributes for each row in the Result
table. Note that this function will not join the Location
table, and thus not contain latitudes and longitudes. See the load_wq()
function in the next section.
results <- db_results(ch)
Finally, to connect to the database and retrieve the water quality data with the location information, use the load_wq()
function (note that the name of this function differs from other database functions to maintain consistency with older R scripts used by MyRWA). The load_wq()
function will connect to the database, and then merge the Result
, Visit
, and Location
database tables into a single dataframe.
wq <- load_wq(path = db_path, sample_types = c("S"), exclude_flags = TRUE) str(wq)
The sample_types
parameter lets you specify which sample types to include in the data frame. By default, it will only return routine samples ("S"
). To include other sample types such as field duplicates and field blanks, just add other SampleType IDs sample_types = c("S", "FD", "FB")
, or set sample_types = NULL
to retrieve all sample types. To see a complete list of sample types, load the SampleType
table: db_table(ch, table_name="SampleType")
.
The exclude_flags
parameter specifies whether the resulting data frame should exclude flagged samples (default is TRUE
). To include the flagged samples, set this to FALSE
.
To close a connection, simply use the close()
function:
close(ch)
The compare_database()
function summarizes changes between to database versions. This function loads the Result
, Visit
, and Location
tables from each database version, and then summarizes how many rows have been added or removed from the first to the second version. Note that this does not look at changes to the values of each row, it only checks for row additions and removals.
compare_database(old_path = file.path(base_dir, "MysticDB_20160120.accdb"), new_path = file.path(base_dir, "MysticDB_20160208.accdb"))
The output of the compare_database()
function can also be saves to a text file by setting the log_file
argument.
compare_database(old_path = file.path(base_dir, "MysticDB_20150227.accdb"), new_path = file.path(base_dir, "MysticDB_20150529.accdb"), log_file = file.path(base_dir, 'compare_20150227_20150529.txt'))
The db_qaqc_suite()
function runs a series of checks on the Result
and Visit
tables These checks are designed to find invalid values in each column of these tables. The types of checks include:
""
)"FD "
)NA
in columns that should not have missing values Result.Units->Units.ID
, Result.FlagID->ResultFlag.ID
, Visit.SampleTypeID->SampleType.ID
, and Visit.ProjectID->Project.ID
)Note that the first three checks are included because Microsoft Access does not enforce consistency across these issues and thus may go unnoticed within the database. However, these issues can cause problems when importing the results into R, which does require these consistencies.
For each set of checks, db_qaqc_suite()
function will report which columns failed, the reason, and the number of rows that failed each check. The specific rows that failed can be shown by setting print.values=TRUE
(not shown here for brevity).
ch <- db_connect(db_path) db_qaqc_suite(ch, print.rows=FALSE) close(ch)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.