knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "fig/README-" ) # library(rems) # remove_data_cache("all")
r as.character(read.dcf("DESCRIPTION", "Version"))
An R package to download, import, and filter data from B.C.'s Environmental Monitoring System (EMS) into R.
The package pulls data from the B.C. Data Catalogue EMS Results, which is licenced under the Open Government Licence - British Columbia.
The package is not available on CRAN, but can be installed using the devtools package:
# install.packages("devtools") # if not already installed library(devtools) install_github("bcgov/rems")
If you are asked during installation "Would you like to install from source the packages which require compilation", choose "No".
NOTE: If you are using Windows, you must be running the 64-bit version of R, as the 32-bit version cannot handle the size of the EMS data. In RStudio, click on Tools -> Global Options and ensure the 64 bit version is chosen in the R version box.
You can use the get_ems_data()
function to get last two years of data
(you can also specify which = "4yr"
to get the last four years of
data):
library(rems) two_year <- get_ems_data(which = "2yr", ask = FALSE) nrow(two_year) head(two_year)
By default, get_ems_data
imports only a subset of columns that are
useful for water quality analysis. This is controlled by the cols
argument, which has a default value of "wq"
. This can be set to
"all"
to download all of the columns, or a character vector of column
names (see ?get_ems_data
for details).
You can filter the data to just get the records you want:
filtered_2yr <- filter_ems_data(two_year, emsid = c("0121580", "0126400"), parameter = c("Aluminum Total", "Cadmium Total", "Copper Total", " Zinc Total", "Turbidity"), from_date = "2011/02/06", to_date = "2015/12/31")
You can also get the entire historic dataset, which has records back to 1964.
First download the dataset using download_historic_data
, which
downloads the data and stores it in a DuckDB
database:
download_historic_data(ask = FALSE)
There are two ways to pull data from the historic dataset into R:
read_historic_data()
Read in the historic data, supplying constraints to only import the records you want:
filtered_historic <- read_historic_data(emsid = c("0121580", "0126400"), parameter = c("Aluminum Total", "Cadmium Total", "Copper Total", "Zinc Total", "Turbidity"), from_date = "2001/02/05", to_date = "2011/12/31", check_db = FALSE)
dplyr
You can also query the historic database using dplyr
, which ultimately
gives you more flexibility than using read_historic_data
:
First, create a connection to the database using
connect_historic_db()
, then attach the historic database table to your
R session using attach_historic_data()
. This creates an object which
behaves like a data frame, which you can query with dplyr. The advantage
is that the computation is done in the database rather than importing
all of the records into R (which would likely be impossible). This is
illustrated below:
library(dplyr) hist_db_con <- connect_historic_db() hist_tbl <- attach_historic_data(hist_db_con)
You can then query this object with dplyr:
filtered_historic2 <- hist_tbl %>% select(EMS_ID, PARAMETER, COLLECTION_START, RESULT) %>% filter(EMS_ID %in% c("0121580", "0126400"), PARAMETER %in% c("Aluminum Total", "Cadmium Total", "Copper Total", " Zinc Total", "Turbidity"))
Finally, to get the results into your R session as a regular data frame,
you must collect()
it. Note that date/times are returned to R in the
Pacific Standard Time timezone (PST; UTC-8).
You can combine the previously imported historic and two_year data sets
using bind_ems_data
:
all_data <- bind_ems_data(filtered_2yr, filtered_historic) head(all_data)
There are many cases in EMS data where the unit of the RESULT
(in the
UNIT
column) is different from that of METHOD_DETECTION_LIMIT
(MDL_UNIT
column). The standardize_mdl_units()
function converts the
METHOD_DETECTION_LIMIT
values to the same unit as RESULT
, and
updates the MDL_UNIT
column accordingly:
# look at data with mismatched units: filter(all_data, UNIT != MDL_UNIT) %>% select(RESULT, UNIT, METHOD_DETECTION_LIMIT, MDL_UNIT) %>% head() all_data <- standardize_mdl_units(all_data) # Check again filter(all_data, UNIT != MDL_UNIT) %>% select(RESULT, UNIT, METHOD_DETECTION_LIMIT, MDL_UNIT) %>% head()
Then you can plot your data with ggplot2:
library(ggplot2) ggplot(all_data, aes(x = COLLECTION_START, y = RESULT)) + geom_point() + facet_grid(PARAMETER ~ EMS_ID, scales = "free_y")
When you are finished querying the historic database, you should close
the database connection using disconnect_historic_db()
:
disconnect_historic_db(hist_db_con)
When the data are downloaded from the B.C. Data Catalogue, they are
cached so that you don't have to download it every time you want to use
it. If there is newer data available in the Catalogue, you will be
prompted the next time you use get_ems_data
or
download_historic_data
.
If you want to remove the cached data, use the function
remove_data_cache
. You can remove all the data, or just the
"historic", "2yr", or "4yr":
remove_data_cache("2yr")
There are two ways to select active sites in the long-term lake
monitoring program. The lt_lake_sites
function selects the EMS_ID
of
active sites. The lt_lake_req
function selects the REQUISITION_ID
of
active sites. Using the lt_lake_sites
will provide all data collected
under the EMS_ID
, whereas using lt_lake_req
will filter data
collected by the long-term lakes monitoring group. Both functions can be
used with filter_ems_data
to easily pull data from active long-term
lake monitoring sites.
head(lt_lake_sites()) head(lt_lake_req()) #use with filter_ems_data filtered_2yr_lt_lakes_ems <- filter_ems_data(two_year, emsid = lt_lake_sites(), parameter = c("Aluminum Total", "Cadmium Total", "Copper Total", " Zinc Total", "Turbidity")) head(filtered_2yr_lt_lakes_ems) filtered_2yr_lt_lakes_req <- filter_ems_data(two_year, req_id = lt_lake_req(), parameter = c("Aluminum Total", "Cadmium Total", "Copper Total", " Zinc Total", "Turbidity")) head(filtered_2yr_lt_lakes_req)
Under development, but stable. Unlikely to break or change substantially.
To report bugs/issues/feature requests, please file an issue.
If you would like to contribute to the package, please see our CONTRIBUTING guidelines.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Copyright 2016 Province of British Columbia Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
This repository is maintained by Environmental Reporting BC. Click here for a complete list of our repositories on GitHub.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.