library(purexposure) library(dplyr)
knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "vignettes" )
The California Department of Pesticide Regulation publishes free copies of Pesticide Use Reports online through the California Pesticide Information Portal (CalPIP). The purexposure
package makes it easy to download Pesticide Use Report (PUR) raw data, output reports in a tidy format, calculate exposure to applied pesticides, and visualize application for counties or more specific locations. For example, you can use the package to produce plots like this one:
fresno <- pull_clean_pur(1995:1998, "Fresno") %>% plot_county_application() fresno$map
knitr::include_graphics("vignettes/figures/fresno_example.png")
The purexposure
package has five main categories of functions: find_*
, pull_*
, calculate_*
, plot_*
, and write_*
. find_*
functions are designed to explore both pesticide products and active ingredients present in applied pesticides, return county names or codes as they are used in PUR data sets, and find the county of a particular location. pull_*
functions pull raw or cleaned PUR data sets through the CA Department of Pesticide Regulation's FTP server. The calculate_exposure
function calculates exposure to applied pesticides for a given location, and plot_*
functions return visualizations of application and exposure. The write_exposure
function calculates exposure to applied pesticides for a vector of functions and writes out files to a specified directory.
purexposure
currently exists in a development version on GitHub. You can install and load the package with the following code:
devtools::install_github("leighseverson/purexposure") library(purexposure)
Many examples in this vignette also use functions from the dplyr
package:
library(dplyr)
Pesticide applications are reported on a product basis, but are recorded by the California Department of Pesticide Regulation (CDPR) by active ingredient. There are usually several active ingredients contained in a single product, so a single pesticide product application is represented by several rows in a data set. In PUR data sets, the prodno
and chem_code
/chemname
variables indicate product names and active ingredient codes/names, respectively.
Records for the amount of active ingredient applied are reliably recorded by day. Raw PUR data sets have a variable called applic_time
, the time that the pesticide product application was completed; however, application time is only required for production agricultural reports and was not added to PUR data sets until 1999. Application time is therefore often missing in raw PUR data sets, and is not retained in data sets cleaned using this package.
PUR data is typically about two years behind. For example, 2017 data was made available in June of 2019. You can check CDPR's FTP site (ftp://transfer.cdpr.ca.gov/pub/outgoing/pur_archives/) to see the most recent year of data available to pull. You can use the package to pull data from 1990 up to that most recent year.
PUR data sets are organized and pulled by county. Application is recorded by township and section, which are units of land resulting from the Public Land Survey System (PLSS). The PLSS is used to subdivide and describe land in the United States, and is regulated by the Bureau of Land Management. Most Midwestern, some southern, and all western states are included in the PLSS. There are several initial points from which PLSS surveys begin from. The line running north-south through the initial points is called the Principle Meridian, and the east-west line running through the points is called the base line. California contains three of these initial points, and three Principle Meridians: Humboldt (H), Mount Diablo (M), and San Bernardino (S).
knitr::include_graphics("vignettes/figures/calneva.jpg")
Townships are six-mile-square areas that are identified by a combination of the location north or south of the base line, and range, the location east or west of the principle meridian. Townships are divided into 36 one-mile-square sections. California townships are uniquely identified by a combination of Principle Meridian (H, M, or S), location north or south of the base line (01-48), base line direction (N or S), location east or west of the Principle Meridian (01-47), and Principle Meridian direction (E or W). Sections are identified by the same combination, with the addition of the section number (01-36). For example, one township in California is identified by "M15S24E", and a section within that township is identified by "M15S24E01". The diagram below shows an example of a township (outlined in blue) and one of the 36 sections within that township (outlined in red).
knitr::include_graphics("vignettes/figures/section_townships.png")
purexposure
packagefind_*
functions: search for active ingredients, product names, and countiesBy default, the package works with all recorded active ingredients in the specified counties and date range. However, if you're interested in analyzing application of a specific active ingredient or chemical class of active ingredients, the find_chemical_codes
function is useful as a starting point to see what active ingredients were present in applied pesticides in a given year or years.
For example, to see how active ingredients matching "methyl bromide" or "abietic" were recorded in the year 2000, we can use the find_chemical_codes
function, which pulls the PUR Chemical Lookup table for a particular year:
find_chemical_codes(year = 2000, chemicals = c("methyl bromide", "abietic"))
The chem_code
column gives the PUR chemical code for the active ingredient, and chemname
gives the PUR chemical name. chemical
gives the search terms input in the chemicals
argument. This function can also be useful to find the number of unique active ingredients that were recorded in applied pesticides in a given year or years:
find_chemical_codes(2000) %>% nrow
3,637 unique active ingredients were present in applied pesticides in California in the year 2000. If you're interested in separating applied chemicals by year, you can set the by_year
argument to TRUE
:
find_chemical_codes(c(2000, 2001), chemicals = "methyl bromide", by_year = TRUE)
This often results in identical search results for multiple years. However, occasionally the chemicals present in PUR data sets differs from year to year. For example, the active ingredient imazosulfuron was not applied in 2006, but was applied in 2007:
find_chemical_codes(c(2006, 2007), chemicals = "imazosulfuron", by_year = TRUE)
If the by_year
argument is left to its default value of FALSE
, unique active ingredients applied across multiple years are returned:
find_chemical_codes(c(2006, 2007), chemicals = c("methyl bromide", "imazosulfuron"))
The find_chemical_codes
function works using grep
to perform pattern matching. Therefore, it may not be effective in returning all active ingredients that should fall under your search term. To find an exhaustive list of all active ingredients that fall under a chemical class, you may find it useful to refer to the CA Department of Pesticide Regulation (DPR)'s Summary of Pesticide Use Report Data, Indexed by Chemical (2008). Chemical classes available in the Summary include:
You can use active ingredients listed under the relevant class to include in the chemicals
argument of find_chemical_codes
to find if those active ingredients were present in applied pesticides in a given year or years.
The find_product_name
works similarly: you can use it to download a product table and search for applied pesticide products for a given year or years:
product_table <- find_product_name(2017, "insecticide") save(product_table, file = "data/product_table.RData")
load("vignettes/data/product_table.RData")
head(product_table, 3)
The prodno
column in the product table can be matched with the same column in a PUR data set.
The find_product_name
uses the pull_product_table
function internally, which saves downloaded tables within the current R session to prevent downloading the same table multiple times. Caching in other pull
functions is discussed further below. This function also has a by_year
argument, which should be set to TRUE
if you would like to pull product names for multiple years and separate results by year. Otherwise, if by_year
is left as its default value of FALSE
, unique results across years are returned.
Counties are indicated in raw PUR data sets by county_cd
: a two-digit integer code. You can pull data for for counties with either their name, PUR code or FIPS code. You can use the find_counties
function to move between county names (return = "names"
), PUR codes (return = "pur_codes"
), and FIPS codes (return = "fips_codes"
). For example:
find_counties(c("Alameda", "Alpine", "Amador"), return = "fips_codes")
Additionally, the county_codes
data set included with this package lists all 58 counties in California with county names and PUR codes as they are recorded in PUR data sets, as well as FIPS codes, which could be useful to link PUR data with other county-level data sets.
purexposure::county_codes %>% slice(1:3)
The find_location_county
function takes a California address or coordinate pair (longitude, latitude) and returns the county of that location.
find_location_county("12906 South Fowler Ave., Selma, CA 93662")
"Fresno"
This function is useful later on if you want to pull PUR data to calculate exposure at a particular location. You can choose to return a county name
(the default), pur_code
, or fips_code
with the return
argument.
pull_*
functions: pull raw or cleaned PUR data setsThe pull_raw_pur
function pulls raw PUR data from CDPR's FTP server:
fresno_raw <- pull_raw_pur(years = 2004, counties = "fresno") fresno_raw <- fresno_raw[1:2,] save(fresno_raw, file = "data/fresno_raw.RData")
load("vignettes/data/fresno_raw.RData")
head(fresno_raw, 2)
Raw data pulled using pull_raw_pur
may differ from data sets downloaded manually from CDPR's FTP server in a few ways:
pull_raw_pur
always has 33 columns. The data sets for some years and counties included columns called error_flag
and comtrs
. Since these variables are inconsistently present and are not documented in the User Guide & Documentation manual, they were removed. applic_dt
(application date) and county_cd
(county code). This may not be the case for data sets downloaded manually. For documentation of raw PUR data, you can reference the Pesticide Use Report Data User Guide & Documentation document published by the CA Department of Pesticide Regulation. The file is saved as "cd_doc.pdf" in any "pur[year].zip" file found here: ftp://transfer.cdpr.ca.gov/pub/outgoing/pur_archives/.
The pull_clean_pur
function pulls and cleans PUR data. You can similarly pull data by years and by counties, with additional arguments to specify the format and contents of your cleaned data set. Here's what a cleaned data set for Fresno county looks like for 2004:
fresno_clean <- pull_clean_pur(2004, "fresno") fresno_clean <- fresno_clean[1:2,] save(fresno_clean, file = "data/fresno_clean.RData")
load("vignettes/data/fresno_clean.RData")
head(fresno_clean, 2)
Pesticide application is recorded by active ingredient (chem_code
and chemname
), kilograms of active ingredient applied (kg_chm_used
), section, township, county (county_name
, pur_code
, and fips_code
), date, method of application (aerial_ground
) and product number (prodno
). Raw PUR data sets are recorded in units of pounds, and cleaned data sets convert applied pesticides to units of kilograms by dividing pounds of active ingredient applied by 2.20562. The use_no
variable uniquely identifies pesticide product uses.
Outliers
The outlier
column, in the case when the record has been flagged as an outlier and kg_chm_used
has been replaced with a calculated maximum rate of application, lists the raw value of recorded kilograms of applied chemical. This column is only included in the data frame returned from pull_clean_pur()
if sum_application = FALSE
. The algorithm used to identify and correct economically unfeasible and therefore erroneously high records of application was developed based on methods used by Gunier et al. (2001).
For each active ingredient and each year, a calculated maximum rate of application was calculated as mean pounds per acre plus two standard deviations. Rates were calculated from pounds applied by dividing the raw lbs_chm_used
column by either the acre_treated
column as is for unit_treated == "A"
(acres), or by the acre_treated
column multiplied by $\scriptsize2.29568\times10^{-5}$ for unit_treated == "S"
(square feet).
If the pounds per acre applied on a particular day exceeded the calculated maximum rate of application for the relevant year and active ingredient, the recorded rate was replaced with the calculated maximum rate. This calculated maximum rate of pounds per acre was then converted back to pounds of chemical applied by multiplying the rate by the number of acres treated, before being converted to kilograms of chemical applied. The outlier
column lists the raw value of kilograms of chemical applied. If the pounds per acre applied on a particular day did not exceed the calculated maximum rate of application for the relevant year and active ingredient, the outlier
column is set to NA
.
Table format
By default, application records are organized by active ingredient, section, township, date, and application method. You can change the format of the cleaned data with additional arguments to pull_clean_pur
.
The chemicals
argument uses the find_chemical_code
function to filter by the chemname
column. You can filter by listing search terms in the chemicals
argument:
nevada_sulfur <- pull_clean_pur(years = 2000, counties = "nevada", chemicals = "sulfur") nevada_sulfur <- nevada_sulfur[1:2,] save(nevada_sulfur, file = "data/nevada_sulfur.RData")
load("vignettes/data/nevada_sulfur.RData")
head(nevada_sulfur, 2) unique(nevada_sulfur$chemname)
There are several records for active ingredients matching the search term sulfur
for 2003 that were applied across California in 2000 (find_chemical_codes(2000, "sulfur")
); however, only two of these were applied in Nevada county in 2000. If you're using the chemicals
argument to filter a PUR data set, you may want to do additional filtering (for example, if you were only interested "sulfur" active ingredients and not "lime-sulfur").
Application method (aerial_ground
), which indicates if pesticides were applied aerially (A
), by ground (B
), or with another method (O
), can be retained or discarded with the aerial_ground
argument set to TRUE
or FALSE
. This choice becomes more important when summing applied active ingredients and calculating exposure; both situations are discussed further below.
By default, cleaned PUR data sets often have many records per day. There are a few different options for how to sum application so that there is one record per active ingredient or chemical class, PLS unit (section or township), and potentially per method of application (aerial, ground, or other). The sum_application
, unit
, sum
, aerial_ground
, and chemical_class
arguments are all relevant here.
When sum_application
is set to TRUE
, the unit
argument indicates what Public Land Survey (PLS) unit you would like to sum application by ("section"
, the default, or "township"
). If aerial_ground
is set to TRUE
, records will be summed by each method of application as well. The sum
argument can be set to "all"
, the default, or "chemical_class"
. "all"
indicates that you would like to sum all active ingredients present in the data set (which could be filtered using the chemicals
argument) to give a daily value of kilograms of pesticides applied per PLS unit. For example, we can pull data for 2010 in Tulare county, summed by active ingredient and township:
tulare <- pull_clean_pur(2010, "tulare", sum_application = TRUE, unit = "township", aerial_ground = FALSE) tulare <- tulare %>% arrange(township) %>% slice(1:3) save(tulare, file = "data/tulare.RData")
load("vignettes/data/tulare.RData")
tulare %>% arrange(township) %>% slice(1:3)
There is one record per active ingredient per township per day. Alternatively, sum = "chemical_class"
indicates that you would like to group certain active ingredients together to give a daily value for each chemical class that those active ingredients fall into. If sum = "chemical_class"
, the active ingredients that belong to each class are specified with the chemical_class
argument, which takes a data frame with three columns: chem_code
, chem_name
, and chemical_class
. For example, a data frame input to the chemical_class
argument might look like this:
chemical_class_df <- rbind(find_chemical_codes(2000, "methylene"), find_chemical_codes(2000, "aldehyde")) %>% rename(chemical_class = chemical) head(chemical_class_df, 2) tail(chemical_class_df, 2)
You may need to do some additional filtering if the results returned by find_chemical_codes
(which relies on grep
for pattern matching) do not give the appropriate active ingredients for your chemical class of interest. In the case of the chemical_class
data frame above, there would be at most three summed values per day and per PLS unit, one for all active ingredients falling under the class "methylene", one for all active ingredients falling under the class "aldehyde", and one for "other" with all active ingredients that don't fit into the two specified classes.
fresno_classes <- pull_clean_pur(2008, "fresno", sum_application = TRUE, unit = "section", sum = "chemical_class", chemical_class = chemical_class_df, aerial_ground = FALSE) fresno_classes <- rbind(fresno_classes[1:3, ], filter(fresno_classes, (chemical_class == "methylene") | (chemical_class == "aldehyde" & section == "M13S21E36" & date == lubridate::ymd("2008-01-12")))) save(fresno_classes, file = "data/fresno_classes.RData")
load("vignettes/data/fresno_classes.RData")
head(fresno_classes, 3)
unique(fresno_classes$chemical_class)
By default, pull_clean_pur
will print a downloading progress bar for each year of data that you are pulling. You can turn this off by setting the quiet
argument to TRUE
.
If you've already pulled data using pull_raw_pur
, you can clean the data without having to pull it again by inputting the raw data frame to the raw_pur_df
argument. In this case, the years
and counties
arguments should list years and counties present in the raw data frame.
fresno_clean <- pull_clean_pur(2004, "fresno", raw_pur_df = fresno_raw)
If you happen to get an error after a call to either pull_raw_pur
or pull_clean_pur
(because you choose to stop the function while it's running, or if the FTP site is down, for example), check your working directory. You'll probably want to change it back from a temporary directory.
The pull
functions in this package use caching to prevent multiple downloads of the same data from occurring in the current R session. Downloaded PUR data sets, PUR Product Tables, and CA county SpatialPolygonsDataFrame objects (pulled using pull_raw_pur
or pull_clean_pur
, find_product_name
or pull_product_table
, and pull_spdf
, respectively) are saved in a temporary environment that's deleted at the end of the current R session. pull
functions will first check the temporary environment for the requested data, and will only download it from CDPR's FTP server if it has not already been downloaded and temporarily saved in the current session.
The temporary environment is called purexposure_package_env
, and data are saved as objects in a list called pur_lst
. Raw PUR data sets are saved in the format "[year]_[pur county code]" (e.g., "2000_10"), product tabes are saved by year (e.g., "2000"), and SpatialPolygonsDataFrame objects are saved by county name (e.g., "Fresno").
calculate_exposure
to applied pesticidesThe calculate_exposure
function calculates exposure to applied pesticides at a particular location for a given buffer extending from the location, time period, and group of active ingredients. The required arguments are clean_pur_df
, a data frame returned from pull_clean_pur
, location
, which can be an address or a coordinate pair, and radius, which gives the radius of the buffer extending from the location in meters.
Exposure is calculated by summing the amount of applied pesticides in the fraction of sections or townships that are overlapped by the buffer described by radius
. For example, if a radius covers 50% of section A, 25% of section B, 25% of section C, and 30% of section D, the pesticides applied in those section are multiplied by 0.5, 0.25, 0.25, and 0.3, respectively. The time period over which application is summed is determined by the time_period
or start_date
and end_date
arguments. If all three arguments are NULL
, application is summed over the date range of the clean_pur_df
data set. time_period
takes strings of units of time ("2 weeks"
or "6 months"
, for example) and calculates exposure separately using that unit of time as a break point. The first time period will always begin on the first day of the year. If the clean_pur_df
data set has data from February through August, for example, and time_period
is set to "6 months"
, exposure will be calculated separately for January through June and July through December. Alternatively, you can specify a single period of time with start_date
and end_date
.
If the input clean_pur_df
data frame has a chemical_class
column and the chemicals
argument in calculate_exposure
is set to "chemical_class"
, exposure will be calculated for each unique value of the chemical_class
column.
For example, this call calculates exposure in 2015 in a 1,500 m radius buffer extending from Monroe Elementary School for all active ingredients present in applied pesticides:
monroe <- pull_clean_pur(2015, "fresno") %>% calculate_exposure(location = "11842 South Chestnut Ave., Fresno, CA", radius = 1500) monroe$clean_pur_df <- monroe$clean_pur_df[1:2,] monroe$county_plot <- "null" save(monroe, file = "data/monroe.RData") monroe_plot <- df_plot(monroe$buffer_plot_df)
calculate_exposure
returns a list with five elements:
load("vignettes/data/monroe.RData")
names(monroe)
The first is the exposure
data frame. There will be one row per exposure value ($\frac{kg}{m^2}$):
monroe$exposure
The cumulative estimated exposure calculated for the year 2015 for a 1,500 m radius buffer extending from Monroe Elementary is about 0.0046 $\frac{kg}{m^2}$.
The second is a meta_data
data frame, with one row per section or township that intersects with the specified buffer. The values in the meta_data
data frame can be used to re-calculate exposure. The pls
column will give information at the township level if the clean_pur_df
was summed by township; otherwise, it will give information at the section level. The chemicals
column specifies either "all" or a chemical class, and percent
gives the percent intersection of each PLS unit with the buffer. kg
gives the kilograms of pesticides applied in that PLS unit for the given active ingredients and time period, and kg_intersection
gives the values of kg
multiplied by percent
, or the amount of pesticides applied within the specified buffer. start_date
and end_date
give time periods over which application was summed. The values of aerial_ground
will be non-missing if the aerial_ground
argument is set to TRUE
. The none_recorded
column is TRUE
if there were not records of application for the specified PLS unit, active ingredients, and time period. Other columns are location
, radius
and area
(giving the area of the specified buffer in $m^2$).
monroe$meta_data %>% slice(1:3)
To make the process of calculating exposure more clear, we can re-calculate exposure by multiplying percent
by kg
to get kg_intersection
for each section, then summing across all sections and dividing by the buffer area:
monroe$meta_data %>% mutate(kg_intersection = percent*kg) %>% group_by(chemicals, start_date, end_date, aerial_ground) %>% summarise(sum = sum(kg_intersection), area = first(area)) %>% mutate(exposure = sum/area) %>% select(exposure, 1:4)
The buffer_plot_df
element of the calculate_exposure
list is a data frame with spatial plotting data for the buffer and overlapping PLS units. Using df_plot
, we can get a rough idea of the area for which exposure was calculated. The plot_exposure
function, explained further below, returns a similar plot with more information.
df_plot(monroe$buffer_plot_df)
knitr::include_graphics("vignettes/figures/monroe_plot.png")
The county_plot
element shows the specified buffer in the context of the entire county:
monroe$county_plot
knitr::include_graphics("vignettes/figures/monroe_county_plot.png")
And the clean_pur_df
element is the cleaned PUR data set returned from pull_clean_pur
that was input to the clean_pur_df
argument:
monroe$clean_pur_df %>% head(2)
If we wanted to calculate exposure for the same location in four month increments, we can specify time_period = "4 months"
.
monroe2 <- pull_clean_pur(2015, "fresno") %>% calculate_exposure(location = "11842 South Chestnut Ave., Fresno, CA", radius = 1500, time_period = "4 months") monroe2$meta_data <- "null" monroe2$buffer_plot_df <- "null" monroe2$county_plot <- "null" monroe2$clean_pur_df <- "null" save(monroe2, file = "data/monroe2.RData")
load("vignettes/data/monroe2.RData")
monroe2$exposure
In this case, there is one exposure value calculated for each four-month increment in the year 2015. There can similarly be different exposure values calculated for different values of chemical_class
(if the clean_pur_df
had a chemical_class
column and chemicals
was set to "chemical_class"
) and aerial_ground
(if the clean_pur_df
had an aerial_ground
column and aerial_ground
was set to TRUE
).
plot_*
functions: visualize application of and exposure to applied pesticidesPlot exposure for a single location
The plot_exposure
function returns a list with three elements: maps
, pls_data
, and exposure
. pls_data
and exposure
data frames are similar to the meta_data
and exposure
data frames returned by calculate_exposure
. For each exposure value that is calculated, there is a separate map, PLS data frame (with one row per section or township), and exposure data frame (with one row).
The first argument of plot_exposure is a list returned by calculate_exposure
.
plot_monroe <- plot_exposure(monroe) plot_monroe$maps <- "null" plot_monroe$pls_data <- "null" plot_monroe$exposure <- "null" save(plot_monroe, file = "data/plot_monroe.RData")
load("vignettes/data/plot_monroe.RData")
names(plot_monroe)
If we input the monroe
exposure list, the pls_data
and exposure
data frames will be identical to the meta_data
and exposure
data frames returned from calculate_exposure
. The maps
element gives a plot of the location, specified buffer, and PLS units colored according to the amount of pesticide applied.
There are a few different ways to visualize the same data on a maps
plot. The default of the color_by
argument is "amount"
, specifying a scale legend, and that of the buffer_or_county
argument is "county"
, indicating that colors should be scaled according to the range of application for the same time period and active ingredients for the entire county:
plot_monroe$maps
knitr::include_graphics("vignettes/figures/m_map1.png")
Alternatively, pesticide application can be colored according to the percentile of county-level application it falls into (color_by = "percentile"
). The number of cut-points can be specified with the percentile
argument. The default is c(0.25, 0.5, 0.75)
:
plot_exposure(monroe, color_by = "percentile")$maps
knitr::include_graphics("vignettes/figures/m_map2.png")
If color_by = "percentile"
, a fourth list element is returned called "cutoff_values"
, a data frame with two columns (percentile
and kg
) that gives the cutoff value of kilograms of pesticide applied for each percentile.
plot_m_percentile <- plot_exposure(monroe, color_by = "percentile") plot_m_percentile$maps <- "null" plot_m_percentile$pls_data <- "null" plot_m_percentile$exposure <- "null" save(plot_m_percentile, file = "data/plot_m_percentile.RData")
load("vignettes/data/plot_m_percentile.RData")
plot_m_percentile$cutoff_values
In this case, sections with $\leq$ 1,352.539 kilograms of pesticides applied in the portion that intersects with the specified buffer will fall under the lowest percentile category ("<= 25th percentile"
).
By default, fill colors of pesticide application are scaled according to the range of application across the entire county for the appropriate time period and active ingredients. If you'd like fill colors of pesticide application to be scaled according to range of application over the PLS units plotted, you can specify buffer_or_county
to be "buffer"
:
plot_exposure(monroe, buffer_or_county = "buffer")$maps
knitr::include_graphics("vignettes/figures/m_map3.png")
There are a few other arguments that can change the appearance of the returned plot in other ways. fill
takes any palette from the colormap
package. alpha
sets the transparency of fill colors, and is set by default to 0.7
. You can also include section or township labels with pls_labels
set to TRUE
. pls_labels_size
controls the size of these labels (the default is 4
). For example:
plot_exposure(monroe, fill = "density", alpha = 0.5, pls_labels = TRUE, pls_labels_size = 3.5)$maps
knitr::include_graphics("vignettes/figures/m_map4.png")
The plots above visualize a single exposure value. When you specify in your call to calculate_exposure
that you would like to calculate multiple exposure values for a single location (for example, in four-month increments over 2015), the exposure
list element has three rows:
monroe2$exposure
In this case, separate list elements are returned from plot_exposure
for each exposure value. If we input the monroe2
list into plot_exposure
, one map, PLS data frame, and exposure row is output in the form of a separate list element for each exposure value. maps[[1]]
visualizes exposure for January 1st through April 30th, maps[[2]]
for May 1st through August 31st, and maps[[3]]
for September 1st through December 31st.
(Note: Code for the multiplot
function below can be found here: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/)
plot_monroe2 <- plot_exposure(monroe2, color_by = "percentile") maps <- plot_monroe2$maps multiplot(maps[[1]], maps[[2]], maps[[3]], cols = 2)
knitr::include_graphics("vignettes/figures/m_plots.png")
Other plot_exposure
list elements are organized in the same way. For example, the first plot_monroe2$maps
element corresponds to plot_monroe2$pls_data[[1]]
, plot_monroe2$exposure[[1]]
, and plot_monroe2$cutoff_values[[1]]
.
Plot exposure for multiple locations in a county
After calling write_exposure
(explained in more detail further below), the plot_locations_exposure
function could be useful to visually compare exposure at multiple locations. The first argument is an exposure_df
data frame returned from write_exposure
. If there are more than one exposure values calculated per location, the data frame should be manually filtered first. The exposure_df
data frame in this example was written to the "fresno_example" directory in the write_exposure
example below.
exposure_df <- readRDS("~/Documents/fresno_example/exposure_df.rds") nrow(exposure_df) exposure_df[1:3,]
load("vignettes/data/exposure_df.RData") nrow(exposure_df) exposure_df[1:3,]
Since the data frame has four exposure values calculated per location, we need to manually filter the data frame and choose the combination of chemicals, date range, aerial/ground application, and radius to plot:
exposure_df %>% dplyr::filter(radius == 3000 & end_date %in% c(lubridate::ymd(c("2000-04-01", "2005-06-01", "2005-08-01")))) %>% plot_locations_exposure()
knitr::include_graphics("vignettes/figures/locations_plot.png")
Each colored point represents the exposure at that location. You can control the background county's PLS units with section_township
(left as its default, "section"
, in this example), and the fill color of points with the fill
option, which takes any one of the palettes from the colormap
package. The alpha
argument can change the transparency of these points.
Contextualize application
The plot_county_locations
function plots a county's location in California. The first argument, counties_or_df
, can be a vector of county names or codes or a data frame with a county_cd
, county_name
, and/or county_code
column (returned from pull_raw_pur
or pull_clean_pur
).
plot_county_locations(fresno_clean)
knitr::include_graphics("vignettes/figures/ca_plot.png")
You can change the county color with fill_color
, and change transparency with alpha
. If you'd like to plot multiple counties, you can specify that they be on the same plot or on different plots with the seprarate_plots
argument. If separate_plots = TRUE
, plot_county_locations
will return a list of plots.
Plot application by county and PLS unit
The plot_county_application
function returns a list. The first element (map
) is a plot of application by PLS units in county. data
is a data frame with two columns (pls
, giving either section or township IDs, and kg
, giving the plotted values of pesticides). cutoff_values
is a data frame with percentile
and kg
values, and is only returned if application is plotted by percentile (specified with color_by = "percentile"
).
The first argument of plot_county_application
should be a data frame returned by pull_clean_pur
.
fresno <- plot_county_application(fresno_clean) fresno$data <- fresno$data[1:2, ] fresno$map <- "null" save(fresno, file = "data/fresno.RData")
load("vignettes/data/fresno.RData")
names(fresno)
fresno$map
knitr::include_graphics("vignettes/figures/fresno_application.png")
head(fresno$data, 2)
There are additional arguments you can pass to plot_county_application
to change the appearance of the returned map. You can choose to map by township by specifying pls = "township"
, choose to scale colors by amount or percentile with color_by
, specify percentile
break points, choose the date range that you would like to plot with start_date
and end_date
, and choose certain chemicals
to plot. You can also change the color palette with fill
(you can specify any palette from the colormap
package), crop
the plot to PLS units with recorded application, and change the transparency of colors with alpha
.
plot_county_application(fresno_clean, pls = "township", color_by = "percentile", percentile = c(0.2, 0.4, 0.6, 0.8), fill = "magma", crop = TRUE)$map
knitr::include_graphics("vignettes/figures/fresno_plot2.png")
If your clean_pur_df
data frame has data for more than one county, you specify which county
you would like to plot data for.
Plot a time series of application
The plot_application_timeseries
function takes a data frame returned from pull_clean_pur
as its first argument, and returns a time series plot of application summed by day. This plot shows total amounts of pesticides applied per day in Fresno County in 2004:
plot_application_timeseries(fresno_clean)
knitr::include_graphics("vignettes/figures/timeseries1.png")
By setting facet
to be TRUE
, the plot will be faceted by unique values of either chemname
or chemical_class
, depending on which is present in the clean_pur_df
data frame.
fresno_clean %>% filter(chemname %in% toupper(c("sulfur", "sodium chlorate"))) %>% plot_application_timeseries(facet = TRUE)
knitr::include_graphics("vignettes/figures/timeseries2.png")
You can also change the axes scaling with axes
, which passes its value to the ggplot2 function facet_wrap
.
fresno_clean %>% filter(chemname %in% toupper(c("sulfur", "sodium chlorate"))) %>% plot_application_timeseries(facet = TRUE, axes = "free_y")
knitr::include_graphics("vignettes/figures/timeseries3.png")
plot_application_timeseries
returns a ggplot2
plot, which means you can add additional ggplot2
arguments:
plot_application_timeseries(fresno_clean) + ggplot2::theme_classic()
knitr::include_graphics("vignettes/figures/timeseries4.png")
write_exposure
for multiple locations in the same countyYou can use write_exposure
to calculate exposure for multiple locations in the same county at once, and write out files to a specified directory. For example, if your directory is a path to "fresno_example", there will be an exposure_df
data frame written with exposure values for each unique combination of location, date range, chemicals, aerial/ground application, and radius, a meta_data
list of data frames with application data for each PLS unit corresponding to each exposure value, and an "exposure_plots"
subdirectory with plot_exposure
plots for each exposure value:
knitr::include_graphics("vignettes/figures/ex_directory.png")
The function takes a data frame returned from pull_clean_pur
as its first argument, which should have data for the county of your vector of locations. The locations_dates_df
argument takes a data frame specifying start and end dates for each location. This is meant to accomodate situations when you might want to calculate exposure for different time periods for each location, with differing start dates. In the example below, we would be calculating exposure for 3 and 6 month periods for three locations with three different start dates:
locations_dates <- data.frame(location = rep(c("3333 American Ave., Fresno, CA 93725", "1616 South Fruit Ave., Fresno, CA 93706", "295 West Saginaw Ave., Caruthers, CA 93609"), each = 2), start_date = c("2000-01-01", "2000-01-01", "2005-03-01", "2005-03-01", "2005-05-01", "2005-05-01"), end_date = c("2000-04-01", "2000-07-01", "2005-06-01", "2005-09-01", "2005-08-01", "2005-11-01")) locations_dates
The radii
argument takes a vector of numeric radius values in meters. If we wanted to calculate exposure for 1,500 and 3,000 meter buffers extending from each location, we would set radii = c(1500, 3000)
. The chemicals
argument, similarly to calculate_exposure
, can be set to either "all"
or "chemical_class"
(depending on if the data frame pulled using pull_clean_pur
has a chemical_class
column that you would like to group by in exposure calculations), and aerial_ground
can be set to TRUE
or FALSE
.
The directory
argument specifies the directory where you would like the data to be saved. If the folder specified doesn't exist yet, the function will create it. The write_plots
argument is TRUE/FALSE
specifying if you would like to save plots returned from plot_exposure
for each exposure value. write_exposure
takes additional arguments that are passed on to plot_exposure
to control the appearance of these plots.
This example will save four exposure values for each location (3 and 6 month time periods and two radii: 1,500 and 3,000 m). First, we can use find_location_county
to check that all of the addresses are in the same county:
test <- find_location_county(as.character(unique(locations_dates$location))) unique(test$county)
"Fresno"
And pull PUR data for Fresno county and the relevant years:
fresno_data <- pull_clean_pur(c(2000, 2005), "fresno")
A message will be printed out for each of the 12 unique exposure calculations. You can turn this off by setting verbose
to FALSE
. This call will save two .rds files to "~/Documents/fresno_example"
.
write_exposure(clean_pur_df = fresno_data, locations_dates_df = locations_dates, radii = c(1500, 3000), directory = "~/Documents/fresno_example") exposure_df <- readRDS("~/Documents/fresno_example/exposure_df.rds") save(exposure_df, file = "vignettes/data/exposure_df.RData") meta_data <- readRDS("~/Documents/fresno_example/meta_data.rds") save(meta_data, file = "vignettes/data/meta_data.RData")
exposure_df.rds
.This is a data frame that lists exposure values for each unique combination of location, date range, chemicals, aerial/ground application, and radius specified in the function's arguments. There are 12 rows, with four exposure values calculated for each of the three locations:
exposure_df <- readRDS("~/Documents/fresno_example/exposure_df.rds") nrow(exposure_df) head(exposure_df)
load("vignettes/data/exposure_df.RData") nrow(exposure_df) head(exposure_df)
meta_data.rds
. This is a list of data frames. Each data frame has one row per PLS unit (section or township) that is intersected by the relevant buffer. Each list element corresponds to a row of the exposure_df
data frame. For example, the first element of the meta_data
list corresponds to the first row of the exposure_df
data frame.
meta_data <- readRDS("~/Documents/fresno_example/meta_data.rds") length(meta_data) head(meta_data[[1]]) file.copy(from = "~/Documents/fresno_example/exposure_plots/01_exposure_plot.png", to = "vignettes/figures/01_exposure_plot.png")
load("vignettes/data/meta_data.RData") length(meta_data) head(meta_data[[1]])
Additionally, if write_plots
is left as its default (TRUE
), plots returned from plot_exposure
will be saved in a subdirectory called exposure_plots
. Each plot is saved as "#_exposure_plot.png".
list.files("~/Documents/fresno_example/exposure_plots")
c("01_exposure_plot.png", "02_exposure_plot.png", "03_exposure_plot.png", "04_exposure_plot.png", "05_exposure_plot.png", "06_exposure_plot.png", "07_exposure_plot.png", "08_exposure_plot.png", "09_exposure_plot.png", "10_exposure_plot.png", "11_exposure_plot.png", "12_exposure_plot.png")
The # corresponds to the row of the exposure_df
data frame and the element of the meta_data
list. For example, 01_exposure_plot.png
corresponds to exposure_df[1,]
and meta_data[[1]]
:
knitr::include_graphics("vignettes/figures/01_exposure_plot.png")
If color_by
is set to "percentile"
, there is an additional subdirectory ("exposure_plots/cutoff_values") where data frames with cutoff values for each percentile are saved. These files are similarly named ("#_cutoff_values.rds") with file numbers corresponding to #_exposure_plot.png file numbers.
You can browse the source code for this package here: https://github.com/leighseverson/purexposure.
If you find a bug, please report it here: https://github.com/leighseverson/purexposure/issues.
Gunier, Robert B., Martha E. Harnly, Peggy Reynolds, Andrew Hertz, and Julie Von Behren. 2001. “Agricultural pesticide use of California: Pesticide prioritization, use densities, and population distributions for a childhood cancer study.” Environmental Health Perspectives 109 (10): 1071–8.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.