pull_clean_pur: Pull cleaned PUR data by counties, years, and active...

Description Usage Arguments Details Value Note Examples

View source: R/03-pull.R

Description

pull_clean_pur returns a data frame of cleaned Pesticide Use Report data filtered by counties, years, and active ingredients. Active ingredients or chemical classes present in applied pesticides can be summed by either Public Land Survey (PLS) section or township.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
pull_clean_pur(
  years = "all",
  counties = "all",
  chemicals = "all",
  sum_application = FALSE,
  unit = "section",
  sum = "all",
  chemical_class = NULL,
  aerial_ground = TRUE,
  verbose = TRUE,
  quiet = FALSE,
  ...
)

Arguments

years

A four-digit numeric year or vector of years, starting with 1990. Indicates the years for which you would like to pull PUR data sets. years == "all" will pull data from 1990 through the most recent year of available data.

counties

A vector of character strings giving either a county name, two digit PUR county codes, or six-digit FIPS county codes for each county. Not case sensitive. California names, county codes as they appear in PUR data sets, and FIPS county codes can be found in the county_codes data set available with this package. For example, to return data for Alameda county, enter either "alameda", "01", or "06001" for the counties argument. counties = "all" will return data for all 58 California counties (this will take a while to run).

chemicals

A string or vector of strings giving search terms of chemicals to match with active ingredients present in pesticides applied in the given years. The default value is "all", which returns records for all active ingredients applied in a given year. See the CDPR's Summary of PUR Data document here: http://www.cdpr.ca.gov/docs/pur/pur08rep/chmrpt08.pdf for comprehensive classifications of active ingredients.

sum_application

TRUE / FALSE indicating if you would like to sum the amounts of applied active ingredients by day, the geographic unit given in unit, and by either active ingredients or chemical class (indicated by sum and chemical_class). The default value is FALSE.

unit

A character string giving either "section" or "township". Specifies whether applications of each active ingredient should be summed by California section (the default) or by township. Only used if sum_application is TRUE.

sum

A character string giving either "all" (the default) or "chemical_class". If sum_application = TRUE, sum indicates whether you would like to sum across all active ingredients, giving an estimation of the total pesticides applied in a given section or township ("all"), or by a chemical class specified in a data frame given in the argument chemical_class.

chemical_class

A data frame with only three columns: chem_code, chemname, and chemical_class. chem_code should have integer values giving PUR chemical codes, and chemname should have character strings with corresponding PUR chemical names (these can be searched for using the find_chemical_codes function or with the chemical_list data set included with this package). The chemical_class column should have character strings indicating the chemical class corresponding to each chem_code. The chemical_class for a group of active ingredients should be decided upon by the user. Only used if sum = "chemical_class". See the CDPR's Summary of PUR Data document here: http://www.cdpr.ca.gov/docs/pur/pur08rep/chmrpt08.pdf for comprehensive classifications of active ingredients.

aerial_ground

TRUE / FALSE indicating if you would like to retain aerial/ground application data ("A" = aerial, "G" = ground, and "O" = other.) The default is TRUE.

verbose

TRUE / FALSE indicating whether you would like a single message printed indicating which counties and years you are pulling data for. The default value is TRUE.

quiet

TRUE / FALSE indicating whether you would like a message and progress bar printed for each year of PUR data that is downloaded. The default value is FALSE.

...

Used internally.

Details

PUR data sets are pulled by county from the CDPR's FTP server. Downloaded PUR data sets are saved in a temporary environment, which is deleted at the end of the current R session.

Value

A data frame:

chem_code

An integer value giving the PUR chemical code for the active ingredient applied. Not included if sum_application = TRUE and sum = "chemical_class".

chemname

A character string giving PUR chemical active ingredient names. Unique values of chemname are matched with terms provided in the chemicals argument. Not included if sum_application = TRUE and sum = "chemical_class".

chemical_class

If sum_application = TRUE and sum = "chemical_class", this column will give values of the chemical_class column in the input chemical_class data frame. If there are active ingredients pulled based on the chemicals argument that are not present in the chemical_class data frame, these chemicals will be summed under the class "other".

kg_chm_used

A numeric value giving the amount of the active ingredient applied (kilograms).

section

A string nine characters long indicating the section of application. PLS sections are uniquely identified by a combination of base line meridian (S, M, or H), township (01-48), township direction (N or S), range (01-47), range direction (E or W) and section number (01-36). This column is not included if sum_application = TRUE and unit = "township".

township

A string seven characters long indicating the township of application. PLS townships are uniquely identified by a combination of base line meridian (S, M, or H), township (01-48), township direction (N or S), range (01-47), and range direction (E or W).

county_name

A character string giving the county name where application took place.

pur_code

A string two characters long giving the PUR county code where application took place.

fips_code

A string six characters long giving the FIPS county code where application took place.

date

The date of application (yyyy-mm-dd).

aerial_ground

A character giving the application method. "A" = aerial, "G" = ground, and "O" = other. Not included if aerial_ground = FALSE.

use_no

A character string identifying unique application of an active ingredient across years. This value is a combination of the raw PUR use_no column and the year of application. Not included if sum_appliction = TRUE.

outlier

If the amount listed in kg_chm_used has been corrected for large amounts entered in error, this column lists the raw value of recorded kilograms of applied chemicals. Otherwise NA. The algorithm for identifying and replacing outliers was developed based on methods used by Gunier et al. (2001). Please see the package vignette for more detail regarding these methods. Not included if sum_application = TRUE.

prodno

Integer. The California Registration Number for the applied pesticide (will be repeated for different active ingredients present in the product). You can match product registration numbers with product names, which can be pulled using the pull_product_table function. This column is not returned if sum_application = TRUE.

Note

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
library(magrittr)


df <- pull_clean_pur(years = 2000:2001,
                     counties = c("06001", "29", "riverside"),
                     chemicals = "methylene",
                     aerial_ground = TRUE)

# filter to active ingredients present in particular products
prod_nos <- find_product_name(2003, "insecticide") 
    dplyr::select(prodno) 
    tibble_to_vector()

df2 <- pull_clean_pur(2003, "10") 
    dplyr::filter(prodno 

# Sum application by active ingredients
df3 <- pull_clean_pur(years = 2009:2010,
                      counties = c("01", "29", "riverside"),
                      unit = "township",
                      sum_application = TRUE)

# Or by chemical classes
chemical_class_df <- rbind(find_chemical_codes(2000, "methylene"),
                           find_chemical_codes(2000, "aldehyde")) 
   dplyr::rename(chemical_class = chemical)

df4 <- pull_clean_pur(years = 1995,
                      counties = "fresno",
                      chemicals = chemical_class_df$chemname,
                      sum_application = TRUE,
                      sum = "chemical_class",
                      unit = "township",
                      chemical_class = chemical_class_df)

leighseverson/purexposure documentation built on Aug. 13, 2021, 6:34 p.m.