auk_filter: Filter the EBD using AWK

Description Usage Arguments Details Value Examples

Description

Convert the filters defined in an auk_ebd object into an AWK script and run this script to produce a filtered eBird Reference Dataset (ERD). The initial creation of the auk_ebd object should be done with auk_ebd() and filters can be defined using the various other functions in this package, e.g. auk_species() or auk_country(). Note that this function typically takes at least a couple hours to run on the full EBD.

Usage

1
2
auk_filter(x, file, file_sampling, awk_file, sep, filter_sampling, execute,
  overwrite)

Arguments

x

auk_ebd object; reference to EBD file created by auk_ebd() with filters defined.

file

character; output file.

file_sampling

character; optional output file for EBD sampling data.

awk_file

character; output file to optionally save the awk script to.

sep

character; the input field separator, the EBD is tab separated by default. Must only be a single character and space delimited is not allowed since spaces appear in many of the fields.

filter_sampling

logical; whether the EBD sampling event data should also be filtered.

execute

logical; whether to execute the awk script, or output it to a file for manual execution. If this flag is FALSE, awk_file must be provided.

overwrite

logical; overwrite output file if it already exists

Details

If an EBD sampling file is provided in the auk_ebd object, this function will filter both the EBD and the sampling data using the same set of filters. This ensures that the files are in sync, i.e. that they contain data on the same set of checklists.

The AWK script can be saved for future reference by providing an output filename to awk_file. The default behavior of this function is to generate and run the AWK script, however, by setting execute = FALSE the AWK script will be generated but not run. In this case, file is ignored and awk_file must be specified.

Calling this function requires that the command line utility AWK is installed. Linux and Mac machines should have AWK by default, Windows users will likely need to install Cygwin.

Value

An auk_ebd object with the output files set. If execute = FALSE, then the path to the AWK script is returned instead.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# define filters
filters <- system.file("extdata/ebd-sample.txt", package = "auk") %>%
  auk_ebd() %>%
  auk_species(species = c("Gray Jay", "Blue Jay")) %>%
  auk_country(country = c("US", "Canada")) %>%
  auk_extent(extent = c(-100, 37, -80, 52)) %>%
  auk_date(date = c("2012-01-01", "2012-12-31")) %>%
  auk_time(time = c("06:00", "09:00")) %>%
  auk_duration(duration = c(0, 60)) %>%
  auk_complete()
## Not run: 
# temp output file
out_file <- tempfile()
auk_filter(filters, file = out_file) %>%
  read_ebd() %>%
  str()
# clean
unlink(out_file)

## End(Not run)

mstrimas/auk documentation built on May 20, 2019, 5:26 p.m.