add_flags: Calculate flags for a set of records and add them to the...
In jotegui/rgeospatialquality: Wrapper for the Geospatial Data Quality REST API

View source: R/add_flags.R

add_flags

R Documentation

Calculate flags for a set of records and add them to the provided data frame

Description

add_flags calls the POST method of the API in order to extract the flags for a set of records. NOTE: currently, the API imposes a hard-limit of 1000 records per request, to avoid malfunctioning due to some third-party library limtations. This function will not work if a data.frame with more than 1000 rows is provided.

Usage

add_flags(indf, guess_fields = FALSE, show_summary = TRUE, quiet = FALSE,
  ...)

## Default S3 method:
add_flags(indf, guess_fields = FALSE, show_summary = TRUE,
  quiet = FALSE, ...)

## S3 method for class 'data.frame'
add_flags(indf, guess_fields = FALSE,
  show_summary = TRUE, quiet = FALSE, ...)

Arguments

`indf`	Required. Properly formatted data frame containing a row per record
`guess_fields`	Optional. Try or not to guess key fields if names don't follow the DarwinCore standard (see details). Defaults to FALSE, meaning it won't try to guess field names and will throw warnings for each missing field. Set to TRUE to try to guess field names (this will make the function stop if no match can be found for any of the key fields)
`show_summary`	Optional. Show a summary of the quality flags after the process has finished. Defaults to TRUE
`quiet`	Optional. Don't show any logging message at all. Defaults to FALSE
`...`	Any extra parameters for `httr` `POST`

Details

Internally, the function takes the provided data.frame, transforms it to JSON and makes a POST request to the underlying API with the JSON object in the body of the request. In order to work properly and give comprehensive results, the data.frame should have the four key fields this API works with. See flags for details. If a field is missing, the function will show a warning. If the name of the fields in your data.frame don't conform to the DarwinCore standard, add_flags can try to map the names in the data.frame to the standard ones if the parameter guess_fields is set to TRUE. In this case, if there is no match, the function will stop and give instructions on how to resume. If there is, the original name in the data.frame will not change.

After finishing, the function returns the provided data.frame with a new column, flags, which holds for each record a list of the geospatial quality flags. If show_summary is TRUE (default value), it also shows a summary of the results, indicating how many records showed different types of issues.

Value

The provided data frame with the quality flags added as new columns

Examples

## Not run: 
# Using the rgbif package
if (requireNamespace("rgbif", quietly=TRUE)) {
 library("rgbif")

 # Prepare data
 d <- occ_data(scientificName="Apis mellifera", limit=50, minimal=FALSE)
 d <- d$data

 # Format data.frame
 d <- format_gq(d, source="rgbif")

 # Execute the call to the API, showing output and
 # logging information, and store the results
 dd <- add_flags(d)

 # Alternatively, instead of formating with 'format_gq', make the function
 # guess the correct name of the fields.
 dd <- add_flags(d, guess_fields=TRUE)

 # Execute the call without showing summary output, but
 # showing logging information
 dd <- add_flags(d, show_summary=FALSE)

 # Execute the call without showing any logging at all
 # (except errors, obviously)
 dd <- add_flags(d, quiet=TRUE)

 # Data quality output will be stored in a new field called flags
 names(dd$flags)

 # You can check records with certain flags as usual
 # See records with coordinates-country mismatch
 dd[dd$flags$coordinatesInsideCountry == FALSE,]
}

## End(Not run)

jotegui/rgeospatialquality documentation built on May 16, 2022, 5:26 p.m.