checkOccurrence: Quality control for input occurrence data

Description Usage Arguments Value See Also Examples

Description

Given a dataframe occurrence of occurrence records and rasters giving the evidence consensus (consensus) and admin units (admin), run a series of quality control check on the data. The checks are:

  1. The column names of occurrence are checked to ensure that the following required fields are present and of the correct class:

    • 'UniqueID' (integer): the record's unique identifying number

    • 'Admin' (integer, either 0, 1, 2, 3, or -999): the admin level for polygons or -999 for points

    • 'Year' (integer): the year of the occurrence record

    • 'Longitude' (numeric): the longitude either of a point or a polygon centroid

    • 'Latitude' (numeric): the latitude either of a point or a polygon centroid

    • 'Area' (numeric): the area in square decimal degees of the polygon, or NA for a pixel

    Any error is thrown if any of these are not presnt or of the wrong class.

  2. Polygon records are checked for duplicated polygon/year combinations and an error is thrown if any are present.

  3. Polygons with an 'Area' value greater than area_threshold square decimal degrees are removed.

  4. Records with coordinates which fall in cells with an evidence consensus value below evidence_threshold are removed.

  5. Coordinates are checked to make sure they don't fall outside the mask (i.e. in consensus cells which are NA). If any do fall in NA, they are moved to the nearest non-NA cell, provided it is less than max_distance decimal degrees away. If this isn't possible the record is removed.

    The GAUL code for any polygons

If the occurrence data fails any checks outright, the function will stop and throw an error message. Otherwise the cleaned and corrected occurrence dataframe will be returned. consensus is assumed to conform to the mastergrid template and have a projected wgs84 coordinate reference. It may be worth checking this using checkRasters first.

Usage

1
2
3
checkOccurrence(occurrence, consensus, admin, consensus_threshold = -25,
                             area_threshold = 1, max_distance = 0.05,
                             spatial = TRUE, verbose = TRUE) 

Arguments

occurrence

A dataframe containing details of occurrence records

consensus

A RasterLayer object with projected WGS 1984 coordinate reference giving the regional evidence consensus scores (between -100 and 100). This should conform to the mastergrid template

admin

A RasterBrick or RasterStack object with four layers giving the GAUL codes for different admin levels. The layers must be in the order 0, 1, 2, 3, as they are in the example object admin.

consensus_threshold

A minimum evidence consensus value. Occurrence records will be removed if they fall in a cell of consensus with a value less than this

area_threshold

The minimum area (in square decimal degrees) of polygons for inclusion. Polygon records with a lower value will be removed.

max_distance

The maximum distance (in decimal degrees) to search for a non-NA cell when attempting to reassign coordinates which fall in NA cells of consensus

spatial

Whether to return a SpatialPointsDataFrame (if TRUE) or just a dataframe (if FALSE)

verbose

Whether to print information on non-critical issues to the console.

Value

Provided none of the checks were failed outright, either a dataframe (if spatial = FALSE) or a SpatialPointsDataFrame (if spatial = TRUE) containing the cleaned and checked occurrence data.

See Also

checkRasters, nearestLand

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# load occurrence data, consensus and admin objects
data(occurrence)
data(consensus)
data(admin)

head(occurrence)
class(occurrence)

# run checkOccurrence
occ <- checkOccurrence(occurrence, consensus, admin)

head(occ)
class(occ)

SEEG-Oxford/seegSDM documentation built on May 9, 2019, 11:08 a.m.