gdm.formatsitepair: Converts common biodiversity (response) / environmental...

Description Usage Arguments Details Value Author(s)

View source: R/GDM_Table_Funcs.R

Description

This function takes input biodiversity and environmental data in various formats and builds a site-pair formatted table required for fitting Generalized Dissimilarity Models. NOTE: Sample site x-y coordinates MUST be present in either the biodiversity or environmental data.

The input biodiversity data can be in one of the following commonly used formats (note that "species" can be replaced with most any relevant response variable):

1: site-species matrix

2: x, y, species list

3: site-site distance (dissimilarity) matrix

Environmental (predictor) data can be either: (i) a site-predictor matrix with a column for each predictor variable; or (ii) a raster stack, with one raster for each predictor variable. NOTE: Optionally, additional site-site dissimilarity predictor matrices (e.g., describing pairwise site dissimilarities of another taxa to be used as a predictor) can be identified separately using the "distPreds" argument (See below).

Usage

1
formatgdmData(bioData, bioFormat, dist = "bray", abundance = F, siteColumn = NULL, XColumn, YColumn, sppColumn = NULL, abundColumn = NULL, sppFilter = 0, predData, distPreds = NULL, weightType = "def", custWeightVect = NULL)

Arguments

bioData

The input biodiversity (response) data table, in one of the three formats mentioned above (See Details).

bioFormat

An integer code specifying the format of bioData. Acceptable values are 1, 2, or 3 (See Details).

dist

Default = "bray". A character code indicating the metric to quantify pairwise site distances / dissimilarities. Uses the vegdist function from the vegan package to calculate dissimilarity and therefore accepts any method available from vegdist.

abundance

Default: FALSE. Are the biodiversity data abundance (TRUE) or presence-absence (FALSE)

siteColumn

The name of the column in either the biodiversity or environmental data table containing site codes/names. If a site column is provided in both the biodiversity and environmental data, the column name must be identical.

XColumn

The name of the column containing x-coordinates of sample sites. X-coordinates can be provided in either the biodiversity or environmental data tables, but MUST be in at least one of them. If an x-coordinate column is provided in both the biodiversity and environmental data, the column name must be identical.

YColumn

The name of the column containing y-coordinates of sample sites. Y-coordinates can be provided in either the biodiversity or environmental data tables, but MUST be in at least one of them. If a y-coordinate column is provided in both the biodiversity and environmental data, the column name must be identical.

sppColumn

The name of the column containing unique identifiers for each species. Only used if bioFormat = 2 (x, y, species list).

abundColumn

The name of the column containing species abundance or presence-absence data. Only used if bioFormat = 2 (i.e., x, y, species list).

sppFilter

Default = 0. To account for limited sampling effort at some sites, sppFilter removes all sites with richness less than the specified value. For example, if sppFilter = 5, all sites with fewer than 5 recorded species will be removed.

predData

The environmental predictor data. Accepts either a site-predictor matrix or a raster stack.

distPreds

A list of distance matrices to be used a predictors, either in combination with or as a substitute for predData. For example, pairwise dissimilarity for one taxa (e.g., ferns) can be used as a a predictor for another group (e.g., trees).

weightType

Default = "equal". Defines the model weighting for sites. Can be one of either (1) "equal" (all weights set to 1.0), (2) "richness" (sum of presence records), or (3) "cust" (user defined). If "cust", the user must provide a vector of site weights of suitable length (number of site pairs).

custWeightVect

A vector of user-defined site weights. Required when weightType = "cust". Ignored otherwise.

Details

bioData and bioFormat: The function accepts biodiversity data in the following commonly used formats:

bioData = site-species matrix; bioFormat = 1: assumes that the response data are provided with a site ID column (specified by siteCol) and, optionally, two columns for the x & y coordinates. All remaining columns contain the biodiversity data, with a column for each entity (most commonly species). In the case that a raster stack is provided for the environmental data (predData), x-y coordinates MUST be provided in bioData. The x-y coordinates will be intersected with the raster stack, and if the number of unique cells intersected by the points is less than the number of unique site IDs (e.g., multiple site fall within a single cell), the function will use the raster cell as the site ID and aggregate sites accordingly. Therefore, model fitting will be sensitive to raster cell size. If the environmental data are in tabular format, they should have the same number of sites (i.e., same number of rows) as bioData. The x-y coordinate and site ID columns must have the same names in bioData and predData.

bioData = x, y, species list; bioFormat = 2: assumes a table of 3 or 4 columns, the first two being the x & y coordinates of species records, the third (sppCol) being the name of the taxa observed at that location, and optionally a fourth column indicating the observed abundance. If an abundance column is not provided, presence-only data are assumed. In the case that a raster stack is provided for the environmental data (predData), the x-y coordinates will be intersected with the raster stack, and if the number of unique cells intersected by the points is less than the number of unique site IDs (e.g., multiple site fall within a single cell), the function will use the raster cell as the site ID and aggregate sites accordingly. Therefore, model fitting will be sensitive to raster cell size.

bioData = site-site distance (dissimilarity) matrix; bioFormat = 3: is used when a site-site distance (dissimilarity) matrix has already been created. Only the lower half of the triangle is needed to create the site-pair output table, but this function automatically removes the upper half if present. This is the only case in which the environmental data MAY NOT be a raster stack.

NOTE: The function assumes that the x-y coordinates and (when used) raster stack are in the same coordinate system. No checking is performed to confirm this is the case.

Value

A site-pair formatted table containing the response (biological distance or dissimilarity), predictors, and weights as required for fitting Generalized Dissimilarity Models.

Author(s)

Matthew Lisk <mlisk@al.umces.edu>, Matthew Fitzpatrick <mfitzpatrick@al.umces.edu>, Glenn Manion, Simon Ferrier


Gdm01 documentation built on May 2, 2019, 4:54 p.m.