deriveVars: Derive variables by transformation.

View source: R/deriveVars.R

deriveVarsR Documentation

Derive variables by transformation.

Description

deriveVars produces derived variables from explanatory variables by transformation, and returns a list of dataframes. The available transformation types are as follows, described in Halvorsen et al. (2015): L, M, D, HF, HR, T (for continuous EVs), and B (for categorical EVs). For spline transformation types (HF, HR, T), a subset of possible DVs is pre-selected by the criteria described under Details.

Usage

deriveVars(
  data,
  transformtype = c("L", "M", "D", "HF", "HR", "T", "B"),
  allsplines = FALSE,
  algorithm = "maxent",
  write = FALSE,
  dir = NULL,
  quiet = FALSE
)

Arguments

data

Data frame containing the response variable in the first column and explanatory variables in subsequent columns. The response variable should represent either presence and background (coded as 1/NA) or presence and absence (coded as 1/0). The explanatory variable data should be complete (no NAs). See readData.

transformtype

Specifies the types of transformations types to be performed. Default is the full set of the following transformation types: L (linear), M (monotone), D (deviation), HF (forward hinge), HR (reverse hinge), T (threshold), and B (binary).

allsplines

Logical. Keep all spline transformations created, rather than pre-selecting particular splines based on fraction of total variation explained.

algorithm

Character string matching either "maxent" or "LR", which determines the type of model used for spline pre-selection. See Details.

write

Logical. Write the transformation functions to .Rdata file? Default is FALSE.

dir

Directory for file writing if write = TRUE. Defaults to the working directory.

quiet

Logical. Suppress progress messages from spline pre-selection?

Details

The linear transformation "L" is a simple rescaling to the range [0, 1].

The monotone transformation "M" performed is a zero-skew transformation (Økland et al. 2001).

The deviation transformation "D" is performed around an optimum EV value that is found by looking at frequency of presence (see plotFOP). Three deviation transformations are created with different steepness and curvature around the optimum.

For spline transformations ("HF", "HR", and "T"), DVs are created around 20 different break points (knots) which span the range of the EV. Only DVs which satisfy all of the following criteria are retained:

  1. 3 <= knot <= 18 (DVs with knots at the extremes of the EV are never retained).

  2. Chi-square test of the single-variable model from the given DV compared to the null model gives a p-value < 0.05.

  3. The single-variable model from the given DV shows a local maximum in fraction of variation explained (D^2, sensu Guisan & Zimmerman, 2000) compared to DVs from the neighboring 4 knots.

The models used in this pre-selection procedure may be maxent models (algorithm="maxent") or standard logistic regression models (algorithm="LR").

For categorical variables, 1 binary derived variable (type "B") is created for each category.

The maximum entropy algorithm ("maxent") — which is implemented in MIAmaxent as an infinitely-weighted logistic regression with presences added to the background — is conventionally used with presence-only occurrence data. In contrast, standard logistic regression (algorithm = "LR"), is conventionally used with presence-absence occurrence data.

Explanatory variables should be uniquely named. Underscores ('_') and colons (':') are reserved to denote derived variables and interaction terms respectively, and deriveVars will replace these — along with other special characters — with periods ('.').

Value

List of 2:

  1. dvdata: List containing first the response variable, followed data frames of derived variables produced for each explanatory variable. This item is recommended as input for dvdata in selectDVforEV.

  2. transformations: List containing first the response variable, followed by all the transformation functions used to produce the derived variables.

References

Guisan, A., & Zimmermann, N. E. (2000). Predictive habitat distribution models in ecology. Ecological modelling, 135(2-3), 147-186.

Halvorsen, R., Mazzoni, S., Bryn, A., & Bakkestuen, V. (2015). Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt. Ecography, 38(2), 172-183.

Økland, R.H., Økland, T. & Rydgren, K. (2001). Vegetation-environment relationships of boreal spruce swamp forests in Østmarka Nature Reserve, SE Norway. Sommerfeltia, 29, 1-190.

Examples

toydata_dvs <- deriveVars(toydata_sp1po, c("L", "M", "D", "HF", "HR", "T", "B"))
str(toydata_dvs$dvdata)
summary(toydata_dvs$transformations)

## Not run: 
# From vignette:
grasslandDVs <- deriveVars(grasslandPO,
                           transformtype = c("L","M","D","HF","HR","T","B"))
summary(grasslandDVs$dvdata)
head(summary(grasslandDVs$transformations))
length(grasslandDVs$transformations)
plot(grasslandPO$terslpdg, grasslandDVs$dvdata$terslpdg$terslpdg_D2, pch=20,
     ylab="terslpdg_D2")
plot(grasslandPO$terslpdg, grasslandDVs$dvdata$terslpdg$terslpdg_M, pch=20,
     ylab="terslpdg_M")

## End(Not run)


julienvollering/MIAmaxent documentation built on July 6, 2023, 11:22 p.m.