bvStep: Clarke and Ainsworth's BVSTEP routine

View source: R/bvStep.R

bvStepR Documentation

Clarke and Ainsworth's BVSTEP routine

Description

The bvStep function performs Clarke and Ainsworth's (1993) "BVSTEP" routine which is a algorithm that searches for highest correlation (Mantel test) between dissimilarities of a fixed and variable multivariate datasets. The test is the same as that performed by the bioEnv function but the routine provides a more efficient search of combinations when the number of variables is large.

Usage

bvStep(
  fix.mat,
  var.mat,
  fix.dist.method = "bray",
  var.dist.method = "euclidean",
  scale.fix = FALSE,
  scale.var = TRUE,
  max.rho = 0.95,
  min.delta.rho = 0.001,
  random.selection = TRUE,
  prop.selected.var = 0.2,
  num.restarts = 10,
  var.always.include = NULL,
  var.exclude = NULL,
  output.best = 10
)

Arguments

fix.mat

The "fixed" matrix of community or environmental sample by variable values

var.mat

A "variable" matrix of community or environmental sample by variable values

fix.dist.method

The method of calculating dissimilarity indices bewteen samples in the fixed matrix (Uses the vegdist function from the vegan package to calculate distance matrices. See the documentation for available methods.). Defaults to Bray-Curtis dissimularity "bray".

var.dist.method

The method of calculating dissimilarity indices bewteen samples in the variable matrix. Defaults to Euclidean dissimularity "euclidean".

scale.fix

Logical. Should fixed matrix be centered and scaled (Defaults to FALSE, recommended for biologic data).

scale.var

Logical. Should fixed matrix be centered and scaled (Defaults to TRUE, recommended for environmental data to correct for differing units between variables).

max.rho

Numeric value between 0 and 1. Provides a maximum Spearman rank correlation ("rho") by which to stop the searching process. This is especially important when conducting a "BIOBIO" or "ENVENV" type setup where rho will be equal to 1 with the full set of variables (see bioEnv for an explanation to these types of setups). Defaults to max.rho=0.95

min.delta.rho

Numeric value. Defines a minimum change in the improvement of Spearman rank correlation ("rho"). When not satisfied, bvStep will terminate the search process and return results of the best variable correlations.

random.selection

Logical. When random.selection=TRUE (Default), the algorithm will begin each restart with a random number of variables from the variable dataset. When random.selection=FALSE, a single search is conducted starting with all variables.

prop.selected.var

Numeric. Value between 0 and 1 indicating the proportion of variables to include at each restart.

num.restarts

Numeric. Number of restarts (Default: num.restarts=50)

var.always.include

Numeric vector. A vector of column numbers from the variable dataset to include at the each restart.

var.exclude

Numeric vector. A vector of column numbers from the variable dataset to always exclude at the each restart and during the search process.

output.best

Numeric value. Number of best combinations to return in the results object (Default=10).

Details

The variable multivariate data set has 2^n-1 possible combinations to test, where n is the number of variables. Testing all variable combinations is thus unrealistic, computationally, when the number of variables is high (e.g. 20 variables contain >1e6 combinations). This may often be the case when conducting a BIOBIO type analysis , where the number of species combinations to search can be quite large (see bioEnv for an explanation of other types of analyses beyond the typical "BIOENV"). Below is an example of a two-step search refinement for searching for subsets of variables that best correlate with a fixed mutlivariate set.

References

Clarke, K. R & Ainsworth, M. 1993. A method of linking multivariate community structure to environmental variables. Marine Ecology Progress Series, 92, 205-219.

Examples



library(vegan)
data(varespec)
data(varechem)

# Example of a 2-round BIO-BIO search. Uses the most frequently included variables
# in the first round at the beginning of each restart in the second round
# first round
set.seed(1)
res.biobio1 <- bvStep(wisconsin(varespec), wisconsin(varespec), 
 fix.dist.method="bray", var.dist.method="bray",
 scale.fix=FALSE, scale.var=FALSE, 
 max.rho=0.95, min.delta.rho=0.001,
 random.selection=TRUE,
 prop.selected.var=0.3,
 num.restarts=50,
 output.best=10,
 var.always.include=NULL
)
res.biobio1 # Best rho equals 0.833 (10 of 44 variables)

#second round - always includes variables 23, 26, and 29 ("Cla.ran" "Cla.coc" "Cla.fim")
set.seed(1)
res.biobio2  <- bvStep(wisconsin(varespec), wisconsin(varespec), 
 fix.dist.method="bray", var.dist.method="bray",
 scale.fix=FALSE, scale.var=FALSE, 
 max.rho=0.95, min.delta.rho=0.001,
 random.selection=TRUE,
 prop.selected.var=0.3,
 num.restarts=50,
 output.best=10,
 var.always.include=c(23,26,29)
)
res.biobio2 # Best rho equals 0.895 (15 of 44 variables)

# A plot of best variables
MDS_res=metaMDS(wisconsin(varespec), distance = "bray", k = 2, trymax = 50)
bio.keep <- as.numeric(unlist(strsplit(res.biobio2$order.by.best$var.incl[1], ",")))
bio.fit <- envfit(MDS_res, varespec[,bio.keep], perm=999)
bio.fit 

plot(MDS_res$points, t="n",xlab="NMDS1", ylab="NMDS2")
plot(bio.fit, col="gray50", cex=0.8, font=4) # display only those with p>0.1
text(MDS_res$points, as.character(1:length(MDS_res$points[,1])), cex=0.7)
mtext(paste("Stress =",round(MDS_res$stress, 2)), side=3, adj=1, line=0.5)

# Display only those with envfit p >= 0.1
plot(MDS_res$points, t="n",xlab="NMDS1", ylab="NMDS2")
plot(bio.fit, col="gray50", p.max=0.1, cex=0.8, font=4) # p.max=0.1
text(MDS_res$points, as.character(1:length(MDS_res$points[,1])), cex=0.7)
mtext(paste("Stress =",round(MDS_res$stress, 2)), side=3, adj=1, line=0.5)




marchtaylor/sinkr documentation built on July 4, 2022, 5:48 p.m.