humboldt.doitall: 'Do it all'. Performs most analyses in Humboldt and outputs...
In jasonleebrown/humboldt: Analysis of Species in Environmental Space

humboldt.doitall

R Documentation

'Do it all'. Performs most analyses in Humboldt and outputs the results as Jason likes!

Description

'Do it all'. Performs most analyses in Humboldt and outputs the results as Jason likes!

Usage

humboldt.doitall(
  inname = "DoItAll",
  env1,
  env2,
  sp1,
  sp2,
  rarefy.dist = 0,
  rarefy.units = "dd",
  env.reso,
  reduce.env = 2,
  reductype = "PCA",
  non.analogous.environments = "NO",
  nae.window = 5,
  env.trim = T,
  env.trim.type = "MCP",
  trim.mask1,
  trim.mask2,
  trim.buffer.sp1 = 200,
  trim.buffer.sp2 = 200,
  color.ramp = 1,
  correct.env = T,
  pcx = 1,
  pcy = 2,
  col.env = e.var,
  e.var,
  R = 100,
  kern.smooth = 1,
  e.reps = 100,
  b.reps = 100,
  b.force.equal.sample = F,
  nae = "YES",
  thresh.espace.z = 1e-04,
  p.overlap = T,
  p.boxplot = F,
  p.scatter = F,
  run.silent = F,
  ncores = 1
)

Arguments

`inname`	string for labelling results output from function
`env1`	environmental variables for all sites of the study area 1 (env1). Column names should be x,y,X1,X2,...,Xn; with X1-Xn being any string label. If env1=env2, input the same file twice.
`env2`	environmental variables for all sites of the study area 2 (env2). Column names should be x,y,X1,X2,...,Xn; with X1-Xn being any string label. If env1=env2, input the same file twice.
`sp1`	occurrence sites for the species/population 1 at study area 1 (env1). Column names should be 'sp', 'x','y'
`sp2`	occurrence sites for the species/population 2 at study area 2 (env2). Column names should be 'sp', 'x', 'y'
`rarefy.dist`	remove occurrences closer than a minimum distance to each other (this function uses the humboldt.occ.rarefy function). Values need to be in km[recommended] or decimal degrees. See associated parameter rarefy.units. Note: rarefy.dist=0 will remove no occurrences
`rarefy.units`	the units of rarefy.dist parameter, either "km" for kilometers or "dd" for decimal degrees
`env.reso`	the resolution of the input environmental data grid in decimal degrees
`reduce.env`	the format to trim environmental space so that it is shared. If reduce.env=1, the second input environment (env2) will be trim the match the first input (env1). If reduce.env=2, both input environments trimmed so that extents of both are identical (the lower maximum value observed in env1 and env2 and the higher minimum value observed in env1 and env2 will be used to trim environmental space for each PC/environmental variable) If reduce.env=0, you will skip trimming environmental space
`reductype`	if reduce.env= 1 or 2, the 'reducetype' parameter specifies the format for how to reduce environmental space ("PCA" or "STANDARD"). If reductype="PCA", the environmental space will be trimmed based on two principal components. If reductype="STANDARD", the environmental space will be trimmed by each included variable specified in col.env. If reduce.env=0, do not include this parameter
`non.analogous.environments`	allow non-analogous environments in environmental space? If non.analogous.environments="YES", non-analogous environments between env1 and env2 will retained. If non.analogous.environments="NO", non-analogous environments between env1 and env2 will be removed. This parameter is only usable under the combinations of reductype="PCA" & reduce.env=1 or reductype="PCA" & reduce.env=2. If something else, this parameter will not be used.
`nae.window`	the spatial window from which non-analogous environments will be quantified. The non-analogous environments are characterized by gridding the espace of env1 and env2 into a R x R grid (e.g. 100 x 100). If nae.window=0, values absent from a cell in one environment will be removed from the other. If nae.window>0, values absent from a window (or neighborhood of cells) in one environment will be trimmed from the other. The nae.window value characterizes the number of cells to search from the focal cell of environmental space values in the other environment. The larger the nae.window value, the fewer non-analogous environments removed. This parameter allows imperfect overlap of environments. If areas of environmental space are a little patchy between environments—but generally present– a larger nae.window value will retain more of the patch environments. The default value is a nae.window=5
`env.trim`	Trim extent of environmental data in geographic space. Necessary for comparing if species have diffrent access to habitats. If this env.trim=T, non-accesible environments will be removed. See associated parameters "env.trim.type", "trim.buffer.sp1", "trim.buffer.sp2"
`env.trim.type`	This parameter is only used if env.trim=TRUE. There options exist, trim enviromental data by: a buffered minimum-convex-polygon (env.trim.type="MCP"), a buffer around all occurence localties (env.trim.type="RADIUS"), and an advanced input mask option (env.trim.type="MASK") that allows a user to input a shapefile. For env.trim.type="RADIUS" & "MCP", the parameters 'trim.buffer.sp1' and 'trim.buffer.sp2' specifices the buffer distance used to trim accessible enviroments in km. Default= "MCP"
`trim.mask1`	This parameter is only used if env.trim.type="MASK". This allows users to input their own mask that trim the environmental data based on access to habitats. This parameter directs Humboldt the user to input a shapefile for species 1. Input the text name of your stored shapefile in your R global enviroment. Imporant note: CRS of shapefile for must equal "+proj=longlat +datum=WGS84"
`trim.mask2`	This parameter is only used if env.trim.type="MASK". This allows users to input their own mask that trim the environmental data based on access to habitats. This parameter directs Humboldt the user to input a shapefile for species 2. Input the text name of your stored shapefile in your R global enviroment. Imporant note: CRS of shapefile for must equal "+proj=longlat +datum=WGS84"
`trim.buffer.sp1`	buffer distance (in km) for trimming available environmental space for sp1
`trim.buffer.sp2`	buffer distance (in km) for trimming available environmental space for sp2
`color.ramp`	An integer from 1-6 depict Humbodlt's six coloramps: 1= rgb 2= plasma, 3=viridis, 4=sunset, 5= rainbow, 6= greyscale. for visual depiction, see: https://github.com/jasonleebrown/humboldt/blob/master/HumboldtInputExp.pdf
`correct.env`	if correct.env=T, the analysis corrects occurrence densities of each species by the prevalence of the environments in their range. If correct.env=F, the overlap measure does not correct occurrence densities of each species by the prevalence of the environments in their range.Default=T
`pcx`	An integer that identifies one (of two) principal components used to perform niche quantification and quantitative tests on. Default=1. Both defaults result in the 1 and 2 PCs being compared.
`pcy`	An integer that identifies the second (of two) principal components used to perform niche quantification and quantitative tests on. Default=2. Both defaults result in the 1 and 2 PCs being compared
`col.env`	if reductype="STANDARD", then parameter specifies the number of columns to trim environmental space on. This can be any number of columns. This can be a subset or all of the enviroment layers input.
`e.var`	selection of variables to include in all of the analyses of E-space. This is a separate parameter than col.env, but must contain all variables included in col.env. Note that it can include more variables than those in col.env, as long as those in col.env are also included.
`R`	resolution of grid in environmental space (RxR)
`kern.smooth`	scale at which kernel smoothing occurs on environmental data, larger values (i.e. 2) increase scale (making espace transitions smoother and typically larger) and smaller values (i.e. 0.5) decrease scale (making occupied espace clusters more dense and irregular). Default value is 1. You can also input: "auto", which estimates the kernel parameter by calculating the standard deviation of rescaled PC1 and PC2 coordinates divided by the sixth root of the number of locations. This method can be unreliable when used on multimodal espace distributions as it results in over-smoothing of home ranges. Multimodal espace occupancy can be somewhat common when a species occupies an extreme aspect of habitat or when espace is not broadly accessible in both dimensions of espace (PCs 1 & 2)
`e.reps`	the number of iterations for the equivalence statistic (humboldt.equivalence.stat). Values higher than 200 are recommend for final analysis
`b.reps`	the number of iterations for the Background statistics (humboldt.background.stat). Values higher than 200 are recommend for final analyses
`nae`	do you include non-analogous environments in the niche similarity measurement? If nae="NO" (use captial letters), then non-analogous environments will be removed from both input environments during overlap measurement and only environments present in both datasets will be used to measure overlap. If nae="YES", then no change will be made to input z1 and z2. Note: this is separate from trimming non-analogous environments from your input dataset (as done by humboldt.g2 specified by parameter non.analogous.environments). This parameter physically removes non-analogous environments from datasets ONLY before the niche similarity measurement. Technically the removal of non-analogous environments via either way should result in similar overlap measurements (though they may not be identical). This because removing NAE from the dataset prior to gridding environments will resulting only non-analogous environments to be gridded (and typically finer grain applied to each grid cell). Whereas removing them only via this parameter (nae), which only removes non-analogous in the gridded environmental space for use in overlap measurements— all the input environmental space is gridded (likely increasing the environmental space per gridded cell). A second cause of differences in values can result from rescaling of espace values during niche-overlap measurements so that the sum of the landscape equals one. If occupied non-analogous environmental are numerous in one of the datasets, this can theoretically cause overlap values to decrease in analogous environments (vs. nae) because differences in core niches are rescaled to 1 in both scenarios. The rescaling among fewer cells increases the values applied to highly suitable areas and, if not equivalently scaled in both datasets, differences among niches could increase, resulting a smaller overlap in non-analogous environments (again values should be similar). If you remove non-analogous environments in humboldt.g2e, I also suggest that you use this function (as it can remove any slight anomalies caused by gridding environments in humboldt.grid.clim due to the binning of values in the RxR grid).
`thresh.espace.z`	this parameter is an experimental parameter and controls the level at which values below the kernel density z values are removed for creating areas of analogous environmental space. Higher values will increase value from which the low-density areas are removed from the environmental space of z1 and z2. Basical values above this are retained and values below are removed. Default=0.001
`p.overlap`	if p.overlap=T, the niche overlap plot will be output (humboldt.plot.overlap). Turn this to 'F' if you want to speed of analyses. This plot takes a lot of CPU time because it calculates kernel densities for all environment and species datasets input
`p.boxplot`	if p.boxplot=T, a boxplot niche overlap plot will be created. The whisker of boxplot depict the environmental space of environment. Whereas the bars depict environmental space of each species within that environment. Dots on bars depict density of species localities in environmental space. This is a quick plot and doesn't requires lots of CPU time
`p.scatter`	if p.scatter=T, this will plot scatter plots with histrograms of your environment and species datasets using the humboldt.plot.scatter function. This is a quick plot and doesn't requires lots of CPU time
`run.silent`	if run.silent=T, texts boxes displaying 'sampling', 'rarefying', 'equivalence statistic', 'Background statistic' progress will not be displayed
`ncores`	number of CPUs to use for tests. If unsure on the number of cores and want to use all but 1 CPU, input ncores="All"
`force.equal.sample`	While execution of the background statistic, occasionally points are shifted into areas without environment data. If force.equal.sample=T, the points without environment data are shifted iteratively. Each round, if environment data are present in the new location, the environment is sampled and that point is added back to the original dataset. This is repeated until all points have sampled areas with existing environment data. In practice, when clusters of points are shifted to areas of no environmental data, the entire cluster is subsequently shifted back into an area with data. If force.equal.sample=F, the points shifted into areas without environmental data are excluded from niche quantification.
`g2e`	an espace file output from humboldt.g2e

Value

Performs almost all the analyses in Humboldt and outputs the results as Jason likes! See my example for my recommended running. For each comparison I recommend running the 'doitall' function twice. In the first run, I highly recommend running the analysis on the full environment input for both background and equivalence statistics. In this case you are testing the total equivalence between species (or divergence) in their current realized distributions. It asks the question how equivalent are the two species realized niches? This test is also the best gauge for background statistic and the power to discriminate differences. I only use background values from this. However, I do re-run the analyses with trimmed, shared espace. This is extremely important if you are interested in quantifying niche evolution/divergence among two populations/species . This test how two species diverge in shared analogous e-space and is the only actual test of niche divergence/evolution. Note that non-significant background statistic are not uncommon in shared-analogous espace and should not be given priority overlap total espace measurements from run 1. Under the situation where there is very little or no analogous e-space —you may not be able conclude much from this situation regarding niche divergence. In this situation, if the species' 'niches' are not truncated by espace boundaries, then this suggests that the g2e conversion is a decent approximation of the species' fundamental niches. And because of this, the results should be accurate and species divergence can also be concluded. However if the core densities on one or both of species are on margins or near margins of espace, then this suggests that the fundamental niche may be much larger than available climate space provides. In this latter case, you cannot conclude that the species have diverged at all. In the former case, where there is no truncation of espace of each species in their habitats, but little overlap between, this is evidence the species have diverged. In this case, simply report the niche similarity values in the total environmental space and the amount analogous espace.

IMPORTANT Go to the following webpage for a detailed explanation of all figures output: https://github.com/jasonleebrown/humboldt/blob/master/HumboldtFigsExp.pdf

Also go to the following webpage for a visual explanation of input parameters: https://github.com/jasonleebrown/humboldt/blob/master/HumboldtInputExp.pdf

Examples

#######################################################################################
###################################    EXAMPLE 1    ###################################
#######################################################################################
library(humboldt)
##load environmental variables for all sites of the study area 1 (env1). Column names should be x,y,X1,X2,...,Xn)
env1<-read.delim("env1.txt",h=T,sep="\t")

## load environmental variables for all sites of the study area 2 (env2). Column names should be x,y,X1,X2,...,Xn)
env2<-read.delim("env2.txt",h=T,sep="\t") 

## remove NAs and make sure all variables are imported as numbers
env1<-humboldt.scrub.env(env1)
env2<-humboldt.scrub.env(env2)

##load occurrence sites for the species at study area 1 (env1). Column names should be sp,x,y
occ.sp1<-na.exclude(read.delim("sp1.txt",h=T,sep="\t"))

##load occurrence sites for the species at study area 2 (env2). Column names should be sp,x,y 
occ.sp2<-na.exclude(read.delim("sp2.txt",h=T,sep="\t"))

##its highly recommened that you using the function "humboldt.top.env" to select only the important enviromnetal variables in humboldt.doitall. This step can be skipped. If you downloaded tons of environmental data, you should use this step.  If you skip this step, input env1/env2 inplace of reduc.vars$env1/reduc.vars$env2 
reduc.vars<- humboldt.top.env(env1=env1,env2=env2,sp1=occ.sp1,sp2=occ.sp2,rarefy.dist=40, rarefy.units="km", env.reso=0.416669,learning.rt1=0.01,learning.rt2=0.01,e.var=(3:21),pa.ratio=4,steps1=50,steps2=50,method="contrib",contrib.greater=5)

##Adjust the number of variables input for e.vars after reduction to only important variables
num.var.e<-ncol(reduc.vars$env1)
##run it first with full environmental for backgroud tests and equivalence statistic (total equivalence or divergence in current distributions)
full<-humboldt.doitall(inname="full_extent", env1=reduc.vars$env1, env2=reduc.vars$env2, sp1=occ.sp1, sp2=occ.sp2, rarefy.dist=50, rarefy.units="km", env.reso=0.416669, reduce.env=0, reductype="PCA", non.analogous.environments="YES", correct.env=T, env.trim=F,  env.trim.type="RADIUS", trim.buffer.sp1=200, trim.buffer.sp2=200, pcx=1, pcy=2, col.env=e.var, e.var=c(3:num.var.e), R=100, kern.smooth=1, e.reps=100, b.reps=100, nae="YES",thresh.espace.z=0.0001, p.overlap=T, p.boxplot=F, p.scatter=F, run.silent=F, ncores=1, color.ramp=3)

##run it a second time with a trimmed, shared-espace. Here the equivalence statistic tests for niche evolution or niche divergence. For comparing results, change only the following model parameters: reduce.env, non.analogous.environmental, env.trim
shared_ae<-humboldt.doitall(inname="shared_espace_ae", env1=reduc.vars$env1, env2=reduc.vars$env2, sp1=occ.sp1, sp2=occ.sp2, rarefy.dist=50, rarefy.units="km", env.reso=0.416669, reduce.env=2, reductype="PCA", non.analogous.environments="NO", correct.env=T, env.trim=T, env.trim.type="RADIUS", trim.buffer.sp1=200, trim.buffer.sp2=200, pcx=1,pcy=2, col.env=e.var, e.var=c(3:num.var.e), R=100, kern.smooth=1, e.reps=100, b.reps=100, nae="YES",thresh.espace.z=0.0001, p.overlap=T, p.boxplot=F, p.scatter=T,run.silent=F, ncores=1, color.ramp=3)
#######################################################################################
###################################    EXAMPLE 2    ###################################
#######################################################################################
############################  Using Provided Example Data   ###########################
#######################################################################################
library(humboldt)
##load environmental variables for all sites of the study area 1 (env1). Column names should be x,y,X1,X2,...,Xn)
data(env1)
## load environmental variables for all sites of the study area 2 (env2). Column names should be x,y,X1,X2,...,Xn)
data(env2)

## remove NAs and make sure all variables are imported as numbers
env1<-humboldt.scrub.env(env1)
env2<-humboldt.scrub.env(env2)

##load occurrence sites for the species at study area 1 (env1). Column names should be sp,x,y
data(sp1)

##load occurrence sites for the species at study area 2 (env2). Column names should be sp,x,y
data(sp2)

##its highly recommened that you using the function "humboldt.top.env" to select only the important enviromnetal variables in humboldt.doitall. This step can be skipped. If you downloaded tons of environmental data, you should use this step.  If you skip this step, input env1/env2 inplace of reduc.vars$env1/reduc.vars$env2 
reduc.vars<- humboldt.top.env(env1=env1,env2=env2,sp1=sp1,sp2=sp2,rarefy.dist=50, rarefy.units="km", env.reso=0.416669,learning.rt1=0.01,learning.rt2=0.01,e.var=(3:21),pa.ratio=4,steps1=50,steps2=50,method="contrib",contrib.greater=5)

##Adjust the number of variables input for e.vars after reduction to only important variables
num.var.e<-ncol(reduc.vars$env1)
##run it first with full environmental for backgroud tests and equivalence statistic (total equivalence or divergence in current distributions)
full<-humboldt.doitall(inname="full_extent", env1=reduc.vars$env1, env2=reduc.vars$env2, sp1=sp1, sp2=sp2, rarefy.dist=50, rarefy.units="km", env.reso=0.416669, reduce.env=0, reductype="PCA", non.analogous.environments="YES", correct.env=T, env.trim=F,  env.trim.type="RADIUS", trim.buffer.sp1=200, trim.buffer.sp2=200, pcx=1, pcy=2, col.env=e.var, e.var=c(3:num.var.e), R=100, kern.smooth=1, e.reps=100, b.reps=100, nae="YES",thresh.espace.z=0.0001, p.overlap=T, p.boxplot=F, p.scatter=F, run.silent=F, ncores=1)

##run it a second time with a trimmed, shared-espace. Here the equivalence statistic tests for niche evolution or niche divergence. For comparing results, change only the following model parameters: reduce.env, non.analogous.environmental, env.trim
shared_ae<-humboldt.doitall(inname="shared_espace_ae", env1=reduc.vars$env1, env2=reduc.vars$env2, sp1=sp1, sp2=sp2, rarefy.dist=50, rarefy.units="km", env.reso=0.416669, reduce.env=2, reductype="PCA", non.analogous.environments="NO", correct.env=T, env.trim=T, env.trim.type="RADIUS", trim.buffer.sp1=200, trim.buffer.sp2=200, pcx=1,pcy=2, col.env=e.var, e.var=c(3:num.var.e), R=100, kern.smooth=1, e.reps=100, b.reps=100, nae="YES",thresh.espace.z=0.0001, p.overlap=T, p.boxplot=F, p.scatter=T,run.silent=F, ncores=1)

jasonleebrown/humboldt documentation built on Jan. 4, 2024, 7:46 a.m.