calculate_spec_and_pval: calculate_spec_and_pval

View source: R/calculate_spec_and_pval.r

calculate_spec_and_pvalR Documentation

calculate_spec_and_pval

Description

This function is called by phy_or_env_spec(). It is made available as a standalone function in the (rare) case a user wishes to calculate Spec using their own null model. calculate_spec_and_pval() takes empirical rao values and sim rao values (from a null model) and calculates Spec and P-values. To do that, use your own null model to make species data, and use rao1sp() and/or raoperms() to get raw rao values. This function expects a vector of empirical values, and a list of vectors of sim values (see below). Most of the inputs for this function are the same as phy_or_env_spec(). Think of this function as the final component of "build your own phy_or_env_spec()". Note that for this custom approach, the environmental variable must be a dist.

Usage

calculate_spec_and_pval(
  emp_raos,
  sim_raos,
  abunds_mat,
  env,
  p_adj = "fdr",
  tails = 1,
  n_cores = 2,
  verbose = TRUE,
  p_method = "raw",
  center = "mean",
  denom_type = "index",
  diagnostic = FALSE,
  ga_params = get_ga_defaults()
)

Arguments

emp_raos

vector. Empirical rao values, one per species ("feature").

sim_raos

list of numeric vectors. Sim rao values, generated under null hypothesis. Each item in list corresponds to an entry in emp_raos. As such, length(emp_raos) must equal length(sim_raos). Each item within sim_raos is a vector or rao values (length=n_sim in the case of phy_or_env_spec()).

abunds_mat

site x species matrix. See ?phy_or_env_spec.

env

MUST BE A dist OBJECT!!!! VERY IMPORTANT!!!! See ?phy_or_env_spec.

p_adj

string. Type of multiple hypothesis testing correction performed on P-values. Can take any valid method argument to p.adjust, including "none", "bonferroni", "holm", "fdr", and others (default: "fdr").

tails

integer. 1 = 1-tailed, test for specificity only. 2 = 2-tailed. 3 = 1-tailed, test for cosmopolitanism only. 0 = no test, P=1.0 (default: 1).

n_cores

integer. Number of CPU cores to use for parallel operations. If set to 1, lapply will be used instead of mclapply. A warning will be shown if n_cores > 1 on Windows, which does not support forked parallelism (default: 2).

verbose

logical. Should status messages be displayed? (default: TRUE).

p_method

string. "raw" for quantile method, or "gamma_fit" for calculating P by fitting a gamma distribution (default: "raw").

center

string. Type of central tendency to use for simulated RQE values. Options are "mean", "median", and "mode". If mode is chosen, a reversible gamma distribution is fit and mode is calculated using that distribution (default: mean).

denom_type

string. Type of denominator (d) to use (default: "index"). Note that denominator type does NOT affect P-values.

"ses":

d for species s is calculated as the standard deviation of RQE values calculated from permuted species weights. This makes the output specificity a standardized effect size (SES). Unfortunately, this makes SES counterintuitively sensitive to occupancy, where species with high occupancy have more extreme SES than rare species, due to their more deterministic sim specificities. Included for comparative purposes, not suggested.

"raw":

d is 1 for all species, so output specificity has units of distance, i.e. the raw difference between empirical and simulated RQE. This means that results from different variables are not comparable, since it is not scale-invariant to env or hosts_phylo. It not scale-invariant to the species weights in aunds_mat, either. Not sensitive to number of samples. Not suggested because units are strange, and isn't comparable between variables.

"index":

d is the center of simulated (permuted) RQE values for species that have stronger specificity than expected by chance, resulting in specificity values with range [-1, 0), with 0 as the null hypothesis. In this case, -1 indicates perfect specificity, where a species is associated with zero environmental variability. In the euclidean sense, this could be a species that is always found at the exact same elevation or the exact same pH. For species that have weaker specificity than expected by chance, d is x minus the center (see above) of simulated RQE values, where x is the maximum possible dissimilarity observable given species weights. x is estimated using a genetic algorithm. This d has other useful properties: scale invariance to env/hosts_phylo, insensitivity to the number of samples, insensitivity to occupancy, and strong sensitivity to specificity (default).

"sim_center":

d is always the center of simulated (permuted) RQE values. For species that have stronger specificity than expected by chance, this will return the same Spec values as "index". For species with weaker specificity than expected by chance, instead of values that range between 0 and 1, they will range between 0 and Inf. This is much faster than "index" because the genetic algorithm is not used. So if species with weaker specificity than expected by chance are not interesting to you, this may be a good option.

diagnostic

logical. If true, changes output to include different parts of SES. This includes Pval, SES, raw, denom, emp, and all sim values with column labels as simN where N is the number of sims (default: FALSE)

ga_params

list. Parameters for genetic algorithm that maximizes RQE. Only used with denom_type="index". Default is the output of get_ga_defaults(). If different parameters are desired, start with output of get_ga_defaults and modify accordingly.

Value

data.frame where each row is an input species. First column is P-value ($Pval), second column is specificity ($Spec).

Author(s)

John L. Darcy

Examples

# # calculating regular old elevational specificity the hard way
# attach(endophyte)
# library(parallel)
# otutable <- occ_threshold(prop_abund(otutable), 10)
# env <- dist(metadata$Elevation)
# emp_raos <- apply(X=otutable, MARGIN=2, FUN=rao1sp, 
#   D=env, perm=F, seed=12345)
# sim_raos <- mclapply(X=as.data.frame(otutable), FUN=function(p){
#   replicate(200, rao1sp(p, D=env, perm=TRUE, seed=0))}, mc.cores=20)
# calculate_spec_and_pval(emp_raos, sim_raos, otutable, env, 
#   n_cores=20)



darcyj/specificity documentation built on Aug. 1, 2023, 8 a.m.