View source: R/phy_or_env_spec.r
phy_or_env_spec | R Documentation |
Calculates species' specificities to either a 1-dimensional variable (vector), 2-dimensional variable (matrix), or to a phylogeny. Transforms all variable input types into a matrix D, and calculates specificity by comparing empirical Rao's Quadratic Entropy to simulated RQE (same but with permuted abundances). By default (denom_type = "index"), an index is calculated from emp and sim values such that Spec=0 indicates random assortment (null hypothesis), and more negative values indicate stronger specificity.
phy_or_env_spec(
abunds_mat,
env = NULL,
hosts = NULL,
hosts_phylo = NULL,
n_sim = 1000,
p_adj = "fdr",
seed = 1234567,
tails = 1,
n_cores = 2,
verbose = TRUE,
p_method = "raw",
center = "mean",
denom_type = "index_full",
diagnostic = F,
chunksize = 1000,
ga_params = get_ga_defaults()
)
abunds_mat |
matrix or data frame of numeric values. Columns represent species, rows are samples. For columns where the value is nonzero for two or fewer data points, specificity cannot be calculated, and NAs will be returned. Negative values in abunds_mat are not allowed (REQUIRED). |
env |
numeric vector, dist, or square matrix. Environmental variable corresponding to abunds. For example, temperature, or geographic distance. Not required for computing phylogenetic specificity. If square matrix provided, note that only the lower triangle will be used (default: NULL). |
hosts |
character vector. Host identities corresponding to abunds. Only required if calculating phylogenetic specificity (default: NULL). |
hosts_phylo |
phylo object. Tree containing all unique hosts as tips. Only required if calculating phylogenetic specificity (default: NULL). |
n_sim |
integer. Number of simulations of abunds_mat to do under the null hypothesis that host or environmental association is random. P-values will not be calculated if n_sim < 100 (default: 500). |
p_adj |
string. Type of multiple hypothesis testing correction performed on P-values. Can take any valid method argument to p.adjust, including "none", "bonferroni", "holm", "fdr", and others (default: "fdr"). |
seed |
integer. Seed to use so that this is repeatable. Same seed will be used for each species in abunds_mat, so all species will experience the same permutations. This can be disabled by setting seed=0, which will make permutation is both non deterministic (not repeatable) AND each species will experience different permutations (default: 1234557). |
tails |
integer. 1 = 1-tailed, test for specificity only. 2 = 2-tailed. 3 = 1-tailed, test for cosmopolitanism only. 0 = no test, P=1.0 (default: 1). |
n_cores |
integer. Number of CPU cores to use for parallel operations. If set to 1, lapply will be used instead of mclapply. A warning will be shown if n_cores > 1 on Windows, which does not support forked parallelism (default: 2). |
verbose |
logical. Should status messages be displayed? (default: TRUE). |
p_method |
string. "raw" for quantile method, or "gamma_fit" for calculating P by fitting a gamma distribution (default: "raw"). |
center |
string. Type of central tendency to use for simulated RQE values. Options are "mean", "median", and "mode". If mode is chosen, a reversible gamma distribution is fit and mode is calculated using that distribution (default: mean). |
denom_type |
string. Type of denominator (d) to use (default: "index"). Note that denominator type does NOT affect P-values.
|
diagnostic |
logical. If true, changes output to include different parts of Spec. This includes Pval, Spec, raw, denom, emp, and all sim values with column labels as simN where N is the number of sims (default: FALSE) |
chunksize |
integer. If greater than zero, computation of sim RAO values will be done using chunked evaluation, which lowers memory use considerably for larger data sets. Can be disabled by setting to 0. Default value is 1000 species per chunk (default: 1000). |
ga_params |
list. Parameters for genetic algorithm that maximizes RQE. Only used with denom_type="index". Default is the output of get_ga_defaults(). If different parameters are desired, start with output of get_ga_defaults and modify accordingly. |
data.frame where each row is an input species. First column is P-value ($Pval), second column is specificity ($Spec).
John L. Darcy
Poulin et al. (2011) Host specificity in phylogenetic and geographic space. Trends Parasitol 8:355-361. doi: 10.1016/j.pt.2011.05.003
Rao CR (2010) Quadratic entropy and analysis of diversity. Sankhya 72:70-80. doi: 10.1007/s13171-010-0016-3
Rao CR (1982) Diversity and dissimilarity measurements: A unified approach. Theor Popul Biol 21:24-43.
# library(specificity)
# attach(endophyte)
# # only analyze species with occupancy >= 20
# m <- occ_threshold(prop_abund(otutable), 20)
# # create list to hold phy_or_env_spec outputs
# specs_list <- list()
#
# # phylogenetic specificity using endophyte data set
# specs_list$host <- phy_or_env_spec(
# abunds_mat=m,
# hosts=metadata$PlantGenus,
# hosts_phylo=supertree,
# n_sim=100, p_method="gamma_fit",
# n_cores=4
# )
#
# # environmental specificity using elevation from endophyte data set:
# specs_list$elev <- phy_or_env_spec(
# abunds_mat=m,
# env=metadata$Elevation,
# n_sim=100, p_method="gamma_fit",
# n_cores=4
# )
#
# # geographic specificity using spatial data from endophyte data set:
# specs_list$geo <- phy_or_env_spec(
# abunds_mat=m,
# env=distcalc(metadata$Lat, metadata$Lon),
# n_sim=100, p_method="gamma_fit",
# n_cores=4
# )
#
# plot_specs_violin(specs_list, cols=c("forestgreen", "red", "black"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.