View source: R/space_similarity.R
space_similarity | R Documentation |
space_similarity
estimate pairwise similarities of phenotype spaces
space_similarity(
formula,
data,
cores = 1,
method = "mcp.overlap",
pb = TRUE,
outliers = 0.95,
pairwise.scale = FALSE,
distance.method = "Euclidean",
seed = NULL,
...
)
formula |
an object of class "formula" (or one that can be coerced to that class).Must follow the form |
data |
Data frame containing columns for the dimensions of the phenotypic space (numeric) and a categorical or factor column with group labels. |
cores |
Numeric vector of length 1. Controls whether parallel computing is applied by specifying the number of cores to be used. Default is 1 (i.e. no parallel computing). |
method |
Character vector of length 1. Controls the method of (di)similarity metric to be compare the phenotypic sub-spaces of two groups at the time. Seven built-in metrics are available which quantify as pairwise sub-space overlap ('similarity') or pairwise distance between bi-dimensional sub-spaces ('dissimilarity'):
In addition, machine learning classification models can also be used for quantify dissimilarity as a measured of how discriminable two groups are. These models can use more than two dimensions to represent phenotyypic spaces. The following classification models can be used: "AdaBag", "avNNet", "bam", "C5.0", "C5.0Cost", "C5.0Rules", "C5.0Tree", "gam", "gamLoess", "glmnet", "glmStepAIC", "kernelpls", "kknn", "lda", "lda2", "LogitBoost", "msaenet", "multinom", "nnet", "null", "ownn", "parRF", "pcaNNet", "pls", "plsRglm", "pre", "qda", "randomGLM", "rf", "rFerns", "rocc", "rotationForest", "rotationForestCp", "RRF", "RRFglobal", "sda", "simpls", "slda", "smda", "snn", "sparseLDA", "svmLinear2", "svmLinearWeights", "treebag", "widekernelpls" and "wsrf". See https://topepo.github.io/caret/train-models-by-tag.html for details on each of these models. Additional arguments can be pased using |
pb |
Logical argument to control if progress bar is shown. Default is |
outliers |
Numeric vector of length 1. A value between 0 and 1 controlling the proportion of outlier observations to be excluded. Outliers are determined as those farthest away from the sub-space centroid. Ignored when using machine learning methods. |
pairwise.scale |
Logical argument to control if pairwise phenotypic spaces are scaled (i.e. z-transformed) prior to similarity estimation. If so ( |
distance.method |
Character vector of length 1 indicating the method to be used for measuring distances (hence only applicable when distances are calculated). Available distance measures are: "Euclidean" (default), "Manhattan", "supremum", "Canberra", "Wave", "divergence", "Bray", "Soergel", "Podani", "Chord", "Geodesic" and "Whittaker". If a similarity measure is used similarities are converted to distances. |
seed |
Integer number containing the random number generator (RNG) state for random number generation in order to make results from the machine learning stochastic methods replicable. |
... |
Additional arguments to be passed to |
The function quantifies pairwise similarity between phenotypic sub-spaces. The built-in methods quantify similarity as the overlap (similarity, or machine learning based discriminability) or distance (dissimilarity) between group. Machine learning methods implemented in the caret package function train
are available to assess the similarity of spaces as the proportion of observations that are incorrectly classified. In this case group overlaps are the class-wise errors (if available) while the mean overlap is calculated as 1- model accuracy
.
A data frame containing the similarity metric for each pair of groups. If the similarity metric is not symmetric (e.g. the proportional area of A that overlaps B is not necessarily the same as the area of B that overlaps A, see space_similarity
) separated columns are supplied for the two comparisons.
Marcelo Araya-Salas marcelo.araya@ucr.ac.cr)
Araya-Salas, M, & K. Odom. 2022, PhenotypeSpace: an R package to quantify and compare phenotypic trait spaces R package version 0.1.0.
rarefact_space_similarity
, space_size_difference
{
# load data
data("example_space")
# get proportion of space that overlaps
prop_overlaps <- space_similarity(
formula = group ~ dimension_1 + dimension_2,
data = example_space,
method = "proportional.overlap")
#' # get symmetric triangular matrix
rectangular_to_triangular(prop_overlaps)
# get minimum convex polygon overlap for each group (non-symmetric)
mcp_overlaps <- space_similarity(
formula = group ~ dimension_1 + dimension_2,
data = example_space,
method = "mcp.overlap")
# convert to non-symmetric triangular matrix
rectangular_to_triangular(mcp_overlaps, symmetric = FALSE)
# check available distance measures
summary(proxy::pr_DB)
# get eculidean distances (default)
area_dist <- space_similarity(
formula = group ~ dimension_1 + dimension_2,
data = example_space,
method = "distance",
distance.method = "Euclidean")
# get Canberra distances
area_dist <- space_similarity(
formula = group ~ dimension_1 + dimension_2,
data = example_space,
method = "distance",
distance.method = "Canberra")
## using machine learning classification methods
# check if caret package and needed dependencies are available
rlang::check_installed("caret")
rlang::check_installed("randomForest")
# random forest 3 dimension data, using 5 repeats and repeated CV resampling
# extract data subset
sub_data <- example_space[example_space$group %in% c("G1", "G2", "G3"), ]
# set method parameters
ctrl <- caret::trainControl(method = "repeatedcv", repeats = 5)
# get similarities ("overlap")
space_similarity(
formula = group ~ dimension_1 + dimension_2 + dimension_3,
data = sub_data,
method = "rf",
trControl = ctrl,
tuneLength = 4,
seed = 123
)
# Single C5.0 Tree using boot resampling
ctrl <- caret::trainControl(method = "boot")
space_similarity(
formula = group ~ dimension_1 + dimension_2,
data = sub_data,
method = "C5.0Tree",
trControl = ctrl,
tuneLength = 3
)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.