knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of TrEvol is to provide easy to use functions to study functional trait syndromes evolutionary patterns. More concretly, this package is focused on separating trait variance and the covariance (represented as correlation) among pairs of traits into a phylogenetic and a non-phylogenetic component. This allows to quantify the amount of phylogenetic conservatism (phylogenetic signal) and evolutionary lability in traits and their correlation.
Moreover, the package allows to consider one environmental variable when doing that, which allows to further separate trait variances and correlation among pairs of traits into four components, quantifying its relationship with the phylogeny and an environmental variable of interest. The first component is the non-attributed phylogenetic variances and correlation, and represents the variance and correlation that is related only to the phylogeny (not related to the environmental variable). The second component is the environmental phylogenetic variances and correlation, representing the variances and correlation related both to the phylogeny and an environmental variable. The third component is the labile environmental variances and correlation, and represents the amount of variance and correlation that is only related to the environmental variable (not related to the phylogeny). Finally, the fourth component is the residual variances and correlation, which is not related to the phylogeny nor to an environmental variable of interest. Total correlation among traits is also calculated.
The package allows then to characterize variance and covariance patterns for a group of traits. To represent these results, the package use trait networks, showing the variance components in the nodes and the correlation among traits as edges. The package allows to display network metrics describing the structure of the network. Networks can be represented for each of the variance and covariance components previously described, and they can be compared.
The package also includes an imputation framework which uses random forest to predict missing values including phylogenetic and environmental information as well as the relationship between traits. This imputation framework optimizes the use of the information by selecting the elements that are expected to be bette predictors and uses them to impute missing values in a given dataset.
You can install the development version of TrEvol from GitHub with:
# install.packages("devtools") devtools::install_github("pablosanchezmart/TrEvol")
This is a basic example showing how to use the package using simulated data. First of all, let's simulate some data to work with. To do so, we will use the simulateDataSet function from the package using the default parameters.
library(TrEvol)
simulated.data <- simulateDataSet(number_observations = 100)
The simulateDataSet function simulates a set of correlated variables and a phylogeny. The variance-covariance matrix used to simulate traits variances (diagonal) and correlations (off-diagonal) is reported by the function and can be modified using the vcv_matrix argument.
simulated.data$vcv_matrix
The simulateDataSet function produces two sets of variables. The firs set uses the variance-covariance matrix to simulate phylogenetically structured data under a Brownian motion model of evolution. These variables are preceded by "phylo_". To do so, the function simulates a phylogeny with the same number of tips as number of observations to be simulated (set in the number_observations argument). A phylogeny can be introduced manually using the phylogeny argument. The second set of variables are simulated in a non-phylogenetically structured manner. So their variances and covariances will not present a phylogenetic component. These variables are preceded by "nonPhylo_". Each of the sets have 6 variables, 4 will be considered traits and one will be considered as an enviornmental variable. Names of each variable show the expected correlation group (G1 or G2, for each set of traits). So we expect phylo G1 to be correlated among them and to present variances and covariances related to the phylogeny and to the phylo_G1_env environmental variable.
head(simulated.data$data)
We can use the function plotData to plot some simulated traits. Let's plot the phylo_G1 traits and environmental variable.
plotData(variables = c("phylo_G1_trait1", "phylo_G1_trait2", "phylo_G1_env"), dataset = simulated.data$data, phylogeny = simulated.data$phylogeny, terminal.taxon = "animal" )
In this plot we can se how the simulated variances and covariances are phylogenetically conserved. Let's now plot the nonPhylo_G1 traits and environmental variable.
plotData(variables = c("nonPhylo_G1_trait1", "nonPhylo_G1_trait2", "nonPhylo_G1_env"), dataset = simulated.data$data, phylogeny = simulated.data$phylogeny, terminal.taxon = "animal" )
We see how these variables variances and covariances are not phylogenetically conserved.
Now we can use the function computeVarianceCovariancePartition to report the correlation structure between traits.
Let's imagine that we are only interested in computing the phylogenetic and non-phylogenetic variances and covariances. So, for now, let's nos include an environmental variable in the function. In this case, phylogenetic variances and correlation and non-phylogenetic variances and correlation will be reported, jointly with total correlation. We will use the default model specifications, but users can modify this by using the defineModelsSpecifications function to create an object that then can be introduced in the model_specifications argument of the computeVarianceCovariancePartition function. This will be specially needed with complex data, where number of iterations, burning and thinning will need to be increased. Users can also explore using different priors.
specif <- defineModelsSpecifications(number_iterations = 100000, burning = 1000, thinning = 10) variance_covariance_results.list <- computeVarianceCovariancePartition( traits = c("nonPhylo_G1_trait1", "nonPhylo_G1_trait2", "phylo_G1_trait1", "phylo_G1_trait2"), dataset = simulated.data$data, terminal_taxa = "animal", phylogeny = simulated.data$phylogeny, model_specifications = specif )
We can now explore the variance-covariance results. First, let's look at the variance results.
variance_covariance_results.list$varianceResults
Now, let's look at the covariance results, which are reported as correlation.
variance_covariance_results.list$covarianceResults
We can now plot this results as trait networks using the plotNetworks. Let's first plot the phylogenetic part of the variance and covariance.
plotNetwork(variance_results = variance_covariance_results.list$varianceResults, correlation_results = variance_covariance_results.list$covarianceResults, variance_type = "phylogenetic_variance", correlation_type = "phylogenetic_correlation", edge_label = T, only_significant = T, correlation_threshold = 0.3, label_size = 1)
Let's now plot the non-phylogenetic variances and covariances.
plotNetwork(variance_results = variance_covariance_results.list$varianceResults, correlation_results = variance_covariance_results.list$covarianceResults, variance_type = "non_phylogenetic_variance", correlation_type = "non_phylogenetic_correlation", edge_label = T, only_significant = T, correlation_threshold = 0.3)
We observe how much of the variance and covariance in the phylo_G1 traits is related to the phylogeny, and that the variances and covariances in nonPhylo_G1 traits are not related to the phylogeny, as expected.
Let's now ilustrate how to include an environmental variable. This will allow us to calculate four different variance and covariance components: the non-attributed phylogenetic variance and covariance, the phylogenetic environmental variance and covariance, the labile environmental variance and covariance and the residual variance and covariance. Let's use the phylo_G1_env variable as an enviornmental variable, which is expected to be related to variances and covariances of phylo_G1 traits in a phylogenetically conserved manner (i.e., phylogenetic environmental variance and covariance).
variance_covariance_environment_results.list <- computeVarianceCovariancePartition( traits = c("nonPhylo_G1_trait1", "nonPhylo_G1_trait2", "phylo_G1_trait1", "phylo_G1_trait2"), environmental_variable = "phylo_G1_env", dataset = simulated.data$data, terminal_taxa = "animal", phylogeny = simulated.data$phylogeny )
We can see now these components in both variance and covariance results.
variance_covariance_environment_results.list$varianceResults
variance_covariance_environment_results.list$covarianceResults
Let's plot the phylogenetic environmental variances and covariances as an example.
plotNetwork(variance_results = variance_covariance_environment_results.list$varianceResults, correlation_results = variance_covariance_environment_results.list$covarianceResults, variance_type = "environmental_phylogenetic_variance", correlation_type = "environmental_phylogenetic_correlation", edge_label = T, only_significant = T, correlation_threshold = 0.3)
We can see how part of the pattern of variances and covariances of the traits phylo_G1 are attributed to the phylogenetically conserved effect of the environmental variable phylo_G1_env.
Now, let's use this information to inform an imputation framework performed by the function imputeTraits of this package. As our simulated data does not have NAs, let's first produce some NAs (20%) in a variable to impute. Then, we will run the imputeTraits to perform predictions using the variance-covariance structures reported before to inform the process.
to_impute.data <- simulated.data$data to_impute.data[, "phylo_G1_trait1"] <- missForest::prodNA(as.data.frame(simulated.data$data[, "phylo_G1_trait1"]), 0.2) imputed.data <- imputeTraits(variables_to_impute = "phylo_G1_trait1", dataset = to_impute.data, terminal_taxon = "animal", phylogeny = simulated.data$phylogeny, predictors = "phylo_G1_env")
head(imputed.data$round3$ximp)
Let's assign the imputed data as a new column in the dataset.
imputed.data <- merge(to_impute.data, imputed.data$round3$ximp, suffixes = c("", "_imputed"), by = "animal", all.x = T)
Let's look at the summary of the imputed data and compare it with the complete data.
Before imputation:
summary(imputed.data$phylo_G1_trait1)
After imputation:
summary(imputed.data$phylo_G1_trait1_imputed)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.