imputeTraits: Impute traits using phylogenetic, environmental and traits...

View source: R/imputeTraits.R

imputeTraitsR Documentation

Impute traits using phylogenetic, environmental and traits relationships

Description

Predict missing values using phylogeny, environmental and trait data as predictors in random forests when it shows a relationship with the variable to be imputed. The methodology implements 3 rounds in which the actualized predicted values are used to predict the rest of the variables to impute.

Usage

imputeTraits(
  variables_to_impute = NULL,
  dataset = NULL,
  terminal_taxon = NULL,
  phylogeny = NULL,
  correlation_results = NULL,
  variance_results = NULL,
  number_of_phylo_axis = NULL,
  predictors = NULL,
  proportion_NAs = 0,
  number_iterations = 10,
  number_clusters = 2,
  model_specifications = NULL,
  save = F,
  force_run = T
)

Arguments

variables_to_impute

(character). Names of the variables with NAs where imputations will be implemented. If more than one, covariation among imputation variables is considered to decide whether to use them as predictors or not.

dataset

(data frame). Data frame containing the variable of interest with missing values and a column describing terminal taxa of phylogeny (e.g., species).

terminal_taxon

(character). Terminal taxon as named in the dataset (e.g., species).

phylogeny

(phylo). Phylogeny with tip labels contained in dataset.

correlation_results

(list). Correlation results from computeVarianceCovariance() function or NULL to compute it internally for the variables to be imputed using the data and phylogeny provided (default).

variance_results

(list). Variance results from computeVarianceCovariance() function.

number_of_phylo_axis

(integer). Number of phylogenetic axis to include as predictors.

predictors

(character). Names of the variables without NAs that will be considered as potential. These varaibles need to be included in the dataset and need to be complete (no NAs).

proportion_NAs

(numeric). Between 0 and 1. Proportion of artificial NAs to be introduced in a given dataset. For evaluation puposes. Default set to 0, so no extra NAs are produced.

number_iterations

(integer). Number of iterations of the imputation process. Results reported are summarized as mean and standard deviation for each variable to be imputed.

number_clusters

(integer). Number of clusters to use in parallelization. If set to one, no paralelization is performed (not recommended). Default is set to 2.

model_specifications

(list). Mcmcglmm models specifications as specified by the defineModelsSpecification of this package. If not defined, the default of the defineModelsSpecification function is used.

save

(logical) If false, results are not saved in the outputs folder.

force_run

(logical) If false, previously calculated phylogenetic axis (saved internally) are used. This can be useful when using big phylogenies, as the calculation of the phylogenetic axis can take some time.

Examples

## Not run: 
# Simulate example data
simulated_traits.data <- simulateDataSet()

# Impute missing values (created within the function)
imputed.data <- imputetraits(
variables_to_impute = c("phylo_G1_trait1", "phylo_G1_trait2"),
dataset = simulated_traits.data$data,
phylogeny = simulated_traits.data$phylogeny,
proportion_NAs = 0.2
)

## End(Not run)

pablosanchezmart/TrEvol documentation built on April 23, 2024, 4:05 p.m.