assess_normality: Assess normality of traits

View source: R/metapipe.R

assess_normalityR Documentation

Assess normality of traits

Description

Assess normality of traits in a data frame.

Usage

assess_normality(
  raw_data,
  excluded_columns,
  cpus = 1,
  out_prefix = "metapipe",
  plots_dir = tempdir(),
  transf_vals = c(2, exp(1), 3, 4, 5, 6, 7, 8, 9, 10),
  alpha = 0.05,
  pareto_scaling = FALSE,
  show_stats = TRUE
)

Arguments

raw_data

Data frame containing the raw data.

excluded_columns

Numeric vector containing the indices of the dataset properties that are non-numeric, excluded columns.

cpus

Number of CPUs to be used in the computation.

out_prefix

Prefix for output files and plots.

plots_dir

Path to the directory where plots should be stored.

transf_vals

Numeric vector with the transformation values.

alpha

Significance level.

pareto_scaling

Boolean flag to indicate whether or not perform a Pareto scaling on the normalised data.

show_stats

Boolean flag to indicate whether or not to show the normality assessment statistics (how many traits are normal, how many were transformed/normalised, and which transformations were applied).

Details

The normality of each trait is assessed using a Shapiro-Wilk test, under the following hypotheses:

  • H_0: the sample comes from a normally distributed population.

  • H_1: the sample does not come from a normally distributed population.

Using a significance level of α = 0.05. If the conclusion is that the sample does not come from a normally distributed population, then a number of transformations are performed, based on the transformation values passed with transf_vals. By default, the following transformation values are used a = c(2, exp(1), 3, 4, 5, 6, 7, 8, 9, 10) with the logarithmic (log_a(x)), power (x^a), and radical/root (x^(1/a)) functions.

Value

List of data frames for the normal (norm) and skewed (skew) traits.

Examples


# Toy dataset
example_data <- data.frame(ID = c(1,2,3,4,5), 
                           P1 = c("one", "two", "three", "four", "five"), 
                           T1 = rnorm(5), 
                           T2 = rnorm(5))
example_data_normalised <- MetaPipe::assess_normality(example_data, c(1, 2))
example_data_norm <- example_data_normalised$norm
example_data_skew <- example_data_normalised$skew

# Normal traits
knitr::kable(example_data_norm)

# Skewed traits (empty)
# knitr::kable(example_data_skew)


# F1 Seedling Ionomics dataset
data(ionomics) # Includes some missing data
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2),
                                          replace_na =  TRUE)
ionomics_normalised <- 
  MetaPipe::assess_normality(ionomics_rev,
                             excluded_columns = c(1, 2),
                             out_prefix = "ionomics",
                             transf_vals = c(2, exp(1)))
                             
ionomics_norm <- ionomics_normalised$norm
ionomics_skew <- ionomics_normalised$skew

# Normal traits
knitr::kable(ionomics_norm[1:5, ])

# Skewed traits (partial output)
knitr::kable(ionomics_skew[1:5, 1:8])

# Clean up example outputs
MetaPipe:::tidy_up(c("HIST_", "ionomics_", "metapipe_"))


villegar/MetaPipe documentation built on Nov. 22, 2022, 10:44 p.m.