knitr::opts_chunk$set( collapse = TRUE, comment = "#>"#, #fig.path = "man/figures/assess-normality-"#, # out.width = "100%" ) options(knitr.kable.NA = '') hcolor <- "#363B74" tcolor <- "#EEEEEE" knitr::opts_chunk$set(dpi = 300, fig.width = 7) `%>%` <- dplyr::`%>%`
library(MetaPipe)
MetaPipe
assesses the normality of variables (traits) by performing a
Shapiro-Wilk test on the raw data (see Load Raw Data and
Replace Missing Data). Based on whether or not the
data approximates a normal distribution, an array of transformations will be
computed, and the normality assessed one more time.
The diagram below shows the tree of transformations that can be performed, the
user can specify the transformation values passing a vector with the argument
transf_vals
to the function assess_normality
; by
default, [2, e, 3, 4, 5, 6, 7, 8, 9, 10]
.
DiagrammeR::grViz(system.file("figures/assess-normality-graph.gv", package = "MetaPipe"), height = "200px", width = "100%")
The function call is as follows:
assess_normality(raw_data = raw_data, excluded_columns = c(2, 3, ..., M), # Optional cpus = 1, out_prefix = "metapipe", plots_dir = tempdir(), transf_vals = c(2, exp(1), 3, 4, 5, 6, 7, 8, 9, 10), alpha = 0.05, pareto_scaling = FALSE, show_stats = TRUE)
where raw_data
is a data frame containing the raw data, as described in
Load Raw Data and excluded_columns
is a vector containing
the indices of the properties, e.g. c(2, 3, ..., M)
. The other arguments are
optional, cpus
is the number of cores to use, in other words, the number of
concurrent traits to process, out_prefix
is the prefix for output files,
plots_dir
is the output directory where the plots will be stored, and
transf_vals
is a vector containing the transformation values to be used when
transforming the original data.
The following histogram shows a sample data obtained from a normal distribution
with the command rnorm
, but it was transformed using the power (base 2
)
function; thus, the data seems to be skewed:
set.seed(123) test_data <- rnorm(500) MetaPipe:::generate_hist(2^test_data, NULL, xlab = latex2exp::TeX("2^{data}"), save = FALSE, alpha = 1, fill = hcolor)
Using MetaPipe
we can find an optimal transformation that "normalises" this
data set:
example_data <- data.frame(ID = 1:500, T1 = test_data, T2 = 2^test_data) normalised_data <- MetaPipe::assess_normality(example_data, c(1)) normalised_data_norm <- normalised_data$norm normalised_data_skew <- normalised_data$skew transformed_data <- read.csv("metapipe_raw_data_normalised_all.csv")
The output of this function is a long table of each trait, with the following format:
nrows <- 1 raw_df <- data.frame(A = rep(" ", nrows), B = rep(NA, nrows), C = rep(NA, nrows), D = rep(NA, nrows), E = rep(NA, nrows), G = rep(NA, nrows)) colnames(raw_df) <- c("index", "trait", "values", "flag", "transf", "transf_val") knitr::kable(raw_df, format = "html", escape = FALSE, table.attr = "style='width:100%;'", align = 'c') %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>% kableExtra::row_spec(0, extra_css = "vertical-align: middle;", background = hcolor, color = tcolor) %>% kableExtra::column_spec(1:6, border_left = TRUE, border_right = TRUE)
where index
is a simple numeric value to uniquely identify each trait,
trait
is the trait/variable name, values
is the actual entry, flag
indicates whether the entry is parametric (Normal) or skewed (Non-normal),
transf
is the transformation function (empty for untransformed traits), and
transf_val
is the transformation value used.
From the previous example, the top 5 entries for the trait T1
are:
knitr::kable(head(transformed_data), format = "html", escape = FALSE, table.attr = "style='width:100%;'", align = 'c') %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>% kableExtra::row_spec(0, extra_css = "vertical-align: middle;", background = hcolor, color = tcolor) %>% kableExtra::column_spec(1:6, border_left = TRUE, border_right = TRUE)
And for trait T2
:
tmp <- head(transformed_data[-c(1:500), ]) rownames(tmp) <- c(1:6) knitr::kable(tmp, format = "html", escape = FALSE, table.attr = "style='width:100%;'", align = 'c') %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>% kableExtra::row_spec(0, extra_css = "vertical-align: middle;", background = hcolor, color = tcolor) %>% kableExtra::column_spec(1:6, border_left = TRUE, border_right = TRUE)
knitr::kable(head(transformed_data_norm), format = "html", escape = FALSE, table.attr = "style='width:100%;'", align = 'c') %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>% kableExtra::row_spec(0, extra_css = "vertical-align: middle;", background = hcolor, color = tcolor)
As expected both tables show the same entries; however, the latter indicates that
T2
was transformed using $\log_2$. The function will generate histograms for
all the traits, the naming convention used is:
HIST_[index]_[transf]_[transf_val]_[trait].png
for transformed traitsHIST_[index]_NORM_[trait].png
for those that were not transformed. For the previous data set HIST_1_NORM_T1.png
and HIST_2_LOG_2_T2.png
:
MetaPipe:::generate_hist(test_data, NULL, xlab = latex2exp::TeX("data"), save = FALSE, alpha = 1, fill = hcolor) MetaPipe:::generate_hist(log(2^test_data), NULL, xlab = latex2exp::TeX("log_2(2^{data})"), save = FALSE, alpha = 1, fill = hcolor)
filenames <- c(list.files(".", "metapipe_*")) output <- lapply(filenames, file.remove)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.