library(tidyverse)
library(knitr)
library(kableExtra)
knitr::opts_chunk$set(echo = FALSE, fig.align = "center")
knitr::opts_chunk$set(fig.pos = 'H')
pagebreak <- function() {
  if(knitr::is_latex_output())
    return("\\newpage")
  else
    return('<div style="page-break-before: always;" />')
}

\newpage

Data set

Analysis report of the data set loaded from the file r params$analysis_parameter$fileName. The data set comprises r params$analysis_parameter$numberOfInstances instances that are characterized by r params$analysis_parameter$numberOfFeatures features, from which r params$analysis_parameter$numberOfMetaFeatures are defined as meta-features. The features are further split into r params$analysis_parameter$numberOfNumericFeatures numeric features, and r params$analysis_parameter$numberOfNonNumericFeatures non numeric features. The data set is missing r params$analysis_parameter$totalNumberOfMissings values. The analysis was prepared on r format(Sys.time(), '%d %B, %Y').

\newpage

Filter {#test-label}

The data set was filtered using the parameters described in Table \ref{tab:filter}.

params$filter_parameter %>%
  knitr::kable(format="latex", caption = "User defined filter set.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

The following features were selected:

cat(paste("- ", params$selected_features), sep = "\n")

\newpage

Limits of quantification

Missing values have been indicated as r params$analysis_parameter$loq_na_handling. The LLOQ substitute was defined as r params$analysis_parameter$lloq_substitute. The ULOQ substitute was defined as r params$analysis_parameter$uloq_substitute. The statistics of the LOQ detection procedure are listed in table \ref{tab:detect-loq}.

params$loq_statistics %>%
  knitr::kable(format="latex", caption = "Result of the LOQ detection analysis.")%>%
  kableExtra::kable_styling(latex_options = c("scale_down","HOLD_position"))

\newpage

Data transformation

The transformation procedure was performed using the parameters listed in table \ref{tab:trafo-para}.

params$trafo_parameter %>%
  knitr::kable(format="latex", caption = "User defined transformation parameter.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

Modeling

The transformed data are fitted to the model of a normal distribution (see eq. \ref{eq:normal}). \begin{align} p\left(x | \mu, \sigma \right) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\left(\frac{\left(x-\mu \right)^2}{2\sigma^{2}} \right)} \label{eq:normal} \end{align} Here, $\mu$ defines the expectation value, and $\sigma$ defines the standard deviation of the model. The likelihood $p\left(x | \mu, \sigma \right)$ was optimized on the transformed data regarding to the model parameters $\mu$ and $\sigma$ in order to determine the model parameters $\hat{\mu}$ and $\hat{\sigma}$ that maximize the likelihood $\hat{L} = p\left(x | \hat{\mu}, \hat{\sigma} \right)$.

The determined model parameters for the transformed data are listed in table \ref{tab:model-para}.

params$model_parameter %>%
  knitr::kable(format="latex", caption = "Model parameter of transformed feature distributions.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

Quality ensurance of modeling procedure

The model quality is estimated on the Bayesian Information Criterion (BIC), the Akaike Information Criterion (AIC), and the Root Mean Squared Error (RMSE). The BIC is calculated as depicted in equation \ref{eq:bic}. \begin{align} \text{BIC} &= k\ln(n)-2\ln(\hat{L}) \label{eq:bic} \end{align} with: \begin{itemize} \item $k :=$ the number of parameters estimated by the model. \item $n :=$ the number of instances in the feature. \item $\hat{L} :=$ the maximum likelihood function of the model. \end{itemize}

The AIC is calculated as depicted in equation \ref{eq:aic}. \begin{align} \text{AIC} &= 2k-2\ln(\hat{L}) \label{eq:aic} \end{align} with: \begin{itemize} \item $k :=$ the number of parameters estimated by the model. \item $\hat{L} :=$ the maximum likelihood function of the model. \end{itemize}

The RSME is calculated as depicted in equation \ref{eq:rmse}. \begin{align} \text{RMSE} &= \sqrt{\sum^{n}{i}\frac{\left(\hat{x}{i} - x_{i} \right)^{2}}{n}} \label{eq:rmse} \end{align} with: \begin{itemize} \item $n :=$ the number of instances in the feature. \item $i :=$ running parameter. \item $x_{i} :=$ observed values. \item $\hat{x}_{i} :=$ predicted vales. \end{itemize}

The model quality is listed in table \ref{tab:model-qual}.

params$model_quality %>%
  knitr::kable(format="latex", caption = "Model quality.")%>%
  kableExtra::kable_styling(latex_options = c("scale_down","HOLD_position"))

Statistics of modeling procedure

The statistics of the modeling procedure are listed in table \ref{tab:model-stats}.

params$model_statistics %>%
  knitr::kable(format="latex", caption = "Statistics of modeling procedure.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage

Rescaling

The selected rescaling method was r params$analysis_parameter$normalization_type. The choices for rescaling are min-max scaling, mean scaling, sigmoidal scaling, and a z transformation. The mathematical implementation of min-max scaling is described in equation \ref{eq:min-max-scaling}. Min-max scaling \begin{align} x' &= \frac{x - \text{min}\left( x\right)}{\text{max}\left( x\right) - \text{min}\left( x\right)} \label{eq:min-max-scaling} \end{align} with: \begin{itemize} \item $x :=$ observed values. \item $\text{max}(x) :=$ maximum of $x$. \item $\text{min}(x) :=$ minimum of $x$. \end{itemize}

The mathematical implementation of mean scaling is described in equation \ref{eq:mean-scaling}. \begin{align} x' &= \frac{x - \overline{x}}{\text{max}\left( x\right) - \text{min}\left( x\right)} \label{eq:mean-scaling} \end{align} with: \begin{itemize} \item $x :=$ observed values. \item $\overline{x} :=$ average of $x$. \item $\text{max}(x) :=$ maximum of $x$. \item $\text{min}(x) :=$ minimum of $x$. \end{itemize}

The mathematical implementation of sigmoidal scaling is described in equation \ref{eq:sigmoidal-scaling}. \begin{align} x' &= \frac{1}{1 + e^{-x}} \label{eq:sigmoidal-scaling} \end{align} with: \begin{itemize} \item $x :=$ observed values. \end{itemize}

The mathematical implementation of the z-transformation is described in equation \ref{eq:z-transformation}. \begin{align} x' &= \frac{x - \overline{x}}{\sigma} \label{eq:z-transformation} \end{align} with: \begin{itemize} \item $x :=$ observed values. \item $\overline{x} :=$ average of $x$. \item $\sigma :=$ standard deviation of $x$. \end{itemize}

\newpage

Imputation

The imputation of anomalies first requires their identification. Two types of anomalies are distinguished: Missing values and outliers. After their identification, outliers are replaced by missing values. Subsequently, the entire set of missing values is imputed.

\newpage

Missings

The appearance of missing data per feature is listed in table \ref{tab:missing-stats}.

params$missings_statistics %>%
  knitr::kable(format="latex", caption = "Statistics of missing values per feature.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

Recurring patterns formed by the occurrence of missing values in the instances examined are listed in table \ref{tab:missing-dist}.

params$missings_distribution %>%
  knitr::kable(format="latex", caption = "Occurrence of missing values per instance.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage

Outliers

The user-defined method for outliers detection was r params$analysis_parameter$anomalies_method. Grubbs test parameters were set to $\alpha =$ r params$analysis_parameter$alpha. DBSCAN parameters were set to $\epsilon =$ r params$analysis_parameter$epsilon and minSamples $=$ r params$analysis_parameter$minSamples. SVM parameters were set to $\gamma =$ r params$analysis_parameter$gamma and $\nu =$ r params$analysis_parameter$nu. SVM parameters were set to cutoff $=$ r params$analysis_parameter$cutoff and k $=$ r params$analysis_parameter$k. The seed was set to r params$analysis_parameter$anomalies_seed.

\newpage The appearance of outliers per feature is listed in table \ref{tab:outliers-stats}.

params$outliers_statistics %>%
  knitr::kable(format="latex", caption = "Statistics of outliers per feature.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage

Mutate

The user-defined method for imputation site mutation was r params$analysis_parameter$imputation_method. The following analysis parameters were set by the user: \begin{itemize} \item $\text{number of neighbors} = $ r params$analysis_parameter$number_of_neighbors. \item $\text{fraction of predictors} = $ r params$analysis_parameter$fraction_of_predictors. \item $\text{outflux threshold} = $ r params$analysis_parameter$outflux_threshold. \item $\text{imputation seed} = $ r params$analysis_parameter$imputation_seed. \item $\text{Iterations} = $ r params$analysis_parameter$iterations. \end{itemize}

\newpage The appearance of imputation sites per feature is listed in table \ref{tab:imputation-stats}.

params$imputation_statistics %>%
  knitr::kable(format="latex", caption = "Statistics of missing values per feature.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage Recurring patterns formed by the occurrence of missing values in the instances examined are listed in table \ref{tab:imputation-dist}.

params$imputation_distribution %>%
  knitr::kable(format="latex", caption = "Occurrence of missing values per instance.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage

Validation

The resulting data set was compared with the original data set. The distribution of the two data sets was compared using statistical hypothesis testing. The test results are summarized in Table \ref{tab:validation-test}.

params$validation_test %>%
  knitr::kable(format="latex", caption = "Validation via hypothesis tests.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage In a second test procedure, the correlation of all transformed and scaled features with each other was calculated for both data sets. The deviations of the correlation coefficients scattered around a mean value of $\mu$ with a standard deviation of $\sigma$. The test results are summarized in Table \ref{tab:validation-correlation}.

params$validation_correlation %>%
  knitr::kable(format="latex", caption = "Validation via correlation analysis.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")

\newpage The distribution of the deviation of the two distributions correlation coefficients $\Delta(\text{corr})$ was analyzed and tested against the Null-Hypothesis of $\mu = 0$. using a two-sided one-sample t-test. The Analysis results are listed in table \ref{tab:validation-corrSum}.

params$validation_corrSum %>%
  knitr::kable(format="latex", caption = "Summary of the $\\Delta$(corr) distribution.")%>%
  kableExtra::kable_styling(latex_options = "HOLD_position")


Try the pguIMP package in your browser

Any scripts or data that you put into this service are public.

pguIMP documentation built on Sept. 30, 2021, 5:08 p.m.