knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette provides an overview of the different statistical tests provided in the package. The first section discusses univariate tests, which are repeated for each feature.
Unless otherwise stated, all functions return separate data frames or other objects with the results. These can be then added to the object results using join_results(object, results)
These functions provide summary statistics and effect sizes for all features:
summary_statistics
cohens_d
fold_change
These functions perform univariate hypothesis tests for each feature, report relevant statistics and correct the p-values using FDR correction. For features, where the model fails for some reason, all statistics are recorded as NA. NOTE setting all_features = FALSE
does not prevent the tests on the flagged compounds, but only affects p-value correction, where flagged features are not included in the correction and thus do not have an FDR-corrected p-value. To prevent the testing of flagged features alltogether, use drop_flagged
before the tests.
Many R functions for statistical tests use a so-called formula interface. For example, the function lm
that is used for fitting linear models uses the formula interface, so when predicting the fuel consumption (mpg - miles per gallon) by the car weight (wt) in the inbuilt mtcars dataset, we would run:
lm(mpg ~ wt, data = mtcars)
For many of the univariate statistical test functions in this package use the formula interface, where the formula is provided as a character, with one special condition: the word "Feature" will get replaced at each iteration by the corresponding feature name. So for example, when testing if any of the features predict the difference between study groups, the formula would be: "Group ~ Feature". Or, when testing if group and time point affect metabolite levels, the formula could be "Feature ~ Group + Time + Group:Time", with the last term being an interaction term ("Feature ~ Group * Time" is equivalent).
Now that we know how the formula interface looks like, let's list the univariate statistical functions available:
perform_lm
perform_lmer
(uses lmer function from the lme4 package, with lmerTest package for p-values) perform_homoscedasticity_tests
perform_kruskal_wallis
perform_oneway_anova
perform_t_test
Some functions do not use the formula interface. They include
perform_pairwise_t_test
perform_correlation_tests
perform_auc
Model diagnostics visalizations are currently available for linear models and linear mixed models, see documenation of save_lm_diagnostic_plots
.
fit_rf
fits a random forest predicting a column in the sample information (pData(obejct)
) by the features importance_rf
extracts the feature importance in random forest prediction in a nice formatNot yet implemented, but coming soon!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.