knitr::opts_chunk$set(echo = TRUE)

Run Biomarker

To identify biomarkers for a specific binary classification problem, users need to specify the taxonomy level and target variable. In the Advanced Options, users can also specify the number of CV repeats, number of CV folds, and top biomarker proportion. For example, with a 3-repeats 3-fold cross validation, animalcules will randomly split the dataset into 3 fold and run CV, then this procedure is repeated 3 times (each time will have a different random data split). The top biomarker proportion defines the threshold for selecting biomarkers: animalcules will generate a classification model based importance score for each microbe/feature and will choose the top 20% (based on the selected proportion which is 0.2 as default) features as the biomarkers.

Users can also choose binary classification models including logistic regression and random forest. After clicking the button "Run", the biomarker list will show up at the right-hand side.

Note:

Instructions:

Running time:

Importance Plot

Ranked feature importance score plot for the identified biomarkers is showed here. The higher the score, the more important this feature (species, genus, ..) in regard to the prediction power.

CV ROC Plot

The identified biomarkers were used to re-train the model via a cross-validation and a ROC plot is showed automatically in this tab.



compbiomed/animalcules documentation built on Feb. 7, 2024, 12:13 p.m.