README.md

Predomics - Interpretable machine learning for omics data

DALL-E3's view of PredOmics

The predomics package offers access to an original Machine Learning framework implementing several heuristics that allow discovering sparse and interpretable models in large datasets. These models are efficient and adapted for classification and regression tasks in metagenomics and other datasets with commensurable variables. We introduce the custom BTR (BIN, TER, RATIO) languages that describe different types of associations between variables. Moreover, in the same framework we implemented several state-of-the-art methods (SOTA) including RF, ENET and SVM. The predomics package started in 2015 and has evolved quickly since. A major improvement came in 2023. The package comes also with predomicsapp, a R Shiny application for easy training and exploration of results.

Badges

R Package License: GPL v3 GitHub issues GitHub forks GitHub stars

Table of Contents

  1. Installation
  2. Usage
  3. Features
  4. Screenshots
  5. Technologies
  6. License
  7. Authors
  8. FAQs

Overview

We introduce here the predomics package, which is designed to search for simple and interpretable predictive models from omics data and more specifically metagenomics. These models, called BTR (for Bin/Ter/Ratio) are based on a novel family of languages designed to represent the microbial interactions in microbial ecosystems. Moreover, in this package we have proposed four different optimization heuristics that allow to discover some of the best predictive models. A model in predomics is a set of indexes from the dataset (i.e. variables) along with the respective coefficients belonging to the ternary set {-1, 0, 1} and an intercept of the form (A + B + C - K - L - M < intercept). The number of variables in a model, also known as model size, sparsity or parsimony, can vary in a range provided as a parameters to the experiment.

In predomics we have impemented the following types of object:

All these objects can be quickly viewed with the printy() function. Other existing functions allow conversion from one object type to the other as for instance modelCollectionToPopulation(). An experiment can be explored using the digest() routine along with many other functions implemented in more than 18K lines of code that compose this package.

Heuristics

In this package we have proposed four different heuristics to search for the best predictive models.

Predomics languages

A predomics model is coded in R as a S3 object, which contains a certain number of attributes among which the learner (algorithm) that generated it but also the language that is used. The languages we have proposed in the current version are the following.

Contact

If you have any questions or feedback, please contact us at:



predomics/predomicspkg documentation built on Dec. 11, 2024, 11:06 a.m.