README.md

SIAMCAT - Statistical Inference of Associations between Microbial Communities And host phenoType

Overview

SIAMCAT is a pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes. A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots). SIAMCAT is available in three different flavors: + Galaxy web server + command line tool + R package

Please see the Support Section if you run into problems when using SIAMCAT.

Input data format

The input data should be organized in the same way for every version of SIAMCAT. All files are in tab-separated column format

Galaxy interface:

The Galaxy interface can be found here: http://siamcat.embl.de/

Galaxy in brief

Additional info: https://usegalaxy.org/ (in particular the Help menu) and https://wiki.galaxyproject.org/Learn

Getting started with Galaxy

Start by uploading your data (see above for input data formats) using the DATA IMPORT / Import Data module / Upload File

Then procede by executing all SIAMCAT modules in order (from A to I). See example history / Workflow as well as each module's description for specific information on input and output data

Commandline version

The commandline version are a collection of modules implemented in R which are called via a bash script.

# type
git clone beta:/g/bork4/zeller/dev/siamcat
# in the folder in which you'd like to clone the siamcat repository

R packages required to run SIAMCAT:

install.packages('optparse')
install.packages('LiblineaR')
install.packages('pROC')
install.packages('colorRamps')
install.packages('RColorBrewer')
install.packages('beanplot')

Using the Commandline version

...COMING SOON...

R package

The SIAMCAT R package ...COMING SOON...

Using the R package

...COMING SOON...

Support

Google user group for support:

https://groups.google.com/d/forum/siamcat-users

Known issues

Examples are weighted differently between classes (a remnant of our colorectal cancer microbiome study). Fixed in Galaxy, will be pushed to GitHub soon.

Class labels are somehow swapped in the LASSO module, so that prediction scores are 1 - p instead of p (posterior probability), consequently precision-recall curves are incorrect, but ROC-curves are unaffected. Appears to only occur in a recent version of R and/or the LiblineaR package; will be fixed with high priority.

Contact

Please let me know if you run into any issues (mailto: zeller@embl.de)



KonradZych/SIAMCAT documentation built on May 17, 2019, 6:20 p.m.