mds_paper_notes.md

Ideas and TODO

Manuscript outline:

Name: metagenboot: Visual Diagnostics for Metagenomic Datasets via Dirichlet-Multinomial Bootstrapping

Introduction

jackknife beta diversity (in QIIME jackknifed_beta_diversity.py) - works by simply stacking the PCoA plots over one another (maybe works because the jackknife = removing one sample at a time so the changes are unlikely to be large) + confidence ellipses

Resampling models

Evaluating reliability of dimensionality reduction techniques

Dunthorn 17: We see that the two clusters for Panama are probably not real

Confidence ellipses (as used in QIIME) likely misleading as the model-based resamples (or even rarefied samples) are nowhere resembling an ellipse.

Why not do a big MDS over all samples? This has two problems: One practical, because NMDS (at least the vegan implementation) tends to have convergence issues for large datasets and takes looooong to compute. Second is theoretical: I've just artificially added similar points to the dataset. The more samples I take, the more the optimization function of NMDS will be rewarded for keeping the bootstrap samples close to the original, even when this means representing the distances between the original points poorly. And this is actually what happens, at least to an extent - for example when using the DESeq2 model to resample Tijana's data, NMDS ran on the big matrix resulted in the original observations neatly grouped by the DESeq2 predictors, making the plot hugely different from what you see when you run NMDS on the originals alone.

Falsifying approach: Reliable clustering survives high-noise resampling. Homogenity must be visible after low-noise resampling.

Diagnosing Coverage of Biological Variability

Appendix



cas-bioinf/metagenboot documentation built on Feb. 25, 2021, 3:58 p.m.