README.md

BEEM

BEEM is an approach to infer models for microbial community dynamics based on metagenomic sequencing data (16S or shotgun-metagenomics). It is based on the commonly used generalized Lotka-Volterra modelling (gLVM) framework. BEEM uses an iterative EM algorithm to simultaneously infer scaling factors (microbial biomass) and model parameters (microbial growth rate and interaction terms) from longitudinal data and can thus work directly with the relative abundance values that are obtained with metagenomic sequencing.

Note: BEEM stands for Biomass Estimation and model inference with an Expectation Maximization algorithm. We have now extended the BEEM framework to be able to work with cross-sectional data (BEEM-static, check out our R package here).

Dependencies

BEEM was written in R (>=3.3.1) and requires the following packages: - foreach - doMC: this currently only works on MacOS or LinuxOS - lokern - pspline - monomvn

You can install BEEM as an R package using devtools

devtools::install_github('csb5/beem')

Input data

The input files for BEEM should have the same format as described in the manual for MDSINE. The following two files are required by BEEM:

OTU table

This should be a tab-delimited text file whose first row has the sample IDs and the first column has the OTU IDs (or taxonomic annotations). Each row should then contain the relative abundance of one OTU across all samples and each column should contain the relative abundances of all OTUs in that sample.

Metadata

The metadata file should be a tab-delimited text file with the following columns:

sampleID    isIncluded    subjectID    measurementID

Sample data

We have provided several sample input files that were also analyzed in our manuscript.

Data from Props et. al. (2016)

Data from Gibbons et. al. (2017)

Usage

Basic Usage (R commands)

## Load functions
library(beem)
## Read inputs
counts <- read.table('counts.txt', head=F, row.names=1)
metadata <- read.table('metadata.txt', head=T)
## Run BEEM
res <- EM(dat=input, meta=metadata)
## Estimate parameters
biomass <- biomassFromEM(res)
write.table(biomass, 'biomass.txt', col.names=F, row.names=F, quote=F)
gLVparameters <- paramFromEM(res, counts, metadata)
write.table(gLVparameters, 'gLVparameters.txt', col.names=T, row.names=F, sep='\t' , quote=F)

Output format

BEEM estimated parameters is an R data.frame (a table) with the following columns in order:

Analyses in the manuscript

The commands for reproducing the analysis reportd in the manuscript are presented as jupyter notebooks: (1) notebook on a demo of the gLVM simulation, (2) notebook for Props et. al. and (3) notebook for Gibbons et. al..

Citation

C Li, K R Chng, J S Kwah, T V Av-Shalom, L Tucker-Kellogg & N Nagarajan. (2019). An expectation-maximization algorithm enables accurate ecological modeling using longitudinal metagenome sequencing data. Microbiome.

Contact

Please direct any questions or feedback to Chenhao Li (cli40@mgh.harvard.edu) and Niranjan Nagarajan (nagarajann@gis.a-star.edu.sg).



lch14forever/BEEM documentation built on April 5, 2025, 11:24 p.m.