Home

/

GitHub

/

In waldronlab/MicrobiomeBenchmarkData: Datasets for benchmarking in microbiome research

knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Introduction

The MicrobiomeBenchamrkData package provides access to a collection of datasets with biological ground truth for benchmarking differential abundance methods. The datasets are deposited on Zenodo: https://doi.org/10.5281/zenodo.6911026

Installation

## Install BioConductor if not installed
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

## Release version (not yet in Bioc, so it doesn't work yet)
BiocManager::install("MicrobiomeBenchmarkData")

## Development version
BiocManager::install("waldronlab/MicrobiomeBenchmarkData")

library(MicrobiomeBenchmarkData)
library(purrr)

Sample metadata

All sample metadata is merged into a single data frame and provided as a data object:

data('sampleMetadata', package = 'MicrobiomeBenchmarkData')
## Get columns present in all samples
sample_metadata <- sampleMetadata |> 
    discard(~any(is.na(.x))) |> 
    head()
knitr::kable(sample_metadata)

Accessing datasets

Currently, there are r nrow(MicrobiomeBenchmarkData::getBenchmarkData()) datasets available through the MicrobiomeBenchmarkData. These datasets are accessed through the getBenchmarkData function.

Print avaialable datasets

If no arguments are provided, the list of available datasets is printed on screen and a data.frame is returned with the description of the datasets:

dats <- getBenchmarkData()

dats

Access a single dataset

In order to import a dataset, the getBenchmarkData function must be used with the name of the dataset as the first argument (x) and the dryrun argument set to FALSE. The output is a list vector with the dataset imported as a TreeSummarizedExperiment object.

tse <- getBenchmarkData('HMP_2012_16S_gingival_V35_subset', dryrun = FALSE)[[1]]
tse

Access a few datasets

Several datasets can be imported simultaneously by giving the names of the different datasets in a character vector:

list_tse <- getBenchmarkData(dats$Dataset[2:4], dryrun = FALSE)
str(list_tse, max.level = 1)

Access all of the datasets

If all of the datasets must to be imported, this can be done by providing the dryrun = FALSE argument alone.

mbd <- getBenchmarkData(dryrun = FALSE)
str(mbd, max.level = 1)

Annotations for each taxa are included in rowData

The biological annotations of each taxa are provided as a column in the rowData slot of the TreeSummarizedExperiment.

## In the case, the column is named as taxon_annotation 
tse <- mbd$HMP_2012_16S_gingival_V35_subset
rowData(tse)

Cache

The datasets are cached so they're only downloaded once. The cache and all of the files contained in it can be removed with the removeCache function.

removeCache()

Session information

sessionInfo()

waldronlab/MicrobiomeBenchmarkData documentation built on June 1, 2025, 11:14 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

waldronlab/MicrobiomeBenchmarkData
Datasets for benchmarking in microbiome research

In waldronlab/MicrobiomeBenchmarkData: Datasets for benchmarking in microbiome research

Introduction

Installation

Sample metadata

Accessing datasets

Print avaialable datasets

Access a single dataset

Access a few datasets

Access all of the datasets

Annotations for each taxa are included in rowData

Cache

Session information

R Package Documentation

Browse R Packages

We want your feedback!

waldronlab/MicrobiomeBenchmarkData Datasets for benchmarking in microbiome research

In waldronlab/MicrobiomeBenchmarkData: Datasets for benchmarking in microbiome research

Introduction

Installation

Sample metadata

Accessing datasets

Print avaialable datasets

Access a single dataset

Access a few datasets

Access all of the datasets

Annotations for each taxa are included in rowData

Cache

Session information

R Package Documentation

Browse R Packages

We want your feedback!

waldronlab/MicrobiomeBenchmarkData
Datasets for benchmarking in microbiome research