cbea: Enrichment analysis using competitive compositional balances...

cbeaR Documentation

Enrichment analysis using competitive compositional balances (CBEA)

Description

cbea is used compute enrichment scores per sample for pre-defined sets using the CBEA (Competitive Balances for Enrichment Analysis).

Usage

cbea(
  obj,
  set,
  output,
  distr = NULL,
  adj = FALSE,
  n_perm = 100,
  parametric = TRUE,
  thresh = 0.05,
  init = NULL,
  control = NULL,
  parallel_backend = NULL,
  ...
)

## S4 method for signature 'TreeSummarizedExperiment'
cbea(
  obj,
  set,
  output,
  distr = NULL,
  abund_values,
  adj = FALSE,
  n_perm = 100,
  parametric = TRUE,
  thresh = 0.05,
  init = NULL,
  control = NULL,
  parallel_backend = NULL,
  ...
)

## S4 method for signature 'data.frame'
cbea(
  obj,
  set,
  taxa_are_rows = FALSE,
  id_col = NULL,
  output,
  distr = NULL,
  adj = FALSE,
  n_perm = 100,
  parametric = TRUE,
  thresh = 0.05,
  init = NULL,
  control = NULL,
  parallel_backend = NULL,
  ...
)

## S4 method for signature 'matrix'
cbea(
  obj,
  set,
  taxa_are_rows = FALSE,
  output,
  distr = NULL,
  adj = FALSE,
  n_perm = 100,
  parametric = TRUE,
  thresh = 0.05,
  init = NULL,
  control = NULL,
  parallel_backend = NULL,
  ...
)

Arguments

obj

The element of class TreeSummarizedExperiment, data.frame, or matrix. phyloseq is not supported due to conflicting dependencies and TreeSummarizedExperiment is much more compact.

set

BiocSet. Sets to be tested for enrichment in the BiocSet format. Taxa names must be in the same format as elements in the set.

output

(String). The form of the output of the model. Has to be either zscore, cdf, raw, pval, or sig

distr

(String). The choice of distribution for the null. Can be either mnorm (2 component mixture normal), norm (Normal distribution), or NULL if parametric is TRUE.

adj

(Logical). Whether correlation adjustment procedure is utilized. Defaults to FALSE.

n_perm

(Numeric). Add bootstrap resamples to both the permuted and unpermuted data set. This might help with stabilizing the distribution fitting procedure, especially if the sample size is low. Defaults to 1.

parametric

(Logical). Indicate whether a parametric distribution will be fitted to estimate z-scores, CDF values, and p-values. Defaults to TRUE

thresh

(Numeric). Threshold for significant p-values if sig is the output. Defaults to 0.05

init

(Named List). Initialization parameters for estimating the null distribution. Default is NULL.

control

(Named List). Additional arguments to be passed to fitdistr and normmixEM. Defaults to NULL.

parallel_backend

See documentation cbea

...

Additional arguments not used at the moment.

abund_values

(Character). Character value for selecting the assay to be the input to cbea

taxa_are_rows

(Logical). Indicate whether the data frame or matrix has taxa as rows

id_col

(Character Vector). Vector of character to indicate metadata columns to keep (for example, sample_id)

Details

This function support different formats of the OTU table, however for best results please use TreeSummarizedExperiment. phyloseq is supported, however CBEA will not explicitly import phyloseq package and will require users to install them separately. If use data.frame or matrix, users should specify whether taxa are rows using the taxa_are_rows option. Additionally, for data.frame, users can specify metadata columns to be kept via the id_col argument.
The output argument specifies what type of values will be returned in the final matrix. The options pval or sig returns either unadjusted p-values or dummy variables indicating whether a set is significantly enriched in that sample (based on unadjusted p-values thresholded at thresh). The option raw returns raw scores computed for each set without any distribution fitting or inference procedure. Users can use this option to examine the distribution of CBEA scores under the null.

Value

R An n by m matrix of enrichment scores at the sample level

Examples

data(hmp_gingival)
seq <- hmp_gingival$data
set <- hmp_gingival$set
# n_perm = 10 to reduce runtime
mod <- cbea(obj = seq, set = set, output = "zscore",
    abund_values = "16SrRNA",
    distr = "norm", parametric = TRUE,
    adj = TRUE, thresh = 0.05, n_perm = 10)

qpmnguyen/CBEA documentation built on April 4, 2022, 7:25 p.m.