Home

/

GitHub

/

FrederickHuangLin/ANCOMBC

/

data_sanity_check: Data Sanity and Integrity Check

data_sanity_check: Data Sanity and Integrity Check
In FrederickHuangLin/ANCOMBC: Microbiome differential abudance and correlation analyses with bias correction

View source: R/data_sanity_check.R

data_sanity_check

R Documentation

Data Sanity and Integrity Check

Description

Determine if the input data is in a correct format

Usage

data_sanity_check(
  data,
  taxa_are_rows = TRUE,
  assay.type = assay_name,
  assay_name = "counts",
  rank = tax_level,
  tax_level = NULL,
  aggregate_data = NULL,
  meta_data = NULL,
  fix_formula,
  group = NULL,
  struc_zero = FALSE,
  global = FALSE,
  pairwise = FALSE,
  dunnet = FALSE,
  mdfdr_control = list(fwer_ctrl_method = "holm", B = 100),
  trend = FALSE,
  trend_control = list(contrast = NULL, node = NULL, solver = "ECOS", B = 100),
  verbose = TRUE
)

Arguments

`data`	the input data. The `data` parameter should be either a `matrix`, `data.frame`, `phyloseq` or a `TreeSummarizedExperiment` object. Both `phyloseq` and `TreeSummarizedExperiment` objects consist of a feature table (microbial count table), a sample metadata table, a taxonomy table (optional), and a phylogenetic tree (optional). If a `matrix` or `data.frame` is provided, ensure that the row names of the `metadata` match the sample names (column names if `taxa_are_rows` is TRUE, and row names otherwise) in `data`. if a `phyloseq` or a `TreeSummarizedExperiment` is used, this standard has already been enforced. For detailed information, refer to `?phyloseq::phyloseq` or `?TreeSummarizedExperiment::TreeSummarizedExperiment`. It is recommended to use low taxonomic levels, such as OTU or species level, as the estimation of sampling fractions requires a large number of taxa.
`taxa_are_rows`	logical. Whether taxa are positioned in the rows of the feature table. Default is TRUE.
`assay.type`	alias for `assay_name`.
`assay_name`	character. Name of the count table in the data object (only applicable if data object is a `(Tree)SummarizedExperiment`). Default is "counts". See `?SummarizedExperiment::assay` for more details.
`rank`	alias for `tax_level`.
`tax_level`	character. The taxonomic or non taxonomic(rowData) level of interest. The input data can be analyzed at any taxonomic or rowData level without prior agglomeration. Note that `tax_level` must be a value from `taxonomyRanks` or `rowData`, which includes "Kingdom", "Phylum" "Class", "Order", "Family" "Genus" "Species" etc. See `?mia::taxonomyRanks` for more details. Default is NULL, i.e., do not perform agglomeration, and the ANCOM-BC2 analysis will be performed at the lowest taxonomic level of the input `data`.
`aggregate_data`	The abundance data that has been aggregated to the desired taxonomic level. This parameter is required only when the input data is in `matrix` or `data.frame` format. For `phyloseq` or `TreeSummarizedExperiment` data, aggregation is performed by specifying the `tax_level` parameter.
`meta_data`	a `data.frame` containing sample metadata. This parameter is mandatory when the input `data` is a generic `matrix` or `data.frame`. Ensure that the row names of the `metadata` match the sample names (column names if `taxa_are_rows` is TRUE, and row names otherwise) in `data`.
`fix_formula`	the character string expresses how the microbial absolute abundances for each taxon depend on the fixed effects in metadata. When specifying the `fix_formula`, make sure to include the `group` variable in the formula if it is not NULL.
`group`	character. the name of the group variable in metadata. The `group` parameter should be a character string representing the name of the group variable in the metadata. The `group` variable should be discrete, meaning it consists of categorical values. Specifying the `group` variable is required if you are interested in detecting structural zeros and performing performing multi-group comparisons (global test, pairwise directional test, Dunnett's type of test, and trend test). However, if these analyses are not of interest to you, you can leave the `group` parameter as NULL. If the `group` variable of interest contains only two categories, you can also leave the `group` parameter as NULL. Default is NULL.
`struc_zero`	logical. Whether to detect structural zeros based on `group`. Default is FALSE. See `Details` for a more comprehensive discussion on structural zeros.
`global`	logical. Whether to perform the global test. Default is FALSE.
`pairwise`	logical. Whether to perform the pairwise directional test. Default is FALSE.
`dunnet`	logical. Whether to perform the Dunnett's type of test. Default is FALSE.
`mdfdr_control`	a named list of control parameters for mixed directional false discover rate (mdFDR), including 1) `fwer_ctrl_method`: family wise error (FWER) controlling procedure, such as "holm", "hochberg", "bonferroni", etc (default is "holm") and 2) `B`: the number of bootstrap samples (default is 100). Increase `B` will lead to a more accurate p-values. See `Details` for a more comprehensive discussion on mdFDR.
`trend`	logical. Whether to perform trend test. Default is FALSE.
`trend_control`	a named list of control parameters for the trend test, including 1) `contrast`: the list of contrast matrices for constructing inequalities, 2) `node`: the list of positions for the nodal parameter, 3) `solver`: a string indicating the solver to use (default is "ECOS"), and 4) `B`: the number of bootstrap samples (default is 100). Increase `B` will lead to a more accurate p-values. See `vignette` for the corresponding trend test examples.
`verbose`	logical. Whether to display detailed progress messages.

Value

a list containing the outputs formatted appropriately for downstream analysis.

Author(s)

Huang Lin

Examples

data(atlas1006, package = "microbiome")
check_results = data_sanity_check(data = atlas1006,
                                  tax_level = "Family",
                                  fix_formula = "age + sex + bmi_group",
                                  group = "bmi_group",
                                  struc_zero = TRUE,
                                  global = TRUE,
                                  verbose = TRUE)

FrederickHuangLin/ANCOMBC documentation built on June 11, 2025, 6:22 p.m.

FrederickHuangLin/ANCOMBC index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

FrederickHuangLin/ANCOMBC
Microbiome differential abudance and correlation analyses with bias correction

data_sanity_check: Data Sanity and Integrity Check
In FrederickHuangLin/ANCOMBC: Microbiome differential abudance and correlation analyses with bias correction

Data Sanity and Integrity Check

Description

Usage

Arguments

Value

Author(s)

Examples

Related to data_sanity_check in FrederickHuangLin/ANCOMBC...

R Package Documentation

Browse R Packages

We want your feedback!

FrederickHuangLin/ANCOMBC Microbiome differential abudance and correlation analyses with bias correction

data_sanity_check: Data Sanity and Integrity Check In FrederickHuangLin/ANCOMBC: Microbiome differential abudance and correlation analyses with bias correction

Data Sanity and Integrity Check

Description

Usage

Arguments

Value

Author(s)

Examples

Related to data_sanity_check in FrederickHuangLin/ANCOMBC...

R Package Documentation

Browse R Packages

We want your feedback!

FrederickHuangLin/ANCOMBC
Microbiome differential abudance and correlation analyses with bias correction

data_sanity_check: Data Sanity and Integrity Check
In FrederickHuangLin/ANCOMBC: Microbiome differential abudance and correlation analyses with bias correction