impute_abundance: Imputation methods on missing value
In HuaZou/MicrobiomeAnalysis: Data analysis toolkits in metagenomics

impute_abundance

R Documentation

Imputation methods on missing value

Description

This function offers different methods to impute missing values in data.

Usage

impute_abundance(
   object,
   level = c(NULL, "Kingdom", "Phylum", "Class",
           "Order", "Family", "Genus",
           "Species", "Strain", "unique"),
   group,
   ZerosAsNA = FALSE,
   RemoveNA = TRUE,
   cutoff = 20,
   method = c("none", "LOD", "half_min", "median",
       "mean", "min", "knn", "rf",
       "global_mean", "svd", "QRILC"),
   LOD = NULL,
   knum = 10)

Arguments

`object`	(Required). a `phyloseq::phyloseq` or `SummarizedExperiment::SummarizedExperiment` object.
`level`	(Optional). character. Summarization level (from `rank_names(pseq)`, default: NULL).
`group`	(Required). character. group for determining missing values.
`ZerosAsNA`	(Optional). logical. zeros in the data are missing values (default: FALSE).
`RemoveNA`	(Optional). logical. those features with more than selected cutoff missing values in each group have to be removed (default: TRUE).
`cutoff`	(Optional). numeric. percentage of missing values allowed in each group. If one of the groups have less missing values than selected cutoff value, these feature will not be removed.
`method`	(Optional). character. Imputation method. Options are: "none": all missing values will be replaced by zero. "LOD": specific Limit Of Detection which provides by user. "half_min": half minimal values across samples except zero. "median": median values across samples except zero. "mean": mean values across samples except zero. "min": minimal values across samples except zero. "knn": k-nearest neighbors samples. "rf": nonparametric missing value imputation using Random Forest. "global_mean": a normal distribution with a mean that is down-shifted from the sample mean and a standard deviation that is a fraction of the standard deviation of the sample distribution. "svd": missing values imputation based Singular value decomposition. "QRILC": missing values imputation based quantile regression. (default: "none").
`LOD`	(Optional). Numeric. limit of detection (default: NULL).
`knum`	(Optional). Numeric. Number of neighbors to be used in the imputation (default=10).

Value

A phyloseq::phyloseq or SummarizedExperiment::SummarizedExperiment object with cleaned data.

Author(s)

Created by Pol Castellano-Escuder; Modified by Hua Zou (12/02/2022 Shenzhen China)

References

Armitage, E. G., Godzien, J., Alonso‐Herranz, V., López‐Gonzálvez, Á., & Barbas, C. (2015). Missing value imputation strategies for metabolomics data. Electrophoresis, 36(24), 3050-3060.

Examples


## Not run: 
# phyloseq object
data("Zeybel_2022_gut")
impute_abundance(
  Zeybel_2022_gut,
  level = "Phylum",
  group = "LiverFatClass",
  ZerosAsNA = TRUE,
  RemoveNA = TRUE,
  cutoff = 20,
  method = "knn")

# SummarizedExperiment object
data("Zeybel_2022_protein")
impute_abundance(
  Zeybel_2022_protein,
  group = "LiverFatClass",
  ZerosAsNA = TRUE,
  RemoveNA = TRUE,
  cutoff = 20,
  method = "knn")

## End(Not run)

HuaZou/MicrobiomeAnalysis documentation built on May 13, 2024, 11:10 a.m.