impute_mv: Impute MV in data.

View source: R/impute_mv.R

impute_mvR Documentation

Impute MV in data.

Description

Impute MV in data.

Usage

impute_mv(
  object,
  sample_id,
  method = c("knn", "rf", "mean", "median", "zero", "minimum", "bpca", "svdImpute",
    "ppca"),
  k = 10,
  rowmax = 0.5,
  colmax = 0.8,
  maxp = 1500,
  rng.seed = 362436069,
  maxiter = 10,
  ntree = 100,
  decreasing = FALSE,
  nPcs = 2,
  maxSteps = 100,
  threshold = 1e-04,
  ...
)

Arguments

object

A mass_dataset object.

sample_id

which samples you want to impute missing value? It is a index or character vector (sample_id)

method

Imputation method. It contains "knn", "rf" (missForest), "mean", "median", "zero", "minium", "bpca" (BPCA), "svd" (SVD) and "ppca" (PPCA). Default is "knn". The detial of this method can be find in detail and reference paperes.

k

See ?impute.knn

rowmax

See ?impute.knn

colmax

See ?impute.knn

maxp

See ?impute.knn

rng.seed

See ?impute.knn

maxiter

See ?missForest

ntree

See ?missForest

decreasing

See ?missForest

nPcs

See ?bpca

maxSteps

See ?bpca

threshold

See ?bpca

...

Other arguments.

Value

A new mass_dataset object.

Author(s)

Xiaotao Shen shenxt1990@outlook.com

Examples

library(massdataset)
data("expression_data")
data("sample_info")
data("variable_info")
object =
  create_mass_dataset(
    expression_data = expression_data,
    sample_info = sample_info,
    variable_info = variable_info
  )
object

get_mv_number(object)
massdataset::get_mv_number(object, by = "sample")

###remove variables who have mv in more than 20% QC samples
qc_id =
  object %>%
  activate_mass_dataset(what = "sample_info") %>%
  filter(class == "QC") %>%
  pull(sample_id)

subject_id =
  object %>%
  activate_mass_dataset(what = "sample_info") %>%
  filter(class == "Subject") %>%
  pull(sample_id)

object =
  object %>%
  mutate_variable_na_freq(according_to_samples = qc_id) %>%
  mutate_variable_na_freq(according_to_samples = subject_id) %>%
  activate_mass_dataset(what = "variable_info") %>%
  filter(na_freq < 0.2 & na_freq.1 < 0.5)

###remove samples with MV > 50% except Blank samples
object =
  filter_samples(
    object = object,
    flist = function(x) {
      sum(is.na(x)) / nrow(object) < 0.5
    },
    apply_to = c(qc_id, subject_id),
    prune = TRUE
  )

blank_id =
  object %>%
  activate_mass_dataset(what = "sample_info") %>%
  filter(class == "Blank") %>%
  pull(sample_id)

object1 =
  impute_mv(object = object,
            sample_id = blank_id,
            method = "zero")

object1 %>%
  activate_mass_dataset(what = "expression_data") %>%
  select(dplyr::contains("Blank")) %>%
  extract_expression_data() %>%
  head()

object2 =
  impute_mv(object = object,
            sample_id = subject_id,
            method = "knn")

object2 %>%
  activate_mass_dataset(what = "sample_info") %>%
  filter(class == "Subject") %>%
  extract_expression_data() %>%
  head()

tidymass/masscleaner documentation built on Sept. 4, 2023, 3:21 a.m.