Tune_ConQuR: Tune over variations of ConQuR

View source: R/ConQuR_main_tune.R

Tune_ConQuRR Documentation

Tune over variations of ConQuR

Description

Tune over variations of ConQuR

Usage

Tune_ConQuR(
  tax_tab,
  batchid,
  covariates,
  batch_ref_pool,
  logistic_lasso_pool,
  quantile_type_pool,
  simple_match_pool,
  lambda_quantile_pool,
  interplt_pool,
  frequencyL,
  frequencyU,
  cutoff = 0.1,
  delta = 0.4999,
  taus = seq(0.005, 0.995, by = 0.005),
  num_core = 2
)

Arguments

tax_tab

The taxa read count table, samples (row) by taxa (col).

batchid

The batch indicator, must be a factor.

covariates

The data.frame contains the key variable of interest and other covariates, e.g., data.frame(key, x1, x2).

batch_ref_pool

A vector of characters, the candidates for reference batch, e.g., c(“0”, “2”).

logistic_lasso_pool

A vector of logical values, whether or not using the L1-penalized logistic regression, e.g., c(T, F).

quantile_type_pool

A vector of characters, the candidates for quantile regression type, e.g., c(“standard”, “lasso”).

simple_match_pool

A vector of logical values, whether or not using the simple quantile-quantile matching, e.g., c(T, F).

lambda_quantile_pool

A vector of characters, the candidates for the penalization parameter in quantile regression (“lasso” or “composite”), e.g., c(NA, “2p/n”, “2p/logn”).

interplt_pool

A vector of logical values, whether or not using the data-driven linear interpolation between zero and non-zero quantiles, e.g., c(T, F).

frequencyL

A real constant between 0 and 1, the lower bound of prevalence that needs tuning.

frequencyU

A real constant between 0 and 1, the upper bound of prevalence that needs tuning.

cutoff

A real constant, the grid size of prevalence for tuning; default is 0.1.

delta

A real constant in (0, 0.5), determing the size of the interpolation window if interplt=TRUE, a larger delta leads to a narrower interpolation window; default is 0.4999.

taus

A sequence of quantile levels, determing the “precision” of estimating conditional quantile functions; default is seq(0.005, 0.995, by=0.005).

num_core

A real constant, the number of cores used for computing; default is 2.

Details

  • “original”, i.e., the original data without correction is always a default candidate.

  • If “standard” is one candidate for quantile_type_pool, always include NA as one candidate for lambda_quantile_pool.

  • Be cautious with candidate “composite” for quantile_type_pool, the underlying assumption is strong and the computation might be slow.

  • The tuning procedure finds the local optimal in each cutoff. If frequencyL=0.2, frequencyU=0.5 and cutoff=0.1, the functions determines the combination achieving maximum removal of batch variations on taxa present in 20%-30%, ..., 40%-50% of the samples, respectively.

  • The same reference batch is used across taxa in the final optimal corrected table.

Value

A list

  • tax_final - The optimal corrected taxa read count table, samples (row) by taxa (col).

  • method_final - A table summarizing variations of ConQuR chosen for each prevalence cutoff.

References

  • Ling, W. et al. (2021+). ConQuR: batch effects removal for microbiome data in large-scale epidemiology studies via conditional quantile regression

  • Ling, W. et al. (2020+). Statistical inference in quantile regression for zero-inflated outcomes. Statistica Sinica.

  • Machado, J.A.F., Silva, J.S. (2005). Quantiles for counts. Journal of the American Statistical Association 100(472), 1226–1237.

  • Koenker, R. & Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the Econometric Society, 33-50.

  • Koenker, R. (2005). Econometric Society Monographs: Quantile Regression. New York: Cambridge University.

  • Zou, H. & Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. The Annals of Statistics 36, 1108-1126.

  • Anderson, M. J. (2014). Permutational multivariate analysis of variance (PERMANOVA). Wiley statsref: statistics reference online, 1-15.


wdl2459/ConQuR documentation built on Aug. 28, 2022, 6:08 a.m.