View source: R/ConQuR_main_tune.R
Tune_ConQuR | R Documentation |
Tune over variations of ConQuR
Tune_ConQuR( tax_tab, batchid, covariates, batch_ref_pool, logistic_lasso_pool, quantile_type_pool, simple_match_pool, lambda_quantile_pool, interplt_pool, frequencyL, frequencyU, cutoff = 0.1, delta = 0.4999, taus = seq(0.005, 0.995, by = 0.005), num_core = 2 )
tax_tab |
The taxa read count table, samples (row) by taxa (col). |
batchid |
The batch indicator, must be a factor. |
covariates |
The data.frame contains the key variable of interest and other covariates, e.g., data.frame(key, x1, x2). |
batch_ref_pool |
A vector of characters, the candidates for reference batch, e.g., c(“0”, “2”). |
logistic_lasso_pool |
A vector of logical values, whether or not using the L1-penalized logistic regression, e.g., c(T, F). |
quantile_type_pool |
A vector of characters, the candidates for quantile regression type, e.g., c(“standard”, “lasso”). |
simple_match_pool |
A vector of logical values, whether or not using the simple quantile-quantile matching, e.g., c(T, F). |
lambda_quantile_pool |
A vector of characters, the candidates for the penalization parameter in quantile regression (“lasso” or “composite”), e.g., c(NA, “2p/n”, “2p/logn”). |
interplt_pool |
A vector of logical values, whether or not using the data-driven linear interpolation between zero and non-zero quantiles, e.g., c(T, F). |
frequencyL |
A real constant between 0 and 1, the lower bound of prevalence that needs tuning. |
frequencyU |
A real constant between 0 and 1, the upper bound of prevalence that needs tuning. |
cutoff |
A real constant, the grid size of prevalence for tuning; default is 0.1. |
delta |
A real constant in (0, 0.5), determing the size of the interpolation window if interplt=TRUE, a larger delta leads to a narrower interpolation window; default is 0.4999. |
taus |
A sequence of quantile levels, determing the “precision” of estimating conditional quantile functions; default is seq(0.005, 0.995, by=0.005). |
num_core |
A real constant, the number of cores used for computing; default is 2. |
“original”, i.e., the original data without correction is always a default candidate.
If “standard” is one candidate for quantile_type_pool
, always include NA as one candidate for lambda_quantile_pool
.
Be cautious with candidate “composite” for quantile_type_pool
, the underlying assumption is strong and the computation might be slow.
The tuning procedure finds the local optimal in each cutoff. If frequencyL
=0.2, frequencyU
=0.5 and cutoff
=0.1, the functions determines the combination achieving maximum removal of batch variations on taxa present in 20%-30%, ..., 40%-50% of the samples, respectively.
The same reference batch is used across taxa in the final optimal corrected table.
A list
tax_final - The optimal corrected taxa read count table, samples (row) by taxa (col).
method_final - A table summarizing variations of ConQuR chosen for each prevalence cutoff.
Ling, W. et al. (2021+). ConQuR: batch effects removal for microbiome data in large-scale epidemiology studies via conditional quantile regression
Ling, W. et al. (2020+). Statistical inference in quantile regression for zero-inflated outcomes. Statistica Sinica.
Machado, J.A.F., Silva, J.S. (2005). Quantiles for counts. Journal of the American Statistical Association 100(472), 1226–1237.
Koenker, R. & Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the Econometric Society, 33-50.
Koenker, R. (2005). Econometric Society Monographs: Quantile Regression. New York: Cambridge University.
Zou, H. & Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. The Annals of Statistics 36, 1108-1126.
Anderson, M. J. (2014). Permutational multivariate analysis of variance (PERMANOVA). Wiley statsref: statistics reference online, 1-15.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.