MVR-package: Mean-Variance Regularization Package
In MVR: Mean-Variance Regularization

Description Details Acknowledgments Author(s) References See Also

MVR is a non-parametric method for joint adaptive mean-variance regularization and variance stabilization of high-dimensional data.

It is suited for handling difficult problems posed by high-dimensional multivariate datasets (p \gg n paradigm), such as in omics-type data, among which are that the variance is often a function of the mean, variable-specific estimators of variances are not reliable, and tests statistics have low powers due to a lack of degrees of freedom.

Key features include:

Normalization and/or variance stabilization of the data
Computation of mean-variance-regularized t-statistics (F-statistics to come)
Generation of diverse diagnostic plots
Computationally efficient implementation using C/C++ interfacing and an option for parallel computing to enjoy a fast and easy experience in the R environment

The following describes all the end-user functions, and internal R subroutines needed for running a complete MVR procedure. Other internal subroutines are not to be called by the end-user at any time. For computational efficiency, end-user regularization functions offer the option to configure a cluster. This is indicated by an asterisk (* = optionally involving cluster usage). The R functions are categorized as follows:

END-USER REGULARIZATION & VARIANCE STABILIZATION FUNCTION
mvr (*) Function for Mean-Variance Regularization and Variance Stabilization.
End-user function for Mean-Variance Regularization (MVR) and Variance Stabilization by similarity statistic under sample group homoscedasticity or heteroscedasticity assumption. The function takes advantage of the R package parallel, which allows users to create a cluster of workstations on a local and/or remote machine(s), enabling parallel execution of this function and scaling up with the number of CPU cores available.
END-USER REGULARIZED TESTS-STATISTICS FUNCTIONS
mvrt.test (*) Function for Computing Mean-Variance Regularized T-test Statistic and Its Significance.
End-user function for computing MVR t-test statistic and its significance (p-value) under sample group homoscedasticity or heteroscedasticity assumption. The function takes advantage of the R package parallel, which allows users to create a cluster of workstations on a local and/or remote machine(s), enabling parallel execution of this function and scaling up with the number of CPU cores available.
END-USER DIAGNOSTIC PLOTS FOR QUALITY CONTROL
cluster.diagnostic Function for Plotting Summary Cluster Diagnostic Plots.
Plot similarity statistic profiles and the optimal joint clustering configuration for the means and the variances by group. Plot quantile profiles of means and standard deviations by group and for each clustering configuration, to check that the distributions of first and second moments of the MVR-transformed data approach their respective null distributions under the optimal configuration found, assuming independence and normality of all the variables.

target.diagnostic Function for Plotting Summary Target Moments Diagnostic Plots.
Plot comparative distribution densities of means and standard deviations of the data before and after Mean-Variance Regularization to check for location shifts between observed first and second moments and their expected target values under a target centered homoscedastic model. Plot comparative QQ scatterplots to look at departures between observed distributions of first and second moments of the MVR-transformed data and their theoretical distributions assuming independence and normality of all the variables.

stabilization.diagnostic Function for Plotting Summary Variance Stabilization Diagnostic Plots.
Plot comparative variance-mean plots to check the variance stabilization across variables before and after Mean-Variance Regularization.

normalization.diagnostic Function for Plotting Summary Normalization Diagnostic Plots.
Plot comparative Box-Whisker and Heatmap plots of variables across samples check the effectiveness of normalization before and after Mean-Variance Regularization.
OTHER END-USER FUNCTIONS
MVR.news Display the MVR Package News
Function to display the log file NEWS of updates of the MVR package.
END-USER DATASETS
A Real dataset coming from a quantitative proteomics experiment, consisting of n=6 samples split into a control ("M") and a treated group ("S") with p=9052 unique peptides or predictor variables. This is a balanced design with two sample groups (G=2), under unequal sample group variance.

A Synthetic dataset with n=10 observations (samples) and p=100 variables, where nvar=20 of them are significantly different between the two sample groups. This is a balanced design with two sample groups (G=2), under unequal sample group variance.

Known Bugs/Problems : None at this time.

This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. This project was partially funded by the National Institutes of Health (P30-CA043703).

"Jean-Eudes Dazard, Ph.D." jean-eudes.dazard@case.edu
"Hua Xu, Ph.D." huaxu77@gmail.com
"Alberto Santana, MBA." ahs4@case.edu

Maintainer: "Jean-Eudes Dazard, Ph.D." jean-eudes.dazard@case.edu

Dazard J-E. and J. S. Rao (2010). "Regularized Variance Estimation and Variance Stabilization of High-Dimensional Data." In JSM Proceedings, Section for High-Dimensional Data Analysis and Variable Selection. Vancouver, BC, Canada: American Statistical Association IMS - JSM, 5295-5309.
Dazard J-E., Hua Xu and J. S. Rao (2011). "R package MVR for Joint Adaptive Mean-Variance Regularization and Variance Stabilization." In JSM Proceedings, Section for Statistical Programmers and Analysts. Miami Beach, FL, USA: American Statistical Association IMS - JSM, 3849-3863.
Dazard J-E. and J. S. Rao (2012). "Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data." Comput. Statist. Data Anal. 56(7):2317-2333.

makeCluster (R package parallel)
justvsn (R package vsn) Variance stabilization and calibration for microarray data Huber, 2002
eBayes (R package limma) Bayesian Regularized t-test statistic Smyth, 2004
samr (R package samr) SAM Regularized t-test statistic Tusher et al., 2001, Storey, 2003
matest (R package maanova) James-Stein shrinkage estimator-based Regularized t-test statistic Cui et al., 2005
ebam (R package siggenes) Empirical Bayes Regularized z-test statistic Efron, 2001
bayesT Hierarchical Bayesian Regularized t-test statistic Baldi et al., 2001