SPRING: Semi-Parametric Rank-based approach for INference in...

Semi-Parametric Rank-based approach for INference in Graphical model (SPRING)


SPRING follows the neighborhood selection methodology outlined in "mb" method (Meinshausen and Buhlmann (2006)).


  quantitative = FALSE,
  method = "mb",
  lambda.min.ratio = 0.01,
  nlambda = 20,
  lambdaseq = exp(seq(log(0.6), log(0.6 * lambda.min.ratio), length.out = nlambda)),
  seed = 10010,
  ncores = 1,
  thresh = 0.1,
  subsample.ratio = 0.8,
  rep.num = 20,
  Rtol = 1e-06,
  verbose = TRUE,
  verboseR = FALSE,
  Rmethod = "original"



n by p matrix of microbiome count data, either quantitative or compositional counts. Each row represents each subject/sample and each column represents each OTU (operational taxonomic unit).


default is FALSE, which means input "data" is compositional data, which will be normalized using mclr transformation within a function. If TRUE, it means "quantitative" counts are input and no normalization will be applied.


graph estimation methods. Currently, only "mb" method is available.


default is 0.01


default is 20.


a sequence of decreasing positive numbers to control the regularization. The default sequence has 20 values generated to be equally spaced on a logarithmic scale from 0.6 to 0.006. Users can specify a sequence to override the default sequence. If user specify as "data-specific", then the lambda sequence will be generated using estimated rank-based correlation matrix from data.


the seed for subsampling.


number of cores to use for subsampling. The default is 1.


threshold for StARS selection criterion. 0.1 is recommended (default). The smaller threshold returns sparser graph.


0.8 is default. The recommended values are 10*sqrt(n)/n for n > 144 or 0.8 otherwise.


the repetition number of subsampling for StARS eddge stability selection. The default value is 20.


Desired accuracy when calculating the solution of bridge function in estimateR function.


If verbose = FALSE, tracing information printing for HUGE (High-dimensional Undirected Graph Estimation) with a specified method (currently "mb" is only available) is disabled. The default value is TRUE.


If verboseR = FALSE, printing information whetehr nearPD is used or not when calculating rank-based correlation matrices is disabled. The defalut value is FALSE.


The calculation method of latent correlation. Either "original" method or "approx". If Rmethod = "original", multilinear approximation method is used, which is much faster than the original method. If Rmethod = "original", optimization of the bridge inverse function is used. The default is "approx".


SPRING returns a data.frame containing

  • output: Output results of pulsar::pulsar based on StARS criterion. It contains:

    • merge: a list of length nlambda and each element of list contains a matrix of edge selection probability. Each lambda value, this edge selection probability is calculated across rep.num.

    • summary: the summary statistic over rep.num graphs at each value of lambda

    • opt.index: index (along the path) of optimal lambda selected by the criterion at the desired threshold. Will return 0 if no optimum is found or NULL if selection for the criterion is not implemented.

    • criterion: we use StARS for our stability criterion.

  • fit: Output results of pulsar::refit function. It contains:

    • est: a data frame containing

      • beta: Estimates of beta coefficient matrices (of size p by p) by "mb" method on the whole data at each of whole lambda sequence value.

      • path: Estimates of precision matrix (of size p by p) on the whole data at each of whole lambda sequence value.

    • refit: final estimates of precision matrix (of size p by p).

  • lambdaseq: lambda sequence used in the analysis


Meinshausen N. and Buhlmann P. (2006) "High-dimensional graphs and variable selection with the lasso", The Annals of Statistics, Vol 34, No. 3, 1436 - 1462.

Yoon G., Gaynanova I. and Müller C. (2019) "Microbial Networks in SPRING - Semi-parametric Rank-Based Correlation and Partial Correlation Estimation for Quantitative Microbiome Data", Frontiers in Genetics, 10:516.


rm(list = ls())

# Load the synthetic count data
data("QMP") # n = 1000 and p = 100 synthetic dataset

# SPRING on Synthetic Data, when assuming the data as quantitative counts.
# The same setting used in Yoon et al. (2019) Frontiers in Genetics.
## Not run: 
# This takes around 23 minutes.
fit.spring <- SPRING(QMP, quantitative = TRUE, lambdaseq = "data-specific",
                     nlambda = 50, seed = 10010, ncores = 2, rep.num = 50)

## End(Not run)

# SPRING on Compositional data. Row sums are scaled to 1. Then, mclr-transformation will be applied.
## Not run: 
compoData <- QMP/rowSums(QMP)
fit.spring <- SPRING(compoData, quantitative = FALSE, lambdaseq = "data-specific",
                     nlambda = 10, rep.num = 10)

## End(Not run)

