SPRING: Semi-Parametric Rank-based approach for INference in...

View source: R/SPRING.R

SPRINGR Documentation

Semi-Parametric Rank-based approach for INference in Graphical model (SPRING)

Description

SPRING follows the neighborhood selection methodology outlined in "mb" method (Meinshausen and Buhlmann (2006)).

Usage

SPRING(
  data,
  quantitative = FALSE,
  method = "mb",
  lambda.min.ratio = 0.01,
  nlambda = 20,
  lambdaseq = exp(seq(log(0.6), log(0.6 * lambda.min.ratio), length.out = nlambda)),
  seed = 10010,
  ncores = 1,
  thresh = 0.1,
  subsample.ratio = 0.8,
  rep.num = 20,
  Rtol = 1e-06,
  verbose = TRUE,
  verboseR = FALSE,
  Rmethod = "original"
)

Arguments

data

n by p matrix of microbiome count data, either quantitative or compositional counts. Each row represents each subject/sample and each column represents each OTU (operational taxonomic unit).

quantitative

default is FALSE, which means input "data" is compositional data, which will be normalized using mclr transformation within a function. If TRUE, it means "quantitative" counts are input and no normalization will be applied.

method

graph estimation methods. Currently, only "mb" method is available.

lambda.min.ratio

default is 0.01

nlambda

default is 20.

lambdaseq

a sequence of decreasing positive numbers to control the regularization. The default sequence has 20 values generated to be equally spaced on a logarithmic scale from 0.6 to 0.006. Users can specify a sequence to override the default sequence. If user specify as "data-specific", then the lambda sequence will be generated using estimated rank-based correlation matrix from data.

seed

the seed for subsampling.

ncores

number of cores to use for subsampling. The default is 1.

thresh

threshold for StARS selection criterion. 0.1 is recommended (default). The smaller threshold returns sparser graph.

subsample.ratio

0.8 is default. The recommended values are 10*sqrt(n)/n for n > 144 or 0.8 otherwise.

rep.num

the repetition number of subsampling for StARS eddge stability selection. The default value is 20.

Rtol

Desired accuracy when calculating the solution of bridge function in estimateR function.

verbose

If verbose = FALSE, tracing information printing for HUGE (High-dimensional Undirected Graph Estimation) with a specified method (currently "mb" is only available) is disabled. The default value is TRUE.

verboseR

If verboseR = FALSE, printing information whetehr nearPD is used or not when calculating rank-based correlation matrices is disabled. The defalut value is FALSE.

Rmethod

The calculation method of latent correlation. Either "original" method or "approx". If Rmethod = "original", multilinear approximation method is used, which is much faster than the original method. If Rmethod = "original", optimization of the bridge inverse function is used. The default is "approx".

Value

SPRING returns a data.frame containing

  • output: Output results of pulsar::pulsar based on StARS criterion. It contains:

    • merge: a list of length nlambda and each element of list contains a matrix of edge selection probability. Each lambda value, this edge selection probability is calculated across rep.num.

    • summary: the summary statistic over rep.num graphs at each value of lambda

    • opt.index: index (along the path) of optimal lambda selected by the criterion at the desired threshold. Will return 0 if no optimum is found or NULL if selection for the criterion is not implemented.

    • criterion: we use StARS for our stability criterion.

  • fit: Output results of pulsar::refit function. It contains:

    • est: a data frame containing

      • beta: Estimates of beta coefficient matrices (of size p by p) by "mb" method on the whole data at each of whole lambda sequence value.

      • path: Estimates of precision matrix (of size p by p) on the whole data at each of whole lambda sequence value.

    • refit: final estimates of precision matrix (of size p by p).

  • lambdaseq: lambda sequence used in the analysis

References

Meinshausen N. and Buhlmann P. (2006) "High-dimensional graphs and variable selection with the lasso", The Annals of Statistics, Vol 34, No. 3, 1436 - 1462.

Yoon G., Gaynanova I. and Müller C. (2019) "Microbial Networks in SPRING - Semi-parametric Rank-Based Correlation and Partial Correlation Estimation for Quantitative Microbiome Data", Frontiers in Genetics, 10:516.

Examples

rm(list = ls())
library(SPRING)

# Load the synthetic count data
data("QMP") # n = 1000 and p = 100 synthetic dataset

# SPRING on Synthetic Data, when assuming the data as quantitative counts.
# The same setting used in Yoon et al. (2019) Frontiers in Genetics.
## Not run: 
# This takes around 23 minutes.
fit.spring <- SPRING(QMP, quantitative = TRUE, lambdaseq = "data-specific",
                     nlambda = 50, seed = 10010, ncores = 2, rep.num = 50)

## End(Not run)

# SPRING on Compositional data. Row sums are scaled to 1. Then, mclr-transformation will be applied.
## Not run: 
compoData <- QMP/rowSums(QMP)
fit.spring <- SPRING(compoData, quantitative = FALSE, lambdaseq = "data-specific",
                     nlambda = 10, rep.num = 10)

## End(Not run)

GraceYoon/SPRING documentation built on June 29, 2022, 4:14 p.m.