combi: Perform model-based data integration

View source: R/combi.R

combiR Documentation

Perform model-based data integration

Description

Perform model-based data integration

Usage

combi(
  data,
  M = 2L,
  covariates = NULL,
  distributions,
  compositional,
  maxIt = 300L,
  tol = 0.001,
  verbose = FALSE,
  prevCutOff = 0.95,
  minFraction = 0.1,
  logTransformGaussian = TRUE,
  confounders = NULL,
  compositionalConf = rep(FALSE, length(data)),
  nleq.control = list(maxit = 1000L, cndtol = 1e-16),
  record = TRUE,
  weights = NULL,
  fTol = 1e-05,
  meanVarFit = "spline",
  maxFeats = 2000,
  dispFreq = 10L,
  allowMissingness = FALSE,
  biasReduction = TRUE,
  maxItFeat = 20L,
  initPower = 1
)

Arguments

data

A list of data objects with the same number of samples. See details.

M

the required dimension of the fit, a non-negative integer

covariates

a dataframe of n samples with sample-specific variables.

distributions

a character vector describing which distributional assumption should be used. See details.

compositional

A logical vector with the same length as "data", indicating if the datasets should be treated as compositional

maxIt

an integer, the maximum number of iterations

tol

A small scalar, the convergence tolerance

verbose

Logical. Should verbose output be printed to the console?

prevCutOff

a scalar, the prevalance cutoff for the trimming.

minFraction

a scalar, each taxon's total abundance should equal at least the number of samples n times minFraction, otherwise it is trimmed.

logTransformGaussian

A boolean, should the gaussian data be logtransformed, i.e. are they log-normal?

confounders

A dataframe or a list of dataframes with the same length as data. In the former case the same dataframe is used for conditioning, In the latter case each view has its own conditioning variables (or NULL).

compositionalConf

A logical vector with the same length as "data", indicating if the datasets should be treated as compositional when correcting for confounders. Numerical problems may occur when set to TRUE

nleq.control

A list of arguments to the nleqslv function

record

A boolean, should intermediate estimates be stored? Can be useful to check convergence

weights

A character string, either 'marginal' or 'uniform', indicating rrhow the feature parameters should be weighted in the normalization

fTol

The tolerance for solving the estimating equations

meanVarFit

The type of mean variance fit, see details

maxFeats

The maximal number of features for a Newton-Raphson procedure to be feasible

dispFreq

An integer, the period after which the variances should be reestimated

allowMissingness

A boolean, should NA values be allowed?

biasReduction

A boolean, should bias reduction be applied to allow for confounder correction in groups with all zeroes? Not guaranteed to work

maxItFeat

Integers, the maximum allowed number of iterations in the estimation of the feature parameters

initPower

The power to be applied to the residual matrix used to calculate the starting value. Must be positive; can be tweaked in case of numerical problems (i.e. infinite values returned by nleqslv)

Details

Data can be provided as raw matrices with features in the columns, or as phyloseq, SummarizedExperiment or ExpressionSet objects. Estimation of independence model and view wise parameters can be parametrized. See ?BiocParallel::bplapply and ?BiocParallel::register. meanVarFit = "spline" yields a cubic spline fit for the abundance-variance trend, "cubic" gives a third degree polynomial. Both converge to the diagonal line with slope 1 for small means. Distribution can be either "quasi" for quasi likelihood or "gaussian" for Gaussian data

Value

An object of the "combi" class, containing all information on the data integration and fitting procedure

Examples

data(Zhang)
#The method works on several datasets at once, and simply is not very fast.
#Hence the "Not run" statement
## Not run: 
#Unconstrained
microMetaboInt = combi(
list("microbiome" = zhangMicrobio, "metabolomics" = zhangMetabo),
distributions = c("quasi", "gaussian"), compositional = c(TRUE, FALSE),
logTransformGaussian = FALSE, verbose = TRUE)
#Constrained
microMetaboIntConstr = combi(
    list("microbiome" = zhangMicrobio, "metabolomics" = zhangMetabo),
    distributions = c("quasi", "gaussian"), compositional = c(TRUE, FALSE),
    logTransformGaussian = FALSE, covariates = zhangMetavars, verbose = TRUE)
    
## End(Not run)

CenterForStatistics-UGent/compIntegrate documentation built on Aug. 4, 2023, 1:08 p.m.