aldex.clr.function: Compute an 'aldex.clr' Object Generate Monte Carlo samples of...

View source: R/clr_function.r

aldex.clr.functionR Documentation

Compute an aldex.clr Object Generate Monte Carlo samples of the Dirichlet distribution for each sample. Convert each instance using a centered log-ratio transform. This is the input for all further analyses.

Description

Compute an aldex.clr Object Generate Monte Carlo samples of the Dirichlet distribution for each sample. Convert each instance using a centered log-ratio transform. This is the input for all further analyses.

Usage

aldex.clr.function(
  reads,
  conds,
  mc.samples = 128,
  denom = "all",
  verbose = FALSE,
  useMC = FALSE,
  summarizedExperiment = NULL,
  gamma = NULL
)

Arguments

reads

A data.frame or RangedSummarizedExperiment object containing non-negative integers only and with unique names for all rows and columns, where each row is a different gene and each column represents a sequencing read-count sample. Rows with 0 reads in each sample are deleted prior to analysis.

conds

A vector containing a descriptor for the samples, allowing them to be grouped and compared.

mc.samples

The number of Monte Carlo instances to use to estimate the underlying distributions; since we are estimating central tendencies, 128 is usually sufficient, but larger numbers may be needed with small sample sizes.

denom

An any variable (all, iqlr, zero, lvha, median, user) indicating features to use as the denominator for the Geometric Mean calculation The default "all" uses the geometric mean abundance of all features. Using "median" returns the median abundance of all features. Using "iqlr" uses the features that are between the first and third quartile of the variance of the clr values across all samples. Using "zero" uses the non-zero features in each grop as the denominator. This approach is an extreme case where there are many nonzero features in one condition but many zeros in another. Using "lvha" uses features that have low variance (bottom quartile) and high relative abundance (top quartile in every sample). It is also possible to supply a vector of row indices to use as the denominator. Here, the experimentalist is determining a-priori which rows are thought to be invariant. In the case of RNA-seq, this could include ribosomal protein genes and and other house-keeping genes. This should be used with caution because the offsets may be different in the original data and in the data used by the function because features that are 0 in all samples are removed by aldex.clr.

verbose

Print diagnostic information while running. Useful only for debugging if fails on large datasets.

useMC

Use multicore by default (FALSE). Multi core processing will be attempted with the BiocParallel package. Serial processing will be used if this is not possible. In practice serial and multicore are nearly the same speed because of overhead in setting up the parallel processes.

summarizedExperiment

must be set to TRUE if input data are in this format.

gamma

Use scale simulation if not NULL. If a matrix is supplied, scale simulation will be used assuming that matrix denotes the scale samples. If a numeric is supplied, scale simulation will be applied by relaxing the geometric mean assumption with the numeric representing the standard deviation of the scale distribution.

Value

The object produced by the clr function contains the log-ratio transformed values for each Monte-Carlo Dirichlet instance, which can be accessed through getMonteCarloInstances(x), where x is the clr function output. Each list element is named by the sample ID. getFeatures(x) returns the features, getSampleIDs(x) returns sample IDs, and getFeatureNames(x) returns the feature names.

# The 'reads' data.frame or # RangedSummarizedExperiment object should # have row and column names that are unique, # and looks like the following: # # T1a T1b T2 T3 N1 N2 Nx # Gene_00001 0 0 2 0 0 1 0 # Gene_00002 20 8 12 5 19 26 14 # Gene_00003 3 0 2 0 0 0 1 # ... many more rows ...

data(selex) #subset for efficiency selex <- selex[1201:1600,] conds <- c(rep("NS", 7), rep("S", 7)) x <- aldex.clr(selex, conds, mc.samples=4, gamma=NULL, verbose=FALSE)


ggloor/ALDEx_bioc documentation built on Oct. 31, 2023, 1:13 a.m.