unmix: Unmix samples using loss in a variance stabilized space

View source: R/helper.R

unmixR Documentation

Unmix samples using loss in a variance stabilized space

Description

Unmixes samples in x according to pure components, using numerical optimization. The components in pure are added on the scale of gene expression (either normalized counts, or TPMs). The loss function when comparing fitted expression to the samples in x occurs in a variance stabilized space. This task is sometimes referred to as "deconvolution", and can be used, for example, to identify contributions from various tissues. Note: some groups have found that the mixing contributions may be more accurate if very lowly expressed genes across x and pure are first removed. We have not explored this fully. Note: if the pbapply package is installed a progress bar will be displayed while mixing components are fit.

Usage

unmix(x, pure, alpha, shift, power = 1, format = "matrix", quiet = FALSE)

Arguments

x

normalized counts or TPMs of the samples to be unmixed

pure

normalized counts or TPMs of the "pure" samples

alpha

for normalized counts, the dispersion of the data when a negative binomial model is fit. this can be found by examining the asymptotic value of dispersionFunction(dds), when using fitType="parametric" or the mean value when using fitType="mean".

shift

for TPMs, the shift which approximately stabilizes the variance of log shifted TPMs. Can be assessed with vsn::meanSdPlot.

power

either 1 (for L1) or 2 (for squared) loss function. Default is 1.

format

"matrix" or "list", default is "matrix". whether to output just the matrix of mixture components, or a list (see Value).

quiet

suppress progress bar. default is FALSE, show progress bar if pbapply is installed.

Value

a matrix, the mixture components for each sample in x (rows). The "pure" samples make up the columns, and so each row sums to 1. If colnames existed on the input matrices they will be propagated to the output matrix. If format="list", then a list, containing as elements: (1) the matrix of mixture components, (2) the correlations in the variance stabilized space of the fitted samples to the samples in x, and (3) the fitted samples as a matrix with the same dimension as x.

Examples


# some artificial data
cts <- matrix(c(80,50,1,100,
                1,1,60,100,
                0,50,60,100), ncol=4, byrow=TRUE)
# make a DESeqDataSet
dds <- DESeqDataSetFromMatrix(cts,
  data.frame(row.names=seq_len(ncol(cts))), ~1)
colnames(dds) <- paste0("sample",1:4)

# note! here you would instead use
# estimateSizeFactors() to do actual normalization
sizeFactors(dds) <- rep(1, ncol(dds))

norm.cts <- counts(dds, normalized=TRUE)

# 'pure' should also have normalized counts...
pure <- matrix(c(10,0,0,
                 0,0,10,
                 0,10,0), ncol=3, byrow=TRUE)
colnames(pure) <- letters[1:3]

# for real data, you need to find alpha after fitting estimateDispersions()
mix <- unmix(norm.cts, pure, alpha=0.01)


mikelove/DESeq2 documentation built on July 25, 2024, 11:11 p.m.