expression_cutoff: Find expression cutoffs

View source: R/expression_cutoff.R

expression_cutoffR Documentation

Find expression cutoffs

Description

This function finds expression cutoffs based on estimating the second breakpoint using segmented applied to two curves. The first curve is the number of features passing the cutoff assuming an initial large drop, an intermediate section with moderate drop, and then a stable tail. The second curve is the mean number of expressed samples (non-zero expression) for all genes at each given cutoff. This curve increases rapidly then has a section with a moderate increase, and finally a long tail. Using segmented we find that second breakpoint. The suggested expression cutoff returned is the average of the two suggested cutoffs.

Usage

expression_cutoff(expr, max_cut = 1, seed = NULL, n.boot = 2000, k = 2)

Arguments

expr

A matrix with the expression values with features in the rows and samples in the columns in RPKM format (typically). expr can be any matrix as expression_cutoff() will work with the row means of the matrix. You'll most likely want to use a normalized expression matrix, although this function will work with any matrix.

max_cut

Maximum expression cutoff to consider. Increasing this value does then to increase the suggested cutoffs. If set to NULL, this will be chosen automatically from 1 to 5.

seed

Set this argument for increased reproducibility of the results. This is passed to seg.control. We highly recommend specifiying this value.

n.boot

This argument controls the stability of the suggested cutoffs. It is passed to seg.control.

k

The number of breakpoints to consider. If you increase this value the estimated breakpoints are less stable.

Value

A two element vector with the suggested cutoffs based on the number of features expressed and the mean number of expressed samples across all features. The function also makes plots that visualize both curves and show the identified breakpoints as vertical grey lines.

Author(s)

Leonardo Collado-Torres

Examples


## Simulate expression data. Based on the limma::lmFit() man page
sd <- 0.3 * sqrt(4 / rchisq(100, df = 4))
y <- matrix(rnorm(100 * 6, sd = sd), 100, 6)
rownames(y) <- paste("Gene", seq_len(100))
y[seq_len(2), seq_len(3) + 3] <- y[seq_len(2), seq_len(3) + 3] + 2

## Make the expression data look like RPKM values
y2 <- apply(y, 2, function(z) {
    res <- z + abs(min(z))
    res[sample(seq_len(100), 10)] <- 0
    res
})
summary(y2)

expression_cutoff(y2)

## Or same the plots to a PDF file

pdf("test_expression_cutoff.pdf", width = 12)
expression_cutoff(y2)
dev.off()

## View the pdf with the following code
utils::browseURL("test_expression_cutoff.pdf")

LieberInstitute/jaffelab documentation built on Dec. 15, 2024, 10:44 p.m.