Summarise expression values across feature

Share:

Description

Create a new SCESet with counts summarised at a different feature level. A typical use would be to summarise transcript-level counts at gene level.

Usage

1
2
summariseExprsAcrossFeatures(object, exprs_values = "tpm",
  summarise_by = "feature_id", scaled_tpm_counts = TRUE, lib_size = NULL)

Arguments

object

an SCESet object.

exprs_values

character string indicating which slot of the assayData from the SCESet object should be used as expression values. Valid options are 'exprs' the expression slot, 'tpm' the transcripts-per-million slot or 'fpkm' the FPKM slot.

summarise_by

character string giving the column of fData(object) that will be used as the features for which summarised expression levels are to be produced. Default is 'feature_id'.

scaled_tpm_counts

logical, should feature-summarised counts be computed from summed TPM values scaled by total library size? This approach is recommended (see https://f1000research.com/articles/4-1521/v2), so the default is TRUE and it is applied if TPM values are available in the object.

lib_size

optional vector of numeric values of same length as the number of columns in the SCESet object providing the total library size (e.g. "count of mapped reads") for each cell/sample.

Details

Only transcripts-per-million (TPM) and fragments per kilobase of exon per million reads mapped (FPKM) expression values should be aggregated across features. Since counts are not scaled by the length of the feature, expression in counts units are not comparable within a sample without adjusting for feature length. Thus, we cannot sum counts over a set of features to get the expression of that set (for example, we cannot sum counts over transcripts to get accurate expression estimates for a gene). See the following link for a discussion of RNA-seq expression units by Harold Pimentel: https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/. For more details about the effects of summarising transcript expression values at the gene level see Sonesen et al, 2016 (https://f1000research.com/articles/4-1521/v2).

Value

an SCESet object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
data("sc_example_counts")
data("sc_example_cell_info")
pd <- new("AnnotatedDataFrame", data = sc_example_cell_info)
example_sceset <- newSCESet(countData = sc_example_counts, phenoData = pd)
fd <- new("AnnotatedDataFrame", data = 
data.frame(gene_id = featureNames(example_sceset), 
feature_id = paste("feature", rep(1:500, each = 4), sep = "_")))
rownames(fd) <- featureNames(example_sceset)
fData(example_sceset) <- fd
effective_length <- rep(c(1000, 2000), times = 1000)
tpm(example_sceset) <- calculateTPM(example_sceset, effective_length, calc_from = "counts")

example_sceset_summarised <- 
summariseExprsAcrossFeatures(example_sceset, exprs_values = "tpm")
example_sceset_summarised <- 
summariseExprsAcrossFeatures(example_sceset, exprs_values = "counts")
example_sceset_summarised <- 
summariseExprsAcrossFeatures(example_sceset, exprs_values = "exprs")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.