Create a new SCESet object.

Description

Scater requires that all data be housed in SCESet objects. SCESet extends Bioconductor's ExpressionSet class, and the same basic interface is supported. newSCESet() expects a matrix of expression values as its first argument, with rows as features (usually genes) and columns as cells. Per-feature and per-cell metadata can be supplied with the featureData and phenoData arguments, respectively. Use of these optional arguments is strongly encouraged. The SCESet also includes a slot 'counts' to store an object containing raw count data.

Usage

1
2
3
4
5
6
newSCESet(exprsData = NULL, countData = NULL, tpmData = NULL,
  fpkmData = NULL, cpmData = NULL, phenoData = NULL, featureData = NULL,
  experimentData = NULL, is_exprsData = NULL,
  cellPairwiseDistances = dist(vector()),
  featurePairwiseDistances = dist(vector()), lowerDetectionLimit = 0,
  logExprsOffset = 1, logged = FALSE, useForExprs = "exprs")

Arguments

exprsData

expression data matrix for an experiment

countData

data matrix containing raw count expression values

tpmData

matrix of class "numeric" containing transcripts-per-million (TPM) expression values

fpkmData

matrix of class "numeric" containing fragments per kilobase of exon per million reads mapped (FPKM) expression values

cpmData

matrix of class "numeric" containing counts per million (CPM) expression values (optional)

phenoData

data frame containing attributes of individual cells

featureData

data frame containing attributes of features (e.g. genes)

experimentData

MIAME class object containing metadata data and details about the experiment and dataset.

is_exprsData

matrix of class "logical", indicating whether or not each observation is above the lowerDetectionLimit.

cellPairwiseDistances

object of class "dist" (or a class that extends "dist") containing cell-cell distance or dissimilarity values.

featurePairwiseDistances

object of class "dist" (or a class that extends "dist") containing feature-feature distance or dissimilarity values.

lowerDetectionLimit

the minimum expression level that constitutes true expression (defaults to zero and uses count data to determine if an observation is expressed or not).

logExprsOffset

numeric scalar, providing the offset used when doing log2-transformations of expression data to avoid trying to take logs of zero. Default offset value is 1.

logged

logical, if a value is supplied for the exprsData argument, are the expression values already on the log2 scale, or not?

useForExprs

character string, either 'exprs' (default),'tpm','counts' or 'fpkm' indicating which expression representation both internal methods and external packages should use when performing analyses.

Details

SCESet objects store a matrix of expression values. These values are typically transcripts-per-million (tpm), counts-per-million (cpm), fragments per kilobase per million mapped (FPKM) or some other output from a program that calculates expression values from RNA-Seq reads. We recommend that expression values on the log2 scale are used for the 'exprs' slot in the SCESet. For example, you may wish to store raw tpm values in the 'tpm' slot and log2(tpm + 1) values in the 'exprs' slot. However, expression values could also be values from a single cell qPCR run or some other type of assay. The newSCESet function can also accept raw count values. In this case see calculateTPM and calculateFPKM for computing TPM and FPKM expression values, respectively, from counts. The function cpm from the package edgeR to can be used to compute log2(counts-per-million), if desired.

An SCESet object has to have the 'exprs' slot defined, so if the exprsData argument is NULL, then this function will define 'exprs' with the following order of precedence: log2(TPM + logExprsOffset), if tpmData is defined; log2(FPKM + logExprsOffset) if fpkmData is defined; otherwise log2(counts-per-million + logExprsOffset) are used. The cpm function from the edgeR package is used to compte cpm. Note that for many analyses counts-per-million are not recommended, and if possible transcripts-per-million should be used.

In many downstream functions you will likely find it most convenient if the 'exprs' values are on the log2-scale, so this is recommended.

Value

a new SCESet object

Examples

1
2
3
4
5
data("sc_example_counts")
data("sc_example_cell_info")
pd <- new("AnnotatedDataFrame", data = sc_example_cell_info)
example_sceset <- newSCESet(countData = sc_example_counts, phenoData = pd)
example_sceset

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.