reduceFunctions: Return matrix from ClusterExperiment with reduced dimensions

getReducedData,ClusterExperiment-methodR Documentation

Return matrix from ClusterExperiment with reduced dimensions

Description

Returns a matrix of data from a ClusterExperiment object based on the choices of dimensionality reduction given by the user.

Functions for calculating and manipulating either filtering statistics, stored in rowData, or the dimensionality reduction results, stored in reducedDims.

Usage

## S4 method for signature 'ClusterExperiment'
getReducedData(
  object,
  reduceMethod,
  filterIgnoresUnassigned,
  nDims = defaultNDims(object, reduceMethod),
  whichCluster = "primary",
  whichAssay = 1,
  returnValue = c("object", "list"),
  reducedDimName
)

## S4 method for signature 'SingleCellExperiment'
defaultNDims(object, reduceMethod, typeToShow)

## S4 method for signature 'matrixOrHDF5'
defaultNDims(object, ...)

## S4 method for signature 'SummarizedExperiment'
makeFilterStats(
  object,
  filterStats = listBuiltInFilterStats(),
  transFun = NULL,
  isCount = FALSE,
  filterNames = NULL,
  whichAssay = 1
)

## S4 method for signature 'matrixOrHDF5'
makeFilterStats(object, ...)

## S4 method for signature 'ClusterExperiment'
makeFilterStats(
  object,
  whichClusterIgnoreUnassigned = NULL,
  filterStats = listBuiltInFilterStats(),
  ...
)

listBuiltInFilterStats()

## S4 method for signature 'SummarizedExperiment'
filterData(
  object,
  filterStats,
  cutoff,
  percentile,
  absolute = FALSE,
  keepLarge = TRUE,
  whichAssay = 1
)

## S4 method for signature 'SummarizedExperiment'
filterNames(object)

## S4 method for signature 'SingleCellExperiment'
makeReducedDims(
  object,
  reducedDims = "PCA",
  maxDims = 500,
  transFun = NULL,
  isCount = FALSE,
  whichAssay = 1
)

## S4 method for signature 'matrixOrHDF5'
makeReducedDims(object, ...)

## S4 method for signature 'SummarizedExperiment'
makeReducedDims(object, ...)

## S4 method for signature 'ClusterExperiment'
makeReducedDims(object, ...)

listBuiltInReducedDims()

Arguments

object

For makeReducedDims,makeFilterStats, defaultNDims either matrix-like, SingleCellExperiment, or ClusterExperiment object. For getReducedData only a ClusterExperiment object allowed.

reduceMethod

character. A method (or methods) for reducing the size of the data, either by filtering the rows (genes) or by a dimensionality reduction method. Must either be 1) must match the name of a built-in method, in which case if it is not already existing in the object will be passed to makeFilterStats or link{makeReducedDims}, or 2) must match a stored filtering statistic or dimensionality reduction in the object

filterIgnoresUnassigned

logical. Whether filtering statistics should ignore the unassigned samples within the clustering. Only relevant if 'reduceMethod' matches one of built-in filtering statistics in listBuiltInFilterStats(), in which case the clustering identified in whichCluster is passed to makeFilterStats and the unassigned samples are excluded in calculating the statistic. See makeFilterStats for more details.

nDims

The number of dimensions to keep from reduceMethod. If missing calls defaultNDims.

whichCluster

argument that can be a single numeric or character value indicating the single clustering to be used. Giving values that result in more than one clustering will result in an error. See details of getClusterIndex.

whichAssay

numeric or character specifying which assay to use. See assay for details.

returnValue

The format of output. Users will generally want to keep the default (see details)

reducedDimName

The name given to the reducedDims slot storing result (if returnValue="object"). If missing, the function will create a default name: if reduceMethod is a dimensionality reduction, then reduceMethod will be given as the name; if a filtering statistic, "filteredBy_" followed by reduceMethod.

typeToShow

character (optional). If given, should be one of "filterStats" or "reducedDims" to indicate of the values in the reduceMethod vector, only show those corresponding to "filterStats" or "reducedDims" options.

...

Values passed on the the 'SingleCellExperiment' method.

filterStats

character vector of statistics to calculate. Must be one of the character values given by listBuildInFilterStats().

transFun

a transformation function to be applied to the data. If the transformation applied to the data creates an error or NA values, then the function will throw an error. If object is of class ClusterExperiment, the stored transformation will be used and giving this parameter will result in an error.

isCount

if transFun=NULL, then isCount=TRUE will determine the transformation as defined by function(x){log2(x+1)}, and isCount=FALSE will give a transformation function function(x){x}. Ignored if transFun=NULL. If object is of class ClusterExperiment, the stored transformation will be used and giving this parameter will result in an error.

filterNames

if given, defines the names that will be assigned to the filtering statistics in the rowData of the object. If missing, will be just the value of filterStats argument

whichClusterIgnoreUnassigned

indicates clustering that should be used to filter out unassigned samples from the calculations. If NULL no filtering of samples will be done. See details for more information.

cutoff

numeric. A value at which to filter the rows (genes) for the test statistic

percentile

numeric. Either a number between 0,1 indicating what percentage of the rows (genes) to keep or an integer value indicated the number of rows (genes) to keep

absolute

whether to take the absolute value of the filter statistic

keepLarge

logical whether to keep rows (genes) with large values of the test statistic or small values of the test statistic.

reducedDims

a vector of character values indicating the methods of dimensionality reduction to be performed. Currently only "PCA" is implemented.

maxDims

Numeric vector of integer giving the number of PC dimensions to calculate. maxDims can also take values between (0,1) to indicate keeping the number of dimensions necessary to account for that proportion of the variance. maxDims should be of same length as reducedDims, indicating the number of dimensions to keep for each method (if maxDims is of length 1, the same number of dimensions will be used for each).

Details

getReducedData determines the matrix of values that can be used for computation based on the user's choice of dimensionality methods. The methods can be either of the filtering kind or the more general dimensionality reduction. The function will first look at any stored ReducedDims or filtering statistics already present in the data, and if missing, will assume that reduceMethod is one of the built-in method provided by the package and calculate the necessary. Note that if reduceMethod is a filtering statistic, in addition to filtering the features, the function will also perform the stored transformation of the data.

Note that this is used internally by functions, but is mainly only of interest for the user if they want to have the filtered, transformed data available as a matrix for continual use.

If returnValue="object", then the output is a single, updated ClusterExperiment object with the reduced data matrix stored as an element of the list in reducedDims slot (with name given by reducedDimName if given). If "list", then a list with one element that is the object and the other that is the reduced data matrix. Either way, the object returned in the list will be updated to contain with the filtering statistics or the dimensionality reduction. The only difference is that if "list", the reduced dimension matrix is NOT saved in the object (and so only really makes a difference if the reduceMethod argument is a filtering method). The option "list" is mainly for internal use, where we do not want to continually save subseted datasets.

If nDims is missing, it will be given a default value depending on the value of reduceMethod. See defaultNDims for details.

If filterIgnoresUnassigned is missing, then it is set to TRUE unless: reduceMethod matches a stored filtering statistic in rowData AND does not match a built-in filtering method provided by the package.

For a reduceMethod that corresponds to a filtering statistics the current default is 1000 (or the length of the number of features, if less). For a dimensionality reduction saved in the reducedDims slot the default is 50 or the maximum number of dimensions if less than 50.

reduceMethod will first be checked to see if it corresponds with an existing saved filtering statistic or a dimensionality reduction to determine which of these two types it is. If it does not match either, then it will be checked against the built in functions provided by the package. @examples se<-SingleCellExperiment(matrix(rnorm(5000*100),nrow=5000,ncol=100)) defaultNDims(se,"PCA") defaultNDims(se,"mad")

whichClusterIgnoreUnassigned is only an option when applied to a ClusterExperiment classs and indicates that the filtering statistics should be calculated based on samples that are unassigned by the designated clustering. The name given to the filter in this case is of the form <filterStats>_<clusterLabel>, i.e. the clustering label of the clustering is appended to the standard name for the filtering statistic.

Note that filterData returns a SingleCellExperiment object. To get the actual data out use either assay or transformData if transformed data is desired.

The PCA method uses either prcomp from the stats package or svds from the RSpectra package to perform PCA. Both are called on t(assay(x)) with center=TRUE and scale=TRUE (i.e. the feature are centered and scaled), so that it is performing PCA on the correlation matrix of the features.

Note that this function does not check if such a reduceDim value already exists, and will recalculate (and overwrite) if it does.

Value

If returnValue="object", a ClusterExperiment object.

If returnValue="list" a list with elements:

  • objectUpdateobject, potentially updated if had to calculate dimensionality reduction or filtering statistic

  • dataMatrixthe reduced dimensional matrix with the samples in columns, features in rows

defaultNDims returns a numeric vector giving the default dimensions the methods in clusterExperiment will use for reducing the size of the data. If typeToShow is missing, the resulting vector will be equal to the length of reduceMethod. Otherwise, it will be a vector with all the unique valid default values for the typeToShow (note that different dimensionality reduction methods can have different maximal dimensions, so the result may not be of length one in this case).

makeFilterStats returns a SummarizedExperiment object with the requested filtering statistics will be added to the DataFrame in the rowData slot and given names corresponding to the filterStats values. Warning: the function will overwrite existing columns in rowData with the same name. Columns in the rowData slot with different names should not be affected.

filterData returns a SingleCellExperiment object with the rows (genes) removed based on filters

filterNames returns a vector of the columns of the rowData that are considered valid filtering statistics. Currently any numeric column in rowData is a valid filtering statistic.

makeReducedDims returns a SingleCellExperiment containing the calculated dimensionality reduction in the reduceDims with names corresponding to the name given in reducedDims.

See Also

makeFilterStats,makeReducedDims, filterData, reducedDim

Examples

data(simData)
listBuiltInFilterStats()
scf<-makeFilterStats(simData,filterStats=c("var","mad"))
scf
scfFiltered<-filterData(scf,filterStats="mad",percentile=10)
scfFiltered
assay(scfFiltered)[1:10,1:10]
data(simData)
listBuiltInReducedDims()
scf<-makeReducedDims(simData, reducedDims="PCA", maxDims=3)
scf

epurdom/clusterExperiment documentation built on Oct. 12, 2022, 5:27 a.m.