reduceFunctions: Return matrix from ClusterExperiment with reduced dimensions
In epurdom/clusterExperiment: Compare Clusterings for Single-Cell Sequencing

getReducedData,ClusterExperiment-method

R Documentation

Return matrix from ClusterExperiment with reduced dimensions

Description

Returns a matrix of data from a ClusterExperiment object based on the choices of dimensionality reduction given by the user.

Functions for calculating and manipulating either filtering statistics, stored in rowData, or the dimensionality reduction results, stored in reducedDims.

Usage

## S4 method for signature 'ClusterExperiment'
getReducedData(
  object,
  reduceMethod,
  filterIgnoresUnassigned,
  nDims = defaultNDims(object, reduceMethod),
  whichCluster = "primary",
  whichAssay = 1,
  returnValue = c("object", "list"),
  reducedDimName
)

## S4 method for signature 'SingleCellExperiment'
defaultNDims(object, reduceMethod, typeToShow)

## S4 method for signature 'matrixOrHDF5'
defaultNDims(object, ...)

## S4 method for signature 'SummarizedExperiment'
makeFilterStats(
  object,
  filterStats = listBuiltInFilterStats(),
  transFun = NULL,
  isCount = FALSE,
  filterNames = NULL,
  whichAssay = 1
)

## S4 method for signature 'matrixOrHDF5'
makeFilterStats(object, ...)

## S4 method for signature 'ClusterExperiment'
makeFilterStats(
  object,
  whichClusterIgnoreUnassigned = NULL,
  filterStats = listBuiltInFilterStats(),
  ...
)

listBuiltInFilterStats()

## S4 method for signature 'SummarizedExperiment'
filterData(
  object,
  filterStats,
  cutoff,
  percentile,
  absolute = FALSE,
  keepLarge = TRUE,
  whichAssay = 1
)

## S4 method for signature 'SummarizedExperiment'
filterNames(object)

## S4 method for signature 'SingleCellExperiment'
makeReducedDims(
  object,
  reducedDims = "PCA",
  maxDims = 500,
  transFun = NULL,
  isCount = FALSE,
  whichAssay = 1
)

## S4 method for signature 'matrixOrHDF5'
makeReducedDims(object, ...)

## S4 method for signature 'SummarizedExperiment'
makeReducedDims(object, ...)

## S4 method for signature 'ClusterExperiment'
makeReducedDims(object, ...)

listBuiltInReducedDims()

Arguments

`object`	For `makeReducedDims`,`makeFilterStats`, `defaultNDims` either matrix-like, `SingleCellExperiment`, or `ClusterExperiment` object. For `getReducedData` only a `ClusterExperiment` object allowed.
`reduceMethod`	character. A method (or methods) for reducing the size of the data, either by filtering the rows (genes) or by a dimensionality reduction method. Must either be 1) must match the name of a built-in method, in which case if it is not already existing in the object will be passed to `makeFilterStats` or `link{makeReducedDims}`, or 2) must match a stored filtering statistic or dimensionality reduction in the object
`filterIgnoresUnassigned`	logical. Whether filtering statistics should ignore the unassigned samples within the clustering. Only relevant if 'reduceMethod' matches one of built-in filtering statistics in `listBuiltInFilterStats()`, in which case the clustering identified in `whichCluster` is passed to `makeFilterStats` and the unassigned samples are excluded in calculating the statistic. See `makeFilterStats` for more details.
`nDims`	The number of dimensions to keep from `reduceMethod`. If missing calls `defaultNDims`.
`whichCluster`	argument that can be a single numeric or character value indicating the single clustering to be used. Giving values that result in more than one clustering will result in an error. See details of `getClusterIndex`.
`whichAssay`	numeric or character specifying which assay to use. See `assay` for details.
`returnValue`	The format of output. Users will generally want to keep the default (see details)
`reducedDimName`	The name given to the reducedDims slot storing result (if `returnValue="object"`). If missing, the function will create a default name: if `reduceMethod` is a dimensionality reduction, then `reduceMethod` will be given as the name; if a filtering statistic, "filteredBy_" followed by `reduceMethod`.
`typeToShow`	character (optional). If given, should be one of "filterStats" or "reducedDims" to indicate of the values in the reduceMethod vector, only show those corresponding to "filterStats" or "reducedDims" options.
`...`	Values passed on the the 'SingleCellExperiment' method.
`filterStats`	character vector of statistics to calculate. Must be one of the character values given by `listBuildInFilterStats()`.
`transFun`	a transformation function to be applied to the data. If the transformation applied to the data creates an error or NA values, then the function will throw an error. If object is of class `ClusterExperiment`, the stored transformation will be used and giving this parameter will result in an error.
`isCount`	if `transFun=NULL`, then `isCount=TRUE` will determine the transformation as defined by `function(x){log2(x+1)}`, and `isCount=FALSE` will give a transformation function `function(x){x}`. Ignored if `transFun=NULL`. If object is of class `ClusterExperiment`, the stored transformation will be used and giving this parameter will result in an error.
`filterNames`	if given, defines the names that will be assigned to the filtering statistics in the `rowData` of the object. If missing, will be just the value of `filterStats` argument
`whichClusterIgnoreUnassigned`	indicates clustering that should be used to filter out unassigned samples from the calculations. If `NULL` no filtering of samples will be done. See details for more information.
`cutoff`	numeric. A value at which to filter the rows (genes) for the test statistic
`percentile`	numeric. Either a number between 0,1 indicating what percentage of the rows (genes) to keep or an integer value indicated the number of rows (genes) to keep
`absolute`	whether to take the absolute value of the filter statistic
`keepLarge`	logical whether to keep rows (genes) with large values of the test statistic or small values of the test statistic.
`reducedDims`	a vector of character values indicating the methods of dimensionality reduction to be performed. Currently only "PCA" is implemented.
`maxDims`	Numeric vector of integer giving the number of PC dimensions to calculate. `maxDims` can also take values between (0,1) to indicate keeping the number of dimensions necessary to account for that proportion of the variance. `maxDims` should be of same length as `reducedDims`, indicating the number of dimensions to keep for each method (if `maxDims` is of length 1, the same number of dimensions will be used for each).

Details

getReducedData determines the matrix of values that can be used for computation based on the user's choice of dimensionality methods. The methods can be either of the filtering kind or the more general dimensionality reduction. The function will first look at any stored ReducedDims or filtering statistics already present in the data, and if missing, will assume that reduceMethod is one of the built-in method provided by the package and calculate the necessary. Note that if reduceMethod is a filtering statistic, in addition to filtering the features, the function will also perform the stored transformation of the data.

Note that this is used internally by functions, but is mainly only of interest for the user if they want to have the filtered, transformed data available as a matrix for continual use.

If returnValue="object", then the output is a single, updated ClusterExperiment object with the reduced data matrix stored as an element of the list in reducedDims slot (with name given by reducedDimName if given). If "list", then a list with one element that is the object and the other that is the reduced data matrix. Either way, the object returned in the list will be updated to contain with the filtering statistics or the dimensionality reduction. The only difference is that if "list", the reduced dimension matrix is NOT saved in the object (and so only really makes a difference if the reduceMethod argument is a filtering method). The option "list" is mainly for internal use, where we do not want to continually save subseted datasets.

If nDims is missing, it will be given a default value depending on the value of reduceMethod. See defaultNDims for details.

If filterIgnoresUnassigned is missing, then it is set to TRUE unless: reduceMethod matches a stored filtering statistic in rowData AND does not match a built-in filtering method provided by the package.

For a reduceMethod that corresponds to a filtering statistics the current default is 1000 (or the length of the number of features, if less). For a dimensionality reduction saved in the reducedDims slot the default is 50 or the maximum number of dimensions if less than 50.

reduceMethod will first be checked to see if it corresponds with an existing saved filtering statistic or a dimensionality reduction to determine which of these two types it is. If it does not match either, then it will be checked against the built in functions provided by the package. @examples se<-SingleCellExperiment(matrix(rnorm(5000*100),nrow=5000,ncol=100)) defaultNDims(se,"PCA") defaultNDims(se,"mad")

whichClusterIgnoreUnassigned is only an option when applied to a ClusterExperiment classs and indicates that the filtering statistics should be calculated based on samples that are unassigned by the designated clustering. The name given to the filter in this case is of the form <filterStats>_<clusterLabel>, i.e. the clustering label of the clustering is appended to the standard name for the filtering statistic.

Note that filterData returns a SingleCellExperiment object. To get the actual data out use either assay or transformData if transformed data is desired.

The PCA method uses either prcomp from the stats package or svds from the RSpectra package to perform PCA. Both are called on t(assay(x)) with center=TRUE and scale=TRUE (i.e. the feature are centered and scaled), so that it is performing PCA on the correlation matrix of the features.

Note that this function does not check if such a reduceDim value already exists, and will recalculate (and overwrite) if it does.

Value

If returnValue="object", a ClusterExperiment object.

If returnValue="list" a list with elements:

objectUpdateobject, potentially updated if had to calculate dimensionality reduction or filtering statistic
dataMatrixthe reduced dimensional matrix with the samples in columns, features in rows

defaultNDims returns a numeric vector giving the default dimensions the methods in clusterExperiment will use for reducing the size of the data. If typeToShow is missing, the resulting vector will be equal to the length of reduceMethod. Otherwise, it will be a vector with all the unique valid default values for the typeToShow (note that different dimensionality reduction methods can have different maximal dimensions, so the result may not be of length one in this case).

makeFilterStats returns a SummarizedExperiment object with the requested filtering statistics will be added to the DataFrame in the rowData slot and given names corresponding to the filterStats values. Warning: the function will overwrite existing columns in rowData with the same name. Columns in the rowData slot with different names should not be affected.

filterData returns a SingleCellExperiment object with the rows (genes) removed based on filters

filterNames returns a vector of the columns of the rowData that are considered valid filtering statistics. Currently any numeric column in rowData is a valid filtering statistic.

makeReducedDims returns a SingleCellExperiment containing the calculated dimensionality reduction in the reduceDims with names corresponding to the name given in reducedDims.

Examples

data(simData)
listBuiltInFilterStats()
scf<-makeFilterStats(simData,filterStats=c("var","mad"))
scf
scfFiltered<-filterData(scf,filterStats="mad",percentile=10)
scfFiltered
assay(scfFiltered)[1:10,1:10]
data(simData)
listBuiltInReducedDims()
scf<-makeReducedDims(simData, reducedDims="PCA", maxDims=3)
scf

epurdom/clusterExperiment documentation built on April 28, 2024, 8:17 p.m.