SingleCellExperiment: S4 Classes for Single Cell Data

#' Reduced dimensions methods
#'
#' Methods to get or set dimensionality reduction results in a \linkS4class{SingleCellExperiment} object.
#' These are typically used to store and retrieve low-dimensional representations of single-cell datasets.
#' Each row of a reduced dimension result is expected to correspond to a column of the SingleCellExperiment object.
#'
#' @section Getters:
#' In the following examples, \code{x} is a \linkS4class{SingleCellExperiment} object.
#' \describe{
#' \item{\code{reducedDim(x, type, withDimnames=TRUE)}:}{
#' Retrieves a matrix (or matrix-like object) containing reduced dimension coordinates for cells (rows) and dimensions (columns).
#' \code{type} is either a string specifying the name of the dimensionality reduction result in \code{x} to retrieve,
#' or a numeric scalar specifying the index of the desired result, defaulting to the first entry if missing.
#'
#' If \code{withDimnames=TRUE}, row names of the output matrix are replaced with the column names of \code{x}.
#' }
#' \item{\code{reducedDimNames(x)}:}{
#' Returns a character vector containing the names of all dimensionality reduction results in \code{x}.
#' This is guaranteed to be of the same length as the number of results, though the names may not be unique.
#' }
#' \item{\code{reducedDims(x, withDimnames=TRUE)}:}{
#' Returns a named \linkS4class{List} of matrices containing one or more dimensionality reduction results.
#' Each result is a matrix (or matrix-like object) with the same number of rows as \code{ncol(x)}.
#'
#' If \code{withDimnames=TRUE}, row names of each matrix are replaced with the column names of \code{x}.
#' }
#' }
#'
#' @section Single-result setter:
#' \code{reducedDim(x, type, withDimnames=TRUE) <- value} will add or replace a dimensionality reduction result
#' in a \linkS4class{SingleCellExperiment} object \code{x}.
#' The value of \code{type} determines how the result is added or replaced:
#' \itemize{
#' \item If \code{type} is missing, \code{value} is assigned to the first result.
#' If the result already exists, its name is preserved; otherwise it is given a default name \code{"unnamed1"}.
#' \item If \code{type} is a numeric scalar, it must be within the range of existing results, and \code{value} will be assigned to the result at that index.
#' \item If \code{type} is a string and a result exists with this name, \code{value} is assigned to to that result.
#' Otherwise a new result with this name is append to the existing list of results.
#' }
#'
#' \code{value} is expected to be a matrix or matrix-like object with number of rows equal to \code{ncol(x)}.
#' Alternatively, if \code{value} is \code{NULL}, the result corresponding to \code{type} is removed from the object.
#'
#' If \code{withDimnames=TRUE}, any non-\code{NULL} \code{rownames(value)} is checked against \code{colnames(x)} and a warning is emitted if they are not the same.
#' Otherwise, any differences in the row names are ignored. 
#' This is inspired by the argument of the same name in \code{\link{assay<-}} but is more relaxed for practicality's sake - 
#' it raises a warning rather than an error and allows \code{NULL} rownames to pass through without complaints.
#'
#' @section Other setters:
#' In the following examples, \code{x} is a \linkS4class{SingleCellExperiment} object.
#' \describe{
#' \item{\code{reducedDims(x, withDimnames=TRUE) <- value}:}{
#' Replaces all dimensionality reduction results in \code{x} with those in \code{value}.
#' The latter should be a list-like object containing any number of matrices or matrix-like objects
#' with number of rows equal to \code{ncol(x)}.
#'
#' If \code{value} is named, those names will be used to name the dimensionality reduction results in \code{x}.
#' Otherwise, unnamed results are assigned default names prefixed with \code{"unnamed"}.
#'
#' If \code{value} is \code{NULL}, all dimensionality reduction results in \code{x} are removed.
#'
#' If \code{value} is a \linkS4class{Annotated} object, any \code{\link{metadata}} will be retained in \code{reducedDims(x)}.
#' If \code{value} is a \linkS4class{Vector} object, any \code{\link{mcols}} will also be retained.
#'
#' If \code{withDimnames=TRUE}, any non-\code{NULL} row names in each entry of \code{value} is checked against \code{colnames(x)} and a warning is emitted if they are not the same.
#' Otherwise, any differences in the row names are ignored. 
#' }
#' \item{\code{reducedDimNames(x) <- value}:}{
#' Replaces all names for dimensionality reduction results in \code{x} with a character vector \code{value}.
#' This should be of length equal to the number of results currently in \code{x}.
#' }
#' }
#'
#' @section Storing dimensionality reduction metadata:
#' When performing dimensionality reduction, we frequently generate metadata associated with a particular method.
#' The typical example is the percentage of variance explained and the rotation matrix from PCA;
#' model-based methods may also report some model information that can be used later to project points onto the embedding.
#' Ideally, we would want to store this information alongside the coordinates themselves.
#'
#' Our recommended approach is to store this metadata as attributes of the coordinate matrix.
#' This is simple to do, easy to extract, and avoids problems with synchronization (when the coordinates are separated from the metadata).
#' The biggest problem with this approach is that attributes are not retained when the matrix is subsetted or combined.
#' To persist these attributes, we suggest wrapping the coordinates and metadata in a \link{reduced.dim.matrix}.
#' More complex matrix-like objects like the \code{\link{LinearEmbeddingMatrix}} can also be used 
#' but may not be immediately compatible with downstream functions that expect an ordinary matrix.
#'
#' The path less taken is to store the metadata in the \code{\link{mcols}} of the \code{\link{reducedDims}} List.
#' This approach avoids the subsetting problem with the attributes but is less ideal as it separates the metadata from the coordinates.
#' Such separation makes the metadata harder to find and remember to keep in sync with the coordinates when the latter changes.
#' The structure of \code{\link{mcols}} is best suited to situations where there are some commonalities in the metadata across entries,
#' but this rarely occurs for different dimensionality reduction strategies.
#'
#' @author Aaron Lun and Kevin Rue-Albrecht
#'
#' @examples
#' example(SingleCellExperiment, echo=FALSE)
#' reducedDim(sce, "PCA")
#' reducedDim(sce, "tSNE")
#' reducedDims(sce)
#'
#' reducedDim(sce, "PCA") <- NULL
#' reducedDims(sce)
#'
#' reducedDims(sce) <- SimpleList()
#' reducedDims(sce)
#'
#' @name reducedDims
#' @docType methods
#' @aliases
#' reducedDim reducedDims reducedDimNames
#' reducedDim,SingleCellExperiment,missing-method
#' reducedDim,SingleCellExperiment,numeric-method
#' reducedDim,SingleCellExperiment,character-method
#' reducedDims,SingleCellExperiment-method
#' reducedDimNames,SingleCellExperiment-method
#' reducedDim<- reducedDims<- reducedDimNames<-
#' reducedDim<-,SingleCellExperiment,missing-method
#' reducedDim<-,SingleCellExperiment,numeric-method
#' reducedDim<-,SingleCellExperiment,character-method
#' reducedDims<-,SingleCellExperiment-method
#' reducedDimNames<-,SingleCellExperiment,character-method
NULL

.red_key <- "reducedDims"

#' @export
setMethod("reducedDims", "SingleCellExperiment", function(x, withDimnames=TRUE) {
    value <- .get_internal_all(x, 
        getfun=int_colData, 
        key=.red_key)

    if (withDimnames) {
        for (i in seq_along(value)) {
            rownames(value[[i]]) <- colnames(x)
        }
    }
    value
})

#' @export
setReplaceMethod("reducedDims", "SingleCellExperiment", function(x, withDimnames=TRUE, ..., value) {
    if (withDimnames) {
        for (v in seq_along(value)) {
            .check_reddim_names(x, value[[v]], withDimnames=TRUE, 
                vname=sprintf("value[[%s]]", v), fun='reducedDims')
        }
    }

    .set_internal_all(x, value, 
        getfun=int_colData,
        setfun=`int_colData<-`,
        key=.red_key,
        convertfun=NULL,
        xdimfun=ncol,
        vdimfun=nrow,
        funstr="reducedDims",
        xdimstr="ncol",
        vdimstr="rows")
})

#' @export
setMethod("reducedDimNames", "SingleCellExperiment", function(x) {
    .get_internal_names(x, 
        getfun=int_colData, 
        key=.red_key)
})

#' @export
setReplaceMethod("reducedDimNames", c("SingleCellExperiment", "character"), function(x, value) {
    .set_internal_names(x, value,
        getfun=int_colData,
        setfun=`int_colData<-`,
        key=.red_key)
})

#' @export
setMethod("reducedDim", c("SingleCellExperiment", "missing"), function(x, type, withDimnames=TRUE) {
    .get_internal_missing(x, 
        basefun=reducedDim, 
        namefun=reducedDimNames, 
        funstr="reducedDim",
        withDimnames=withDimnames)
})

#' @export
setMethod("reducedDim", c("SingleCellExperiment", "numeric"), function(x, type, withDimnames=TRUE) {
    out <- .get_internal_integer(x, type,
        getfun=int_colData,
        key=.red_key,
        funstr="reducedDim",
        substr="type")

    if (withDimnames) {
        rownames(out) <- colnames(x)
    }

    out
})

#' @export
setMethod("reducedDim", c("SingleCellExperiment", "character"), function(x, type, withDimnames=TRUE) {
    out <- .get_internal_character(x, type,
        getfun=int_colData,
        key=.red_key,
        funstr="reducedDim",
        substr="type",
        namestr="reducedDimNames")

    if (withDimnames) {
        rownames(out) <- colnames(x)
    }

    out
})

.check_reddim_names <- function(reference, incoming, withDimnames, fun='reducedDim', vname='value') {
    if (!is.null(incoming)) {
        rni <- rownames(incoming)
        cnr <- colnames(reference)
        if (withDimnames && !is.null(rni)) {
            if (!identical(cnr, rni)) {
                msg <- paste0("non-NULL 'rownames(", vname, ")' should be the same as 'colnames(x)' for '", 
                    fun, "<-'. This will be an error in the next release of Bioconductor.")
                warning(paste(strwrap(msg), collapse="\n"))
            }
        }
    }
    incoming
}

#' @export
setReplaceMethod("reducedDim", c("SingleCellExperiment", "missing"), function(x, type, withDimnames=TRUE, ..., value) {
    .set_internal_missing(x, value,
        withDimnames=withDimnames,
        basefun=`reducedDim<-`,
        namefun=reducedDimNames
    )
})

#' @export
setReplaceMethod("reducedDim", c("SingleCellExperiment", "numeric"), function(x, type, withDimnames=TRUE, ..., value) {
    .check_reddim_names(x, value, withDimnames)

    .set_internal_numeric(x, type, value,
        getfun=int_colData,
        setfun=`int_colData<-`,
        key=.red_key,
        convertfun=NULL,
        xdimfun=ncol,
        vdimfun=nrow,
        funstr="reducedDim",
        xdimstr="ncol",
        vdimstr="rows",
        substr="type")
})

#' @export
setReplaceMethod("reducedDim", c("SingleCellExperiment", "character"), function(x, type, withDimnames=TRUE, ..., value) {
    .check_reddim_names(x, value, withDimnames)

    .set_internal_character(x, type, value, 
        getfun=int_colData,
        setfun=`int_colData<-`,
        key=.red_key,
        convertfun=NULL,
        xdimfun=ncol, 
        vdimfun=nrow,
        funstr="reducedDim", 
        xdimstr="ncol",
        vdimstr="rows", 
        substr="type") 
})