R/RAMultiomeData.R

Defines functions .addAltExp .getSingleRA RAMultiomeData

Documented in RAMultiomeData

#' Mouse gastrulation joint ATAC/RNA data
#'
#' Obtain the processed counts for the mouse gastrulation "multi-omics" dataset.
#'
#' @param type String specifying the type of data to obtain, see Details.
#' Default behaviour is to return all three data types.
#' @param samples Integer or character vector specifying the samples for which data (processed or raw) should be obtained.
#' If \code{NULL} (default), data are returned for all (11) samples.
#' 
#' @return 
#' If \code{type="all"}, a \linkS4class{SingleCellExperiment} object is returned containing processed data from selected samples for all data types.
#' RNA-seq data is in the primary assay slot, while the other data types are in the altExp slot.
#' The default \code{counts} slot on the first level of the SingleCellExperiment object will be occupied by the RNA data.
#' The other modalities can be accessed using \code{SingleCellExperiment::altExp}, where the counts slot will again be occupied by the data for each modality for compatability with many function defaults.
#'
#' If \code{type="rna"}, \code{type="peaks"}, or \code{type="tss"}, a \linkS4class{SingleCellExperiment} object is returned containing information for a single data type.
#' Each assay will be in the primary \code{counts} slot.
#' RNA data corresponds to RNA-seq read counts.
#' Peak data corresponds to read counts from ATAC-seq quantified over peaks defined using ArchR's peak calling strategy.
#' TSS data corresponds to read counts from ATAC-seq quantified over transcriptions start sites using ArchR's Gene Scores model.
#' 
#' @details
#' This function downloads the data for the embryo atlas from Argelaguet et al. (2022).
#' The dataset contains 11 10X Genomics multiome samples.
#' 
#' The column metadata contains columns from the following set, depending on modality:
#' \describe{
#' \item{\code{barcode}:}{Character: cell barcode from the 10X Genomics experiment (with appended "-1" from Cellranger).}
#' \item{\code{sample}:}{Integer: index of the sample from which the cell was taken.}
#' \item{\code{sample_name}:}{Character: descriptive name of the sample from which the cell was taken.}
#' \item{\code{stage}:}{Character: stage of the mouse embryo at which the sample was taken.}
#' \item{\code{genotype}:}{Character: cell genotype, wild type (WT) or Brachyury KO (T_KO)}
#' \item{\code{celltype}:}{Character: cell type to which the cell was assigned by mapping to RNA atlas.}
#' \item{\code{nFeature_RNA}:}{Integer: number of genes detected in RNAseq data for the cell.}
#' \item{\code{nCount_RNA}:}{Integer: number of RNA molecules detected in RNAseq data for the cell.}
#' \item{\code{mitochondrial_percent_RNA}:}{Numeric: percent of RNA molecules detected from mitochondrial genome for the cell.}
#' \item{\code{ribosomal_percent_RNA}:}{Numeric: percent of RNA molecules detected from ribosomal genes for the cell.}
#' \item{\code{nFrags_atac}:}{Numeric: number of ATAC fragments detected per cell.}
#' \item{\code{TSSEnrichment_atac}:}{Numeric: Quality control metric that represents the ratio of ATAC peaks near the transcription start site relative to the flanking regions. Derived from the ArchR package.}
#' \item{\code{doublet_score}:}{Numeric: doublet score for each cell calculated using the \code{cxds_bcds_hybrid} function from the \code{scds} package.}
#' \item{\code{doublet_call}:}{Logical: doublet call for each cell calculated from the "doublet_score" column. Cells with a doublet score larger than 1.25 are assumed to be doublets and thus were removed from downstream analysis.}
#' }
#' Reduced dimension representations of the data are also available in the \code{reducedDims} slot of the SingleCellExperiment object.
#' These are UMAPs calculated either across all the data, or per stage (\code{perstage}).
#' Those labelled either \code{rna} or \code{atac} alone were calculated from the processed count matrices of these modalities; \code{rna_atac}-labelled UMAPs were calculated from the MOFA factors calculated cross-modality.
#' 
#' For the RNA and TSS gene score data, the row metadata contains the Ensembl ID and MGI symbol for each gene.
#' The ATAC peak row metadata contains information for each of those peaks
#' Unlike other datasets in MouseGastrulationData, the rownames for these objects are gene symbols.
#'
#' @author Jonathan Griffiths
#' @examples
#' RA_rna <- RAMultiomeData(samples=1, type = "rna")
#'
#' @references
#' Argelaguet R et al. (2022). 
#' Decoding gene regulation in the mouse embryo using single-cell multi-omics. 
#' \emph{bioRxiv} 2022.06.15.496239
#'
#' @export
#' @importFrom ExperimentHub ExperimentHub
#' @importFrom SingleCellExperiment SingleCellExperiment
#' @importFrom SingleCellExperiment altExp<-
#' @importFrom BiocGenerics sizeFactors
#' @importClassesFrom S4Vectors DataFrame
#' @importFrom methods as
RAMultiomeData <- function(type=c("all", "rna", "peaks", "tss"), samples=NULL) {
    type <- match.arg(type)
    versions <- list(base="1.12.0")
    if(type!="all"){
        return(.getSingleRA(type, s=samples, v = versions))
    } else {
        ass <- c("rna", "peaks", "tss")
        dat <- lapply(ass, .getSingleRA, s=samples, v=versions)
        # newnames <- c(
        #     "rna" = "RNA_counts",
        #     "peaks" = "ATAC_peak_counts",
        #     "tss" = "TSS_gene_score")
        # names(dat) <- newnames[ass] # more complex names for the assays
        names(dat) <- rep("counts", length(ass)) # default names for the assays
        return(.addAltExp(dat))
    }
}

.getSingleRA <- function(type=c("rna", "peaks", "tss"), s, v){
    type <- match.arg(type)
    name <- switch(type, rna="RA_rna", tss="RA_atac_tss", peaks="RA_atac_peaks")
    .getRNAseqData(name, type="processed", version=v, samples=s, sample.options=as.character(1:11), sample.err="1:11", ens_rownames=FALSE)
}

.addAltExp <- function(sce_list){
    if(length(sce_list)<2){
        stop("List of SCEs not long enough to combine")
    }
    #match order of cells
    intersect <- Reduce(intersect, lapply(sce_list, colnames))
    for(i in seq_along(sce_list)){
        sce_list[[i]] = sce_list[[i]][, intersect]
    }
    #add altExps
    names(assays(sce_list[[1]])) <- names(sce_list)[1]
    for(i in seq_along(sce_list)[-1]){
        altExp(sce_list[[1]], names(sce_list)[i]) <- sce_list[[i]]
    }
    sce_list[[1]]
}
MarioniLab/MouseGastrulationData documentation built on Jan. 31, 2024, 11:01 a.m.