#' Mouse gastrulation joint ATAC/RNA data
#'
#' Obtain the processed counts for the mouse gastrulation "multi-omics" dataset.
#'
#' @param type String specifying the type of data to obtain, see Details.
#' Default behaviour is to return all three data types.
#' @param samples Integer or character vector specifying the samples for which data (processed or raw) should be obtained.
#' If \code{NULL} (default), data are returned for all (11) samples.
#'
#' @return
#' If \code{type="all"}, a \linkS4class{SingleCellExperiment} object is returned containing processed data from selected samples for all data types.
#' RNA-seq data is in the primary assay slot, while the other data types are in the altExp slot.
#' The default \code{counts} slot on the first level of the SingleCellExperiment object will be occupied by the RNA data.
#' The other modalities can be accessed using \code{SingleCellExperiment::altExp}, where the counts slot will again be occupied by the data for each modality for compatability with many function defaults.
#'
#' If \code{type="rna"}, \code{type="peaks"}, or \code{type="tss"}, a \linkS4class{SingleCellExperiment} object is returned containing information for a single data type.
#' Each assay will be in the primary \code{counts} slot.
#' RNA data corresponds to RNA-seq read counts.
#' Peak data corresponds to read counts from ATAC-seq quantified over peaks defined using ArchR's peak calling strategy.
#' TSS data corresponds to read counts from ATAC-seq quantified over transcriptions start sites using ArchR's Gene Scores model.
#'
#' @details
#' This function downloads the data for the embryo atlas from Argelaguet et al. (2022).
#' The dataset contains 11 10X Genomics multiome samples.
#'
#' The column metadata contains columns from the following set, depending on modality:
#' \describe{
#' \item{\code{barcode}:}{Character: cell barcode from the 10X Genomics experiment (with appended "-1" from Cellranger).}
#' \item{\code{sample}:}{Integer: index of the sample from which the cell was taken.}
#' \item{\code{sample_name}:}{Character: descriptive name of the sample from which the cell was taken.}
#' \item{\code{stage}:}{Character: stage of the mouse embryo at which the sample was taken.}
#' \item{\code{genotype}:}{Character: cell genotype, wild type (WT) or Brachyury KO (T_KO)}
#' \item{\code{celltype}:}{Character: cell type to which the cell was assigned by mapping to RNA atlas.}
#' \item{\code{nFeature_RNA}:}{Integer: number of genes detected in RNAseq data for the cell.}
#' \item{\code{nCount_RNA}:}{Integer: number of RNA molecules detected in RNAseq data for the cell.}
#' \item{\code{mitochondrial_percent_RNA}:}{Numeric: percent of RNA molecules detected from mitochondrial genome for the cell.}
#' \item{\code{ribosomal_percent_RNA}:}{Numeric: percent of RNA molecules detected from ribosomal genes for the cell.}
#' \item{\code{nFrags_atac}:}{Numeric: number of ATAC fragments detected per cell.}
#' \item{\code{TSSEnrichment_atac}:}{Numeric: Quality control metric that represents the ratio of ATAC peaks near the transcription start site relative to the flanking regions. Derived from the ArchR package.}
#' \item{\code{doublet_score}:}{Numeric: doublet score for each cell calculated using the \code{cxds_bcds_hybrid} function from the \code{scds} package.}
#' \item{\code{doublet_call}:}{Logical: doublet call for each cell calculated from the "doublet_score" column. Cells with a doublet score larger than 1.25 are assumed to be doublets and thus were removed from downstream analysis.}
#' }
#' Reduced dimension representations of the data are also available in the \code{reducedDims} slot of the SingleCellExperiment object.
#' These are UMAPs calculated either across all the data, or per stage (\code{perstage}).
#' Those labelled either \code{rna} or \code{atac} alone were calculated from the processed count matrices of these modalities; \code{rna_atac}-labelled UMAPs were calculated from the MOFA factors calculated cross-modality.
#'
#' For the RNA and TSS gene score data, the row metadata contains the Ensembl ID and MGI symbol for each gene.
#' The ATAC peak row metadata contains information for each of those peaks
#' Unlike other datasets in MouseGastrulationData, the rownames for these objects are gene symbols.
#'
#' @author Jonathan Griffiths
#' @examples
#' RA_rna <- RAMultiomeData(samples=1, type = "rna")
#'
#' @references
#' Argelaguet R et al. (2022).
#' Decoding gene regulation in the mouse embryo using single-cell multi-omics.
#' \emph{bioRxiv} 2022.06.15.496239
#'
#' @export
#' @importFrom ExperimentHub ExperimentHub
#' @importFrom SingleCellExperiment SingleCellExperiment
#' @importFrom SingleCellExperiment altExp<-
#' @importFrom BiocGenerics sizeFactors
#' @importClassesFrom S4Vectors DataFrame
#' @importFrom methods as
RAMultiomeData <- function(type=c("all", "rna", "peaks", "tss"), samples=NULL) {
type <- match.arg(type)
versions <- list(base="1.12.0")
if(type!="all"){
return(.getSingleRA(type, s=samples, v = versions))
} else {
ass <- c("rna", "peaks", "tss")
dat <- lapply(ass, .getSingleRA, s=samples, v=versions)
# newnames <- c(
# "rna" = "RNA_counts",
# "peaks" = "ATAC_peak_counts",
# "tss" = "TSS_gene_score")
# names(dat) <- newnames[ass] # more complex names for the assays
names(dat) <- rep("counts", length(ass)) # default names for the assays
return(.addAltExp(dat))
}
}
.getSingleRA <- function(type=c("rna", "peaks", "tss"), s, v){
type <- match.arg(type)
name <- switch(type, rna="RA_rna", tss="RA_atac_tss", peaks="RA_atac_peaks")
.getRNAseqData(name, type="processed", version=v, samples=s, sample.options=as.character(1:11), sample.err="1:11", ens_rownames=FALSE)
}
.addAltExp <- function(sce_list){
if(length(sce_list)<2){
stop("List of SCEs not long enough to combine")
}
#match order of cells
intersect <- Reduce(intersect, lapply(sce_list, colnames))
for(i in seq_along(sce_list)){
sce_list[[i]] = sce_list[[i]][, intersect]
}
#add altExps
names(assays(sce_list[[1]])) <- names(sce_list)[1]
for(i in seq_along(sce_list)[-1]){
altExp(sce_list[[1]], names(sce_list)[i]) <- sce_list[[i]]
}
sce_list[[1]]
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.