R/BlueprintEncodeData.R

Defines functions BlueprintEncodeData

Documented in BlueprintEncodeData

#' Obtain human bulk RNA-seq data from Blueprint and ENCODE
#'
#' Download and cache the normalized expression values of 259 RNA-seq samples of
#' pure stroma and immune cells as generated and supplied by Blueprint and ENCODE.
#'
#' @inheritParams HumanPrimaryCellAtlasData
#' @param rm.NA String specifying how missing values should be handled.
#' \code{"rows"} will remove genes with at least one missing value, 
#' \code{"cols"} will remove samples with at least one missing value,
#' \code{"both"} will remove any gene or sample with at least one missing value,
#' and \code{"none"} will not perform any removal.
#'
#' @details
#' This function provides normalized expression values for 259 bulk RNA-seq samples
#' generated by Blueprint and ENCODE from pure populations of stroma and immune 
#' cells (Martens and Stunnenberg, 2013; The ENCODE Consortium, 2012).
#' The samples were processed and normalized as described in Aran, Looney and
#' Liu et al. (2019), i.e., the raw RNA-seq counts were downloaded from 
#' Blueprint and ENCODE in 2016 and normalized via edgeR (TPMs).
#' 
#' Blueprint Epigenomics contains 144 RNA-seq pure immune samples annotated to 28 cell types.
#' ENCODE contains 115 RNA-seq pure stroma and immune samples annotated to 17 cell types.
#' All together, this reference contains 259 samples with 43 cell types (\code{"label.fine"}),
#' manually aggregated into 24 broad classes (\code{"label.main"}).
#' The fine labels have also been mapped to the Cell Ontology (\code{"label.ont"},
#' if \code{cell.ont} is not \code{"none"}), which can be used for further programmatic
#' queries.
#' 
#' @return A \linkS4class{SummarizedExperiment} object with a \code{"logcounts"} assay
#' containing the log-normalized expression values, along with cell type labels in the 
#' \code{\link{colData}}.
#'
#' @author Friederike Dündar
#'
#' @references
#' The ENCODE Project Consortium (2012).
#' An integrated encyclopedia of DNA elements in the human genome.
#' \emph{Nature} 489, pages 57–74.
#' 
#' Martens JHA and Stunnenberg HG (2013). 
#' BLUEPRINT: mapping human blood cell epigenomes.
#' \emph{Haematologica} 98, 1487–1489.
#' 
#' Aran D, Looney AP, Liu L et al. (2019). 
#' Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.
#' \emph{Nat. Immunol.} 20, 163–172. 
#' 
#' @examples
#' ref.se <- BlueprintEncodeData(rm.NA = "rows")
#'
#' @export
BlueprintEncodeData <- function(rm.NA = c("rows","cols","both","none"), ensembl=FALSE, cell.ont=c("all", "nonna", "none"), legacy=FALSE) {
    rm.NA <- match.arg(rm.NA)
    cell.ont <- match.arg(cell.ont)

    if (!legacy && rm.NA == "rows" && cell.ont == "all") {
        se <- fetchReference("blueprint_encode", "2024-02-26", realize.assays=TRUE)
    } else {
        se <- .create_se("blueprint_encode", 
            version = list(logcounts="1.0.0", coldata="1.2.0"),
            assays="logcounts", rm.NA = rm.NA,
            has.rowdata = FALSE, has.coldata = TRUE)
        se <- .add_ontology(se, "blueprint_encode", cell.ont)
    }

    .convert_to_ensembl(se, "Hs", ensembl)
}
LTLA/CellTypeReferences documentation built on June 1, 2024, 12:12 p.m.