library(BiocStyle) knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
Dvir Aran's original r Githubpkg("dviraran/SingleR")
github repository contains Robjects with normalized expression values from reference data sets such as those collected by Blueprint & Encode, the Human Primary Cell Atlas (HPCA), and the Immunological Genome Project (ImmGen).
These expression data are based on bulk RNA-seq or microarrays from purified cell populations or single-cell RNA-seq of individual tissues.
Every sample represents the transcriptome of a specific cell type; this data is therefore well suited to be used as a general training data set for the typical SingleR analysis.
For now, we're going to retrieve the processed data from the legacy SingleR repository:
dataset <- "blueprint_encode" full.url <- sprintf("https://github.com/dviraran/SingleR/blob/master/data/%s.rda?raw=true", dataset) library(BiocFileCache) bfc <- BiocFileCache(ask=FALSE) ref <- bfcrpath(bfc, full.url) env <- new.env() load(ref, envir = env) ref.set <- get(dataset, envir = env) names(ref.set)
The original objects contain numerous nested lists. We only need the matrix of normalized expression values and the labels assigned to each sample/cell.
Extract the normalized expression matrix:
logcounts <- ref.set$data dim(logcounts)
There's a problem with the naming of the samples -- there's a duplicated name:
table(duplicated(colnames(logcounts)))
Uniquifying the colnames:
chosen <- anyDuplicated(colnames(logcounts)) colnames(logcounts)[chosen] <- "hematopoietic.multipotent.progenitor.cell.3" table(duplicated(colnames(logcounts)))
Fixing a reference to adipocytes, which should actually be astrocytes (see https://github.com/LTLA/SingleR/issues/96):
offenders <- ref.set$types == "Astrocytes" colnames(logcounts)[offenders] ref.set$main_types[offenders] ref.set$main_types[offenders] <- "Astrocytes"
Extract cell labels, which represent the metadata:
library(S4Vectors) coldata <- DataFrame(row.names = colnames(logcounts), label.main = ref.set$main_types, label.fine = ref.set$types)
Saving counts and metadata to upload them to r Biocpkg("ExperimentHub")
.
path <- file.path("celldex", dataset, "1.2.0") dir.create(path, showWarnings = FALSE, recursive = TRUE) ## saving counts saveRDS(logcounts, file = file.path(path, "logcounts.rds")) ## saving metadata saveRDS(coldata, file = file.path(path, "coldata.rds"))
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.