library(BiocStyle)
knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)

Overview

Dvir Aran's original r Githubpkg("dviraran/SingleR") github repository contains Robjects with normalized expression values from reference data sets such as those collected by Blueprint & Encode, the Human Primary Cell Atlas (HPCA), and the Immunological Genome Project (ImmGen). These expression data are based on bulk RNA-seq or microarrays from purified cell populations or single-cell RNA-seq of individual tissues. Every sample represents the transcriptome of a specific cell type; this data is therefore well suited to be used as a general training data set for the typical SingleR analysis.

Data retrieval

For now, we're going to retrieve the processed data from the legacy SingleR repository:

dataset <- "mouse.rnaseq" 
full.url <- sprintf("https://github.com/dviraran/SingleR/blob/master/data/%s.rda?raw=true", dataset)

library(BiocFileCache)
bfc <- BiocFileCache(ask=FALSE)
ref <- bfcrpath(bfc, full.url)

env <- new.env()
load(ref, envir = env)
ref.set <- get(dataset, envir = env)
names(ref.set)

The original objects contain numerous nested lists. We only need the matrix of normalized expression values and the labels assigned to each sample/cell.

Data extraction

Extract the normalized expression matrix:

logcounts <- ref.set$data

Extract cell labels, which represent the metadata:

library(S4Vectors)
coldata <- DataFrame(row.names = colnames(logcounts),
    label.main = ref.set$main_types,
    label.fine = ref.set$types)

Saving to file

Saving counts and metadata to upload them to r Biocpkg("ExperimentHub").

path <- file.path("celldex", dataset, "1.0.0")
dir.create(path, showWarnings = FALSE, recursive = TRUE)

## saving counts
saveRDS(logcounts, file = file.path(path, "logcounts.rds"))

## saving metadata
saveRDS(coldata, file = file.path(path, "coldata.rds"))

Session info

sessionInfo()


LTLA/celldex documentation built on June 3, 2024, 4:53 p.m.