library(BiocStyle) knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
We obtain a MARS-seq single-cell RNA sequencing dataset of human bone marrow plasma cells and circulating plasma cells from Ledergor et al. (2018).
Counts for endogenous genes are available from the Gene Expression Omnibus
using the accession number GSE117156.
We download and cache them using the r Biocpkg("BiocFileCache")
package.
library(BiocFileCache) bfc <- BiocFileCache("raw_data", ask = FALSE) mat.path <- bfcrpath(bfc, "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE117156&format=file")
We read this into memory as a sparse matrix.
Each matrix corresponds to a 384-well plate from the MARS-seq sequencing.
Rows correspond to the features (identical across all matrices), columns
correspond to the wells. To obtain the full count matrix, we cbind
the
matrices together.
tmp <- tempfile() untar(mat.path, exdir=tmp) all.files <- list.files(tmp) library(BiocParallel) library(scuttle) all.counts <- bplapply(file.path(tmp, all.files), readSparseCounts) names(all.counts) <- sub(".txt.gz$", "", all.files) do.call(rbind, lapply(all.counts, dim)) counts <- do.call(cbind, all.counts) dim(counts)
Creating a SingleCellExperiment
object:
library(SingleCellExperiment) sce <- SingleCellExperiment(list(counts=counts))
We pull the metadata file from the GEO and load it in:
meta.path <- bfcrpath(bfc, "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE117156&format=file&file=GSE117156%5Fmetadata%2Etxt%2Egz") meta <- read.delim(meta.path, row.names=1) meta <- DataFrame(meta, check.names=FALSE) meta
Removing this extraneous column:
summary(meta$X) meta$X <- NULL
We check that the cell names match up with the matrix.
m <- match(colnames(counts), rownames(meta)) stopifnot(all(!is.na(m))) meta <- meta[m,]
The Experiment_ID
column encodes information on the subject, tissue and
treatment status. We'll extract this into separate columns for easier access.
splt <- strsplit(meta$Experiment_ID, split = "_") meta$Subject_ID <- vapply(splt, `[[`, character(1), 2) meta$Condition <- sub("\\d+$", "", meta$Subject_ID) meta$Condition[which(meta$Condition == "hip")] <- "Control" ## Treated IDs have 4 components, with `postRx` between subject ID and tissue meta$Tissue <- vapply(splt, function(x) x[[length(x)]], character(1)) meta$Tissue <- sub("#\\d$", "", meta$Tissue) meta$Treated <- grepl("postRx", meta$Experiment_ID) meta
colData(sce) <- meta sce
We polish the dataset to remove redundant names and convert assay types to more appropriate formats:
library(scRNAseq) sce <- polishDataset(sce) sce
We then save all of the relevant components to disk for upload:
meta <- list( title="Single cell dissection of plasma cell heterogeneity in symptomatic and asymptomatic myeloma", description="Multiple myeloma, a plasma cell malignancy, is the second most common blood cancer. Despite extensive research, disease heterogeneity is poorly characterized, hampering efforts for early diagnosis and improved treatments. Here, we apply single cell RNA sequencing to study the heterogeneity of 40 individuals along the multiple myeloma progression spectrum, including 11 healthy controls, demonstrating high interindividual variability that can be explained by expression of known multiple myeloma drivers and additional putative factors. We identify extensive subclonal structures for 10 of 29 individuals with multiple myeloma. In asymptomatic individuals with early disease and in those with minimal residual disease post-treatment, we detect rare tumor plasma cells with molecular characteristics similar to those of active myeloma, with possible implications for personalized therapies. Single cell analysis of rare circulating tumor cells allows for accurate liquid biopsy and detection of malignant plasma cells, which reflect bone marrow disease. Our work establishes single cell RNA sequencing for dissecting blood malignancies and devising detailed molecular characterization of tumor cells in symptomatic and asymptomatic patients.", taxonomy_id="9606", genome="GRCh38", sources=list( list(provider="GEO", id="GSE117156"), list(provider="PubMed", id="30523328") ), maintainer_name="Milan Malfait", maintainer_email="milan.malfait94@gmail.com" ) saveDataset(sce, "2023-12-20_output", meta)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.