knitr::opts_chunk$set(echo = TRUE, eval = TRUE, fig.show = "hold", out.width = "50%", message = FALSE, warning = FALSE)

devtools::load_all()
library(SPATAData)
library(dplyr)
library(stringr)
library(ggplot2)

1. Introduction

The package SPATAData gives access to our data base of spatial transcriptomic samples. Furthermore, it provides easy access to data sets that have been already published. We are continuously updating this collection so make sure to check for package updates on a regular basis. Please not that many of these data sets are not owned by us! Make sure to use the correct citation if you download and use them for your analysis. See more under section 4. Citation.

# install SPATADAta with:
devtools::install_github(repo = "theMILOlab/SPATAData")

# and if you have not already:  
devtools::install_github("kueckelj/confuns")

We, the MILOlab, are a workgroup focused on neurooncology. Our database is predominantly composed of human samples from the cerebrum (a). Nonetheless, we have also curated multiple objects from various other organs (b) with distinct histological classifications.

Utilize the source data.frame as described below to obtain an overview and filter samples that may be relevant to your research. Note the differentiation between the organ Brain for mice tissue donors and Cerebrum for human tissue donors. This distinction is important because, in the case of Visium datasets, mouse samples usually encompass the entire intracranial central nervous system (commonly referred to as Brain). Due to their size, human brain samples are derived from specific organs (column: organ, with values such as Cerebrum, Midbrain, Cerebellum) and specific locations (column: organ_part, with values such as frontal lobe, temporal lobe, corpus callosum).

source_df <- sourceDataFrame()

ns <- nrow(source_df)
no <- n_distinct(source_df$organ)
nh <- n_distinct(source_df$histo_class)

p1 <- 
  ggplot(filter(source_df, organ == "Cerebrum")) + 
  geom_bar(mapping = aes(x = histo_class), color = "black", fill = "steelblue") + 
  theme_bw() + 
  coord_flip() + 
  labs(subtitle = "a) Organ: Cerebrum")

p2 <- 
  ggplot(filter(source_df, !organ %in% c("Cerebrum", "Brainstem"))) + 
  geom_bar(mapping = aes(x = organ), color = "black", fill = "steelblue") + 
  theme_bw() + 
  coord_flip() + 
  labs(subtitle = "b) Other organs")

p1
p2

2. The source data.frame

The last version of SPATAData used to have an interactive interface in which data samples could be viewed and downloaded by mouse click. This interface is currently not available (but will be, hopefully, in the months to come.) Till then, you can make use of the source data.frame directly in combination with some dplyr logic.

2.1 Structure

The source data.frame of SPATAData, as obtained by sourceDataFrame(), contains web links as well as meta data to multiple spatial data sets that have been published so far. Currently it counts a total of r ns samples across r no organs and r nh histological classifications. In the source data.frame every row corresponds to a data set. Hence, you can use dplyr to filter for data sets that fit your interest by filtering for specific characteristics. The following variables provide meta data about each data set.

Furthermore, there are variables that describe the sample data set and quality control results.

2.2 Usage

You can use unique() on each non numeric variable to obtain groups by which to filter the object.

# load required packages
library(SPATA2)
library(SPATAData)
library(dplyr)
library(stringr)

#assign the data.frame
source_df <- sourceDataFrame()

# get unique donor species types
unique(source_df$donor_species)

# get the different organs for which data exists
unique(source_df$organ)

# get additional specifications of anatomical location
unique(source_df$organ_part)

# get the different histo subclasses for which data exists
unique(source_df$histo_class)

To filter the source data.frame use dplyr::filter() in combination with the logical tests that represent your idea of the data set you need. For instance, if you want all glioblastoma samples from the frontal and temporal lobe the code would look like this:

# filter for frontal and temporal glioblastoma
filter(source_df, histo_class == "Glioblastoma" & organ_part %in% c("frontal", "temporal")) %>% 
  select(sample_name, donor_id, histo_class, organ, organ_part, pub_citation, everything())

If you want samples from a specific publication:

# look for publications and journals with string subsetting
filter(source_df, str_detect(pub_citation, pattern = "^Kuppe") & pub_journal == "Nature") %>% 
  select(sample_name, pathology, organ, histo_class, histo_class_sub, pub_citation, pub_journal, everything())

The number of conditions is unlimited. You can even process the data.frame to filter for specific queries. E.g. if you want patient wise matching of samples.

# look for several samples from one single patient
filter(source_df, !is.na(donor_id) & organ == "Cerebrum") %>% 
  group_by(donor_id) %>% # count the Cerebrum samples by donor
  mutate(ns_by_donor = n()) %>% 
  filter(ns_by_donor > 1) %>% # keep only those samples with n > 1
  arrange(donor_id)

3. Downloads

Whether you get them by filtering the source data.frame or because you know them by name, to download SPATA2 objects the sample names are required. There are two functions with which to download SPATA2 object.

Note, that the downloaded objects are completely unprocessed. Hence, the plots you see above derive from raw counts. Refer to the vignettes on object creation and processing to find the pipeline you see fit for your data samples.

3.1 Download and assign

This code chunk downloads single objects by sample name. It assigns the result to a variable in your global environment and you can immediately start with analysis and visualization.

# download objects by sample name and assign them to environment variables
object_heart <- downloadSpataObject(sample_name = "ACH0010")
object_gbm <- downloadSpataObject(sample_name = "UKF242T")

# left plot
plotSurface(object_heart, color_by = "HM_HYPOXIA")

# right plot
plotSurface(object_gbm, color_by = "GFAP", alpha_by = "GFAP")
object_heart <- readRDS("data/object_heart.RDS")
object_gbm <- readRDS("data/object_gbm.RDS")

# left plot
plotSurface(object_heart, color_by = "HM_HYPOXIA")

# right plot
plotSurface(object_gbm, color_by = "GFAP", alpha_by = "GFAP")

3.2 Download and saving on disk

This code chunk uses filtering and downloadSpataObjects() to download a complete set into a single folder.

# filter source data.frame
healthy_human_cortex_samples <- 
  filter(source_df, organ == "Cerebrum" & histo_class == "Cortex") %>% 
  pull(sample_name)

# show results
healthy_human_cortex_samples

# download samples
downloadSpataObjects(
  sample_names = healthy_human_cortex_samples, 
  folder = "spata_objects/healthy_cortex" # create directory or adjust it
  )
# filter source data.frame
healthy_human_cortex_samples <- 
  filter(source_df, organ == "Cerebrum" & histo_class == "Cortex") %>% 
  pull(sample_name)

# show results
healthy_human_cortex_samples

5. Sample meta data

Meta data about the sample are stored in slot @@meta_sample. It is a list that can be extended flexibly with addSampleMetaData() We recommend, however, to stick to the naming suggested by our source data.frame.

getSampleMetaData(object_heart)

6. Source code & Sharing

The SPATA2 objects have been curated manually by us without any further processing. Data sets that derive from other publications have been acessessed as suggested in the respective data availability statement. SPATA2 objects have been created in batches as can be tracked in the script /scripts/populate_spata2v3_objects.R in the main repository of SPATAData. If you want to make your data set easily accessible for users via SPATAData please contact jan.kueckelhaus@uk-erlangen.de.



theMILOlab/SPATAData documentation built on Aug. 27, 2024, 5:04 p.m.