knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(SimBu)
This vignette will cover the integration of the public database Sfaria. \
As a public database, sfaira [@Fischer2020] is used, which is a dataset and model repository for single-cell RNA-sequencing
data. It gives access to about multiple datasets from human and mouse with more than 3 million cells in total.
You can browse them interactively here: https://theislab.github.io/sfaira-portal/Datasets. Note that only annotated datasets will be downloaded! Also there are cases of datasets, which have private URLs and cannot be automatically downloaded; SimBu will skip these datasets. \
In order to use this database, we first need to install it. This can easily be done, by running the setup_sfaira()
function for the first time. In the background we use the basilisik package to establish a conda environment that has all sfaira dependencies installed. The installation will be only performed one single time, even if you close your R session and call setup_sfaira()
again. The given directory serves as the storage for all future downloaded datasets from sfaira:
setup_list <- SimBu::setup_sfaira(basedir = tempdir())
We will now create a dataset of samples from human pancreas using the organisms
and tissues
parameter.
You can provide a single word (like we do here) or for example a list of tissues
you want to download: c("pancreas","lung")
. An additional parameter is the assays
parameter,
where you subset the database further to only download datasets from certain sequencing
assays (for examples Smart-seq2
). \
The name
parameter is used later on to give each sample (cell) a unique name.
ds_pancrease <- SimBu::dataset_sfaira_multiple( sfaira_setup = setup_list, organisms = "Homo sapiens", tissues = "pancreas", name = "human_pancreas" )
Currently there are three datasets in sfaira from human pancreas, which have cell-type annotation. The package will download them for you automatically and merge them together into a single expression matrix and a streamlined annotation table, which we can use for our simulation. \ It can happen, that some datasets from sfaira are not (yet) ready for the automatic download, an error message will then appear in R, telling you which file to download and where to put it. \
If you wish to see all datasets which are included in sfaira you can use the following command:
all_datasets <- SimBu::sfaira_overview(setup_list = setup_list) head(all_datasets)
This allows you to find the specific IDs of datasets, which you can download directly:
SimBu::dataset_sfaira( sfaira_id = "homosapiens_lungparenchyma_2019_10x3v2_madissoon_001_10.1186/s13059-019-1906-x", sfaira_setup = setup_list, name = "dataset_by_id" )
utils::sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.