fetch_processed_quant: Fetch preprocessed quantification result.
In COMBINE-lab/roe: R utilities for alevin-fry

fetch_processed_quant

R Documentation

Fetch preprocessed quantification result.

Description

Fetch alevin-fry processed quantification result of publicly available datasets.

Usage

fetch_processed_quant(
  dataset_ids = c(),
  fetch_dir = "processed_quant",
  force = FALSE,
  delete_tar = FALSE,
  quiet = FALSE
)

Arguments

`dataset_ids`	integer scalar or vector providing the id of the available dataset(s) to be fetched.
`fetch_dir`	path to the directory where the fetched quantification results will be stored. It will be created if not exists.
`force`	logical whether to force re-fetching the existing datasets.
`delete_tar`	logical whether to delete the compressed datasets after decompressing. If FALSE, the tar files will be stored in a folder called quant_tar under the `fetch_dir`.
`quiet`	logical whether to display no messages.

Details

The raw data for many single-cell and single-nucleus RNA-seq experiments is publicly available. However, certain datasets are used again and again, to demonstrate data processing in tutorials, as benchmark datasets for novel methods (e.g. for clustering, dimensionality reduction, cell type identification , etc.). In particular, 10x Genomics hosts various publicly available datasets generated using their technology and processed via their Cell Ranger software on their website for download.

We have created a Nextflow-based alevin-fry workflow that one can use to easily quantify single-cell RNA-sequencing data in a single workflow. The pipeline can be found here. To test out this initial pipeline, we have begun to reprocess the publicly-available datasets collected from the 10x website. We have focused the initial effort on standard single-cell and single-nucleus gene-expression data generated using the Chromium v2 and v3 chemistries, but hope to expand the pipeline to more complex protocols soon (e.g. feature barcoding experiments) and process those data as well. We note that these more complex protocols can already be processed with alevin-fry (see the alevin-fry tutorials), but these have just not yet been incorporated into the automated Nextflow-based workflow linked above.

Following we list the name, link and dataset id of the currently available datasets whose quantification result is ready for fetch. To obtain the details of these available datasets as a data frame, simply run 'fetch_processed_quant()' in R.

Note that because the name of datasets are too long, the stored datasets are named by their id.

Value

If an empty dataset_ids is provided, a data frame containing the information of available datasets will be returned; otherwise, a list of ProcessedQuant class objects, in which each ProcessedQuant object stores the information of one fetched dataset. The 'quant_path' field represents the path to the quantification result of the fetched dataset.

Author(s)

Dongze He

Examples


## Not run: 
library(roe)
# run the function
available_datasets = load_processed_quant()
fetched_quant_list = fetch_processed_quant(dataset_id = c(1, 3),
                                              fetch_dir = "processed_quant",
                                              force = FALSE,
                                              delete_tar = FALSE,
                                              quiet = FALSE)

print(fetched_quant_list$"1"@quant_path)
print(fetched_quant_list$"2"@quant_path)

## End(Not run)

COMBINE-lab/roe documentation built on Nov. 8, 2022, 5:23 p.m.