available_projects: List available projects in recount3

View source: R/available_projects.R

available_projectsR Documentation

List available projects in recount3

Description

List available projects in recount3

Usage

available_projects(
  organism = c("human", "mouse"),
  recount3_url = getOption("recount3_url", "http://duffel.rail.bio/recount3"),
  bfc = recount3_cache(),
  available_homes = project_homes(organism = organism, recount3_url = recount3_url)
)

Arguments

organism

A character(1) specifying which organism you want to download data from. Supported options are "human" or "mouse".

recount3_url

A character(1) specifying the home URL for recount3 or a local directory where you have mirrored recount3. Defaults to the load balancer http://duffel.rail.bio/recount3, but can also be https://recount-opendata.s3.amazonaws.com/recount3/release from https://registry.opendata.aws/recount/ or SciServer datascope from IDIES at JHU https://sciserver.org/public-data/recount3/data. You can set the R option recount3_url (for example in your .Rprofile) if you have a favorite mirror.

bfc

A BiocFileCache-class object where the files will be cached to, typically created by recount3_cache().

available_homes

A character() vector with the available project homes for the given recount3_url. If you use a non-standard recount3_url, you will likely need to specify manually the valid values for available_homes.

Value

A data.frame() with the project ID (project), the organism, the file_source from where the data was accessed, the recount3 project home location (project_home), the project project_type that differentiates between data_sources and compilations, the n_samples with the number of samples in the given project.

Examples


## Find all the human projects
human_projects <- available_projects()

## Explore the results
dim(human_projects)
head(human_projects)

## How many are from a data source vs a compilation?
table(human_projects$project_type, useNA = "ifany")

## What are the unique file sources?
table(
    human_projects$file_source[human_projects$project_type == "data_sources"]
)

## Note that big projects are broken up to make them easier to access
## For example, GTEx and TCGA are broken up by tissue
head(subset(human_projects, file_source == "gtex"))
head(subset(human_projects, file_source == "tcga"))

## Find all the mouse projects
mouse_projects <- available_projects(organism = "mouse")

## Explore the results
dim(mouse_projects)
head(mouse_projects)

## How many are from a data source vs a compilation?
table(mouse_projects$project_type, useNA = "ifany")

## What are the unique file sources?
table(
    mouse_projects$file_source[mouse_projects$project_type == "data_sources"]
)

## Not run: 
## Use with a custom recount3_url:
available_projects(
    recount3_url = "http://snaptron.cs.jhu.edu/data/temp/recount3test",
    available_homes = "data_sources/sra"
)

## You can also rely on project_homes() if the custom URL has a text file
## that can be read with readLines() at:
## <recount3_url>/<organism>/homes_index
available_projects(
    recount3_url = "http://snaptron.cs.jhu.edu/data/temp/recount3test"
)

## End(Not run)

LieberInstitute/recount3 documentation built on Dec. 11, 2024, 8:35 p.m.