downloadStudy: Manually download, untar, and load study tarballs

View source: R/cBioDataPack.R

downloadStudyR Documentation

Manually download, untar, and load study tarballs

Description

Note that these functions should be used when a particular study is not currently available as a MultiAssayExperiment representation. Otherwise, use cBioDataPack. Provide a cancer_study_id from getStudies and retrieve the study tarball from the cBio Genomics Portal. These functions are used by cBioDataPack under the hood to download,untar, and load the tarball datasets with caching. As stated in ?cBioDataPack, not all studies are currently working as MultiAssayExperiment objects. As of July 2020, about ~80% of datasets can be successfully imported into the MultiAssayExperiment data class. Please open an issue if you would like the team to prioritize a study. You may also check getStudies(buildReport = TRUE)$pack_build for the current status.

Usage

downloadStudy(
  cancer_study_id,
  use_cache = TRUE,
  force = FALSE,
  url_location = getOption("cBio_URL", .url_location),
  ask = interactive()
)

untarStudy(cancer_study_file, exdir = tempdir())

loadStudy(
  filepath,
  names.field = c("Hugo_Symbol", "Entrez_Gene_Id", "Gene", "Composite.Element.REF"),
  cleanup = TRUE
)

Arguments

cancer_study_id

character(1) The study identifier from cBioPortal as seen in the dataset links at https://www.cbioportal.org/datasets.

use_cache

logical(1) (default TRUE) create the default cache location and use it to track downloaded data. If data found in the cache, data will not be re-downloaded. A path can also be provided to data cache location.

force

logical(1) (default FALSE) whether to force re-download data from remote location

url_location

character(1) (default "https://cbioportal-datahub.s3.amazonaws.com") the URL location for downloading packaged data. Can be set using the 'cBio_URL' option (see ?cBioDataPack for more details)

ask

logical(1) Whether to prompt the the user before downloading and loading study MultiAssayExperiment that is not currently building based on previous testing. Set to interactive() by default. In a non-interactive session, data download will be attempted; equivalent to ask = FALSE. The argument will also be used when a cache directory needs to be created when using downloadStudy.

cancer_study_file

character(1) indicates the on-disk location of the downloaded tarball

exdir

character(1) indicates the folder location to put the contents of the tarball (default tempdir(); see also ?untar)

filepath

character(1) indicates the folder location where the contents of the tarball are located (usually the same as exdir)

names.field

character() Possible column names for the column that will used to label ranges for data such as mutations or copy number (default: c("Hugo_Symbol", "Entrez_Gene_Id", "Gene", "Composite.Element.REF")). Values are cycled through and eliminated when no data present, or duplicates are found. Values in the corresponding column must be unique in each row.

cleanup

logical(1) whether to delete the untar-red contents from the exdir folder (default TRUE)

Details

When attempting to load a dataset using loadStudy, note that the cleanup argument is set to TRUE by default. Change the argument to FALSE if you would like to keep the untarred data in the exdir location. downloadStudy and untarStudy are not affected by this change. The tarball of the downloaded data is cached via BiocFileCache when use_cache is TRUE.

Value

  • downloadStudy - The file location of the data tarball

  • untarStudy - The directory location of the contents

  • loadStudy - A MultiAssayExperiment-class object

See Also

cBioDataPack, MultiAssayExperiment

Examples


(acc_file <- downloadStudy("acc_tcga"))

(file_dir <- untarStudy(acc_file, tempdir()))

loadStudy(file_dir)


waldronlab/MultiAssayExperimentData documentation built on Dec. 22, 2024, 12:06 p.m.