pull_data_synapse: Obtain clinical & genomic data files for GENIE BPC Project

View source: R/pull_data_synapse.R

pull_data_synapseR Documentation

Obtain clinical & genomic data files for GENIE BPC Project

Description

Function to access specified versions of clinical and genomic GENIE BPC data from Synapse and read them into the R environment. See the pull_data_synapse vignette for further documentation and examples.

Usage

pull_data_synapse(
  cohort = NULL,
  version = NULL,
  download_location = NULL,
  username = NULL,
  password = NULL,
  pat = NULL
)

Arguments

cohort

Vector or list specifying the cohort(s) of interest. Must be one of "NSCLC" (Non-Small Cell Lung Cancer), "CRC" (Colorectal Cancer), or "BrCa" (Breast Cancer), "PANC" (Pancreatic Cancer), "Prostate" (Prostate Cancer), and "BLADDER" (Bladder Cancer). This is not case sensitive.

version

Vector specifying the version of the cohort. Must match one of the release versions available for the specified 'cohort' (see 'synapse_version()' for available cohort versions). When entering multiple cohorts, it is inferred that the order of the version numbers passed corresponds to the order of the cohorts passed. Therefore, 'cohort' and 'version' must be in the same order to ensure the correct data versions are pulled. See examples below for details.

download_location

if 'NULL' (default), data will be returned as a list of dataframes with requested data as list items. Otherwise, specify a folder path to have data automatically downloaded there. When a path is specified, data are not read into the R environment.

username

'Synapse' username

password

'Synapse' password

pat

'Synapse' personal access token

Value

Returns a nested list of clinical and genomic data corresponding to the specified cohort(s).

Authentication

To access data, users must have a valid 'Synapse' account with permission to access the data set and they must have accepted any necessary 'Terms of Use'. Users must always authenticate themselves in their current R session. (see README: Data Access and Authentication

for details). To set your 'Synapse' credentials during each session, call:

'set_synapse_credentials(username = "your_username", password = "your_password")'

In addition to passing your 'Synapse' username and password, you may choose to set your 'Synapse' Personal Access Token (PAT) by calling: 'set_synapse_credentials(pat = "your_pat")'.

If your credentials are stored as environmental variables, you do not need to call 'set_synapse_credentials()' explicitly each session. To store authentication information in your environmental variables, add the following to your .Renviron file, then restart your R session ' (tip: you can use 'usethis::edit_r_environ()' to easily open/edit this file):

  • 'SYNAPSE_USERNAME = <your-username>'

  • 'SYNAPSE_PASSWORD = <your-password>'

  • 'SYNAPSE_PAT = <your-pat>'

Alternatively, you can pass your username and password or your PAT to each individual data pull function if preferred, although it is recommended that you manage your passwords outside of your scripts for security purposes.

Analytic Data Guides

Documentation corresponding to the clinical data files can be found on 'Synapse' in the Analytic Data Guides:

Author(s)

Karissa Whiting, Michael Curry

Examples


# Example 1 ----------------------------------
# Set up 'Synapse' credentials
set_synapse_credentials()

# Print available versions of the data
synapse_version(most_recent = TRUE)

# Pull version 2.0-public for non-small cell lung cancer
# and version 2.0-public for colorectal cancer data

 ex1 <- pull_data_synapse(
   cohort = c("NSCLC", "CRC"),
   version = c("v2.0-public", "v2.0-public")
 )

 names(ex1)


genieBPC documentation built on Sept. 11, 2024, 8:29 p.m.