Tutorial: pull_data_synapse"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

# exit if user doesn't have synapser, a log in, or access to data.
if (genieBPC:::.is_connected_to_genie() == FALSE){
  knitr::knit_exit()
}
library(genieBPC)
library(dplyr)
library(tibble)
library(magrittr)
library(gt)

set_synapse_credentials()

Introduction

The pull_data_synapse() function accesses the specified version of the clinical and genomic GENIE BPC data from Synapse and reads it into the R environment.  

This vignette will walk a user through the pull_data_synapse() function.

Setup

Access to the GENIE BPC data release folders on 'Synapse' is required in order to use this function. To obtain access:

For public data releases:

  1. Register for a 'Synapse' account. Accept the Synapse account terms of use.

  2. Navigate to the data release and request accept terms of use (e.g., for the NSCLC 2.0-public data release, navigate to the 'Synapse' page for the data release). Towards the top of the page, there is information including the 'Synapse' ID, DOI, Item count, and Access. Next to Access is a link that reads Request Access.

  3. Select Request Access, review the terms of data use and select Accept

Note that permissions for Synapse and permissions for each data release are distinct. Both permissions must be accepted to successfully access the data.

For consortium data releases (restricted to GENIE consortium members & BPC pharmaceutical partners):

  1. Register for a 'Synapse' account

  2. Use this link to access the GENIE BPC team list and request to join the team. Please include your full name and affiliation in the message before sending out the request.

  3. Once the request is accepted, you may access the data in the GENIE Biopharma Collaborative projects.

Note: Please allow up to a week to review and grant access.

Authenticate yourself

  1. Whether you are using public or consortium data, you will need to authenticate yourself at the beginning of each R session in which you use {genieBPC} to pull data. To set your Synapse credentials during each session, call:

set_synapse_credentials(username = "your_username", password = "your_password")

SYNAPSE_USERNAME = <your-username> SYNAPSE_PASSWORD = <your-password>

Alternatively, you can pass your username and password to each individual data pull function if preferred, although it is recommended that you manage your passwords outside of your scripts for security purposes.

Usage

Let's start by reviewing the pull_data_synapse() arguments.

tibble::tribble(
  ~Argument,       ~Description,
  "`cohort`",       "Vector or list specifying the cohort(s) of interest. Must be one of 'NSCLC' (Non-Small Cell Lung Cancer), 'CRC' (Colorectal Cancer), 'BrCa' (Breast Cancer), 'BLADDER' (Bladder Cancer), 'PANC' (Pancreatic Cancer), or 'Prostate' (Prostate Cancer)",
  "`version`",        "Vector specifying the version of the data. Must be one of the following: 'v1.1-consortium', 'v1.2-consortium', 'v2.1-consortium', 'v2.0-public'. When entering multiple cohorts, the order of the version numbers corresponds to the order that the cohorts are specified; the cohort and version number must be in the same order in order to pull the correct data.",
  "`download_location`",        "If `NULL` (default), data will be returned as a list of dataframes with requested data as list items. Otherwise, specify a folder path to have data automatically downloaded there. When a path is specified, data are not read into the R environment.",
  "`username`",        "Synapse username",
  "`password`",        "Synapse password"
) %>%
  gt::gt() %>%
  gt::fmt_markdown(columns = c(Argument)) %>%
  gt::tab_options(table.font.size = 'small',
                  data_row.padding = gt::px(1),
                  summary_row.padding = gt::px(1),
                  grand_summary_row.padding = gt::px(1),
                  footnotes.padding = gt::px(1),
                  source_notes.padding = gt::px(1),
                  row_group.padding = gt::px(1))

 

Examples

Example 1

Pull version 2.0-public of the NSCLC data from Synapse and store in the local environment.

nsclc_2_0 = pull_data_synapse("NSCLC", version = "v2.0-public")

The resulting nsclc_data object is a list of elements, such that each element represents a dataset:

nsclc_2_0 = pull_data_synapse("NSCLC", version = "v2.0-public")
ls(nsclc_2_0$NSCLC_v2.0) %>% 
  as.data.frame() %>% 
  mutate(clinical_vs_genomic = case_when(. %in% c("cna", "fusions",
                                                  "mutations_extended") ~ "Genomic Data",
                                         TRUE ~ "Clinical Data")) %>% 
  gt::gt() %>% 
  gt::cols_label(.data = .,
                 `.` = md("**Datasets Returned**"),
             clinical_vs_genomic = md("**Clinical/Genomic Data**"))

Example 2

Pull version 2.1-consortium of the NSCLC data and version 1.1-consortium of the CRC data.

pull_data_synapse(c("NSCLC", "CRC"), 
                  version = c("v2.1-consortium","v1.1-consortium"))

Example 3

Pull version 1.2-consortium of the NSCLC data and version 1.1-consortium of the CRC data.

pull_data_synapse(c("NSCLC", "CRC"),
                  version = c("v1.2-consortium", "v1.1-consortium"))

Future Work



Try the genieBPC package in your browser

Any scripts or data that you put into this service are public.

genieBPC documentation built on March 7, 2023, 6:46 p.m.