knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Document Status: Working
Estimated Reading Time: 8 min

Special acknowledgments

Utils demonstrated in this vignette benefited greatly from code originally written by hhunterzinck.

Important note

The requirements for cBioPortal change, just like with any software or database. The package is updated to keep up on a yearly submission basis, but there may be occasional points in time when the workflow is out-of-date with this external system.

Intro

This describes how to package some Synapse processed data as a cBioPortal study dataset. A cBioPortal study contains one or more data types, see cBioPortal docs. The current API covers creating a cBioPortal study with a subset of data types relevant to the NF workflow (so not all data types). The design has been inspired by and should feel somewhat like working with the R package usethis, and data types can be added to the study package interactively.

Though there is some checking depending on the data type, final validation with the official cBioPortal validation tools/scripts should still be run.

Breaking changes are possible as the API is still in development.

Set up

First load the nfportalutils package and log in. The recommended default usage of syn_login is to use it without directly passing in credentials. Instead, have available the SYNAPSE_AUTH_TOKEN environment variable with your token stored therein.

library(nfportalutils)
syn_login()

Create a new study dataset

First create the study dataset "package" where we can put together the data. Each study dataset combines multiple data types -- clinical, gene expression, gene variants, etc. Meta can be edited after the file has been created. This will also set the working directory to the new study directory.

cbp_new_study(cancer_study_identifier = "npst_nfosi_ntap_2022",
              name = "Plexiform Neurofibroma and Neurofibroma (Pratilas 2022)",
              type_of_cancer = "nfib", # required -- see https://oncotree.mskcc.org/
              citation = "TBD")

Add data types to study

Data types can be most easily added in any order using the cbp_add* functions. These functions download data files and create the meta for them.

Note that:

Add mutations data

maf_data <- "syn36553188"

cbp_add_maf(maf_data)

Add copy number alterations (CNA) data

cna_data <- "syn********"

cbp_add_cna(cna_data)

Add expression data

mrna_data <- "syn********"
mrna_data_raw <- "syn********"

cbp_add_expression(mrna_data,
                   expression_data_raw = mrna_data_raw)

Add clinical data

clinical_data <- "select * from syn43278088" # query when the table already contains just the releasable patients
ref_map <- "https://raw.githubusercontent.com/nf-osi/nf-metadata-dictionary/main/mappings/cBioPortal/cBioPortal.yaml"

cbp_add_clinical(clinical_data, ref_map)

Validation

Validation has to be done with a cBioPortal instance. Each portal may have specific configurations (such as genomic reference) to validate against.

For an example simple offline validation, assuming you are at ~/datahub/public and a study folder called npst_nfosi_ntap_2022 has been placed into it, mount the dataset into the container and run validation like:

STUDY=npst_nfosi_ntap_2022
sudo docker run --rm -v $(pwd):/datahub cbioportal/cbioportal:6.0.25 validateData.py -s datahub/$STUDY -n -v

See the general docs for dataset validation for more examples.



nf-osi/nfportalutils documentation built on June 10, 2025, 5:08 a.m.