knitr::opts_chunk$set(cache = TRUE)

Introduction

This vignette aims to help developers migrate from the now defunct cgdsr CRAN package. Note that the cgdsr package code is shown for comparison but it is not guaranteed to work. If you have questions regarding the contents, please create an issue at the GitHub repository: https://github.com/waldronlab/cBioPortalData/issues

Loading the package

library(cBioPortalData)

Discovering studies {.tabset .tabset-fade .tabset-pills}

cBioPortalData setup

Here we show the default inputs to the cBioPortal function for clarity.

cbio <- cBioPortal(
    hostname = "www.cbioportal.org",
    protocol = "https",
    api. = "/api/v2/api-docs"
)
getStudies(cbio)

Note that the studyId column is important for further queries.

head(getStudies(cbio)[["studyId"]])

cgdsr setup

library(cgdsr)
cgds <- CGDS("http://www.cbioportal.org/")
getCancerStudies.CGDS(cgds)

Obtaining Cases {.tabset .tabset-fade .tabset-pills}

cBioPortalData (Cases)

Notes

sampleLists

For the sample list identifiers, you can use sampleLists and inspect the sampleListId column.

samps <- sampleLists(cbio, "gbm_tcga_pub")
samps[, c("category", "name", "sampleListId")]

samples from sampleLists

It is possible to get case_ids directly when using the samplesInSampleLists function. The function handles multiple sampleList identifiers.

samplesInSampleLists(
    api = cbio,
    sampleListIds = c(
        "gbm_tcga_pub_expr_classical", "gbm_tcga_pub_expr_mesenchymal"
    )
)

getSampleInfo

To get more information about patients, we can query with getSampleInfo function.

getSampleInfo(api = cbio,  studyId = "gbm_tcga_pub", projection = "SUMMARY")

cgdsr (Cases)

Notes

getCaseLists and getClinicalData

We obtain the first case_list_id in the cgds object from above and the corresponding clinical data for that case list (gbm_tcga_pub_all as the case list in this example).

clist1 <-
    getCaseLists.CGDS(cgds, cancerStudy = "gbm_tcga_pub")[1, "case_list_id"]

getClinicalData.CGDS(cgds, clist1)

Obtaining Clinical Data {.tabset .tabset-fade .tabset-pills}

cBioPortalData (Clinical)

All clinical data

Note that a sampleListId is not required when using the fetchAllClinicalDataInStudyUsingPOST internal endpoint. Data for all patients can be obtained using the clinicalData function.

clinicalData(cbio, "gbm_tcga_pub")

By sample data

You can use a different endpoint to obtain data for a single sample. First, obtain a single sampleId with the samplesInSampleLists function.

clist1 <- "gbm_tcga_pub_all"
samplist <- samplesInSampleLists(cbio, clist1)
onesample <- samplist[["gbm_tcga_pub_all"]][1]
onesample

Then we use the API endpoint to retrieve the data. Note that you would run httr::content on the output to extract the data.

cbio$getAllClinicalDataOfSampleInStudyUsingGET(
    sampleId = onesample, studyId = "gbm_tcga_pub"
)

cgdsr (Clinical)

Notes

getClinicalData

We query clinical data for the gbm_tcga_pub_expr_classical case list identifier which is part of the gbm_tcga_pub study.

getClinicalData.CGDS(x = cgds,
    caseList = "gbm_tcga_pub_expr_classical"
)

Clinical Data Summary

cgdsr allows you to obtain clinical data for a case list subset (54 cases with gbm_tcga_pub_expr_classical) and cBioPortalData provides clinical data for all 206 samples in gbm_tcga_pub using the clinicalData function.

You may be interested in other clinical data endpoints. For a list, use the searchOps function.

searchOps(cbio, "clinical")

Molecular or Genetic Profiles {.tabset .tabset-fade .tabset-pills}

cBioPortalData (molecularProfiles)

molecularProfiles(api = cbio, studyId = "gbm_tcga_pub")

Note that we want to pull the molecularProfileId column to use in other queries.

cgdsr (getGeneticProfiles)

getGeneticProfiles.CGDS(cgds, cancerStudy = "gbm_tcga_pub")

Genomic Profile Data for a set of genes {.tabset .tabset-fade .tabset-pills}

cBioPortalData (Indentify samples and genes)

Currently, some conversion is needed to directly use the molecularData function, if you only have Hugo symbols. First, convert to Entrez gene IDs and then obtain all the samples in the sample list of interest.

Convert hugoGeneSymbol to entrezGeneId

genetab <- queryGeneTable(cbio,
    by = "hugoGeneSymbol",
    genes = c("NF1", "TP53", "ABL1")
)
genetab
entrez <- genetab[["entrezGeneId"]]

Obtain all samples in study

allsamps <- samplesInSampleLists(cbio, "gbm_tcga_pub_all")

In the next section, we will show how to use the genes and sample identifiers to obtain the molecular profile data.

cgdsr (Profile Data)

The getProfileData function allows for straightforward retrieval of the molecular profile data with only a case list and genetic profile identifiers.

getProfileData.CGDS(x = cgds,
    genes = c("NF1", "TP53", "ABL1"),
    geneticProfiles = "gbm_tcga_pub_mrna",
    caseList = "gbm_tcga_pub_all"
)

Molecular Data with cBioPortalData

cBioPortalData provides a number of options for retrieving molecular profile data depending on the use case. Note that molecularData is mostly used internally and that the cBioPortalData function is the user-friendly method for downloading such data.

molecularData

We use the translated entrez identifiers from above.

molecularData(cbio, "gbm_tcga_pub_mrna",
    entrezGeneIds = entrez, sampleIds = unlist(allsamps))

getDataByGenes

The getDataByGenes function automatically figures out all the sample identifiers in the study and it allows Hugo and Entrez identifiers, as well as genePanelId inputs.

getDataByGenes(
    api =  cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_mrna"
)

cBioPortalData: the main end-user function

It is important to note that end users who wish to obtain the data as easily as possible should use the main cBioPortalData function:

gbm_pub <- cBioPortalData(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_mrna"
)

assay(gbm_pub[["gbm_tcga_pub_mrna"]])[, 1:4]

Mutation Data {.tabset .tabset-fade .tabset-pills}

cBioPortalData (mutationData)

Similar to molecularData, mutation data can be obtained with the mutationData function or the getDataByGenes function.

mutationData(
    api = cbio,
    molecularProfileIds = "gbm_tcga_pub_mutations",
    entrezGeneIds = entrez,
    sampleIds = unlist(allsamps)
)
getDataByGenes(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_mutations"
)

cgdsr (getMutationData)

getMutationData.CGDS(
    x = cgds,
    caseList = "getMutationData",
    geneticProfile = "gbm_tcga_pub_mutations",
    genes = c("NF1", "TP53", "ABL1")
)

Copy Number Alteration (CNA) {.tabset .tabset-fade .tabset-pills}

cBioPortalData (CNA)

Copy Number Alteration data can be obtained with the getDataByGenes function or by the main cBioPortal function.

getDataByGenes(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_cna_rae"
)
cBioPortalData(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_cna_rae"
)

cgdsr (CNA)

getProfileData.CGDS(
    x = cgds,
    genes = c("NF1", "TP53", "ABL1"),
    geneticProfiles = "gbm_tcga_pub_cna_rae",
    caseList = "gbm_tcga_pub_cna"
)

Methylation Data {.tabset .tabset-fade .tabset-pills}

cBioPortalData (Methylation)

Similar to Copy Number Alteration, Methylation can be obtained by getDataByGenes function or by 'cBioPortalData' function.

getDataByGenes(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_methylation_hm27"
)
cBioPortalData(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_methylation_hm27"
)

cgdsr (Methylation)

getProfileData.CGDS(
    x = cgds,
    genes = c("NF1", "TP53", "ABL1"),
    geneticProfiles = "gbm_tcga_pub_methylation_hm27",
    caseList = "gbm_tcga_pub_methylation_hm27"
)

sessionInfo

sessionInfo()


waldronlab/MultiAssayExperimentData documentation built on May 4, 2024, 2:29 p.m.