knitr::opts_chunk$set(dpi = 300)
knitr::opts_chunk$set(cache = FALSE)

knitr::opts_chunk$set(fig.width = 6, fig.height = 6, width=8)
library(TCGAbiolinks)
library(MMRFBiolinks)
library(SummarizedExperiment)
library(dplyr)
library(DT)

Overview of MMRFBiolinks functions

MMRFBiolinks Vs TCGABiolinks
MMRFBiolinks relies on the usage of TCGABiolinks functions among which TCGAbiolinks::GDCquery function for searching MMRF-COMMPASS data available at GDC Data Portal. [TCGAbiolinks package](https://bioconductor.org/packages/TCGAbiolinks/) which should be consulted for more detailed information. Left side of the following Table shows MMRFBiolinks functions while right side shows the incorporated TCGABiolink functions, if any. If the right entry is empty, it means that the MMRFBiolinks function on the right entry is not provided byTCGABiolinks (i.e. it is a new function, not covered by TCGABiolinks package). In MMRFBiolinks, the function name allows to indentify the data source: the functions whose name contains 'GDC' deal with NCI-GDC dataset while those whose name contains 'RG' deal with MMRF Research Gateway (MMRF-RG) dataset (e.g MMRFGDC\_GetIdentifierByTherapy deal with NCI-GDC data, MMRFRG\_SurvivalKM deals with MMRF-RG data). | MMRFBiolinks | TCGABiolinks | |--------------------------------------|------------------------| | MMRFGDC\_QuerySummary | | | MMRFGDC\_ProjectSummary | | | MMRFGDC\_QueryClinic | GDCquery\_Clinic | | MMRFGDC\_prepare | GDC\_prepare | | MMRFGDC\_GetTherapyByIdentifier | | | MMRFRG\_GetBor | | | MMRFGDC\_QuerySamples | | | MMRFRG\_GetBorPlot | | | MMRFRG\_TimeBorPlot | | | MMRFRG\_TreatBorDurationPlot | | | MMRFRG\_SurvivalKM | | | MMRFRG\_VariantCountPlot | | | MMRFRG\_GetIDSamplebyVariant | |
Different sources for MMRF-COMMPASS data
MMRF-Compass data can be retrieved from two different sources:

- **GDC Data Portal**: MMRF-Commpass information about data available at GDC Data Portal can be retrieved through *MMRFGDC_QuerySummary* function. MMRF-Commpass data can be queried and downloaded using rispectively *TCGABiolinks::GDCquery* and *TCGABiolinks::GDCdownload* belonging to [TCGAbiolinks package](https://bioconductor.org/packages/TCGAbiolinks/).

- **MMRF-CoMMpass Researcher Gateway**: MMRF-CoMMpass data can be directly downloaded from MMRF-Commpass Researcher Gateway that has more information compared to GDC Data Portal data (e.g Best overall response). So the previous one is only a subset of this last. Once logged, the user can download data from MMRF-CoMMpass Researcher Gateway and import them as a dataframe into R environment for the further analysis.

Getting information and searching MMRF-COMMPASS data from GDC Data Portal

MMRFqueryGDC_Summary

Data Search

MMRFqueryGDC\_Summary allows to get information (including the number of cases) about query obteined from GDCquery function belonging to TCGABiolinks package.

query.mm <- GDCquery(project = "MMRF-COMMPASS", 
                     data.category = "Transcriptome Profiling",
                     data.type = "Gene Expression Quantification",
                     workflow.type = "HTSeq - FPKM")
# Download 
GDCdownload(query.mm)
summary<-MMRFGDC_QuerySummary(query.mm)

# Only first 100 to make faster
datatable(summary, rownames = TRUE)

TCGABiolinks::GDCquery arguments for MMRF-CoMMpass Project

The useful arguments for searching MMRF-CoMMpass Project data are:

Arguments | Description -----|----- Data.category| A valid data category in the list below Data.type| A data type to filter the files to download Workflow.type| GDC workflow type Access| Filter by access type. Possible values: controlled experimental_strategy| Filter to experimental strategy barcode| A list of barcodes to filter the files to download sample.type| A sample type to filter the files to download

The arguments options for filtering MMRF-COMMPASS data are listed below:

| Data.category | Data.type | Workflow Type | Access | experimental_strategy
|----------------------------- | ----------------------------------- | ------------------------------- | -------------------- |------------------------ | | Transcriptome Profiling | Gene Expression Quantification | HTSeq - Counts | Open / Controlled | RNA-Seq |
| | | HTSeq - FPKM-UQ | Open / Controlled | RNA-Seq |
| | | HTSeq - FPKM | Open / Controlled | RNA-Seq |
| | | STAR - Counts | Open / Controlled | RNA-Seq | | | Splice Junction Quantification | STAR - Counts | Open / Controlled | RNA-Seq |
| Simple Nucleotide Variation | Raw Simple Somatic Mutation | MuSE | Controlled | WXS | | | | SomaticSniper | Controlled | WXS | | | | VarScan2 | Controlled | WXS | | | | Pindel | Controlled | WXS | | | | MuTect2 | Controlled | WXS | | | Annotated Somatic Mutation | MuSE Annotation | Controlled | WXS | | | | VarScan2 Annotation | Controlled | WXS | | | | Pindel Annotation | Controlled | WXS | | | | MuTect2 Annotation | Controlled | WXS | | | | SomaticSniper Annotation | Controlled | WXS | | Sequencing Reads | Aligned Reads | BWA with Mark Duplicates and BQSR | Controlled | WXS / RNA-Seq / WGS | | | | STAR 2-Pass Genome | Controlled | WXS / RNA-Seq / WGS | | | | STAR 2-Pass Transcriptome | Controlled | WXS / RNA-Seq / WGS |

Below we provide further details about Sample.type an Barcode arguments:

Sample.type argument

The options for the sample.type field in MMRF-COMPASS Project are:

sample_type.code|sample_type.def -----|----- TRBM|Recurrent Blood Derived Cancer - Bone Marrow TBM|Primary Blood Derived Cancer - Bone Marrow NB|Blood Derived Normal TB|Primary Blood Derived Cancer - Peripheral Blood TRB|Recurrent Blood Derived Cancer - Peripheral Blood

Example:

#library(DT)
query.mm<-GDCquery(project = "MMRF-COMMPASS",
                   data.category = "Transcriptome Profiling",
                   data.type = "Gene Expression Quantification",
                   workflow.type="HTSeq - FPKM",
                   sample.type="Primary Blood Derived Cancer - Peripheral Blood")
getResults(query.mm, cols = c("cases.submitter_id","sample_type","cases")) %>% DT::datatable(options = list(scrollX = TRUE, keys = TRUE))

Barcode argument

Example:

library(DT)
query.mm<-GDCquery(project = "MMRF-COMMPASS",
                   data.category = "Transcriptome Profiling",
                   data.type = "Gene Expression Quantification",
                   workflow.type="HTSeq - FPKM",
                   barcode = c("MMRF_2473","MMRF_2111",
                                "MMRF_2362","MMRF_1824",
                                "MMRF_1458","MRF_1361",
                                "MMRF_2203","MMRF_2762",
                                "MMRF_2680","MMRF_1797"))
getResults(query.mm, cols = c("cases.submitter_id","sample_type","cases")) %>% datatable(options = list(scrollX = TRUE, keys = TRUE))

Getting MMRF-CoMMpass data from MMRF Research Gateway

Once logged in to the MMRF Research Gateway Web Portal, you can download dataset you are interested in and import it as a dataframe into R environment for the further analysis. Note:actually, files containing the data used by MMRFBiolinks functions are:

File name | Description -----|----- MMRF_CoMMpass_IA14_STAND_ALONE_TRTRESP.csv|containing the data about the response to treatment MMRF_CoMMpass_IA14a_All_Canonical_Variants.txt|containing the data about variants MMRF_CoMMpass_IA14_PER_PATIENT.csv|containing the data about (eg. age, sex, date of death, date of the last follow up)



marziasettino/MMRFBiolinks documentation built on Jan. 24, 2023, 4:49 a.m.