knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message=F,warning=F, fig.width = 8, fig.height = 4 )
install.packages("devtools") devtools::install_github("AxelitoMartin/gnomeR")
gnomeR requires the following packages - ComplexHeatmap
, iClusterPlus
, cluster
(installed with gnomeR).
library(gnomeR)
library(knitr) library(dplyr) library(dtplyr) library(tibble) library(plotly)
gnomeR is a R package that aims to process and analyze genetic data from cBioportal. We include in this package the mutation, copy number alteration (CNA), fusion and clin.patientsical information of all publicly available data from cBioPortal.
as.tbl(mut) %>% select(Tumor_Sample_Barcode,Hugo_Symbol,Variant_Classification,Variant_Type,Reference_Allele,Tumor_Seq_Allele2)
as.tbl(cna[1:5,1:5])
as.tbl(fusion) %>% select(Tumor_Sample_Barcode,Hugo_Symbol,Fusion)
as.tbl(head(clin.patients))
as.tbl(head(clin.sample))
MAF files are the standard file format for mutation information. Each line represents a single mutation mapped to a sample, a particular gene and specific effect. All these fields are required in order to properly process the file. The IMPACT platform sequences a set of targeted oncogenic genes that are on cBioportal.
We can create a binary matrix of genetic events from the files described above. If a patient has a mutation in gene X the entry will be marked with a 1, otherwise it will be a 0. This function has the following arguments:
This function will return a binary matrix of genetics events with patients as rows and columns as genes. Along with a list of patients that weren't found to have any events (if any).
patients <- as.character(unique(mut$Tumor_Sample_Barcode))[1:1000] bin.mut <- binmat(patients = patients,maf = mut,mut.type = "SOMATIC",SNP.only = F,include.silent = F, spe.plat = T) as.tbl(bin.mut)
Similarly including fusions and CNAs:
bin.mut <- binmat(patients = patients,maf = mut,mut.type = "SOMATIC",SNP.only = F,include.silent = F, fusion = fusion, cna = cna, spe.plat = T) as.tbl(bin.mut)
We include a function to visualize summaries of the mutations in a given cohort:
maf.summary(maf = mut %>% filter(Tumor_Sample_Barcode %in% patients), mut.type = "SOMATIC")
The gen.tab
function allows us to test for potential differences in genetic event frequencies using Fisher's exact test for unpaired data and the McNemar exact test for paired data. This function takes the following arguments:
outcome <- as.character(clin.sample$Sample.Type[match(patients,clin.sample$Sample.Identifier)]) gen.dat <- bin.mut gen.tab(gen.dat = gen.dat, outcome = outcome, filter = 0.05,paired = F,cont = F,rank = T)
Similarly we show here an example with a simulated continuous outcome:
set.seed(1) outcome <- rnorm(n = nrow(gen.dat)) tab.out <- gen.tab(gen.dat = gen.dat, outcome = outcome, filter = 0.05,paired = F,cont = T,rank = T) tab.out$fits tab.out$vPlot
We further include uni.cox
for univariate survival analysis if time to event data is available. This function takes as inputs:
Surv(time,status)~.
. Note that delayed entry is allowed of the form Surv(time1,time2,status)~.
surv.dat <- clin.patients %>% filter(X.Patient.Identifier %in% abbreviate(patients,strict = T, minlength = 9)) %>% select(X.Patient.Identifier,Overall.Survival..Months., Overall.Survival.Status) %>% rename(DMPID = X.Patient.Identifier, time = Overall.Survival..Months.,status = Overall.Survival.Status) %>% mutate(time = as.numeric(as.character(time)), status = ifelse(status == "LIVING",0,1)) %>% filter(!is.na(time)) X <- bin.mut[match(surv.dat$DMPID,abbreviate(rownames(bin.mut),strict = T, minlength = 9)),] uni.cox(X = X, surv.dat = surv.dat,surv.formula = Surv(time,status)~.,filter = 0.05)
OncoPrints are a convenient way to study comutation patterns in our cohort through the plot_oncoPrint
function. It takes as argument:
binmat
functionWe show here an example with the most common genes.
gen.dat <- bin.mut[1:1000,names(sort(apply(bin.mut,2, sum),decreasing = T))[1:15]] plot_oncoPrint(gen.dat)
Similarly we include here an example adding patients' clinical variables:
clin.patients.dat <- clin.patients[match(abbreviate(rownames(gen.dat),strict = T, minlength = 9),clin.patients$X.Patient.Identifier),] %>% rename(DMPID = X.Patient.Identifier, Smoker = Smoking.History) %>% select(DMPID, Sex,Smoker) %>% filter(!is.na(DMPID)) %>% distinct(DMPID,.keep_all = TRUE) gen.dat <- gen.dat[match(clin.patients.dat$DMPID,abbreviate(rownames(gen.dat),strict = T, minlength = 9)),] clin.patients.dat <- clin.patients.dat %>% tibble::column_to_rownames('DMPID') rownames(gen.dat) <- rownames(clin.patients.dat) plot_oncoPrint(gen.dat = gen.dat,clin.dat = clin.patients.dat)
FACETs is a fully integrated stand-alone pipeline that includes sequencing BAM file post-processing, joint segmentation of total- and allele-specific read counts, and integer copy number calls corrected for tumor purity, ploidy and clonal heterogeneity, with comprehensive output and integrated visualization. We integrate the output of this tool to our package to vizualise the copy number alteration events in our cohort.
The segmentation file is now integrated to the cBioPortal and we include it in our package. The FACETs output can be vizualized using the facets.heatmap
function which takes as input:
This function returns the a heatmap and the merged segmentation dataset used to created:
patients.seg <- as.character(unlist(clin.sample %>% filter(Sample.Identifier %in% patients, as.numeric(as.character(Tumor.Purity)) > 30) %>% select(Sample.Identifier))) facet <- facets.heatmap(seg = seg, patients=patients.seg[0:100]) facet$p as.tbl(facet$out.cn)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.