knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message=F,warning=F,
  fig.width = 8,
  fig.height = 4
)

Installation

install.packages("devtools")
devtools::install_github("AxelitoMartin/gnomeR")

Requirements

gnomeR requires the following packages - ComplexHeatmap, iClusterPlus, cluster (installed with gnomeR).

library(gnomeR)
library(knitr)
library(dplyr)
library(dtplyr)
library(tibble)
library(plotly)

Introduction {.tabset .tabset-fade .tabset-pills}

gnomeR is a R package that aims to process and analyze genetic data from cBioportal. We include in this package the mutation, copy number alteration (CNA), fusion and clin.patientsical information of all publicly available data from cBioPortal.

Mutations

as.tbl(mut) %>% select(Tumor_Sample_Barcode,Hugo_Symbol,Variant_Classification,Variant_Type,Reference_Allele,Tumor_Seq_Allele2)

CNA

as.tbl(cna[1:5,1:5])

Fusions

as.tbl(fusion) %>% select(Tumor_Sample_Barcode,Hugo_Symbol,Fusion)

Clinical

Patients information

as.tbl(head(clin.patients))

Samples information

as.tbl(head(clin.sample))

MAF processing

MAF files are the standard file format for mutation information. Each line represents a single mutation mapped to a sample, a particular gene and specific effect. All these fields are required in order to properly process the file. The IMPACT platform sequences a set of targeted oncogenic genes that are on cBioportal.

Creating a binary matrix

We can create a binary matrix of genetic events from the files described above. If a patient has a mutation in gene X the entry will be marked with a 1, otherwise it will be a 0. This function has the following arguments:

' of patients that were not sequenced on that plaform

This function will return a binary matrix of genetics events with patients as rows and columns as genes. Along with a list of patients that weren't found to have any events (if any).

patients <- as.character(unique(mut$Tumor_Sample_Barcode))[1:1000]
bin.mut <- binmat(patients = patients,maf = mut,mut.type = "SOMATIC",SNP.only = F,include.silent = F, spe.plat = T)
as.tbl(bin.mut)

Similarly including fusions and CNAs:

bin.mut <- binmat(patients = patients,maf = mut,mut.type = "SOMATIC",SNP.only = F,include.silent = F, fusion = fusion, cna = cna, spe.plat = T)
as.tbl(bin.mut)

Visualizing genetics

We include a function to visualize summaries of the mutations in a given cohort:

maf.summary(maf = mut %>% filter(Tumor_Sample_Barcode %in% patients),
            mut.type = "SOMATIC")

Correlating genetics with outcome

Binary outcome

The gen.tab function allows us to test for potential differences in genetic event frequencies using Fisher's exact test for unpaired data and the McNemar exact test for paired data. This function takes the following arguments:

outcome <- as.character(clin.sample$Sample.Type[match(patients,clin.sample$Sample.Identifier)])
gen.dat <- bin.mut
gen.tab(gen.dat = gen.dat,
        outcome = outcome,
        filter = 0.05,paired = F,cont = F,rank = T)

Continuous outcome

Similarly we show here an example with a simulated continuous outcome:

set.seed(1)
outcome <-  rnorm(n = nrow(gen.dat))
tab.out <- gen.tab(gen.dat = gen.dat,
        outcome = outcome,
        filter = 0.05,paired = F,cont = T,rank = T)
tab.out$fits
tab.out$vPlot

Time to event outcome

We further include uni.cox for univariate survival analysis if time to event data is available. This function takes as inputs:

surv.dat <- clin.patients %>%
  filter(X.Patient.Identifier %in% abbreviate(patients,strict = T, minlength = 9)) %>%
  select(X.Patient.Identifier,Overall.Survival..Months., Overall.Survival.Status) %>% 
  rename(DMPID = X.Patient.Identifier, time = Overall.Survival..Months.,status = Overall.Survival.Status) %>% 
  mutate(time = as.numeric(as.character(time)),
    status = ifelse(status == "LIVING",0,1)) %>%
    filter(!is.na(time))
X <- bin.mut[match(surv.dat$DMPID,abbreviate(rownames(bin.mut),strict = T, minlength = 9)),]
uni.cox(X = X, surv.dat = surv.dat,surv.formula = Surv(time,status)~.,filter = 0.05)

Advanced genetic visuals

OncoPrints

OncoPrints are a convenient way to study comutation patterns in our cohort through the plot_oncoPrint function. It takes as argument:

We show here an example with the most common genes.

gen.dat <- bin.mut[1:1000,names(sort(apply(bin.mut,2, sum),decreasing = T))[1:15]]
plot_oncoPrint(gen.dat)

Similarly we include here an example adding patients' clinical variables:

clin.patients.dat <- clin.patients[match(abbreviate(rownames(gen.dat),strict = T, minlength = 9),clin.patients$X.Patient.Identifier),] %>% 
  rename(DMPID = X.Patient.Identifier, Smoker = Smoking.History) %>% 
  select(DMPID, Sex,Smoker) %>% 
  filter(!is.na(DMPID)) %>%
  distinct(DMPID,.keep_all = TRUE)
gen.dat <- gen.dat[match(clin.patients.dat$DMPID,abbreviate(rownames(gen.dat),strict = T, minlength = 9)),]
clin.patients.dat <- clin.patients.dat %>%
  tibble::column_to_rownames('DMPID')
rownames(gen.dat) <- rownames(clin.patients.dat)
plot_oncoPrint(gen.dat = gen.dat,clin.dat = clin.patients.dat)

FACETs

FACETs is a fully integrated stand-alone pipeline that includes sequencing BAM file post-processing, joint segmentation of total- and allele-specific read counts, and integer copy number calls corrected for tumor purity, ploidy and clonal heterogeneity, with comprehensive output and integrated visualization. We integrate the output of this tool to our package to vizualise the copy number alteration events in our cohort. The segmentation file is now integrated to the cBioPortal and we include it in our package. The FACETs output can be vizualized using the facets.heatmap function which takes as input:

This function returns the a heatmap and the merged segmentation dataset used to created:

patients.seg <- as.character(unlist(clin.sample %>% filter(Sample.Identifier %in% patients, as.numeric(as.character(Tumor.Purity)) > 30) %>% select(Sample.Identifier)))
facet <- facets.heatmap(seg = seg, patients=patients.seg[0:100])
facet$p
as.tbl(facet$out.cn)


margarethannum/gnomeR documentation built on Feb. 26, 2020, 8:16 p.m.