In margarethannum/gnomeR: What the Package Does (One Line, Title Case)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message=F,warning=F,
  fig.width = 8,
  fig.height = 4
)

Installation

install.packages("devtools")
devtools::install_github("AxelitoMartin/gnomeR")

Requirements

gnomeR requires the following packages - ComplexHeatmap, iClusterPlus, cluster (installed with gnomeR).

library(gnomeR)

library(knitr)
library(dplyr)
library(dtplyr)
library(tibble)
library(plotly)

Introduction {.tabset .tabset-fade .tabset-pills}

gnomeR is a R package that aims to process and analyze genetic data from cBioportal. We include in this package the mutation, copy number alteration (CNA), fusion and clin.patientsical information of all publicly available data from cBioPortal.

Mutations

as.tbl(mut) %>% select(Tumor_Sample_Barcode,Hugo_Symbol,Variant_Classification,Variant_Type,Reference_Allele,Tumor_Seq_Allele2)

CNA

as.tbl(cna[1:5,1:5])

Fusions

as.tbl(fusion) %>% select(Tumor_Sample_Barcode,Hugo_Symbol,Fusion)

Clinical

Patients information

as.tbl(head(clin.patients))

Samples information

as.tbl(head(clin.sample))

MAF processing

MAF files are the standard file format for mutation information. Each line represents a single mutation mapped to a sample, a particular gene and specific effect. All these fields are required in order to properly process the file. The IMPACT platform sequences a set of targeted oncogenic genes that are on cBioportal.

Creating a binary matrix

We can create a binary matrix of genetic events from the files described above. If a patient has a mutation in gene X the entry will be marked with a 1, otherwise it will be a 0. This function has the following arguments:

patients : a character vector of sample IDs to be matched in the MAF file. Default is NULL in which case all samples will be used.
maf : the corresponding MAF file (required)
mut.type : the mutation type to be used, options are "SOMATIC", "GERMLINE" or "ALL". Default is "SOMATIC"
SNP.only : boolean specifying if only SNPs should be used. Default is F
include.silent : boolean specifying if silent mutations should be included. Default is F (recommended)
fusion : optional fusion file to be included. Fusions will be added as their own column with ".fus" suffix
cna : optional copy number alteration file to be included. CNAs will be added as their column with ".Amp" suffix for amplifications and ".Del" suffix for deletions
cna.relax : boolean specifying if shallow deletions and low level gains should be counted as deletions and amplifications respectively
spe.plat : boolean specifying if specific IMPACT platforms should be considered. When TRUE NAs will fill the cells for genes

' of patients that were not sequenced on that plaform

This function will return a binary matrix of genetics events with patients as rows and columns as genes. Along with a list of patients that weren't found to have any events (if any).

patients <- as.character(unique(mut$Tumor_Sample_Barcode))[1:1000]
bin.mut <- binmat(patients = patients,maf = mut,mut.type = "SOMATIC",SNP.only = F,include.silent = F, spe.plat = T)
as.tbl(bin.mut)

Similarly including fusions and CNAs:

bin.mut <- binmat(patients = patients,maf = mut,mut.type = "SOMATIC",SNP.only = F,include.silent = F, fusion = fusion, cna = cna, spe.plat = T)
as.tbl(bin.mut)

Visualizing genetics

We include a function to visualize summaries of the mutations in a given cohort:

maf.summary(maf = mut %>% filter(Tumor_Sample_Barcode %in% patients),
            mut.type = "SOMATIC")

Correlating genetics with outcome

Binary outcome

The gen.tab function allows us to test for potential differences in genetic event frequencies using Fisher's exact test for unpaired data and the McNemar exact test for paired data. This function takes the following arguments:

gen.dat : a binary matrix of genetic events with patients as rows and columns as genes
outcome : a binary vector corresponding to the patients in the gen.dat argument
filter : a value in [0,1) that will remove all genes that have a frequency below that threshold. Default is 0, including all
paired : boolean specifying is the observations are paired. Default is false
cont : boolean specifying if the outcome is continuous. Default is false (see following section for an example)
rank : boolean specifying if the output table should be ranked by pvalue

outcome <- as.character(clin.sample$Sample.Type[match(patients,clin.sample$Sample.Identifier)])
gen.dat <- bin.mut
gen.tab(gen.dat = gen.dat,
        outcome = outcome,
        filter = 0.05,paired = F,cont = F,rank = T)

Continuous outcome

Similarly we show here an example with a simulated continuous outcome:

set.seed(1)
outcome <-  rnorm(n = nrow(gen.dat))
tab.out <- gen.tab(gen.dat = gen.dat,
        outcome = outcome,
        filter = 0.05,paired = F,cont = T,rank = T)
tab.out$fits
tab.out$vPlot

Time to event outcome

We further include uni.cox for univariate survival analysis if time to event data is available. This function takes as inputs:

X : a data frame containing the covariates to be used
surv.dat : a data frame containing the time and status information
surv.formula : survival formula of the form Surv(time,status)~. . Note that delayed entry is allowed of the form Surv(time1,time2,status)~.
filter : a value in [0,1) that will remove all genes that have a frequency below that threshold. Default is 0, including all

surv.dat <- clin.patients %>%
  filter(X.Patient.Identifier %in% abbreviate(patients,strict = T, minlength = 9)) %>%
  select(X.Patient.Identifier,Overall.Survival..Months., Overall.Survival.Status) %>% 
  rename(DMPID = X.Patient.Identifier, time = Overall.Survival..Months.,status = Overall.Survival.Status) %>% 
  mutate(time = as.numeric(as.character(time)),
    status = ifelse(status == "LIVING",0,1)) %>%
    filter(!is.na(time))
X <- bin.mut[match(surv.dat$DMPID,abbreviate(rownames(bin.mut),strict = T, minlength = 9)),]
uni.cox(X = X, surv.dat = surv.dat,surv.formula = Surv(time,status)~.,filter = 0.05)

Advanced genetic visuals

OncoPrints

OncoPrints are a convenient way to study comutation patterns in our cohort through the plot_oncoPrint function. It takes as argument:

gen.dat : data frame of binary genetic events as created by the binmat function
clin.patients.dat : optional data frame of clin.patientsical covariates
ordered : order in which patients should be printed in the OncoPrint

We show here an example with the most common genes.

gen.dat <- bin.mut[1:1000,names(sort(apply(bin.mut,2, sum),decreasing = T))[1:15]]
plot_oncoPrint(gen.dat)

Similarly we include here an example adding patients' clinical variables:

clin.patients.dat <- clin.patients[match(abbreviate(rownames(gen.dat),strict = T, minlength = 9),clin.patients$X.Patient.Identifier),] %>% 
  rename(DMPID = X.Patient.Identifier, Smoker = Smoking.History) %>% 
  select(DMPID, Sex,Smoker) %>% 
  filter(!is.na(DMPID)) %>%
  distinct(DMPID,.keep_all = TRUE)
gen.dat <- gen.dat[match(clin.patients.dat$DMPID,abbreviate(rownames(gen.dat),strict = T, minlength = 9)),]
clin.patients.dat <- clin.patients.dat %>%
  tibble::column_to_rownames('DMPID')
rownames(gen.dat) <- rownames(clin.patients.dat)
plot_oncoPrint(gen.dat = gen.dat,clin.dat = clin.patients.dat)

FACETs

FACETs is a fully integrated stand-alone pipeline that includes sequencing BAM file post-processing, joint segmentation of total- and allele-specific read counts, and integer copy number calls corrected for tumor purity, ploidy and clonal heterogeneity, with comprehensive output and integrated visualization. We integrate the output of this tool to our package to vizualise the copy number alteration events in our cohort. The segmentation file is now integrated to the cBioPortal and we include it in our package. The FACETs output can be vizualized using the facets.heatmap function which takes as input:

seg : a segmentation file containing all the samples of interest
filenames : character vector of file names in case you are given individual segmentation files for each patients
path : the path to the files of the filenames character vector
patients : DMPIDs of the samples to be included in the heatmap. Default is NULL where all patients will be used
min.purity : if filenames argument is provided you may select a minimum purity to be included in the heatmap. Default is 0.3
epsilon : the maximum Euclidean distance between adjacent probes tolerated for denying a nonredundant region. epsilon=0 is equivalent to taking the union of all unique break points across the n samples. See CNregions function iClusterPlus for more information. Default is 0.005
ordered : order in which patients should be printed in the heatmap. Default is NULL, where we use hierarchical clustering.
outcome : potential outcome of interest to be plotted along the X-axis

This function returns the a heatmap and the merged segmentation dataset used to created:

patients.seg <- as.character(unlist(clin.sample %>% filter(Sample.Identifier %in% patients, as.numeric(as.character(Tumor.Purity)) > 30) %>% select(Sample.Identifier)))
facet <- facets.heatmap(seg = seg, patients=patients.seg[0:100])
facet$p
as.tbl(facet$out.cn)

margarethannum/gnomeR documentation built on Feb. 26, 2020, 8:16 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

margarethannum/gnomeR
What the Package Does (One Line, Title Case)

In margarethannum/gnomeR: What the Package Does (One Line, Title Case)

Installation

Requirements

Introduction {.tabset .tabset-fade .tabset-pills}

Mutations

CNA

Fusions

Clinical

Patients information

Samples information

MAF processing

Creating a binary matrix

' of patients that were not sequenced on that plaform

Visualizing genetics

Correlating genetics with outcome

Binary outcome

Continuous outcome

Time to event outcome

Advanced genetic visuals

OncoPrints

FACETs

R Package Documentation

Browse R Packages

We want your feedback!

margarethannum/gnomeR What the Package Does (One Line, Title Case)

In margarethannum/gnomeR: What the Package Does (One Line, Title Case)

Installation

Requirements

Introduction {.tabset .tabset-fade .tabset-pills}

Mutations

CNA

Fusions

Clinical

Patients information

Samples information

MAF processing

Creating a binary matrix

' of patients that were not sequenced on that plaform

Visualizing genetics

Correlating genetics with outcome

Binary outcome

Continuous outcome

Time to event outcome

Advanced genetic visuals

OncoPrints

FACETs

R Package Documentation

Browse R Packages

We want your feedback!

margarethannum/gnomeR
What the Package Does (One Line, Title Case)