BiocStyle::markdown()
knitr::opts_chunk$set(echo = TRUE, dev = "png")
library("BAGEL")

Introduction

BAGEL Mutational Signature TookKit discovers novel signatures and predicts sample exposure to known signatures across multiple motif classes including single base substitutions (SBS), double base substitutions (DBS), insertions (INS) and deletions (DEL) and SBS with replication strand and SBS with transcription strand. BAGEL also plots signatures and sample exposures along with advanced downstream analysis including UMAP.

Installation

Currently BAGEL can be installed from Github; in the future it will be available on Bioconductor. To install from Github directly please use the code below: library(devtools) install_github("campbio/BAGEL")

Setting up a bagel object

In order to discover or predict, we must first set up our BAGEL object by 1) extracting variants, 2) selecting a genome 3) creating our BAGEL object 4) building a counts table for our BAGEL (SBS, DBS, INS, DEL, etc.)

Reproducibility note

Many functions in BAGEL make use of stochastic algorithms or procedures which require the use of random number generator (RNG) for simulation or sampling. To maintain reproducibility, all these functions use a default seed of 1 to make sure same results are generated each time one of these functions is called. Explicitly setting the seed arguments is needed for greater control and randomness. These functions include discover_signatures, predict_exposure, and create_umap.

Extracting variants

Variants can be extracted from VCFs and MAFs, and R objects such as data.frames and VCF and MAF representation via VariantAnnotation and maftools.

# Extract variants from a MAF File
lusc_maf <- system.file("testdata", "public_TCGA.LUSC.maf", package = "BAGEL") 
lusc.variants <- extract_variants_from_maf_file(maf_file = lusc_maf)

# Extract variants from an individual VCF file
luad_vcf <- system.file("testdata", "public_LUAD_TCGA-97-7938.vcf", 
                         package = "BAGEL")
luad.variants <- extract_variants_from_vcf_file(vcf_file = luad_vcf)

# Extract variants from multiple files and/or objects
melanoma_vcfs <- list.files(system.file("testdata", package = "BAGEL"), 
                           pattern = glob2rx("*SKCM*vcf"), full.names = TRUE)
variants <- extract_variants(c(lusc_maf, luad_vcf, melanoma_vcfs), verbose = TRUE)

Choosing genome

BAGEL uses BSgenome objects to access genomic information such as reference bases for count tables. The BSgenome represents and stores full genome sequeces for different organisms. To make selection easy we provide an accessor method for human genome build versions 38, and 19 (hg38, hg19), but many other less common builds and organisms are available via BSgenome as well.

g = select_genome("hg38")

Create bagel object

bagel = create_bagel(x = variants, genome = g)

Create mutation count tables

Motifs are the building blocks of mutational signatures. Motifs themselves are a mutation combined with other genomic information. For instance, SBS96 motifs are constructed from an SBS mutation and one upsteam and one downstream base sandwiched together. We build tables by counting these motifs for each sample.

build_standard_table(bagel, g, "SBS96")

Discover Signatures/Exposures

Discovery and prediction result are loaded into a self-contained result object that includes signatures and sample exposures.

result = discover_signatures(bagel = bagel, table_name = "SBS96", 
                              num_signatures = 4, method = "lda", nstart = 10, 
                             seed = 1)

Plotting

Signatures

##Plot results
plot_signatures(result)

# We can name signatures based on prior knowledge
name_signatures(result, c("unknown", "smoking", "apobec", "uv"))

Exposures

plot_exposures(result, proportional = TRUE)
plot_exposures(result, proportional = FALSE)
plot_sample_counts(bagel, "SBS96", get_sample_names(bagel)[1])

Comparison to external signatures (e.g. COSMIC)

##Compare to COSMIC signatures by leaving the second result as default
compare_cosmic_v2(result, threshold = 0.78)

Predicting exposures using existing signatures

#List which signatures correspond to subtypes including "lung"
cosmic_v2_subtype_map("lung")

#Calculate posterior based on COSMIC signatures 4, 11, 12, 15
cosmic_post = predict_exposure(bagel = bagel, g, "SBS96", signature_res = 
                                 cosmic_v2_sigs, signatures_to_use = 
                                 c(12, 4, 11, 15), algorithm = "lda")

#Calculate posterior based on our novel signatures
our_sigs_post = predict_exposure(bagel = bagel, g, "SBS96", signature_res = 
                                   result, algorithm = "lda")

#Plot results from posterior calculation
plot_signatures(cosmic_post)
plot_exposures(cosmic_post, proportional = TRUE)

plot_signatures(our_sigs_post)
plot_exposures(our_sigs_post, proportional = TRUE)

#Compare posterior results to each other
compare_results(result = cosmic_post, other_result = our_sigs_post, 
                threshold = 0.60)

Use of sample annotations for advanced sample comparisons

Adding annotations

sample_annotations <- read.table(system.file("testdata", 
                                             "sample_annotations.txt", 
                                             package = "BAGEL"), sep = "\t", 
                                 header=TRUE)
init_sample_annotations(bagel)
add_sample_annotations(bay = bagel, annotations = sample_annotations, 
                       sample_column = "Sample_Names", 
                       columns_to_add = "Tumor_Subtypes")

Standard discovery using BAGEL with sample annotations

#Add tumor type annotations to our samples
res <- discover_signatures(bagel, table_name = "SBS96", num_signatures = 3, 
                           seed = 1)

plot_exposures_by_annotation(res, annotation = "Tumor_Subtypes")
plot_exposures_by_annotation(res, annotation = "Tumor_Subtypes", 
                             proportional = FALSE)
plot_exposures_by_annotation(res, annotation = "Tumor_Subtypes", 
                             by_group = FALSE)

Other functions

#Display what samples were added
get_sample_names(bagel)

Session Information

sessionInfo()


campbio/BAGEL documentation built on Oct. 6, 2020, 3:59 a.m.