In qingjian1991/MPTevol: Clonal Evolutionary History and Metastatic Routines Analysis for Multiple Primary Tumors

knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>",
  cache = TRUE
)

1. Introduction

1.1 Brief introduction

Multiple primary tumors (MPT) is a special and rare cancer type, defined as more than two primary tumors presenting at the diagnosis in a single patient. The molecular characteristics and tumorigenesis of MPT remain unclear due to insufficient approaches.

Here, we present MPTevol, a practical computational framework for comprehensively exploring the MPT from multiregional sequencing (MRS) experiments. MPTevol facilitates comparison genomic profiles across multiple primary tumor samples, detection of clonal evolutionary history and metastatic routines in MPT, and quantification of metastatic history. This package incorporates multiple cancer evolution analyses, for a one-stop solution of MPT analysis.

1.2 Packages Structure.

Blue circles: the input data;

Green circles: the functions of MPTevol;

Purple circles: the functions inherited from MesKit.

1.3 Installation

You can install the development version of MPTevol from GitHub with:

# install.packages("remotes")
remotes::install_github("qingjian1991/MPTevol")

1.4 Citation

If you are using MPTevol in academic research, please cite the following paper:

Chen, Q., Wu, Q.-N., Rong, Y.-M., Wang, S., Zuo, Z., Bai, L., . . . Zhao, Q. (2022). Deciphering clonal dynamics and metastatic routines in a rare patient of synchronous triple-primary tumors and multiple metastases with MPTevol. Briefings in Bioinformatics. doi:10.1093/bib/bbac175

2. Input data format

MPTevol takes the SNVs and CNVs information as the input. The format is compatible with MesKit.

To analyze with MPTevol, you need to provide:

For mutation data

A MAF file of multi-region samples from patients. (*.maf / *.maf.gz). Required
A clinical file contains the clinical data of tumor samples from each patient. Required
Cancer cell fraction (CCF) data of somatic mutations. Optional but recommended

For CNA data

A segmentation file. Required
GISTIC analysis outputs. Optional

Note: Tumor_Sample_Barcode should be consistent in all input files.

2.1 MAF file

Mutation Annotation Format (MAF) files are tab-delimited text files with aggregated mutations information from VCF Files. The input MAF file could be gzip compressed, and allowed values of Variant_Classificationcolumn can be found at Mutation Annotation Format Page.

The following fields are required to be contained in the MAF file:

Hugo_Symbol, Chromosome, Start_Position, End_Position, Variant_Classification, Variant_Type, Reference_Allele, Tumor_Seq_Allele2, Ref_allele_depth, Alt_allele_depth, VAF, Tumor_Sample_Barcode

Note:

The Tumor_Sample_Barcode of each sample should be unique.
The VAF (variant allele frequencies) ranges from 0-1 or 0-100.

Example MAF file

MAFtable <- read.table(system.file("extdata", "CRC_HZ.maf", package = "MesKit"), header = TRUE)
extractLines <- rbind(MAFtable[1, ], MAFtable[6600, ])
extractLines <- rbind(extractLines, MAFtable[15000, ])
data.frame(extractLines, row.names = NULL)

2.2 Clinical data file

Clinical data file contains clinical information about each patient and their tumor samples, and mandatory fields are Tumor_Sample_Barcode, Tumor_ID, Patient_ID, and Tumor_Sample_Label.

Example clinical data file

ClinInfo <- read.table(system.file("extdata", "CRC_HZ.clin.txt", package = "MesKit"), header = TRUE)
ClinInfo[1:5, ]

2.3 CCF file

By default, there are six mandatory fields in input CCF file: Patient_ID, Tumor_Sample_Barcode, Chromosome, Start_Position, CCF and CCF_Std/CCF_CI_High (required when identifying clonal/subclonal mutations). The Chromosome field of your MAF file and CCF file should be in the same format (both in number or both start with "chr"). Notably, Reference_Allele and Tumor_Seq_Allele2 are also required if you want to include INDELs in the CCF file.

Optionally, the cluster field can be included in this file. cluster denotes the mutation cluster IDs.

Example CCF file

ccfInfo <- read.table(system.file("extdata", "CRC_HZ.ccf.tsv", package = "MesKit"), header = TRUE)
ccfInfo[1:5, ]

2.4 Segmentation file

The segmentation file is a tab-delimited file with the following columns:

Patient_ID - patient ID
Tumor_Sample_Barcode - tumor sample barcode
Chromosome - chromosome name or ID
Start_Position - genomic start position of segments (1-indexed)
End_Position - genomic end position of segments (1-indexed)
SegmentMean/CopyNumber - segment mean value or absolute integer copy number
Minor_CN - copy number of minor allele
Major_CN - copy number of major allele
Tumor_Sample_Label - the specific label of each tumor sample.

Note: Positions are in base pair units. By default, the Minor_CN and Major_CN fields are optional for MesKit, but are required for MPTevol.

Example Segmentation file

segInfo <- read.table(system.file("extdata", "CRC_HZ.seg.txt", package = "MesKit"), header = TRUE)
segInfo[1:5, ]

3. Start with the MAF object

readMaf function creates Maf/MafList objects by reading MAF files, clinical files and cancer cell fraction (CCF) data (optional but recommended). Parameter refBuild is used to set reference genome version for Homo sapiens reference ("hg18", "hg19" or "hg38"). You should set use.indel.ccf = TRUE when your ccfFile contains INDELs apart from SNVs.

library(tidyverse)
library(MesKit)
library(MPTevol)

Example data contains a rare patients with three primary tumors, including a Breast cancer (BRCA), a colorectal cancer(READ) and a lung cancer(LNET) were collected with Multi-region sequencing (MRS) for each primary tumors. Three Metastatic tumors, most of which are from READ, are also sequenced, including OvaryLM, OvaryRM and UterusM.

#For split data, the tumors are divided according to their histological types.
data.type <- "split"

maf <- readMaf(
  mafFile = system.file(package = "MPTevol", "extdata", sprintf("meskit.%s.mutation.txt", data.type)),
  ccfFile = system.file(package = "MPTevol", "extdata", sprintf("meskit.%s.CCF.txt", data.type)),
  clinicalFile = system.file(package = "MPTevol", "extdata", sprintf("meskit.%s.clinical.txt", data.type)),
  refBuild = "hg19",
  ccf.conf.level = 0.95
)

4. Genomic compariosn of multiple primary and metastatic tumors

4.1 Driver mutation landscapes

In order to explore the genomic alterations during cancer progression with multi-region sequencing approach, we provided classifyMut() function to categorize mutations. The classification is based on shared pattern or clonal status (CCF data is required) of mutations, which can be specified by class option. Additionally, option classByTumor can be used to reveal the mutational profile within tumors.

driverGene <- read.delim(system.file(package = "MPTevol", "extdata", "IntOGen-Drivers-Cancer_Genes.tsv"), header = T) %>%
  filter(CANCER_TYPE %in% c("BRCA", "COREAD", "LUAD", "LUSC")) %>%
  pull(SYMBOL) %>%
  unique()

mut.class <- classifyMut(maf, class = "SP", patient.id = "BRCA")
head(mut.class)

The MesKit plotMutProfile function can visualize the mutational profile of tumor samples.

plotMutProfile(maf, class = "SP", geneList = driverGene, use.tumorSampleLabel = TRUE, removeEmptyCols = FALSE)

From the driver mutational landscape, the three primary tumors, BRCA, LNET and READ, in this MPT case were characterized by distinct driver genes, supporting they were derived from different tumor ancestors. Two metastatic tumors, OvaryLM and OvaryRM were the descendants of READ. Whereas only 2 out 7 samples in UterusM (UterusM_1 and UterusM_3 ) were derived from READ.

4.2 Copy Number Alterations Profile.

4.2.1 Plot CNAs Profile

The plotCNA function can characterize the CNA landscape across samples based on copy number data from segmentation algorithms.

```{R CNAs, fig.align='left', fig.height=6.5, fig.width=11, message=FALSE }

segCN = system.file( "extdata", "meskit.sequenza.CNAs.txt" ,package = "MPTevol")

seg = readSegment(segFile = segCN)

Using the default plot.

plotCNA(seg, chrSilent = "X")

```r


# define the new columns
# 0-6 copy numbers, add cnLOH, LOH >=2 copys
Copy_cutoff = function(seg){
  seg %>%
    mutate(CopyNumber1 = ifelse(CopyNumber >5, 6, CopyNumber) ) %>%
    mutate(CopyNumber1 = ifelse(CopyNumber == 2 & Minor_CN == 0, "cnLOH",  CopyNumber1)) 
}

seg1 = lapply(seg, Copy_cutoff)

plotCNA(seg1,
        Type.name = "CopyNumber1",
        Type.colors = setNames(
           c("#7D8BCD", "#B1B9E7", "#F6F7F7", "#E4A8B5", "#CB7185","#B03D5E","#99143C", "#91BAA7"), 
           nm = c(seq(0,6), "cnLOH")
        ),
        showRownames = TRUE,
        rect.patients.size = 0,
        chrSilent = "X"

)

#Plot Minor_CNVs
plotCNA(seg1,
        Type.name = "Minor_CN",
        Type.colors = setNames(
           c("#7D8BCD", "#F6F7F7", "#E4A8B5", "#CB7185","#B03D5E"), 
           nm = 0:4
        ),
        showRownames = TRUE,
        rect.patients.size = 0,
        chrSilent = "X"
)

4.3 Intratumor Heterogeneity (ITH)

To quantify the genetic divergence of ITH between regions or tumors, we introduced two classical metrics derived from population genetics, which were Wright’s fixation index (Fst) and Nei’s genetic distance.

4.3.1 Fixation Index

The fixation index (FST) is a measure of population differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data, such as single-nucleotide polymorphisms (SNP) or microsatellites.

FST is the proportion of the total genetic variance contained in a subpopulation (the S subscript) relative to the total genetic variance (the T subscript). Values can range from 0 to 1. High FST implies a considerable degree of differentiation among populations. A particularly simple estimator applicable to DNA sequence data is

$F_{st} = \frac{T-S}{T}$

Where T is the total genetic variance and S is the genetic variance in subpopulation. Here, we use the MesKit calFst to calculate the Fst between tumor regions.

```{R message=FALSE, fig.align='left'}

library(ggpubr) library(rstatix)

cols_samples = setNames( set.colors(6), nm = c("BRCA","READ","LNET","OvaryLM","OvaryRM","UterusM") )

scores.Fst = calFst(maf, plot = TRUE, use.tumorSampleLabel = TRUE, withinTumor = FALSE, number.cex = 10)

Fst.data = list()

for(i in names(scores.Fst) ){ Fst = scores.Fst[[i]]$Fst.pair Fst = Fst[lower.tri(Fst)]

Fst.data[[i]] = data.frame( Fst = Fst, tumor = i ) }

Fst.data = purrr::reduce(Fst.data, rbind)

statistical test

stat.test = Fst.data %>% wilcox_test(Fst ~ tumor) %>% add_significance("p") %>% add_xy_position(fun ="mean") %>% dplyr::filter(group1 == "Coad" & group2 != "Lung")

p1 = Fst.data %>% ggplot(aes(x = tumor, y = Fst) ) + theme_classic2() + geom_boxplot(aes(col = tumor, shape = tumor), width = 0.6 ) + geom_jitter(aes(col = tumor, shape = tumor), width = 0.1, size = 2) + labs(x = NULL, y = "Fst" ) + scale_color_manual( values = cols_samples ) + stat_pvalue_manual(stat.test, label = "p.signif") + theme( legend.position = "none", axis.text = element_text(size = 15), axis.text.x = element_text(hjust = 1, angle = 45) )

### 4.4 Search the clinically targtable sites

`getClinSites()` search the clinically targetable sites. Briefly, we match the drivers genes between MAF files and clinical sites in OncoKB. We only match the gene names, whereas ignoring the cancer types and gene alterations. The main targetable alterations include (1) gene fusions (like BCR-ABL1 fusion), (2) Oncogenic mutations, (3) Exon deletions/insertion, (4) Amplifications, (5) Deletions and (6) single-nucleotide mutation (BRAC V600E). Please manually check the mutation status.

Please see more information in [oncokb](oncokb.org)

```r

# get all cancers
sites <- getClinSites(maf)

#get BRCA
sites <- getClinSites(maf, Patient_ID = "BRCA")

# DT::datatable(sites)

sites[c(1,2,4:6),c("Hugo_Symbol", "Variant_Classification", "Tumor_Sample_Barcode", "Clonal_Status", "Level", "Alterations", "Cancer.Types", "Drugs")]

We found that the BRCA contains one targetable site in PIK3CA. The PIK3CA mutation in BRCA is clonal (shared by all sampling sites) and the corresponding drug is Alpelisib + Fulvestrant in Level 1

4.5 Estimate natural selection

The natural selection may driver multiple primary tumors in MPT under different selective advantages. The Ka/Ks (ratio of non-synonymous rate relative to synonymous rate), also known Dn/Ds, can be used to measure the pressure of natural selection. The Ka/Ks is informative about the selection, where Dn/Ds > 1 indicates positive selection and Dn/Ds < 1 indicates negative selection.

The function calKaKs estimates the selective pressure in different sets of mutations, according to the mutation class defined by classifyMut. The Ka/Ks is calculated based on dndscv.

kaks <- calKaKs(maf, patient.id = "BRCA", class = "SP", parallel = TRUE, vaf.cutoff = 0.05)

On the other hand,selective pressure can be estimated simply by the ratio of driver mutations relative to total mutation. Like calKaKs, the function calPropDriver estimates the selective pressure in different sets of mutations, according to the ratio of driver mutations defined by classifyMut.

prop <- calPropDriver(maf, patient.id = "BRCA", class = "SP", driverGene = driverGene)
cowplot::plot_grid(kaks$BRCA$plot + ggpubr::theme_pubr(),
                   prop$BRCA$plot + ggpubr::theme_pubr()
                   )

The mutations in BRCA were divided into Public(shared by all sampling site), Shared(shared by two or more samples but not all sampling site), and Private(Private in one sampling site). The Public mutations were under positive selection (Dn/Ds >=1 and The prop of driver mutations is high ), coherent with the expectation that the ancestral mutations might contribute to cancer development.

5. Build phylogenetic tress

For MPT analysis, the core purpose is to dissect the evolutionary relationships between multiple primary and metastatic tumors. The phylogenetic samples tree delineates the relationships between samples by using the genetic summary of each sample, including somatic mutations and allele-specific CNAs. In MPTevol, the mutation-based and CNA-based trees are built by the function plotMutTree and plotCNAtree, respectively.

5.1 Build mutation-based sample trees.

With MPTevol, the construction of phylogenetic mutation tree is based on the binary present/absence matrix of mutations across all tumor regions.

Based on the Maf object, phyloTree() function reconstructs phylogenetic tree in different methods, including "NJ" (Neibor-Joining) , "MP" (maximum parsimony), "ML" (maximum likelihood), "FASTME.ols" and "FASTME.bal", which can be set by method parameter.

By group and group.colors parameters, the samples in the tree can be color by their histological types.

# For split1, the READ and its Metastases were indicated as one patient.
data.type <- "split1"

maf1 <- readMaf(
  mafFile = system.file(package = "MPTevol", "extdata", sprintf("meskit.%s.mutation.txt", data.type)),
  ccfFile = system.file(package = "MPTevol", "extdata", sprintf("meskit.%s.CCF.txt", data.type)),
  clinicalFile = system.file(package = "MPTevol", "extdata", sprintf("meskit.%s.clinical.txt", data.type)),
  refBuild = "hg19",
  ccf.conf.level = 0.95
)

# Set the Group Information
group <- list(
  Coad = paste0("READ_", 1:5),
  OveryLM = paste0("OvaryLM_", 1:5),
  OveryRM = paste0("OvaryRM_", 1:6),
  UterusM = paste0("UterusM_", c(1, 3))
)

# set group colors
group.colors <- setNames(c("#7570B3", "#E6AB02", "#003C30", "#666666"), nm = names(group))


mutTrees <- plotMutTree(maf1,
  patient.id = "Met1", group = group,
  group.colors = group.colors,
  title = "mutation-based Tree"
)

mutTrees$plot

5.2 Build CNA-based sample trees

MPTevol used MEDICC to infer CNA-based samples trees. MEDICC calculates the minimal copy number events from the ancestor(major = 1, minor = 1) to the mutant samples. The estimations are based on each chromosome and the finial genetic distances is the sum of the distances in total chromosomes.

First, we obtain shared genomic regions across samples by splitSegment. The splitSegment is used to generate the input format required by MEDICC. In current develop version, the input is based on the called genomic allele-specific CNAs by Sequenza.

# Running MEDICC
folder <- "/data1/qingjian/Rproject/Three/medicc/Seg.new1"
segfiles <- list.files(folder, pattern = ".txt", full.names = T)
sampleid <- list.files(folder, pattern = ".txt") %>% stringr::str_remove("_segments.txt")

# Running for Breast
library(IRanges)

seg <- splitSegment(
  segfiles = segfiles[1:5],
  sampleid = sampleid[1:5],
  project.names = "Breast",
  out.dir = "medicc/Breast",
  N.baf = 30,
  cnv_min_length = 1e+05,
  max_CNt = 15,
  minLength = 1e+05,
  maxCNV = 4
)

We can check shared genomic regions by using heatmaps to show the CNA changes (major or minor) in total samples.

plotMediccSeg <- function(file) {
  major <- read.table(file, header = T)

  major <- major %>%
    mutate(seq = str_c(seqnames, start, end, sep = "_")) %>%
    column_to_rownames(var = "seq") %>%
    mutate(seqnames = NULL, start = NULL, end = NULL, width = NULL)

  Heatmap(major,
    row_names_gp = gpar(fontsize = 6)
  )
}

# See Major and Minor
plotMediccSeg("medicc/Breast/Breast.major.txt")
plotMediccSeg("medicc/Breast/Breast.minor.txt")

Then we run MEDICC in the shell. A simple run command is as follows:

```{sh eval=FALSE}

install position of medicc

medicc=/soft/medicc/medicc.py

install position of python

python=/anaconda3/envs/pyclone/bin/python

$python $medicc medicc/Breast/Breast.descr.txt medicc/Breast.out -v > medicc/Breast.runinfo.txt

After Running `MEDICC`, we get the distance file (`tree_final.dist`, generated by `MEDICC`) to plot the CNA-based trees.`plotCNAtree()` first estimates the bootstrap values of CNA-based trees by replacing tree distances. Then, `plotCNAtree()` visualzied the trees.


```r
# read samples distances.
# This dist file is the output of MEDICC
dist <- system.file(package = "MPTevol", "extdata", "tree_final.dist")

# set group information
# set group information
group <- list(
  Coad = paste0("READ_", 1:5),
  OveryLM = paste0("OvaryLM_", 1:5),
  OveryRM = paste0("OvaryRM_", 1:6),
  UterusM = paste0("UterusM_", c(1, 3))
)

# set group colors
group.colors <- setNames(c("#7570B3", "#E6AB02", "#003C30", "#666666"), nm = names(group))

# built trees
cnaTree <- plotCNAtree(
  dist = dist,
  group = group,
  group.colors = group.colors,
  title = "CNA-based Trees"
)

cnaTree$plot

5.4 Compare the mutation-based trees and CNA-based trees

We compare mutation-based trees and CNA-based trees to see their discrepancy in the phylogenetic relationships.

library(patchwork)

(mutTrees$plot + theme(legend.position = "none")) +
  (cnaTree$plot + theme(legend.position = "none"))

The READ_5,UterusM_1, UterusM_3 exhibit different relationships in the two trees.

To DO: check how the mutation cutoffs influence the mutation-based trees.

6. Clonal dynamics

The samples trees reflecting the overall genetic similarity are often suffered from the tumor admixture in bulk samples. In contrast, the clonal trees represent the clonal history among tumor cell lineage. The clonal trees can reflect the clonal dynamics from primary to metastatic tumors.

6.1 Inferring clonal structures

This step is to infer the clonal structures. The sciClone 4 and PyClone 5 could infer the clonal structures.

Two prominent approaches in clonal evolution studies are:

using only diploid heterozygous variants (variants in regions without copy number alteration), hence excluding copy-altered variants. When only diploid heterozygous variants are used, VAFs can be estimated as the ratio of the variant read count and total read count and clustering can be performed by tools such as sciClone.
including copy-altered variants. When copy-altered variants are included, clustering should be performed using copy-number aware tools such as PyClone, and copy number corrected VAFs can be obtained by dividing the CCFs estimated by such tools by two.

In MPTevol, the format of variants is used.

# load data
data("variants", package = "MPTevol")
data("variants.ref", package = "MPTevol")

head(variants,3)

6.2 Check clonal prevalence across samples.

library(clonevol)

vaf.col.names <- c(
  paste0("READ_", 1:5), paste0("OvaryLM_", 1:5),
  paste0("OvaryRM_", 1:6), paste0("UterusM_", c(1, 3))
)

sample.groups <- mapply(function(x) x[1], strsplit(vaf.col.names, "_"))
names(sample.groups) <- vaf.col.names

cluster.col.name <- "cluster"

clones.number <- 10
clone.colors <- set.colors(10)

# Check data.
pp <- plotVafCluster(
  variants = variants,
  cluster.col.name = "cluster",
  vaf.col.names = vaf.col.names[c(1, 6, 10, 11)],
  # highlight = "is.driver",
  # highlight.note.col.name = "gene_site",
  box = TRUE,
  violin = FALSE
)

pp

# check cluster changes.
plot.cluster.flow(variants,
  cluster.col.name = cluster.col.name,
  vaf.col.names = vaf.col.names,
  sample.names = vaf.col.names,
  colors = set.colors(clones.number),
  y.title = "Variant Allele Frequency %"
) +
  theme(axis.text.x = element_text(angle = 90))

6.3 Inferring clonal trees

6.3.1 tips about buidling clonal trees.

Inferring the clonal trees is the central process in clonal construction. However, users always find it difficult to build clonal trees. Therefore, we should check the cluster structure before building clonal trees. Here are some suggestions about building clonal trees.

chose the optimal clustering methods. Before mutation clustering. we should removed low-quality mutations. It is recommended to delete mutations in the LOH regions and INDELs. The mutations in the cnv-regions should be carefully checked.
chose the right founding cluster. When the mean VAFs of founding cluster is smaller than a certain cluster, tree construction fails.
ignore some false-negative clusters. (1) The clusters with low cellular prevalence is probably clustering error, especially clusters that have low VAF in all samples. (2) small cluster. Removing clusters that having too few mutations.
try different cutoffs. The two parameters sum.p and alpha are used to determine whether a cluster is present in a given sample. A relaxed cutoffs (small values of the two parameters) enables more clusters are though to be present in the sample.

sel <- 1:18

y <- inferClonalTrees(
  project.names = "Met",
  variants = variants.ref,
  ccf.col.names = vaf.col.names[sel],
  sample.groups = sample.groups[sel],
  cancer.initiation.model = "monoclonal",
  founding.cluster = 1, ignore.clusters = 4,
  cluster.col.name = "cluster",
  subclonal.test.model = "non-parametric",
  sum.p = 0.01, alpha = 0.05, weighted = FALSE,
  plot.pairwise.CCF = FALSE,
  highlight.note.col.name = NULL,
  highlight = "is.driver", highlight.CCF = TRUE
)

# pdf(file = sprintf("%s/%s.trees.pdf", output, output), width = 6, height = 6)

plot.all.trees.clone.as.branch(y,
  branch.width = 0.5,
  node.size = 2, node.label.size = 0.5,
  tree.rotation = 180,
  angle = 20
)

# dev.off()

6.3.1 Buidling clonal trees for MPT Patients.

For cancer evolution, most models assume that all cancer samples in a case originates from a monoclonal ancestor(single-primary tumor). However, the metastatic tumors might form from multiple primary tumors, so polyclonal model (multiple founding clones) should be considered.

6.4 Combined MRS samples into a meta sample.

The merge.samples function can merge MRS samples into a meta sample. This can be used to represent the clonal admixture of whole tumor.

Note: (1) This function required the number of ref and var alleles for each sample. In the examples, we used the variants.ref data sets. (2) The CCF ranges between 0-100.

# merge coad
sel <- 1:5
y.merge <- merge.samples(y,
  samples = vaf.col.names[sel],
  new.sample = "READ", new.sample.group = "READ",
  ref.cols = str_c(vaf.col.names[sel], ".ref"),
  var.cols = str_c(vaf.col.names[sel], ".var")
)
# merge overyLM
sel <- 6:10
y.merge <- merge.samples(y.merge,
  samples = vaf.col.names[sel],
  new.sample = "OvaryLM", new.sample.group = "OvaryLM",
  ref.cols = str_c(vaf.col.names[sel], ".ref"),
  var.cols = str_c(vaf.col.names[sel], ".var")
)
# merge overyRM
sel <- c(11:13, 14, 15, 16)
y.merge <- merge.samples(y.merge,
  samples = vaf.col.names[sel],
  new.sample = "OvaryRM", new.sample.group = "OvaryRM",
  ref.cols = str_c(vaf.col.names[sel], ".ref"),
  var.cols = str_c(vaf.col.names[sel], ".var")
)

# merge UterusM
sel <- c(17, 18)
y.merge <- merge.samples(y.merge,
  samples = vaf.col.names[sel],
  new.sample = "UterusM", new.sample.group = "UterusM",
  ref.cols = str_c(vaf.col.names[sel], ".ref"),
  var.cols = str_c(vaf.col.names[sel], ".var")
)

6.5 View clonal dynamic by timescape

TimeScape is an automated tool for navigating temporal clonal evolution data. The key attributes of this implementation involve the enumeration of clones, their evolutionary relationships and their shifting dynamics over time. TimeScape requires two inputs: (i) the clonal phylogeny and (ii) the clonal prevalences. The output is the TimeScape plot showing clonal prevalence vertically, time horizontally, and the plot height optionally encoding tumour volume during tumour-shrinking events. At each sampling time point (denoted by a faint white line), the height of each clone accurately reflects its proportionate prevalence. These prevalences form the anchors for bezier curves that visually represent the dynamic transitions between time points

The tree2timescape transforms the tree information into the required format of TimeScape. Then user can view the clonal dynamic along with tumor locations or time.

library(timescape)

samples <- names(y.merge$models)

times <- tree2timescape(results = y.merge, samples = names(y.merge$models))


# run timescape
i <- 1
timescape(
  clonal_prev = times$clonal_prev[[i]],
  tree_edges = times$tree_edges[[i]],
  clone_colours = times$clone_colours[[i]],
  genotype_position = "centre",
  xaxis_title = NULL
)

# save svg file plot
# rsvg::rsvg_pdf( svg = sprintf("%s/%s.model%s.svg", project.names, project.names, i),
#                file = sprintf("%s/%s.model%s.pdf", project.names, project.names, i)
# )

Say something about the tree.

7. Estimating Metastatic routines

Metastatic routines could be estimated by comparing the CCFs of mutations from the primary and metastatic tumors. The H index is calculated as

$H=\frac{L_m}{L_p+1}$

Where $L_m$ and $L_p$ are the number of metastatic-private clonal and primary-private clonal mutations, respectively 7. The H index is positively correlated with the time of dissemination, therefore larger H index is associated with later dissemination (H >=20) 7.

The Jaccard similarity index (JSI) between the primary and metastatic tumors was calculated as

$JSI=\frac{W_s}{L_p+L_m+W_s}$

Where $W_s$ is the number of shared subclonal mutations 8. The JSI index is informative about the monoclonal (JSI <= 0.3) and polyclonal metastatic seeding (JSI >0.3) 8.

## add driver information

maf_driver <- data.frame(
  Mut_ID = c("5:112170777:CAGA:-", "1:147092680:-:C"),
  is.driver = c(TRUE, TRUE)
)


cal <- calRoutines(
  maf = maf1,
  patient.id = "Met1",
  PrimaryId = "READ",
  pairByTumor = TRUE,
  use.tumorSampleLabel = TRUE,
  subtitle = "both",
  maf_drivers = maf_driver
)

wrap_plots(plotlist = cal$Met1$plist, nrow = 1)

8. References {#refer}

Liu M, Chen J, Wang X et al. MesKit: a tool kit for dissecting cancer evolution of multi-region tumor biopsies through somatic alterations, Gigascience 2021;10.
Favero F, Joshi T, Marquard AM et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data, Ann Oncol 2015;26:64-70.
Schwarz RF, Trinh A, Sipos B et al. Phylogenetic quantification of intra-tumour heterogeneity, PLoS Comput Biol 2014;10:e1003535.
Miller CA, White BS, Dees ND et al. SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution, PLoS Comput Biol 2014;10:e1003665.
Roth A, Khattra J, Yap D et al. PyClone: statistical inference of clonal population structure in cancer, Nat Methods 2014;11:396-398.
Dang HX, White BS, Foltz SM et al. ClonEvol: clonal ordering and visualization in cancer sequencing, Ann Oncol 2017;28:3076-3082.
Hu Z, Ding J, Ma Z et al. Quantitative evidence for early metastatic seeding in colorectal cancer, Nat Genet 2019;51:1113-1122.
Hu Z, Li Z, Ma Z et al. Multi-cancer analysis of clonality and the timing of systemic spread in paired primary tumors and metastases, Nat Genet 2020;52:701-708.

qingjian1991/MPTevol documentation built on Jan. 30, 2023, 10:16 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

qingjian1991/MPTevol Clonal Evolutionary History and Metastatic Routines Analysis for Multiple Primary Tumors