In pneuvial/c3co: Cancer Cell Clonality using Copy-Number

Introduction

The c3co package implements a constrained dictionary learning problem to recover cancer subclones from several DNA copy number profiles described in @pierre-jean:c3co. The c3co model may be seen as an extension of the Fused Lasso Latent Feature Model (FLLat) method of @nowak2011fused, and of the Enhanced FLLat (E-FLLat) method of @masecchia2013dictionary, with the following original features:

interpretable weights: we model each profile as a convex combination of latent profiles, making the corresponding weights directly interpretable as proportions of latent features;
parent-specific copy numbers: we leverage the allelic signals available from SNP array or sequencing data in order to explicitly integrate parent-specific copy numbers (@olshen11parent-specific) in the model.
segment-level: we model tumor clonality at the level of copy number segments (not individual loci), which is the level of information at which such events occur.

Notes:

Although the c3co model is designed to deal with allelic signals, we emphasize that it is also applicable to data where only total copy number estimates are available, such as array-CGH data or low-pass sequencing data.
The FLLat method is implemented in the R package FLLat. The E-FLLat method is implemented in Python package PyCGH.

Model

The figure below illustrates the model used in the c3co package. Two heterogeneous tumor samples (green and yellow circles) are composed of a collection of normal cells (gray discs) and two cancer subclones (red triangles and blue squares). One of the cancer subclones is present in both tumor samples.

The corresponding (noiseless) copy number profiles are displayed in the figure below. They are given by a linear combination of the latent profiles. This Figure is adapted from @nowak2011fused.

Setup

library("c3co")
library("ggplot2")
library("mclust")
library("reshape")

Creating a synthetic data set

len <- 1000L   ## Number of loci
K <- 5L        ## Number of subclones
J <- 11L       ## Number of segments
n <- 20L       ## Number of samples
eps <- 1.0

Noiseless profiles

dat <- getToyData(n, len = len, nbClones = K, nbSegs = J, eps = 0.0)
matplot(t(dat$locus$Y), t = "s")
matplot(t(dat$segment$Y), t = "s")

Noisy profiles

dat <- getToyData(n, len = len, nbClones = K, nbSegs = J, eps = 0.2)  ## noisy
matplot(t(dat$locus$Y), t = "l")
matplot(t(dat$segment$Y), t = "s")
W <- dat$W

Fitting the c3co model

l1 <- seq(from = 1e-6, to = 1e-4, length.out = 10L)
candP <- 2:10
parameters.grid <- list(lambda = l1, nb.arch = candP)

Y <- dat$segment$Y
fit <- fitC3co(Y, parameters.grid = parameters.grid, verbose = TRUE)

For each candidate number of subclones $p$ in r candP, c3co() only retains the combination of the penalization coefficients $(\lambda_1, \lambda_2)$ which minimizes the Bayesian Information Criterion (BIC) of the model. The next step is to choose the best $p$ (number of subclones). Following @nowak2011fused, we compare the models for different values of $p$ through their percentage of variance explained (PVE).

pvePlot2(fit$config$best, ylim = c(0.8, 1))

Here we arbitrarily choose five subclones (because we know it's the true number of subclones).

We can compare the true and the estimated weight matrices:

best <- 4

res.clustTRUE <- hclust(dist(cbind(W, 1-rowSums(W))), method = "ward.D")
col <-  grDevices::colorRampPalette(RColorBrewer::brewer.pal(9, 'GnBu'))(100)

cfit <- new("c3coFit")  ## terrible hack to bypass lacking plot methods
cfit@fit <- fit$fit
Wplot(cfit, idxBest = best, cexCol = 0.4)

heatmap.3(W, dendrogram = "row", main = "TRUE",
          Rowv = as.dendrogram(res.clustTRUE), col = col, scale = "none")

Further analysis

Performance of clustering

In this section, we show the ability of the method to recover the simulated clustering of patients.

fitW <- cfit@fit[[best]]@W
adjustedRandIndex(cutree(hclust(dist(W), method = "ward.D2"), k = 4),
                  cutree(hclust(dist(fitW), method = "ward.D2"), k = 4))

Session information

sessionInfo()

References

pneuvial/c3co documentation built on May 25, 2019, 10:21 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pneuvial/c3co
Cancer Cell Clonality using Copy-Number

In pneuvial/c3co: Cancer Cell Clonality using Copy-Number

Introduction

Model

Setup

Creating a synthetic data set

Noiseless profiles

Noisy profiles

Fitting the c3co model

Further analysis

Performance of clustering

Session information

References

R Package Documentation

Browse R Packages

We want your feedback!

pneuvial/c3co Cancer Cell Clonality using Copy-Number

In pneuvial/c3co: Cancer Cell Clonality using Copy-Number

Introduction

Model

Setup

Creating a synthetic data set

Noiseless profiles

Noisy profiles

Fitting the c3co model

Further analysis

Performance of clustering

Session information

References

R Package Documentation

Browse R Packages

We want your feedback!

pneuvial/c3co
Cancer Cell Clonality using Copy-Number