gene.hypotheses = c('KRAS', 'NRAS', 'IDH1', 'IDH2', 'TET2', 'SF3B1', 'ASXL1')
alterations = events.selection(as.alterations(aCML), filter.freq = .05)
aCML.clean = events.selection(aCML,, gene.hypotheses))
aCML.clean = annotate.description(aCML.clean, 
    'CAPRI - Bionformatics aCML data (selected events)')
aCML.hypo = hypothesis.add(aCML.clean, 'NRAS xor KRAS', XOR('NRAS', 'KRAS'))
aCML.hypo = hypothesis.add(aCML.hypo, 'SF3B1 xor ASXL1', XOR('SF3B1', XOR('ASXL1')),
    '*'), genes = 'TET2') 
aCML.hypo = hypothesis.add(aCML.hypo,
    'TET2 xor IDH2',
    XOR('TET2', 'IDH2'),
aCML.hypo = hypothesis.add(aCML.hypo,
    'TET2 or IDH2',
    OR('TET2', 'IDH2'),
aCML.hypo = hypothesis.add.homologous(aCML.hypo)
aCML.hypo = annotate.description(aCML.hypo, '')
aCML.clean = annotate.description(aCML.clean, '')
model.capri = tronco.capri(aCML.hypo, boot.seed = 12345, nboot = 5)
model.capri = annotate.description(model.capri, 'CAPRI - aCML')
model.caprese = tronco.caprese(aCML.clean)
model.caprese = annotate.description(model.caprese, 'CAPRESE - aCML')
model.edmonds = tronco.edmonds(aCML.clean, nboot = 5, boot.seed = 12345)
model.edmonds = annotate.description(model.edmonds, 'MST Edmonds - aCML')
model.gabow = tronco.gabow(aCML.clean, nboot = 5, boot.seed = 12345)
model.gabow = annotate.description(model.gabow, 'MST Gabow - aCML')
model.chowliu = tronco.chowliu(aCML.clean, nboot = 5, boot.seed = 12345)
model.chowliu = annotate.description(model.chowliu, 'MST Chow Liu - aCML')
model.prim = tronco.prim(aCML.clean, nboot = 5, boot.seed = 12345)
model.prim = annotate.description(model.prim, 'MST Prim - aCML data')

TRONCO provides functions to plot a model, access information about the probabilities used to extract it from data, and two types of confidence measures: those used to infer the model, and those computed a posteriori from it.

Function view provides updated information about a model if this is available.


Visualizing a reconstructed model

We can plot a model by using function \Rfunction{tronco.plot}. Here, we plot the aCML model inferred by CAPRI with BIC and AIC as a regolarizator. We set some parameters to get a nice plot (scaling etc.), and distinguish the edges detected by the two regularization techniques. The confidence of each edge is shown in terms of temporal priority and probability raising (selective advantage scores) and hypergeometric testing (statistical relevance of the dataset of input). Events are annotated as in the oncoprint, edge p-values above a minium threshold (default 0.05) are red.

    fontsize = 12, 
    scale.nodes = 0.6, 
    confidence = c('tp', 'pr', 'hg'), 
    height.logic = 0.25, 
    legend.cex = 0.35, 
    pathways = list(priors = gene.hypotheses), 
    label.edge.size = 10)

We can also make a multiplot with this function, which in this case we do by showing the models inferred by the other algorithms based on Minimum Spanning Trees.

par(mfrow = c(2,2))
tronco.plot(model.caprese, fontsize = 22, scale.nodes = 0.6, legend = FALSE)
tronco.plot(model.edmonds, fontsize = 22, scale.nodes = 0.6, legend = FALSE)
tronco.plot(model.chowliu, fontsize = 22, scale.nodes = 0.6, legend.cex = .7)
tronco.plot(model.prim, fontsize = 22, scale.nodes = 0.6, legend = FALSE)

Accessing information within a model (e.g., confidence)

We can visualize a summary of the parameters used for the reconstruction, test if an object has a model or delete it (which shall be done to retrieve the original dataset).
dataset = delete.model(model.capri)

Model structure

A set of functions can be used to visualize the content of object which contains the reconstructed model. For instance, we can access the adjacency matrix of a model by using as.adj.matrix which will return a matrix for each one of the regularizators used -- in this case because CAPRI was run with both BIC/AIC.


Empirical probabilities

Every model is inferred by estimating the empirical marginal, joint and conditional probabilities for all the events, from input data. These in some cases are estimated by a bootstrap procedure (see the algorithms implemented). TRONCO has functions to extract such table, that could be in turn printed by using external functions for, e.g., heatmap visualization (see below for an example via the pheatmap package). We show these functions working with the CAPRI model; in this case the tables are the same for both BIC/AIC structures as they are computed before performing penalized likelihood-fit. The marginal P(x) for x an event in the dataset are obtained by as.marginal.probs.

marginal.prob = as.marginal.probs(model.capri)

Similarly, the joint P(x,y) for every pair of events in the dataset is given by as.joint.probs.

joint.prob = as.joint.probs(model.capri, models='capri_bic')
joint.prob$capri_bic[1:3, 1:3]

And as.conditional.probs finally gives the conditional P(x|y) for every edge in the dataset.

conditional.prob = as.conditional.probs(model.capri, models='capri_bic')

Confidence measures

Confidence scores can be accessed by function \Rfunction{as.confidence}, which takes as parameter the type of confidence measure that one wants to access to. This will work for either confidence measures assessed before reconstructing the model -- if available --, or afterwards.

str(as.confidence(model.capri, conf = c('tp', 'pr', 'hg')))

Other functions visualize tables summarizing the statistics for each edge in the model, For instance, if one uses function as.selective.advantage.relations the p-values for temporal priority, probability raising and hypergeometric testing, as well as other information about each edge can be accessed, e.g., the number of observations for the upstream and the downstream events.


Confidence via non-parametric and statistical bootstrap

TRONCO provides three different strategies to perform bootstrap and assess confidence of each edge in terms of a score in the range [0, 100], where 100 is the highest confidence). Non-parametric (default) and statistical bootstrap strategies are available, and can be executed by calling function tronco.bootstrap with type parameter set appropriately. This function is parallel, and parameter cores.ratio (default 1) can be used to percentage of available cores that shall be used to compute the scores. Parameter nboot controls the number of bootstrap iterations.

model.boot = tronco.bootstrap(model.capri, nboot = 3, cores.ratio = 0)
model.boot = tronco.bootstrap(model.boot, nboot = 3, cores.ratio = 0, type = 'statistical')

Bootstrap scores can be annotated to the tronco.plot output by setting them via the confidence parameter confidence=c('npb', 'sb'). In this case edge thickness will be proportional to the non-parametric npb) scores -- the last to appear in the confidence parameter.

    fontsize = 12, 
    scale.nodes = .6,   
    confidence=c('sb', 'npb'), 
    height.logic = 0.25, 
    legend.cex = .35, 
    pathways = list(priors= gene.hypotheses), 

Bootstrap scores can extracted or visualized even with other TRONCO functions. For instance, we can accessall scores via as.bootstrap.scores, which resembles function as.selective.advantage.relations and will display the scores per edge. Notice that even function view gives an update output by mentioning the available bootstrap scores.


If we want to access a matrix with the scores and visualize that in a heatmap we can use for instance the pheatmap function of TRONCO. In this case we need to use also function keysToNames to translate internal TRONCO keys to mnemonic names in the plot

pheatmap(keysToNames(model.boot, as.confidence(model.boot, conf = 'sb')$sb$capri_aic) * 100, 
           main =  'Statistical bootstrap scores for AIC model',
           fontsize_row = 6,
           fontsize_col = 6,
           display_numbers = TRUE,
           number_format = "%d"

Confidence via cross-validation (entropy loss, prediction and posterior classification errors)

TRONCO implements k-fold cross-validation routines (from the bnlearn package) to provide estimates of the following statistics:

By default, a 10 repetitions from 10-fold cross-validation experiment are perfomed, for all the models which are found inside a TRONCO object -- in this case 2, one for CAPRI with BIC and one for CAPRI with AIC.

model.boot = tronco.kfold.eloss(model.boot)
model.boot = tronco.kfold.prederr(model.boot, runs = 2, cores.ratio = 0)
model.boot = tronco.kfold.posterr(model.boot, runs = 2, cores.ratio = 0)

These results can be visualized in terms of summary tables, as for the other confidence scores.


Notice that these can be combined to create a nice table with all these statistics -- we make here the example of a table with all the BIC statistics. This format can be readily exported to external spreadsheets for further visualization.

tabular = function(obj, M){
    tab = Reduce(
        function(...) merge(..., all = TRUE), 
            list(as.selective.advantage.relations(obj, models = M),
                as.bootstrap.scores(obj, models = M),
                as.kfold.prederr(obj, models = M),
                as.kfold.posterr(obj,models = M)))

    # merge reverses first with second column
    tab = tab[, c(2,1,3:ncol(tab))]
    tab = tab[order(tab[, paste(M, '.NONPAR.BOOT', sep='')], na.last = TRUE, decreasing = TRUE), ]

head(tabular(model.boot, 'capri_bic'))

We finally show the plot of the model with the confidences by cross-validation.

    fontsize = 12, 
    scale.nodes = .6, 
    confidence=c('npb', 'eloss', 'prederr', 'posterr'), 
    height.logic = 0.25, 
    legend.cex = .35, 
    pathways = list(priors= gene.hypotheses), 

Try the TRONCO package in your browser

Any scripts or data that you put into this service are public.

TRONCO documentation built on Nov. 8, 2020, 5:51 p.m.