saturationPlot: Plot your panel along with incremental genomic space occupied...
In PrecisionTrialDrawer: A Tool to Analyze and Design NGS Based Custom Gene Panels

Description Usage Arguments Details Value Author(s) See Also Examples

This plot method returns a scatter plot with genomic space on X axis and average/absolute number of alterations on Y axis. The way the plot is built is incremental. We add one feature at a time starting from the most altered and we see how many samples we include at each step and how much space is occupied.

saturationPlot(object
    , alterationType = c( "copynumber", "expression", "mutations", "fusions")
    , grouping = c(NA, "drug", "group", "alteration_id", "tumor_type")
    , adding = c( "alteration_id", "gene_symbol", "drug", "group")
    , tumor_type = NULL
    , y_measure = c( "mean", "absolute")
    , adding.order=c( "absolute", "rate")
    , sum.all.feature=FALSE
    , collapseMutationByGene=TRUE
    , collapseByGene=FALSE
    , labelling=TRUE
    , tumor.weights=NULL
    , main=""
    , legend=c("in" , "out")
    , noPlot = FALSE)

`object`	a CancerPanel object
`alterationType`	what kind of alteration to include. It can be one or more between "copynumber", "expression", "mutations", "fusions". Default is to include all kind of alterations.
`grouping`	One of the following: "drug", "group", "alteration_id", "tumor_type". This parameter draws a curve for every level of the chosen grouping. if set to NA, the panel is not split and the plot is a single curve.
`adding`	One of the following: "alteration_id", "gene_symbol", "drug", "group". This parameter will set which variable is added at every point of the plot. see details
`tumor_type`	only plot one or more tumor types among the ones available in the object.
`y_measure`	if 'mean', the measure on Y axis is the mean number of alterations per sample. Confidence interval of the measure is also reported. If 'absolute', the relative frequency of samples covered by at least one alteration is reported, similarly to `coveragePlot.`
`adding.order`	This parameter modifies the order of entrance of the adding variable. If 'absolute', the adding variable starts from the most altered up to the less frequently altered. If 'rate', the order of entrance, from left to right is based on the number of alterations divided by the length in kb.
`sum.all.feature`	logical. if TRUE every gene length of the panel is summed up by the adding variable. The effect is that if a gene is considered both for SNV and CNA, it is counted twice.
`collapseMutationByGene`	A logical that collapse all mutations on the same gene for a single patient as a single alteration.
`collapseByGene`	A logical that collapse all alterations on the same gene for a single patient as a single alteration. e.g. if a sample has TP53 both mutated and deleted as copynumber, it will count for one alteration only.
`labelling`	if FALSE, the dots are not labelled. It is useful for very large comparative plots. Default TRUE.
`tumor.weights`	A named vector of integer values containing an amount of samples to be randomly sampled from the data. Each element should correspond to a different tumor type and is named after its tumor code. See details
`main`	Set a name for the plot
`legend`	if 'in' the legend is plotted in the top left corner, if 'out', outside of the plotting area
`noPlot`	if TRUE, the plot is not shown and data to create it are reported instead.

This plot is particularly useful to evaluate the panel piece by piece. At the last point, we can observe the maximum coverage or maximum mean value of the panel, as reported in the coveragePlot. If we go back one point at a time, we can appreciate how many samples we gained by adding a new drug or a new gene to the panel. It is often the case that our panel is redundant for certain drugs or genes and there is no point in wasting sequencing space for a gene that is poorly altered and doesn't allow further improvement to our clinical trial.

By default, saturationPlot will use all the available data from the object, using all the samples for the requested alterationTypes. Nevertheless, one could be interested in creating a compound design that is composed by a certain number of samples per tumor type. This is the typical situation of basket trials, where you seek for specific alteration, rather than specific tumor types and your design can be stopped when the desired sample size for a given tumor type is reached. By adding tumor.weights, we can achieve such target (see examples). Unfortunately, there are two main drawbacks in doing so:

small sample size: by selecting small random samples, the real frequency can be distorted. to avoid this, it is better to run several small samples and then bootstrap them
recycling: if the sample size for a tumor type requested by the user is above the available number of cBioportal samples, the samples are recycled. This has the effect of stabilizing the frequencies but y_measure = "absolute" will have no real meaning when the heterogeneity of the samples is lost.

If noPlot is TRUE, the method returns a data.frame with 8 or 9 columns, depending on how the adding.order parameter was set:

gene_symbol , drug , alteration_id or group: the adding variable chosen by the user
grouping: the grouping variable chosen by the user
Mean: the value plotted on Y axis if 'mean' is chosen as y_measure parameter
Coverage: the value plotted on Y axis if 'absolute' is chosen as y_measure parameter
SD: standard deviation of Mean
SE: standard error of Mean
CI: confidence interval of Mean
Space: genomic space in kBases ordered by grouping variable
num_of_variants_per_KB: if adding.order='rate', this additional column is added. It represents the number of alteration divided by the feature length

An incremental scatter plot if noPlot is FALSE, a data.frame otherwise.

Giorgio Melloni, Alessandro Guida

coveragePlot

# Load example CancerPanel object
data(cpObj)
# Plot the saturation of this panel by tumor type adding one drug at a time
# Using mutations and copynumber data
saturationPlot(cpObj 
    , alterationType=c( "mutations" , "copynumber")
    , adding="drug"
    , grouping="tumor_type"
    , y_measure="absolute")
# Plot with no grouping giving more weight to lung cancer samples
# Note that we ask for more samples than the availables in luad dataset
# the code will recycle the samples to account for this forced disequilibrium
saturationPlot(cpObj 
    , alterationType=c( "mutations" , "copynumber") 
    , adding="gene_symbol"
    , y_measure="mean"
    , tumor.weights=c(brca=500 , luad=2000))