squaremodel: Calculate potential fits for a single sample using ploidy as...

View source: R/ACE.R

squaremodelR Documentation

Calculate potential fits for a single sample using ploidy as a variable

Description

squaremodel performs a "two-dimensional" fitting algorithm on a single sample. It calculates the error of the fit at each cellularity over a range of "ploidies". Input can be either a template or a QDNAseq-object with the index of the sample specified. Returns a list with input parameters (method, penalty, and penploidy) and model characteristics (an error matrix, a logical matrix specifying minima, a data frame with all information, a data frame with only minima, and a graphical representation of the error matrix).

Usage

squaremodel(template, QDNAseqobjectsample = FALSE, prows=100, 
  ptop=5, pbottom=1, method = 'RMSE', exclude = c("X", "Y"), 
  sgc = c(), penalty = 0, penploidy = 0, cellularities = seq(5,100), 
  highlightminima = TRUE, standard)

Arguments

template

Object. Either a data frame as created by objectsampletotemplate, or a QDNAseq-object

QDNAseqobjectsample

Integer. Specifies which sample to analyze from the QDNAseqobject. Required when using a QDNAseq-object as template. Default = FALSE

prows

Integer. Sets the resolution of the ploidy-axis. Determines how many decrements are used to get from ptop to pbottom (see below). Therefore, the actual number of rows is actually prows + 1. Default = 100

ptop

Numeric. Sets the highest ploidy at which to start testing fits. Default = 5

pbottom

Numeric. Sets the lowest ploidy to be tested. Default = 1

method

Character string specifying which error method to use. For more documentation, consult the vignette. Can be "RMSE", "SMRE", or "MAE". Default = "RMSE"

exclude

Integer or character vector. Specifies which chromosomes to exclude for model fitting. Default = c("X", "Y")

sgc

Integer or character vector. Specifies which chromosomes occur with only a single copy in the germline

penalty

Numeric. Penalizes fits at lower cellularities. Suggested values between 0 and 1. Default = 0 (no penalty)

penploidy

Numeric. Penalizes fits that diverge from 2N with the formula (1+abs(ploidy-2))^penploidy. Default = 0

cellularities

Numeric vector. Specifies the cellularities (in percentage) to be tested

highlightminima

Logical. Minima are highlighted in the matrixplot by a black dot. Default = TRUE

standard

Numeric. Force the ploidy to represent this raw value. When omitted, the standard will be calculated from the data

Details

Unlike other functionality of ACE, squaremodel does not use the "standard", but it fits all tested ploidies to "standard = 1". It is therefore necessary that segment values are normalized to 1 (which they are by default coming from QDNAseq). The penalty parameter is the same as in singlemodel. Additionally, it is possible to penalize fits at ploidies diverging from 2N using the penploidy parameter. For other details on the fitting algorithm, see singlemodel. Range of ploidies is set by parameters ptop and pbottom, and resolution is determined by prows. Resolution on the X-axis can be adapted by changing the cellularities option. To create good contrast in the matrixplot, the color scale derives from the inverse of the error, and the opacity of the dots marking the minima is calculated as min(error)/error.

Value

Returns a list, containing

method

Applied error method

penalty

Applied penalty factor for low cellularities

penploidy

Applied penalty factor for diverging ploidies

errormatrix

Numeric matrix with errors of all combinations of ploidy and cellularity

minimatrix

Logical matrix indicating whether the combination of ploidy and cellularity represents a minimum

errordf

Data frame with columns ploidy, cellularity, error, and minimum

minimadf

Same as errordf, but only containing minima and sorted by error

matrixplot

ggplot2-graph of the relative errors calculated at each combination of ploidy and cellularity

Note

squaremodel() only needs a data frame with columns named chr and segments. Every row should contain an individual genomic feature, i.e. a bin or a probe. If you have data with each row representing a segment, and the size of the segment given in a column (e.g. NumBins or NumProbes), you can create the data frame as follows (giving the correct variable names of course):

template <- data.frame(chr = rep(Chromosome, NumProbes), segments = rep(SegmentMean, NumProbes))

Alternatively you can look into segmentstotemplate.

If your data contains sex chromosomes and you wish to include these for model fitting, then make sure to specify exclude = c(), and sgc = c("X", "Y") when analyzing data from a male individual.

Author(s)

Jos B. Poell

See Also

objectsampletotemplate, squaremodel, singleplot

Examples

## toy data assuming each chromosome comprises 100 bins
s <- jitter(c(1, 1, 0.8, 1.2, rep(1, 5), 1.4, rep(1, 13)), amount = 0)
n <- c(100, 100, 40, 60, rep(100, 5), 100, rep(100, 13))
df <- data.frame(chr = rep(1:22, each = 100), segments = rep(s, n))
squaremodel(df)$matrixplot
sm <- squaremodel(df, method = 'MAE', penalty = 0.5, penploidy = 0.5)
sm$matrixplot
mdf <- sm$minimadf
head(mdf[order(mdf$error,-mdf$cellularity),])

## using segmented data from a QDNAseq-object
data("copyNumbersSegmented")
sqm <- squaremodel(copyNumbersSegmented, QDNAseqobjectsample = 2, 
  penalty = 0.5, penploidy = 0.5, 
  ptop = 4.3, pbottom = 1.8, prows = 250)
sqm$matrixplot
mdf <- sqm$minimadf
head(mdf[order(mdf$error,-mdf$cellularity),])

tgac-vumc/ACE documentation built on Nov. 29, 2022, 12:15 a.m.