squaremodel: Calculate potential fits for a single sample using ploidy as...
In tgac-vumc/ACE: Absolute Copy Number Estimation from Low-coverage Whole Genome Sequencing

squaremodel

R Documentation

Calculate potential fits for a single sample using ploidy as a variable

Description

squaremodel performs a "two-dimensional" fitting algorithm on a single sample. It calculates the error of the fit at each cellularity over a range of "ploidies". Input can be either a template or a QDNAseq-object with the index of the sample specified. Returns a list with input parameters (method, penalty, and penploidy) and model characteristics (an error matrix, a logical matrix specifying minima, a data frame with all information, a data frame with only minima, and a graphical representation of the error matrix).

Usage

squaremodel(template, QDNAseqobjectsample = FALSE, prows=100, 
  ptop=5, pbottom=1, method = 'RMSE', exclude = c("X", "Y"), 
  sgc = c(), penalty = 0, penploidy = 0, cellularities = seq(5,100), 
  highlightminima = TRUE, standard)

Arguments

`template`	Object. Either a data frame as created by `objectsampletotemplate`, or a QDNAseq-object
`QDNAseqobjectsample`	Integer. Specifies which sample to analyze from the QDNAseqobject. Required when using a QDNAseq-object as template. Default = FALSE
`prows`	Integer. Sets the resolution of the ploidy-axis. Determines how many decrements are used to get from ptop to pbottom (see below). Therefore, the actual number of rows is actually prows + 1. Default = 100
`ptop`	Numeric. Sets the highest ploidy at which to start testing fits. Default = 5
`pbottom`	Numeric. Sets the lowest ploidy to be tested. Default = 1
`method`	Character string specifying which error method to use. For more documentation, consult the vignette. Can be "RMSE", "SMRE", or "MAE". Default = "RMSE"
`exclude`	Integer or character vector. Specifies which chromosomes to exclude for model fitting. Default = c("X", "Y")
`sgc`	Integer or character vector. Specifies which chromosomes occur with only a single copy in the germline
`penalty`	Numeric. Penalizes fits at lower cellularities. Suggested values between 0 and 1. Default = 0 (no penalty)
`penploidy`	Numeric. Penalizes fits that diverge from 2N with the formula (1+abs(ploidy-2))^penploidy. Default = 0
`cellularities`	Numeric vector. Specifies the cellularities (in percentage) to be tested
`highlightminima`	Logical. Minima are highlighted in the matrixplot by a black dot. Default = TRUE
`standard`	Numeric. Force the ploidy to represent this raw value. When omitted, the standard will be calculated from the data

Details

Unlike other functionality of ACE, squaremodel does not use the "standard", but it fits all tested ploidies to "standard = 1". It is therefore necessary that segment values are normalized to 1 (which they are by default coming from QDNAseq). The penalty parameter is the same as in singlemodel. Additionally, it is possible to penalize fits at ploidies diverging from 2N using the penploidy parameter. For other details on the fitting algorithm, see singlemodel. Range of ploidies is set by parameters ptop and pbottom, and resolution is determined by prows. Resolution on the X-axis can be adapted by changing the cellularities option. To create good contrast in the matrixplot, the color scale derives from the inverse of the error, and the opacity of the dots marking the minima is calculated as min(error)/error.

Value

Returns a list, containing

`method`	Applied error method
`penalty`	Applied penalty factor for low cellularities
`penploidy`	Applied penalty factor for diverging ploidies
`errormatrix`	Numeric matrix with errors of all combinations of ploidy and cellularity
`minimatrix`	Logical matrix indicating whether the combination of ploidy and cellularity represents a minimum
`errordf`	Data frame with columns ploidy, cellularity, error, and minimum
`minimadf`	Same as errordf, but only containing minima and sorted by error
`matrixplot`	ggplot2-graph of the relative errors calculated at each combination of ploidy and cellularity

Note

squaremodel() only needs a data frame with columns named chr and segments. Every row should contain an individual genomic feature, i.e. a bin or a probe. If you have data with each row representing a segment, and the size of the segment given in a column (e.g. NumBins or NumProbes), you can create the data frame as follows (giving the correct variable names of course):

template <- data.frame(chr = rep(Chromosome, NumProbes), segments = rep(SegmentMean, NumProbes))

Alternatively you can look into segmentstotemplate.

If your data contains sex chromosomes and you wish to include these for model fitting, then make sure to specify exclude = c(), and sgc = c("X", "Y") when analyzing data from a male individual.

Author(s)

Jos B. Poell

Examples

## toy data assuming each chromosome comprises 100 bins
s <- jitter(c(1, 1, 0.8, 1.2, rep(1, 5), 1.4, rep(1, 13)), amount = 0)
n <- c(100, 100, 40, 60, rep(100, 5), 100, rep(100, 13))
df <- data.frame(chr = rep(1:22, each = 100), segments = rep(s, n))
squaremodel(df)$matrixplot
sm <- squaremodel(df, method = 'MAE', penalty = 0.5, penploidy = 0.5)
sm$matrixplot
mdf <- sm$minimadf
head(mdf[order(mdf$error,-mdf$cellularity),])

## using segmented data from a QDNAseq-object
data("copyNumbersSegmented")
sqm <- squaremodel(copyNumbersSegmented, QDNAseqobjectsample = 2, 
  penalty = 0.5, penploidy = 0.5, 
  ptop = 4.3, pbottom = 1.8, prows = 250)
sqm$matrixplot
mdf <- sqm$minimadf
head(mdf[order(mdf$error,-mdf$cellularity),])

tgac-vumc/ACE documentation built on Nov. 29, 2022, 12:15 a.m.