singlemodel: Calculate potential fits for a single sample
In tgac-vumc/ACE: Absolute Copy Number Estimation from Low-coverage Whole Genome Sequencing

singlemodel

R Documentation

Calculate potential fits for a single sample

Description

singlemodel performs the basic fitting algorithm of ACE on a single sample. Input can be either a template or a QDNAseq-object with the index of the sample specified. Returns a list with input parameters (ploidy, standard, and penalty) and model characteristics (calculated minima, the relative error corresponding with the minima, and the errors calculated at every cellularity). It also returns the plot associated with the error list. The minima represent cellularities, as can be seen in the plot.

Usage

singlemodel(template, QDNAseqobjectsample = FALSE, ploidy = 2, 
            standard, method = 'RMSE', exclude = c("X","Y"), 
            sgc = c(), penalty = 0, highlightminima = TRUE)

Arguments

`template`	Object. Either a data frame as created by `objectsampletotemplate`, or a QDNAseq-object
`QDNAseqobjectsample`	Integer. Specifies which sample to analyze from the QDNAseqobject. Required when using a QDNAseq-object as template. Default = FALSE
`ploidy`	Integer. Calculate fits assuming the median of segments has this absolute copy number. Default = 2
`standard`	Numeric. Force the given ploidy to represent this raw value. When omitted, the standard will be calculated from the data
`method`	String character specifying which error method to use. For more documentation, consult the vignette. Can be "RMSE", "SMRE", or "MAE". Default = "RMSE"
`exclude`	Integer or character vector. Specifies which chromosomes to exclude for model fitting. Default = c("X", "Y")
`sgc`	Integer or character vector. Specifies which chromosomes occur with only a single copy in the germline
`penalty`	Numeric value. Penalizes fits at lower cellularities. Suggested values between 0 and 1. Default = 0 (no penalty)
`highlightminima`	Logical. Minima are highlighted in the errorplot by a red color. Default = TRUE

Details

All ACE fitting algorithms work by calculating "expected values" of integer copies given a certain cellularity. It calculates these expected values for 1-12 copies at cellularities 0.05-1 (in increments of 0.01). First of all, this means that fits at cellularities below 0.05 are not calculated. These low-cellularity fits will not give very meaningful results, and only obscure more plausible fits. Second, it means that 0 copies and >12 copies are not "fitted". This prevents fits predicting many and/or large segments with 0 or >12 copies, which is biologically unlikely. More explanation is given in the vignette.

Value

Returns a list, containing

`ploidy`	Absolute copy number that corresponds with the median segment value
`standard`	Ploidy corresponds to this raw data value. Unless specified as argument, it corresponds to the median segment value
`method`	Applied error method
`penalty`	Applied penalty factor
`minima`	Vector with cellularities at which the error reached a minimum
`rerror`	Vector with relative errors corresponding to the minima
`errorlist`	List of errors of all cellularities tested
`errorplot`	ggplot2-graph of the relative errors calculated at each cellularity

Note

singlemodel() only needs a data frame with columns named chr and segments. Every row should contain an individual genomic feature, i.e. a bin or a probe. If you have data with each row representing a segment, and the size of the segment given in a column (e.g. NumBins or NumProbes), you can create the data frame as follows:

template <- data.frame(chr = rep(Chromosome, NumProbes), segments = rep(SegmentMean, NumProbes))

Alternatively you can look into segmentstotemplate.

If your data contains sex chromosomes and you wish to include these for model fitting, then make sure to specify exclude = c(), and sgc = c("X", "Y") when analyzing data from a male individual.

Author(s)

Jos B. Poell

Examples

## toy data assuming each chromosome comprises 100 bins
s <- jitter(c(1, 1, 0.8, 1.2, rep(1, 5), 1.4, rep(1, 13)), amount = 0)
n <- c(100, 100, 40, 60, rep(100, 5), 100, rep(100, 13))
df <- data.frame(chr = rep(1:22, each = 100), segments = rep(s, n))
singlemodel(df)
singlemodel(df, ploidy = 3)
singlemodel(df, method = 'MAE', penalty = 0.5)
singlemodel(df, exclude = 1:3)

## using segmented data from a QDNAseq-object
data("copyNumbersSegmented")
singlemodel(copyNumbersSegmented, QDNAseqobjectsample = 2)

tgac-vumc/ACE documentation built on Nov. 29, 2022, 12:15 a.m.