singlemodel: Calculate potential fits for a single sample

View source: R/ACE.R

singlemodelR Documentation

Calculate potential fits for a single sample

Description

singlemodel performs the basic fitting algorithm of ACE on a single sample. Input can be either a template or a QDNAseq-object with the index of the sample specified. Returns a list with input parameters (ploidy, standard, and penalty) and model characteristics (calculated minima, the relative error corresponding with the minima, and the errors calculated at every cellularity). It also returns the plot associated with the error list. The minima represent cellularities, as can be seen in the plot.

Usage

singlemodel(template, QDNAseqobjectsample = FALSE, ploidy = 2, 
            standard, method = 'RMSE', exclude = c("X","Y"), 
            sgc = c(), penalty = 0, highlightminima = TRUE)

Arguments

template

Object. Either a data frame as created by objectsampletotemplate, or a QDNAseq-object

QDNAseqobjectsample

Integer. Specifies which sample to analyze from the QDNAseqobject. Required when using a QDNAseq-object as template. Default = FALSE

ploidy

Integer. Calculate fits assuming the median of segments has this absolute copy number. Default = 2

standard

Numeric. Force the given ploidy to represent this raw value. When omitted, the standard will be calculated from the data

method

String character specifying which error method to use. For more documentation, consult the vignette. Can be "RMSE", "SMRE", or "MAE". Default = "RMSE"

exclude

Integer or character vector. Specifies which chromosomes to exclude for model fitting. Default = c("X", "Y")

sgc

Integer or character vector. Specifies which chromosomes occur with only a single copy in the germline

penalty

Numeric value. Penalizes fits at lower cellularities. Suggested values between 0 and 1. Default = 0 (no penalty)

highlightminima

Logical. Minima are highlighted in the errorplot by a red color. Default = TRUE

Details

All ACE fitting algorithms work by calculating "expected values" of integer copies given a certain cellularity. It calculates these expected values for 1-12 copies at cellularities 0.05-1 (in increments of 0.01). First of all, this means that fits at cellularities below 0.05 are not calculated. These low-cellularity fits will not give very meaningful results, and only obscure more plausible fits. Second, it means that 0 copies and >12 copies are not "fitted". This prevents fits predicting many and/or large segments with 0 or >12 copies, which is biologically unlikely. More explanation is given in the vignette.

Value

Returns a list, containing

ploidy

Absolute copy number that corresponds with the median segment value

standard

Ploidy corresponds to this raw data value. Unless specified as argument, it corresponds to the median segment value

method

Applied error method

penalty

Applied penalty factor

minima

Vector with cellularities at which the error reached a minimum

rerror

Vector with relative errors corresponding to the minima

errorlist

List of errors of all cellularities tested

errorplot

ggplot2-graph of the relative errors calculated at each cellularity

Note

singlemodel() only needs a data frame with columns named chr and segments. Every row should contain an individual genomic feature, i.e. a bin or a probe. If you have data with each row representing a segment, and the size of the segment given in a column (e.g. NumBins or NumProbes), you can create the data frame as follows:

template <- data.frame(chr = rep(Chromosome, NumProbes), segments = rep(SegmentMean, NumProbes))

Alternatively you can look into segmentstotemplate.

If your data contains sex chromosomes and you wish to include these for model fitting, then make sure to specify exclude = c(), and sgc = c("X", "Y") when analyzing data from a male individual.

Author(s)

Jos B. Poell

See Also

objectsampletotemplate, squaremodel, singleplot

Examples

## toy data assuming each chromosome comprises 100 bins
s <- jitter(c(1, 1, 0.8, 1.2, rep(1, 5), 1.4, rep(1, 13)), amount = 0)
n <- c(100, 100, 40, 60, rep(100, 5), 100, rep(100, 13))
df <- data.frame(chr = rep(1:22, each = 100), segments = rep(s, n))
singlemodel(df)
singlemodel(df, ploidy = 3)
singlemodel(df, method = 'MAE', penalty = 0.5)
singlemodel(df, exclude = 1:3)

## using segmented data from a QDNAseq-object
data("copyNumbersSegmented")
singlemodel(copyNumbersSegmented, QDNAseqobjectsample = 2)

tgac-vumc/ACE documentation built on Nov. 29, 2022, 12:15 a.m.