CodomMarker: Function to fit a multiple mixture model to a vector of...
In fitPoly: Genotype Calling for Bi-Allelic Marker Assays

CodomMarker

R Documentation

Function to fit a multiple mixture model to a vector of signal ratios of a single bi-allelic marker

Description

This function fits a specified mixture model to a vector of signal ratios of multiple samples for a single bi-allelic marker. Returns a list with results from the fitted mixture model.

Usage

CodomMarker(y, ng, pop.parents=matrix(c(NA,NA), nrow=1),
pop=rep(1, length(y)), mutype=0, sdtype="sd.const", ptype=NA,
clus=TRUE, mu.start=NA, sd=rep(0.075, ng), p=NA,
maxiter=500, maxn.bin=200, nbin=200, plothist=TRUE, nbreaks=40,
maintitle=NULL, closeScreen=TRUE, fPinfo=NA)

Arguments

`y`	the vector of signal ratios (each value is from one sample, vector y contains the values for one marker). All values must be between 0 and 1 (inclusive), NAs are not allowed. The minimum length of y is 10*ng.
`ng`	the number of possible genotypes (mixture components) to be fitted: one more than the ploidy of the samples.
`pop.parents`	a matrix with 2 columns and 1 row per population; the cells contain the row numbers of the parental populations in case of an F1 and NA otherwise. The rows must be sorted such that all F1s occur above their parental populations. By default 1 row with elements NA, i.e. all samples belong to a single non-F1 population. If parameter pop is a factor or character vector, its levels or elements must correspond to the rownames of pop.parents.
`pop`	an integer vector specifying the population to which each sample in y belongs. All values must index rows of pop.parents. By default a vector of 1's, i.e. all samples belong to a single non-F1 population. Alternatively pop can be a factor or character vector of which the levels or elements match the rownames of pop.parents
`mutype`	an integer in 0:6; default 0. Describes how to fit the means of the components of the mixture model: with mutype=0 the means are not constrained, requiring ng degrees of freedom. With mutype in 1:6 the means are constrained based on the ng possible allele ratios according to one of 6 models; see Details.
`sdtype`	one of "sd.const", "sd.free", "sd.fixed"; default "sd.const". Describes how to fit the standard deviations of the components of the mixture model: with "sd.const" all standard deviations (on the transformed scale) are equal (requiring 1 degree of freedom); with "sd.free" all standard deviations are fitted separately (ng d.f.); with "sd.fixed" all sd's ON THE TRANSFORMED SCALE are equal to parameter sd (0 d.f.).
`ptype`	a character vector of length nrow(pop.parents) containing for each population one of "p.free", "p.fixed", "p.HW" or "p.F1". The default NA is interpreted as "p.F1" for F1 populations and "p.free" for all other populations; this is not necessarily the best choice for GWAS panels where "p.HW" may be more appropriate. Describes per population how to fit the mixing proportions of the components of the mixture model: with "p.free", the proportions are not constrained (and require ng-1 degrees of freedom per population); with "p.fixed" the proportions given in parameter p are fixed; with "p.HW" the proportions are calculated per population from an estimated allele frequency, requiring only 1 degree of freedom per population; with "p.F1" polysomic (auto-polyploid) F1 segregation ratios are calculated based on the fitted dosages of the F1 parents and require no extra d.f.
`clus`	boolean. If TRUE, the initial means and standard deviations are based on a kmeans clustering of all samples into ng or fewer groups. If FALSE, the initial means are equally spaced on the transformed scale between the values corresponding to 0.02 and 0.98 on the original scale and the initial standard deviations are 0.075 on the transformed scale.
`mu.start`	vector of ng values. If present, gives the start values of mu (the means of the mixture components) on the original (untransformed) scale. Must be strictly ascending (mu[i] > mu[i-1]) between 0 and 1 (inclusive). Overrides the start values determined by clus TRUE or FALSE.
`sd`	vector of ng values. If present, gives the initial (or fixed, if sd.fixed is TRUE) values of sd (the standard deviations of the mixture components) ON THE TRANSFORMED SCALE. Overrides the start values determined by clus TRUE or FALSE.
`p`	a matrix of nrow(pop.parents) rows and ng columns, each row summing to 1. If present, specifies the initial (or fixed, for populations where ptype is "p.fixed") mixing proportions of the mixture model components.
`maxiter`	a single integer: the maximum number of times the nls function is called (0 = no limit, default=500).
`maxn.bin`	a single integer, default=200: if the length of y is larger than max.nbin the values of y (after arcsine square root transformation) are binned (i.e. the range of y (0 to pi/2) is divided into nbin bins of equal width and the number of y values in each bin is used as the weight of the midpoints of each bin). This results in significant speed improvement with large numbers of samples without noticeable effects on model fitting.
`nbin`	a single integer, default=200: the number of bins (see maxn.bin).
`plothist`	if TRUE (default) a histogram of y is plotted with the fitted distributions superimposed
`nbreaks`	number of breaks (default 40) for plotting the histogram; does not have an effect on fitting the mixture model.
`maintitle`	string, used as title in the plotted histogram.
`closeScreen`	logical, only has an effect if plothist is TRUE. closeScreen should be TRUE (default) unless CodomMarker will plot on a device that is managed outside CodomMarker.
`fPinfo`	NA (default), for internal use only. Prevents unneeded checking and recalculation of input parameters when called from fitOneMarker.

Details

This function takes as input a vector of ratios of the signals of two alleles (a and b) at one genetic marker locus (ratios as b/(a+b)), one for each sample, and fits a mixture model with ng components (for a tetraploid species: ng=5 components representing the nulliplex, simplex, duplex, triplex and quadruplex genotypes). Ideally these signal ratios should reflect the possible allele ratios (for a tetraploid: 0, 0.25, 0.5, 0.75, 1) but in real life they show a continuous distribution with a number of more or less clearly defined peaks. The samples can represent multiple populations, each with their own segregation type (polysomic F1 ratios, Hardy-Weinberg ratios or free ratios). Multiple arguments specify what model to fit and with what values the iterative fitting process should start.
Parameter mutype determines how the means of the mixture model components are constrained based on the possible allele ratios, as follows

0: all means are fitted without restrictions (ng parameters)
1: a basic model assuming that both allele signals have a linear response to the allele dosage; one parameter for the ratio of the slopes of the two signal responses, and two parameters for the background levels (intercepts) of both signals (total 3 parameters)
2: as 1, but with the same background level for both signals (2 parameters)
3: as 1, with two parameters for a quadratic effect in the signal responses (5 parameters)
4: as 3, but with the same background level for both signals (4 parameters)
5: as 3, but with the same quadratic parameter for both signal responses (4 parameters)
6: as 5, but with the same background level for both signals (3 parameters)

Value

A list; if an error occurs the only list component is

message: the error message

If no error occurs the list has the following components:

loglik: the optimized log-likelihood
npar: the number of fitted parameters
AIC: Akaike's Information Criterion
BIC: Bayesian Information Criterion
psi: a list with components mu, sigma and p: mu and sigma each a vector of length ng with the means and standard deviations of the components of the fitted mixture model ON THE TRANSFORMED SCALE. p a matrix with one row per population and ng columns: the mixing proportions of the mixture components for each population
post: a matrix of ng columns and length(y) rows; each row r gives the ng probabilities that y[r] belongs to the ng components
nobs: the number of observations in y (excluding NA's)
iter: the number of iterations
message: an error message, "" if no error
back: a list with components mu.back and sigma.back: each a vector of length ng with the means and standard deviations of the mixture model components back-transformed to the original scale

Examples

data(fitPoly_data)
mrkdat <- fitPoly_data$ploidy6$dat6x[fitPoly_data$ploidy6$dat6x$MarkerName == "mrk001",]

# hexaploid, without specified populations
cdm <- CodomMarker(mrkdat$ratio, ng=7)
names(cdm)

# hexaploid, with specified populations (4 F1 populations and a cultivar panel)
# first set the ptype for each population: p.F1 for F1 populations,
# p.HW for the panel, p.free for the F1 parents
ptype <- rep("p.HW", nrow(fitPoly_data$ploidy6$pop.parents))
ptype[!is.na(fitPoly_data$ploidy6$pop.parents[,1])] <- "p.F1"
ptype[unique(fitPoly_data$ploidy6$pop.parents)] <- "p.free" #all F1 parents
cdm <- CodomMarker(y=mrkdat$ratio, ng=7,
                   pop=fitPoly_data$ploidy6$pop,
                   pop.parents=fitPoly_data$ploidy6$pop.parents,
                   mutype=5, ptype=ptype)

fitPoly documentation built on April 3, 2025, 8:58 p.m.