fitModelParameters: Peak parameter estimation
In IPPD: Isotopic peak pattern deconvolution for Protein Mass Spectrometry by template matching

Description Usage Arguments Details Value Warning Warning See Also Examples

In the template-based approach of this package, each template/peak pattern is composed of several single basic peaks. The shape of such a basic peak may be modeled either as Gaussian or as Exponentially Modified Gaussian (EMG). The second model assumes that the shape of each peak equals the shape of the function one obtains when convolving the probability density function of a Gaussian distribution with the probability density function of the exponential distribution. This is a more complex as well as more flexible model, since it allows one to account for skewness. Both peak models depend on a set of parameters which are usually unknown a priori. Moreover, these parameters tend to vary over the spectrum. The documented method provides the following functionality: given a raw spectrum and a linear model that describes how the set of parameters varies as a function of m/z, we detect well-resolved peaks within the spectrum. For each detected peak, we determine the parameters of our model in such a way that the resulting peak shape matches that of the detected peak as good as possible in a least-squares sense. In this manner, we obtain a set of estimated parameters for different m/z positions. These estimates are used as response, the m/z positions as explanatory variable in a linear regression model (not necessarily linear in m/z). To be resistant to outliers (e.g. as occurring due to overlapping peaks), we use least absolute deviation regression to infer the model parameters of these linear models. The result can directly be used for the argument model.parameters in the main method getPeakList.

fitModelParameters(mz, intensities, model = c("Gaussian", "EMG"),
                                           fitting = c("most_intense", "model"), 
                                           formula.alpha =  formula(~1),
                                           formula.sigma = formula(~1),
                                           formula.mu = formula(~1),
                                           control = list(window = 6,
                                           threshold = NULL, rlm.maxit = 20))

`mz`	A `numeric` vector of m/z (mass/charge) values (in Thomson).
`intensities`	A `numeric` vector of intensities corresponding to `mz`.
`model`	Basic model for the shape of a single peak. Must be `"Gaussian"` or `"EMG"` (exponentially modified Gaussian). See details below.
`fitting`	A character specifying the mode of peak extraction and -fitting. If `fitting = "most_intense"`, then the most intense peak is detected and the parameter(s) is (are) fitted using only this peak. Then, the resulting functions of the form `parameter(mz)` are all constant functions. If `fitting = "model"`, as many peaks as possible satisfying the criteria of `control` are used to estimate linear models of the form `parameter(mz) = beta_0 + beta_1 g_1(mz) + ... + beta_p g_p(mz)`, where the `g`'s are fixed functions. Their specification is performed by specifying one-sided formulae according to the Wilkinson-Roger notation for linear models as in the function `lm`. The model formulae have to satisfy the following criteria: The formula is one sided, i.e. no term appears on the left hand side of `~`. The right hand side consists only of functions in `mz`, and `mz` is the only variable that may be used. Product terms involving `` are not* admitted. Important: Note that, for example `~ 1 + mz + sqrt(mz)` is a valid formula in the sense that no error will occur, but it does not correspond to the linear model `parameter(mz) = beta_0 + beta_1 mz + beta_2 sqrt(mz)`. The correct model formula instead reads `~ 1 + mz + I(sqrt(mz))`, i.e. each function has to be bracketed by `I()`.
`formula.alpha`	A one-sided formula describing the dependence of the EMG-parameter `alpha` as a function of m/z, s. `fitting`. The default assumes that the parameter is independent of m/z, in which case one obtains a function returning the median of the estimates obtained for different detected peaks. `formula.alpha`, `formula.sigma` and `formula.mu` are needed if and only if `fitting = "model"`.
`formula.sigma`	Parameter used for both peak models. For further information, see `formula.alpha`
`formula.mu`	See `formula.alpha`. Used only if `model = "EMG"`.
`control`	A `list` controlling peak detection. The parameter `window` refers to the minimal resolution of a peak. According to `window`, a sequence of intensities at adjacent m/z positions is considered as peak if and only if there are at least `window` m/z positions with increasing intensity followed by a second sequence of `window` m/z positions with decreasing intensity. If in addition `threshold` is specified, only peaks whose maximum intensity is equal to or greater than `threshold` is considered. Note: Usually, `threshold` is specified, since otherwise the maximum intensity among the complete spectrum minus some epsilon is taken as `threshold`. `rlm.maxit` allows to control the maximum number of iterations used to fit peaks to the data.

Let the variable x represent m/z. Then model = "Gaussian" assumes that a single peak can be described as

gaussfun(x;sigma,mu) = exp(-(x - mu)^2/sigma)

The parameter mu is not considered as model parameter: in the computation of the resulting basis function matrix, mu is always set to a known m/z position where the leading peak of a peak pattern might be present.
Model = "EMG" assumes that a single peak can be described as

EMG(x;alpha,sigma,mu) = exp(sigma^2/(2 * alpha^2) + (mu - x)/alpha) (1 - Phi(sigma/alpha + (mu - x)/(sigma)))/alpha,

where Phi represents the cumulative density function of the standard Gaussian distribution. Alternatively, EMG(.;alpha,sigma,mu) can be expressed as
EMG(x;alpha,sigma,mu) = (phi ** gamma)(x),
where ** denotes convolution, phi is the density function of the Gaussian distribution with mean mu and standard deviation sigma and gamma is the density function of an exponential distribution with expectation alpha.
The parameters of EMG can be interpreted as follows.

alpha: The lower alpha, the more the shape of the peak resembles that of a Gaussian. Conversely, large values of alpha lead to long right tails.
sigma: Controls the width of the peak (together with alpha).
mu: A location parameter. Note that in general mu does not coincide with the mode of EMG. Therefore, if model = "EMG", all three parameters are estimated from detected peaks.

Moreover, the skewness of EMG is characterized by the ratio alpha/sigma.

An object of class modelfit.

Parameter estimation by fitting detected peaks is possible only if single peaks are sufficiently well-resolved. A peak composed of, say, five (m/z, intensity) pairs, is inappropriate to infer three parameters.

The choice model = "EMG" in conjunction with fitting = "model" can be extremely slow (taking up to several minutes of computing time) if many peaks are detected and fitted. This is caused by a grid search over a grid of 10^6 different combinations of alpha, sigma and mu performed prior to nonlinear least squares estimation in order to find suitable starting values.

getPeaklist, modelfit

### load data
data(toyspectrum)
### estimate parameter sigma of a Gaussian model,
### assumed to be independent of m/z

simplegauss <- fitModelParameters(toyspectrum[,1],
             toyspectrum[,2],
             model = "Gaussian",
             fitting = c("model"),
             formula.sigma = formula(~1),
             control = list(window = 6, threshold = 1))

show(simplegauss)
visualize(simplegauss, type = "peak", xlab = "m/z", ylab = "intensity",
          main = "Gaussian fit")

### fit the model sigma(m/z) = beta_0 + beta_1 m/z + beta_2 m/z^2

gaussquadratic <- fitModelParameters(toyspectrum[,1],
             toyspectrum[,2],
             model = "Gaussian",
             fitting = "model",
             formula.sigma = formula(~mz + I(mz^2) ),
             control = list(window = 6, threshold = 1))

show(gaussquadratic)
visualize(gaussquadratic, type = "model", modelfit = TRUE)

### estimate parameters for EMG-shaped peaks

EMGlinear <- fitModelParameters(toyspectrum[,1],
             toyspectrum[,2],
             model = "EMG",
             fitting = "model",
             formula.alpha = formula(~mz),
             formula.sigma = formula(~mz),
             formula.mu = formula(~1),
             control = list(window = 6, threshold = 1))

show(EMGlinear)

visualize(EMGlinear, type = "peak", xlab = "m/z", ylab = "intensities",
          main = "EMG fit")

visualize(EMGlinear, type = "model", parameters = c("alpha", "sigma"), modelfit = TRUE)